In recent years, the robotics community has pushed robots to operate outside of the factories. To operate in such unstructured scenarios, robots need to embed compliant elements in their design and controllers capable of preserving the body's elasticity. Feedforward control actions can maintain the body's elasticity relying on the accuracy of the model. To overcome the cost and time associated with accurate parameter identification, data-driven paradigms, e.g., learning-based control, are often employed. Iterative Learning Control (ILC) is a data-driven feedforward approach capable of ensuring theoretically good tracking performance. However, ILC needs a new learning phase for any new trajectory. On the other hand, Reinforcement Learning (RL) approaches generate robust policies solving the generalisation issue at the cost of high time and computational costs. Further, the policy remains dependent on the system modeled in the simulation. This thesis proposes a new learning framework, namely Reinforcement Iterative Learning Control (RILC), aiming to leverage the strengths of ILC as an inner learning framework within RL, maintaining ILC properties and utilising high-quality samples generated to increase the robot's performance. The controller's effectiveness is tested with experiments using a 2-degrees-of-freedom serial elastic robot with varying trajectories and model mismatches.