Reinforcement Learning

Aishwarya
3 min readOct 1, 2024

--

Everything you need to know to get started with Reinforcement Learning

Reinforcement Learning (RL) is a framework for solving control tasks or decision tasks, where agents interact with an environment, learning through trial and error by receiving rewards as feedback.

Figure 1: The Basic Reinforcement Learning Model. In this model, the agent (Mario) interacts with the environment, represented as the Mario game. At a given state Sₜ, Mario selects an action Aₜ​ from the action space A. After taking the action, the environment responds by providing a reward Rₜ+₁ and transitioning to the next state Sₜ+₁.

This mirrors how humans learn — many of our actions result from repeated trial and error until they become habits, like eating with a spoon and knowing where our mouth is. Similarly, in new environments, we adapt and learn through interaction.

Figure 2. In a meme analogy, the agent (the baby) interacts with the environment (eating the lemon) and learns from the reward (the sour taste).

The central idea in Reinformace Learning is to maximize the reward.

Reinforcement Learning differs from other learning algorithms - In supervised learning, we provide the model with instructive feedback on the correct action. In reinforcement learning, we give the model evaluative feedback by assessing how good or bad the action was and not whether it was a correct or incorrect action.

Training an Agent

There are two primary ways to train an agent in reinforcement learning:

1. Policy-Based Methods

In this approach, we train the agent directly by specifying which action to take given a particular state. This can be further classified into two types:

  • Deterministic: Returns a single action.
  • Stochastic: Returns a probability distribution over the action space, making the model less deterministic.

2. Value-Based Methods

In this approach, we train the agent indirectly by providing the value of each action taken in a given state. The agent then assumes a policy based on these values and learns through trial and error. The value function calculates the expected reward when the agent transitions to state s and takes action a, then follows the same policy for future actions. This method can be further classified into two types:

  • State-Value: Returns the value for all possible next states.
  • Action-Value: Evaluates value over state and action pairs.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Aishwarya
Aishwarya

Written by Aishwarya

Data Science Practioner | Machine Learning Enthusiast

No responses yet

Write a response