Sampath Balivada
Sampath Balivada

Sampath Balivada

Reinforcement Learning

Reinforcement Learning

Sampath Balivada's photo
Sampath Balivada

Published on Sep 16, 2020

4 min read

Why Reinforcement Learning?

RL(Reinforcement Learning) gives us a way to build adaptable intelligent machines. It's one of things that makes humans more reliable than a computer. Cause humans, when given a problem, can go around and figure stuff out with the resources available at hand. This is highly unstructured behaviour and generally involves past experience combined with instincts.

What is RL?

"Reinforcement Learning is essentially a mathematical formulisation of a decision making problem."

RL started off as a biological study about the decision making process of a brain and gradually branched into computer science and is essentially seen as a way to achieve what is known as a singularity.

Did I encounter RL before?

A well known computer program called as AlphaGo designed by Google's Deep Mind lead by David Silver, 2019 ACM Prize in Computing winner is probably the first example of RL that I explicitly heard about.

For the unknown, AlphaGo is the first computer program to defeat a professional human Go player, the first to defeat a Go world champion, is arguably the strongest Go player in history. You can check the wiki for more info on AlphaGo but we'll now move to the next question.

How does RL work then?

RL-1-1.png

The above picture gives us an abstract understanding of how RL systems work. It basically how a child or a human learns. The learner performs a certain set of action(s) and the environment responds to the actions in the form of new changes to the existing environment and a suitable reward for the action performed.

For instance, a child may hit a ball with his/her foot and in response the ball moves. This is a change that occurs within the environment that the child can observe. In addition, if the ball moves in the direction intended by the child, it is reflected as a positive reward, else a negative reward is awarded to the child. An RL system works in a similar fashion.

Where do the rewards for the agent come from?

It is easy to determine a reward for a video game, cause video games are designed around these reward systems. But it can get quite difficult when we venture into the real world as sometimes it can be harder than the RL problem itself. For instance, a food making robot needs a human to taste and test if the food made is good.

So, what is Deep RL?

In normal reinforcement learning, we actually perform a lot of manual work to generate a reinforcement learning agent (The agent in RL is the component that makes the decision of what action to take).

it typically includes manually extracting features and designing policies (a policy is an agent's strategy) with a significant amount of trial and error.

But in Deep RL, we utilize Deep Neural Networks to extract features and design policies optimized for the goal at hand. This goal can be winning a game, aiming for the highest score or anything else.

This helps in extracting better features and making the process automatic will actually help us in applying RL to multiple domains much easier than before.

Deep Models are what allow reinforcement learning algorithms to solve complex problems end to end.

And hear this out...

The Reinforcement Learning problem is the AI problem!

Yeah, it is in theory possible and kind of obvious that all the AI problems can be casted as Reinforcement Learning problem.

For instance, we can build an image classifier with RL by giving a positive reward when the agent gets the image label right and a negative reward when the agent is wrong.

This in practise can be a lot to build and there are simpler ways to build image classifiers but it is always possible to solve any AI problem using RL/Deep RL.

So, what kind of problems are ideal to be solved using RL?

  • Usage of physical tools by a robot.
  • Play a game. Usually video games but can also play physical games if given the required mechanical ability.
  • Maybe control traffic to prevent traffic jams.

Why Learn RL Now?

  1. Advances in Deep Learning.
  2. Advances in Reinforcement Learning.
  3. Improved computational capabilities.

Other forms of acquiring knowledge.

  • Learning from demonstrations.
  • Learning from observing the world.
  • Learning form other tasks. (Transfer Learning/Meta Learning)

What can Deep RL do now?

  • Acquire high proficiency in domains governed by simple and known rules.
  • Learn simple skills with raw inputs, given enough experience.
  • Learn from imitating human provided expert behaviour.

Do we have any challenges? Yeah...

  • Humans are quick learners while RL agents are not and they need a lot of computing power.
  • Humans use past experiences efficiently but an RL agent needs explicit guidance in using past experiences.
  • The reward function is generally not very clear or quite complex.
  • The role of prediction is not quite clear.

Let's end it here. This is my day 1 knowledge of learning reinforcement learning. I thank all the researchers, engineers and every person who contributed to my todays learning.

References

rail.eecs.berkeley.edu/deeprlcourse

ai.stackexchange.com/questions/8476/what-do..

stackoverflow.com/questions/46260775/what-i..

slideslive.com/38923170/mixed-autonomy-traf..

en.wikipedia.org/wiki/AlphaGo

 
Share this