Understanding Deep Reinforcement Learning

Understanding Deep Reinforcement Learning

Welcome to the Bulletin by Remix Robotics, where we share a summary of the week's need-to-know robotics and automation news.

In today's email -

  • Understanding Machine learning
  • Tesla’s a potential child killer?
  • Get your eyelashes on point
  • How to democratise AI

Snippets

Slow Down, Dummy! - Tesla issued a cease and desist after The Dawn Project published videos showing a Tesla Model 3 hitting child dummies in Full-Self Driving Beta mode. Unphased, The Dawn Project justified the videos as trying “to stop Tesla from putting Full Self-Driving cars that will mow down a child wearing a safety vest in a school crosswalk into the hands of 100,000 untrained consumers.” Looks like Elon Musk and The Dawn Project’s billionaire founder Dan O’Dowd may be battling this one out in court.

Robots Up, Rising - A record number of robots have been purchased in the first half of 2022, as North American companies have struggled to keep factories and warehouses going amidst Covid-19 related challenges. Companies ordered a record 12,305 machines in the second quarter valued at $585 million, 25% more units than during the same period a year ago. With a tight labour market and falling productivity, the industry has turned to robots to keep production stable and consistent going forward.

I guess there’s a robot for everything now - An eyelash extension start-up, Luum closed a $700k crowdfunding round. The company uses robotics, artificial intelligence and computer vision to apply lashes and plans to move from its prototype into salons. Truly eye-opening.

Breakthrough in Quantum AI - Contrary to previously held assumptions, a new proof has discovered that training quantum neural networks only requires a small amount of data. The need for large data sets could have been a roadblock to quantum AI, but this breakthrough has removed that issue. Potential direct applications include more efficient compiling for quantum computers and distinguishing phases of matter for materials discovery.

Programming in Sea ++ - The OceanOneK robot has been developed to dive to depths that would kill a human diver, meaning that the robot is able to retrieve delicate objects — without breaking them — at breath-taking depths. Utilising a haptic system, the OceanOneK avatar is controlled by humans to complete missions that were previously impossible.

It’s coming home -  Lots of robot football news. Deepmind has demonstrated a multiagent football simulation and researchers from UC Berkeley have succeeded in training quadrupedal robots to control soccer balls. Watch out Ronaldo, the robots are coming!

The First Robot CEO - NetDragon Websoft, a Chinese company that develops and operates multiplayer online games and mobile applications has announced the appointment of its new CEO ‘Ms. Tang Yu’. The AI-powered virtual humanoid robot is the world’s first robot to hold an executive position and will oversee the company’s ‘organizational and efficiency department’. Can’t wait to see how this plays out.

The Big Idea

This photo was taken in a school bench, with a mesh for fruit, a blue lantern and a camera Nikon D500, it represent the depth of the ocean and the internet.

Part 1 - Machine learning primer

Deep Reinforcement Learning (DRL) is the hottest topic in robotics research. Every one of our Bulletins has featured the technology - our Deep Dive on grasping saw that it was at the heart of universal grasping, and last week we saw the technique had been used to teach robots to draw. Understanding DRL is becoming a necessity for anyone working in robotics and over the next two weeks, we’re going to provide a primer on the tech and its application in robotics.

This week will introduce machine learning principles for non-AI experts. We’ll cover the main approaches, how they work and important decision criteria. In the next few weeks, we’ll get deeper into DRL and its value for roboticists.

Understanding AI Lingo

For the uninitiated AI jargon can be very, very off-putting. Even experts use different phrases like AI, machine learning, and deep learning interchangeably. These are not the same thing but understanding how everything fits together can be very unclear. The best Taxonomy we’ve seen comes from OpenAi researcher, Richard Ngo.

To break it down -

Artificial Intelligence -  is the attempt to develop computer programs that possess the capabilities associated with intelligence in humans: language skills, visual perception, motor control, and so on.

AI was pioneered in the 50s and until the 2000’s researchers focused on Symbolic AI.  This school of thought tries to represent intelligence using a formal language of logic. They used theory and rules to program intelligence from first principles. When Deep Blue beat Kasparov at chess, a human had hard-coded the rules of chess and the system used formal logic to decide on the best chess move.

Turns out this isn't very scalable - Symbolic AI gets very complex very fast. Today the leading AI approaches let computers learn for themselves. The new approach runs on machine intuition rather than theory.

Machine Learning - involves a process by which an AI receives input data, produces output data, and then is given feedback on its output (known as the learning, training or optimisation process).

There are a million flavours of Machine Learning and it gets very complex very quickly but at a high level you can organise methods by the type of data they learn from

  1. Supervised
  2. Unsupervised
  3. Reinforcement

You can teach a machine using a range of techniques of which the most popular is Deep Neural Networks, AKA Deep Learning.

We’ll introduce these concepts in this article and Dive deeper into DRL. Optimisation and neural network techniques will be left for another time.

Supervised Learning

Supervised Learning is the task of learning from tagged data; its goal is to generalize. It is ‘supervised’ because a human babysitter is required to train the system. The babysitter needs to be very diligent and provide pairs of labelled input and output data from which the system learns. After enough data, the system is able to generalise trends and predict the output for new inputs.

Supervised systems can be used for -

  • Classification - Categorising data into predefined classes. Examples include email and spam classification and identifying cancer tumour cells.
  • Regression - Predicts a continuous numerical output by finding a correlation between dependent and independent variables. It’s best suited to continuous variables like predicting house prices, market trends and weather patterns.

Supervised Learning is one of the most prevalent forms of machine learning, but is limited by the volume of training data required and the effort needed to label this data. The babysitter needs to work hard to make the system functional, which limits how easily the system can be extended to new problems.

Unsupervised Learning

Unsupervised Learning is the task of learning from unlabelled data; its goal is to compress.  A babysitter is still required, but they can be far less diligent — only checking up on the algorithm at the end. No pre-specified outputs are required to train a model. Instead, Unsupervised Learning is used to identify existing patterns in the data. It reduces the costs associated with annotating large datasets and it's great if the user doesn't know what they’re looking for, or wants to understand the data before running using a supervised model.

Unsupervised systems can be used for -

  • Clustering - Divides datasets into groups based on the features of the variables without using pre-specified categories.
  • Association - Finds patterns and relationships in the data, but instead of using these relationships for prediction, the model attempts to discover general rules in the dataset itself.  This is often used by retailers to identify purchasing habits - “If a person buys X they are likely to buy Y”.
  • Dimensional Reduction- Reduces the number of variables in a dataset. It is used in conjunction with other types of modelling to make the final model less complex. The objective is to simplify the data without losing too much information. This compression has its limitations, and should only be attempted if training is unfeasibly slow and there is a lot of similar information in the dataset.
  • Anomaly Detection -  Identifies outliers. This is often used in fraud detection.

Unsupervised Learning requires a lot less initial input than supervised methods but is often more challenging. It's hard to know if the model is overfitting data and finding patterns that don't exist often an expert is still needed to evaluate results.

Jack Pearson

London