Machine learning is a core technology of AI (artificial intelligence) . There are various methods for machine learning, but they can be divided into three types , “supervised learning,” “unsupervised learning,” and “reinforcement learning,” depending on the learning method and input data .
In this column, I will explain in detail the three types of learning methods of machine learning.
Supervised learning is a method in which a computer learns using a learning model built on data with known correct labels and numerical values . It is the simplest learning method of machine learning, and it is characterized by the fact that it is easy to obtain results that are close to the predictions made by humans in classification and prediction .
Supervised learning is also used when predicting the future from historical data . For example, market forecasts for stock trading and estimates of clients who often buy their products. To improve the accuracy of supervised learning , prepare learning data that makes it easy to extract features so that it can be classified as designed by humans .
Supervised learning tasks “identification” and “regression”
There are two typical tasks in supervised learning: “identification” and “regression”. “Identification” is to enter an image and classify it into several predetermined classes such as dogs and cats as the correct answer . “Regression” is a task such as inputting the temperature and predicting the sales volume of books .
For example, if you want to classify shoe images in supervised learning, prepare thousands to tens of thousands of pictures of various shoes as learning data. Label the images with the correct answers, “This is high heels,” “This is pumps,” and “This is boots.” The combination of this correct label and the huge amount of image data prepared is loaded into the machine learning program. From there, we will observe the combination of image data and correct labels to find a model for successfully classifying “high heels,” “pumps,” and “boots.”
Unsupervised learning is a learning method that finds groups with common characteristics and extracts information that characterizes the data from input data that does not have a correct answer label .
Clustering is a typical task of unsupervised learning . Clustering automatically finds data with similar characteristics from the data and divides it into several types of groups . For example, you can group users with similar purchasing behavior from user purchasing data such as age and gender consumption tendency, or extract user preferences from survey data.
Advantages and disadvantages of unsupervised learning
The advantage of unsupervised learning is that it is easier to get started than supervised learning because the data does not need to be labeled correctly . In addition, unsupervised learning can classify data without correct answer labeling , compared to supervised learning, which requires manual correct answer labeling .
On the other hand, since it is not possible to predict what kind of classification criteria a computer will create, it seems that while it is possible to find a classification method that humans cannot imagine, classification is not practically useful . There are also disadvantages such as being unable to determine how to use the results derived from unsupervised learning.
Unlike supervised learning and unsupervised learning, “reinforcement learning” is a method of finding the best judgment while actually acting on tasks that take time to produce results or require many repetitions . .. Of automobile automatic operation , the robot of control and AlphaGo (alpha go) has been used in games such as typified by.
As with unsupervised learning, only input is prepared as learning data, but rewards are given according to the quality of output (behavior) . The optimum solution is reached by learning by trial and error what kind of action should be taken to obtain the maximum “reward” .
Learning method and its principle in the case of reinforcement learning
At the time of input, decide the reward in advance, such as “If you can do it, +1 point”. At first, the computer is in a state of “I don’t know what to do”, so it will move randomly from the options I have prepared. However, when the reward is received, it remembers “what state” and “what” the reward was received.
Next, while leaving a random movement, it moves using the previous memory as a clue. If you get a reward again, you will remember “what you did” in “what state”. By repeating this process, you will acquire a pair of “state” and “action” that will give you a reward.
When you start keeping a dog or cat, you will use food to listen to human instructions. When dogs and cats receive rewards from humans, they begin to make trial and error on how to receive rewards (= food). Eventually, you will be able to realize the fact that you will receive a reward (= bait) in this process, and you will be able to earn a reward. Reinforcement learning is based on the same idea.