Like it!

Join us on Facebook!

Like it!

Introduction to machine learning

What machine learning is about, types of learning and classification algorithms, introductory examples.

Other articles from this series

This is the first article on my series of machine learning notes, a sub-field of Artificial Intelligence that arouses me since some time. The main source of knowledge will be the Machine Learning course @ Coursera, provided by Andrew Ng from Stanford University, along with other books and online tutorials.

What is machine learning in a nutshell

Arthur Lee Samuel had a nice definition of machine learning: the field of study that gives computers the ability to learn without being explicitly programmed. I also like what Drew Conway in his book Machine Learning for Hackers says about machine learning: it's just statistics made by computers.

In general, machine learning is used to make predictions on data. But instead of hard-code those predictions with a custom algorithm, you let the program itself to figure out the best output, based on the input data. This is also called data-driven prediction/decision. Sounds like magic, doesn't it? Let me show off a couple of examples and everything will be more clear.

Regression versus classification algorithms

Machine learning tasks are typically classified into several broad categories, depending on what side of your program you are looking at. If you think of the output side of machine learning, that is the outcome of your program, the most famous learning operations are regression and classification.


So you have a bunch of data and you want to make a prediction on it. For example you are collecting real-estate information in your city, because you want to predict the house prices given, say, its size in feet. You start gathering data and you end up with something like in the picture 1.:

House prices given their size
1. House prices given their size.

Each dot on the graph is a survey. For example you found out that a 1000 square feet house is worth about $200.000 (are those fantasy numbers? I don't know, sorry). You also found out that a ~1300 square feet house is worth ~$250.000. Machine learning will help you answer questions like: how much a 1100 square feet house is worth, given my input data?

In this case the output of your machine learning algorithm takes continuous values, i.e. any number from $0 to $400.000. This is a regression problem. The weird name comes from the fact that you "regress" your data to a line (the dotted one in the picture 1.), with a corresponding mathematical equation. If you know the equation, you can find any output (y) given any input (x). The operation is called linear regression and I will deal with it in future chapters.


Let's now change topic: you want to know if a watermelon is more or less sweet given its size. As always you start collecting data and you finally end up with a chart like the following one (picture 2.):

Watermelons' sweetness
2. Watermelon sweetness given their size.
Full dots: more sweet, empty dots: less sweet.

Each dots is a survey, where full dots are sweeter watermelons. Fantasy numbers here, too. Then, given a new watermelon of, say 41 centimeters of diameter, you want to know whether its flavor will be more or less sweet.

In this case the output of your machine learning algorithm takes discrete values: more sweet or less sweet. You are basically classifying things, like putting lables on each outcome, and that's called classification.

The vertical dotted line is the hyperplane, a boundary generated by the algorithm, used to discern values. Your program decided that values below ~33 cm are classified as "more sweet" and viceversa. More on that in future chapters, of course.

Supervised versus unsupervised learning

I've talked about the output side of a machine learning program so far. When you think of the input side instead, that is the data you feed into it, two broad learning categories come up: supervised and unsupervised learning.

Supervised learning

In supervised learning you give the algorithm the right answer in advance. For example, let's take a look back at the house pricing dataset in figure 1. There, for every point (size in square feet) I told the program the right price. The algorithm just had to produce more of those right answers.

The watermelon example in figure 2. was a supervised learning task as well. I told the program the sweetness of each watermelon in advance, and it just had to interpolate new outputs given new watermelon sizes in input.

Unsupervised learning

Unsupervised learning introduces yet more black magic on the scene: you let the algorithm figure out the labels itself. This approach brings in the concept of clustering: the task of grouping objects so that the same group (called a cluster) are more similar to each other than to those in other groups.

Unsupervised learning is great when you don't know how to label things in advance. For example, let's think of an image classification problem. You have a bunch of pictures you want to classify based on what they portray. You don't provide the algorithm with the right labels, maybe because even you don't know what the pictures are about. The task of the machine learning program is to find out similarities in the input data and figure out itself the best way to classify the pictures into proper groups, or clusters.


Machine Learning Course @ Coursera - Supervised Learning (link)
Machine Learning Course @ Coursera - Unsupervised Learning (link)
Wikipedia - Machine Learning (link)
Wikipedia - Arthur Samuel (link)

next article
Linear regression with one variable
Gowtham on July 04, 2018 at 14:05
Good website.. All the best!
Mit Patel on March 23, 2019 at 06:36
Very well explained. Keep it up.
Daniel Udekwe on August 04, 2019 at 10:50
Very well explained. Bravo!
Tyler on July 25, 2020 at 17:34
Thanks for putting this together! It has really helped me gain a better understanding of the Coursera Machine Learning Course.