Machine learning use is on the rise in organizations across industries. With more and more machine learning techniques and tools to choose from, it is getting more and more difficult to pick the right machine learning tool for the job.
There are many machine learning styles to choose from. The right choice depends on the type of problem you are trying to solve using machine learning.
In this guide, I aim to give you a brief introduction to the most important machine learning types in use today, and when to use each. Specifically, we will go over machine learning by type of training, machine learning by learning volume, and machine learning by style of learning.
There are four categories of machine learning algorithms by the type of training they use.
In supervised learning, the training data includes the desired solutions, known as labels. For instance, if the training data set is a list of users for a given website, and the goal is to predict which of these users will convert to paying customers, the label for each user would be whether or not they did indeed buy something on the website.
Classification problems, such as user conversion predictors or spam filters, are one type of problem that can be solved using supervised learning algorithms. In classification problems, the goal is to predict whether a given data point belongs in one of two or more class (e.g., in the case of a spam filter, if an email belongs to the class ‘spam’ or to the class ‘not spam’).
Regression problems, such as predicting how much a house should cost given its location, surface area and other attributes, are another type of problem that can be solved using supervised learning. In regression problems, the goal is to predict a target numeric value given a set of features called predictors.
Popular supervised learning algorithms include:
In unsupervised learning, the training data is not labeled. Unsupervised learning algorithms are used to determine the level of similarity between data points, or to learn association rules within a data set, such as that “on Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer.”
Popular unsupervised learning algorithms include:
In semi-supervised learning, the training data contains a lot of unlabeled data, as well as a little bit of labeled data. Most semi-supervised learning algorithms are combinations of supervised and unsupervised algorithms.
One popular example of a semi-supervised learning algorithm is Deep Belief Networks (DBNs). DBNs are based on unsupervised components called Restricted Bolzmann Machines (RBMs) stacked on top of one another. The RBMs are trained in an unsupervised manner and then the whole system is fine-tuned using a supervised learning algorithm. Deep Belief Networks have been used successfully to identify objects or persons in image, video and motion-capture data.
In reinforcement learning, the learning system (a.k.a the agent) observes the environment, performs certain actions, and gets rewards or penalties. AlphaGo, Google DeepMind’s AI which beat a professional human player at Go for the first time in 2015, is an example of reinforcement learning at work.
There are two types of machine learning by learning volume.
In batch learning, the system is trained using all available data. This usually takes a lot of time and resources, so it’s typically done offline - that is, the system is first trained, then it is pushed to production.
The biggest downside of batch learning is that it is slow. When you have new data, you have to retrain the system from scratch. Yes, the training-evaluating-deploying cycle can be automated, but it will still be slow, and the system can usually only be retrained once a day or once a week. For huge amounts of data, this may be altogether impossible.
With online learning, the system is trained incrementally, and data instances are fed into the system individually or in small group (a.k.a, mini-batches). When new data arrives, the system can learn on the fly. What is more, once data has been used for training, you can even discard it.
Online learning can be used to do out-of-core learning, or learning on huge data sets that do not fit in memory.
In online learning systems, you can set the learning rate, or how fast the system changes with new data. A high learning rate will cause the system to change fast. However, older knowledge will be forgotten faster. On the flip side, a slow learning rate will result in a system that is more resistant to change.
One big downside of online learning is the fact that, if bad data is fed into the system, the system performance will decline. This is why it is important to monitor data quality closely, and turn off learning (and revert to a previous state) if bad data is detected.
There are two types of machine learning by style of learning.
An instance-based learning system learns the data examples by heart, then generalizes that knowledge to new cases using a measure of similarity. No model is learned in this case. Instead, the examples are literally stored in memory, and the training instances are the knowledge.
One example of instance-based learning is the k-Nearest Neighbor classifier. KNN classifies new data by comparing it to the previously seen examples. The classification of the closest matching k training instances predict the class of the new data.
Lastly, model-based learning is the most prevalent and widely-used type of machine learning today. The workflow for model-based learning is the following:
I hope you have found this high-level breakdown of machine learning types useful. We cover each of these machine learning styles in much more detail in our Classic Machine Learning course for teams, along with a lot of other topics in machine learning.
Finally, to get our next blog as soon as it is out, sign up for our (infrequent) newsletter below.