5 Essential Machine Learning Techniques

5 Essential Machine Learning Techniques

Oleksii Tsymbal, MobiDev
Oleksii Tsymbal,
Chief Innovation Officer at MobiDev

Machine learning is a field of research aimed at teaching machines to perform cognitive activity, similar to the human mind. While they are typically much more limited in cognitive ability than the average human, they are able to process vast amounts of information quickly and derive useful insights. 

“The goal of machine learning algorithms is to gain valuable business insights. Here the point is not to get more data, the point is to get the “right” data.”

Liudmyla Taranenko - Data Science Engineer at MobiDev

Liudmyla Taranenko

Data Science Engineer at MobiDev

The “learning” in machine learning refers to a process in which machines review existing data and learn new skills and knowledge from that data. Machine learning systems use algorithms to find patterns in datasets, which might include structured data, unstructured textual data, numeric data, or even rich media like audio files, images and videos. Machine learning algorithms are computationally intensive, requiring specialized infrastructure to run at large scale.

5 Essential Machine Learning Techniques

Download PDF

Machine Learning Methods

Below are the main disciplines in machine learning. Most machine learning algorithms fall into one of these categories:

Categorization of Machine Learning Algorithms

Source: mathworks.com

1. Supervised Learning

Use supervised learning if you know in advance what you want to teach a machine. This typically requires exposing the algorithm to a huge set of training data, letting the model examine the output, and adjusting the parameters until getting the desired results. You can then test the machine by letting it make predictions for a “validation data set”, or in other words, new unseen data.

Common supervised learning tasks typically implement prediction, regression, or classification. A few examples of applications of supervised learning:

  • In the financial industry, machine learning can analyze historical data and then identify the financial risk of organizations and individuals.
  • In the marketing industry, machine learning can analyze past behavior patterns, then predict customers’ behavior and personalize the experience.
  • In the knowledge management field, machine learning helps classify text into categories.

2. Unsupervised Learning

Unsupervised learning enables a machine to explore a set of data. After the initial exploration, the machine tries to identify hidden patterns that connect different variables. This type of learning can help turn data into groups, based only on statistical properties. Unsupervised learning does not require training on large data sets, and so it is much faster and easier to deploy, compared to supervised learning.

Data scientists spend up to 80% of their time cleaning the gathered data before training the ML model, which is not a guarantee of the entire absence of errors and bias. That is why it is often difficult to reach the ideal data quality and meet all the data standard requirements. And that’s how we came to unsupervised machine learning approaches.”

Liudmyla Taranenko - Data Science Engineer at MobiDev

Liudmyla Taranenko

Data Science Engineer at MobiDev

A few examples of applications of unsupervised learning:

  • In the eCommerce industry, clustering algorithms are used to identify related products or related items for smart recommendation systems.
  • In the cybersecurity industry, machine learning can help identify anomalous activity on computer networks.
  • In social media analysis, machine learning can identify the emotional sentiment or a message, by grouping together messages with similar sentiment or tone.

3. Semi-Supervised Learning

Semi-supervised learning combines techniques from unsupervised and supervised learning. For example, manually labeling some of the data can provide the algorithm with an example on how the rest of the data set should be grouped.

An example application of semi-supervised learning is detecting identity fraud. Supervised learning is used to define what is considered an “anomaly”, and anomalous cases are then categorized using unsupervised learning methods.

4. Reinforcement learning (RL)

Reinforcement learning enables a machine to interact with an environment. A simple example is repeatedly playing a video game, and providing a reward when the algorithm takes the desired action. By repeating the process thousands or millions of times, the machine can eventually learn from its experience.

Reinforcement learning is the best way to simulate human creativity in a machine by running many possible scenarios. The model can even be adapted to complete complex behavioral tasks. It’s an ideal solution for solving all kinds of optimization problems.

Serhii Maksymenko, Data Science Solution Architect

Serhii Maksymenko

Data Science Solution Architect

Reinforcement learning is also applied in deep learning (DL). Deep reinforcement learning is often used for training autonomous decision-making, in cases that require more than what is possible to achieve with supervised learning or unsupervised learning techniques.

AlphaGo is a notable example of reinforcement learning. AlphaGo is an artificial intelligence (AI) engine that uses reinforcement learning to learn how to play the game of Go, possibly the world’s most complex strategy game, with 10^170 possible moves in every turn. The system trained itself repetitively until it managed to defeat the world champion in Go. 

5 Essential Machine Learning Techniques

1. Regression

Regression methods are used for training supervised ML. The goal of regression techniques is typically to explain or predict a specific numerical value while using a previous data set. For example, regression methods can take historical pricing data, and then predict the price of a similar property.

Linear regression is considered the simplest and most basic method. In this case, a dataset is modeled using the following equation: 

( y = m * x + b ) 

It is possible to train a regression model with multiple pairs of data, such as x, y. To do this, you need to define a position, as well as the slope of the line, with a minimal distance from all known data points. This is the line that best approximates the observations in the data, and can help make predictions for new unseen data.

2. Classification

Classification algorithms can explain or predict a class value. Classification is an essential component for many AI applications, but it is especially useful for eCommerce. For example, classification algorithms can help predict if a customer will purchase a product, or not. The two classes in this case are “yes” and “no”. Classification algorithms are not limited to two classes and can be used to classify items into a large number of categories.

Logistic regression is considered the simplest and most basic classification algorithm. A logistic regression algorithm can take more than one input, and use the data to estimate the probability of an event occuring. An interesting use of this algorithm can be seen in predicting university admittance results. The algorithm, in this case, analyzes two test scores to estimate the university admittance probability. 

The output is a probable number between zero and one. The number ‘one’ represents absolute certainty in the admittance of the student, but any number greater than 0.5 predicts the student will be accepted by the university.

3. Clustering

Clustering algorithms are unsupervised learning methods. A few common clustering algorithms are K-means, mean-shift, and expectation-maximization. They group data points according to similar or shared characteristics. 

Grouping or clustering techniques are particularly useful when there is a need to segment or categorize large volumes of data. Examples include segmenting customers by different characteristics to better target marketing campaigns, and recommending news articles that certain readers will enjoy. Clustering is also effective in discovering patterns in complex data sets that may not be obvious to the human eye.

4. Decision Trees

The decision tree algorithm classifies objects by answering “questions” about their attributes located at the nodal points. Depending on the answer, one of the branches is selected, and at the next junction, another question is posed, until the algorithm reaches the tree’s “leaf”, which indicates the final answer. 

Decision tree applications include knowledge management platforms for customer service, predictive pricing, and product planning. 

A typical example of decision trees is identifying the insurance premium that should be charged based on an individual’s situation. The decision tree can define a complex map of criteria such as location, types of insured events, environmental conditions, etc., and determine risk categories based on claims submitted and amounts spent. The system can then evaluate new claims for insurance coverage, categorizing them by risk category and potential financial damage.

5. Neural Networks

Neural networks mimic the structure of the brain: each artificial neuron connects to several other neurons, and together millions of neurons create a complex cognitive structure. Neural networks have a multilayer structure: neurons in one layer transmit data to several neurons on the next, and so on. Ultimately, the data reaches the output layer, where the network makes a decision about how to solve a problem, classify an object, etc. Due to the multi-layer nature of neural networks, their field of study is known as “deep learning”.

Neural networks are used for a wide variety of applications. In healthcare, they are used in the analysis of medical images, to speed up diagnostic procedures and search for drugs. In the telecommunications and media industries, neural networks can be used for machine translation, fraud detection, and virtual assistant services. The financial industry uses them for fraud detection, portfolio management and risk analysis.

Machine Learning Techniques with MobiDev

When working on projects involving machine learning techniques, we have gathered many useful insights. Here we will share some of them.

1. The high quality of prepared data does not always lead to expected results. 

The process of data preparation often requires domain knowledge. Since data science engineers develop products in different domain areas, they are unlikely to know all of them in-depth. Considering this fact, no matter how high the gathered data’s quality is, this data is wrong. Thus, an ML model will not add value for business optimization when processing incorrect data.

A good way to avoid misconception is the utilization of unsupervised machine learning techniques. The idea is not to label data but to let the ML algorithm do it for you. By applying dimensionality reduction, data clustering, anomaly detection, and association mining algorithms, it’s possible not only to determine the well-known data categories but also to find the hidden structures of data and get the more valuable business insights.
Clustering defines a structure in a set of unlabeled data

The example of Clustering algorithm in action

2. Predictive systems often require multiple machine learning techniques.

When developing demand forecasting systems for retail, we used several ML approaches involving regression, time series, random forest, decision tree, and feature engineering. Isn’t that much? – you may be asking. It depends on the expected result. For example, in the case of demand forecasting, the ML model may process different data categories from both internal and external, structured and unstructured. Thus, when it’s not enough to use just one ML technique to get the expected output accuracy, data scientists use more methods until reaching the most accurate results.

Demand Forecasting Models Integration in Retail

One more reason to use several ML techniques is the vulnerability of forecasting systems to anomalies. For example, the COVID-19 pandemic can be deemed by the demand forecasting algorithm as an anomaly. Since forecasting models process historical data mostly, they cannot recognize immediate changes in the demand. Thus, the accuracy of the forecasts will be low. 

There are several ways to avoid the vulnerability of that type:

  1. Utilize the latest POS data
  2. Utilize transfer learning 
  3. Utilize natural language processing (NLP)
  4. Utilize cascade modeling 
  5. Collect the data about new behavior in the market

3. Neural networks integration requires an elaborate software architecture.

The main requirements to those neural networks-based systems that process images and video records are related to the data storage capacity. For example, in manufacturing, AI-based visual inspection systems for defect detection commonly process video records from the production lines. 

The size of video records per one production lot might reach more than 10 GB, depending on the production process’s duration. Thus, it’s required to use an optimal data storage approach. In this case, it’s recommended to use a cloud streaming service, but it’s also possible to consider a local server or serverless architecture.

Machine learning approaches are not limited only to the techniques described in this article. The more sophisticated the use case is, the more advanced techniques are applied. It’s all about business ideas involving machine learning services, but not about inventing new approaches.

5 Essential Machine Learning Techniques

Download PDF
Insights
How to perform 3D human pose estimation in AI fitness coach apps

3D Human Pose Estimation in AI Fitness Coach Apps

Insights
10 AI And Machine Learning Trends To Impact Business In 2020

10 AI and Machine Learning Trends To Impact Business in 2020

Insights
AI-Based Visual Inspection For Defect Detection1

AI-Based Visual Inspection For Defect Detection

Want to get in touch?

contact us