In the past decade or so, Machine Learning and broadly Data Science has taken over the technological front by storm. Almost every tech-enthusiast has or wants to have a piece in it. In 2012 Harvard Business Review called the job of a Data Scientist as “The Sexiest Job of the 21st Century”, and six years hence, it still holds that tag tight and high.
But what makes it so appealing? Let’s have a closer look.
Machine learning refers to the various techniques and ways in which a machine or a computer learns to perform a certain task and get better at it without being explicitly programmed for it.
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
— Tom M. Mitchell (Machine Learning)
Just the idea of a machine to learn by itself to perform tasks has been one of the biggest driving forces for the widespread excitement and enthusiasm around this subfield of Artificial Intelligence (AI), today.
Several exciting recent developments like self-driving cars and virtual gaming bots have kindled the curiosity in this area even more, while several applications and implementations of machine learning like, recommendation systems used by Netflix and Amazon, Google Translate, Voice-Assistants like Siri, Alexa and The Google Assistant are already making our day-to-day lives way more convenient than they used to be a few years back.
Machine Learning Work-flow
Machine learning broadly comprises of the following steps,
- Data Collection: The very first and the most important step is to collect relevant data corresponding to our problem statement. Accurate data collection is essential to maintaining the integrity of our machine learning project.
- Data Pre-Processing: The data gathered from the previous step most probably is not fit to be used by our machine learning algorithm yet, as this data might be incomplete, inconsistent and is likely to contain many errorsand missing values. After taking care of all the inconsistencies, errors and missing data in our dataset we move on to feature engineering. A feature is an attribute or property shared by all of the independent units on which analysis or prediction is to be done. Feature engineering is the process of using domain knowledge of the data to create relevant features that make machine learning algorithms perform well.
- Model Training: The pre-processed data is first divided into mainly two parts i.e. training and testing datasets in the train/test ratio of usually 70/30 or 80/20 for smaller datasets and as our primary dataset increases in size, the test dataset should usually decrease in size with even the train/test ratio of 99/1 for very large datasets. Model training is the process by which a machine learning algorithm takes insights from the training dataset and learns specific parameters over the training period that will minimize the loss or how bad it performs on the training dataset.
- Model Evaluation: After the model is trained, it is then evaluated, using some evaluation metric, on the test dataset, which it has never seen before. By “never seen before” it is meant that the model has not gained any insights from this particular dataset to learn those parameters mentioned above. Hence, the model tries to perform on the test dataset using only the knowledge gained from the training dataset. Some of the most common evaluation metrics are Accuracy score, F1 Score, Mean Absolute Error (MAE) and Mean Squared Error (MSE).
- Performance Improvement: The performance of the model can further be improved on both the training and testing datasets using various techniques like, cross-validation, hyper-parameter tuning or by trying out multiple machine learning algorithms and using the one which performs the best or even better, by using ensembling methods which combine the results from multiple algorithms.
Types of machine learning
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Supervised Learning is the machine learning task in which a relationship is established or more specifically a function is learned between a given set of input features and an output feature using the training dataset. And then this learned function is further used to predict outputs for a new set of inputs from the testing dataset. Supervised Learning operates on labelled training data i.e. for every set of input features we have the corresponding correct outputs to learn the function mapping from.
Supervised Learning is mainly divided into two specific tasks: Classification and Regression.
- Classification: It is the supervised learning task in which each set of input variables is classified or segregated into one of the possible classes as the output, that is, the output will only be represented by a certain discretevalue.
For Example, given the performance of students of a class in previous examinations throughout the year we wish to predict if a particular student will Pass/Fail the final examination. This type of problem is commonly referred to as a binary classification problem as we are trying to predict outputs in form of just two classes, that is, “Pass” or “Fail”. Problems with more than two classes to predict upon are referred to as multi-class classification problems.
2. Regression: It is the supervised learning task in which for each set of input variables the output is represented by continuous values.
Modifying the previous example, given the performance of students of a class in previous examinations throughout the year we wish to predict the percentage scores of students in the final examination. The score is represented as a continuous value, for example, a percentage score of 96.75/100 or 59.25/100.
In contrary to supervised learning, unsupervised learning is the machine learning task in which inferences are drawn from data sets consisting of input data without labelled responses and the main job at hand is to learn the structure and the patterns of the given data. As we do not have labelled data, evaluating the model’s performance is not straightforward anymore unlike in the case of Supervised Learning. One of the most important unsupervised learning algorithms is clustering.
Clustering: It is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
Reinforcement learning (RL) is the machine learning task concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. It follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a correct or a wrong answer, and on the basis of the positive reward points gained the model trains itself. And again once trained it gets ready to predict the new data presented to it.
A child is learning how to walk. He has quite a handful of tasks before he actually is able to walk like standing up, maintaining balance and taking the first steps. Let’s say his mother is standing close to him with chocolates which will be the child’s rewards if he successfully accomplishes a part of these tasks and if he doesn’t, he doesn’t get the reward as well. So, the child learns step by step, to do all those tasks which will fetch him a reward and not do those which will not.
Here, in this analogy, child is the agent, the floor and the reward system (the mother) consist the environment he is interacting with by taking actions i.e. walking and changing from one state to another i.e. one subtask to another, to maximize his positive rewards where, getting a chocolate is a positive reward and not getting a chocolate is a negative reward.
Deep Learning has been one of the biggest contributors to AI moving forward as rapidly as it is today. Deep learning is a subfield of machine learning which makes use of a certain kind of machine learning algorithm known as Artificial Neural Networks (ANNs), vaguely inspired by the human brain. The major attraction of Neural Networks has been their ability to learn complex non-linear representations of input data.
As seen in the figure above, deep learning models tend to perform well with larger amounts of data whereas old machine learning models stop improving after a certain saturation point. It is also seen that larger (Deeper) NNs tend to perform even better, but they incur larger computational costs. So why Deep Learning is taking off now?
It is because of these three major reasons:
- Development of better algorithmic techniques and approaches.
- The abundance of data today, courtesy of the Internet.
- Enhanced computational powers with the advent of GPUs.
Machine learning is growing so fast and so well that it has applications in nearly every other field of study. Big companies are increasingly investing in ML-based solutions to solve problems too difficult or time-consuming for humans to solve. There are plenty of resources available today, that didn’t even exist say, five years ago, across countless platforms in so many different forms, to get one started in machine learning.
There’s never been a better time to get into the field of machine learning and AI in general.