Tutorial
Supervised And Unsupervised Learning
💡
Click here if you would like to modify or contribute
GitHub

Supervised And Unsupervised Learning Introduction

Message from the Writer

Author

Here you will learn about Supervised and Unsupervised Learning. I hope you will enjoy it. If you have any questions, please feel free to ask me in the comment section. I will try to answer your questions as soon as possible. Thank you for reading this article. Have a nice day.

Are you exited to learn about the Machine Learning?

a. Supervised learning

What is supervised learning?

⚠️

In simple term: Supervised learning is a machine learning technique where the model is trained using labeled data. The model learns from the labeled data and then predicts the output for the new data.The labelled data means some input data is already tagged with the correct output.

Supervised learning is a process of providing input data as well as correct output data to the machine learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y).

In the real-world, supervised learning can be used for the following tasks:

  • Image classification
  • Speech recognition
  • Text classification
  • Medical diagnosis
  • Stock market prediction
  • Weather forecasting

Supervised Learning Works

In Supervised Learning, models are trained using labelled dataset where the input data is already tagged with the correct output.Once the training process is completed, the model is tested using the test dataset. The test dataset is the dataset that the model has never seen before. The model is tested on the test dataset to check the accuracy of the model. The accuracy of the model is measured by comparing the predicted output with the actual output.

Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape. For training the model, we need to provide the input data and the correct output data. The input data is the image of the shape and the correct output data is the name of the shape. The model is trained using the input data and the correct output data.

  • If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
  • If the given shape has four sides, and all the sides are not equal, then it will be labelled as a Rectangle.
  • If the given shape has three sides, then it will be labelled as a Triangle.
  • If the given shape has more than four sides, then it will be labelled as a Polygon.
  • If the given shape has less than three sides, then it will be labelled as a None.
  • if the given shape has six equal sides and six equal angles, then it will be labelled as a Hexagon.

The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape on the bases of a number of sides, and predicts the output.

Steps Involved

  • → First Determine the type of training dataset
  • → Collect the training dataset
  • → Split the dataset into training , testing dataset and validation dataset
  • → Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
  • → Train the model using the training dataset
  • → Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters, which are the subset of training datasets.
  • → Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which means our model is accurate.

Types

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction of continuous values. For example, if we want to predict the price of a house, then we can use regression algorithms. The output of the regression algorithm is a continuous value.

Below are some examples of regression algorithms:

  • → Linear Regression
  • → Regression Trees
  • → Non-Linear Regression
  • → Bayesian Linear Regression
  • → Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc. For example, if we want to predict whether the customer will buy the product or not, then we can use classification algorithms. The output of the classification algorithm is a discrete value.

Below are the Spam Filtering,

  • → Random Forest
  • → Decision Trees
  • → Logistic Regression
  • → Support vector Machines

Advantages

  • With the help of supervised learning algorithms, we can predict the future outcomes.
  • In supervised learning, we can have an exact idea about the classes of objects and the relationship between the input and output variables.
  • Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc.
  • Supervised learning algorithms are easy to implement and understand.
  • Supervised learning algorithms are used for both classification and regression problems.

Disadvantages

  • Supervised learning models are not suitable for handling the complex problems.
  • Supervised learning cannot predict the correct output if the test dataset is not similar to the training dataset.
  • Training required lots of computation power and time.
  • In supervised learning, we need enough knowledge about the dataset to train the model.
  • Supervised learning algorithms are not suitable for handling the missing values in the dataset.
  • Supervised learning algorithms are not suitable for handling the noisy data.
  • Supervised learning algorithms are not suitable for handling the outliers in the dataset.

b. Unsupervised Learning

What is Unsupervised learning?

⚠️

In simple term In unsupervised learning, models are trained on unlabeled data and allowed to act on that data without supervision.

The problem with unsupervised learning is that unlike supervised learning, we don't have corresponding output data, only input data.Unsupervised learning is about finding the underlying structure of a dataset, grouping it based on similarity, and represent that dataset in a compressed format.

Example: Suppose we provide a cat and dog dataset to our unsupervised learning model. The algorithm never trained upon the given dataset which means it does not have any idea about the features of the dataset. Unsupervised learning algorithms identify images' features on their own. Clustering the image dataset into groups based on similarities between images will be performed by an unsupervised learning algorithm.

Why We Use ?

  • Unsupervised learning is used to find the hidden patterns in the data.
  • Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to the real AI.
  • Unsupervised learning relies on data that is unlabeled and uncategorized, which makes it more valuable.
  • We sometimes do not have input data corresponding to output in the real world, so we need unsupervised learning to deal with such cases.

Unsupervised Learning Works

Working of unsupervised learning can be understood by the below diagram:

Here you can see we take an unlabeled dataset and train the model on it. The model will try to find the hidden patterns in the dataset.Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms such as k-means clustering, Decision tree, etc.Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and difference between the objects.

Types

1. Clustering:

In clustering, objects are grouped so that objects with similar properties remain in a group and have fewer or no similarities with objects from other groups. The cluster analysis finds commonalities between data objects and categorizes them according to the presence or absence of those commonalities.

Below are some examples of Clustering:

  • → K-Means Clustering
  • → Hierarchical Clustering
  • → Density-Based Clustering
  • → Gaussian Mixture Model

2. Association

An association rule is an unsupervised learning method that is used for finding the relationships between variables in a large database. It determines the set of items that occurs together in the dataset. The association rule improves the effectiveness of marketing strategies. Such as people who buy X items (suppose a bread) also tend to purchase Y (Butter/Jam) items. Market Basket Analysis is a typical example of an association rule.

Below are some examples of Clustering:

  • → Apriori Algorithm
  • → Eclat Algorithm

Algorithms

Below are some of the algorithms used in unsupervised learning:

  • → K-means clustering
  • → KNN (k-nearest neighbors)
  • → Hierarchical clustering
  • → Anomaly detection
  • → Neural Networks
  • → Principle Component Analysis
  • → Independent Component Analysis
  • → Apriori algorithm
  • → Singular value decomposition

Advantages

  • → Unsupervised learning is used to find the hidden patterns in the data.
  • → Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to the real AI.
  • → Unsupervised learning relies on data that is unlabeled and uncategorized, which makes it more valuable.
  • → We sometimes do not have input data corresponding to output in the real world, so we need unsupervised learning to deal with such cases.

Disadvantages

  • → Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output.
  • → Unsupervised learning is not suitable for the prediction of future events.
  • → Unsupervised learning algorithms are less accurate since the input data is not labeled, and algorithms cannot predict the exact output.

Supervised Learning vs Unsupervised Learning

Supervised LearningUnsupervised Learning
Trained with labeled dataTrained with unlabeled data
takes direct feedback to check if it is predicting correct output or notdoes not take direct feedback
predicts the output for the given inputfinds the hidden patterns in data.
input data is provided to the model also with the output dataonly input data is provided to the model.
purpose is when we provide the data model can predict the outputpurpose is when we provide the data model can find the hidden patterns and useful insights from the unknown dataset
Needs supervision to train the modeldoes not need supervision to train the model
Supervised learning can be categorized in Classification and Regression problems.Unsupervised learning can be categorized in Clustering and Association problems.
Can be used for those cases where we know the input as well as corresponding outputs.Can be used for those cases where we know the input but not the corresponding outputs.
Model produces an accurate resultModel produces an approximate result may give less accurate result
Not close to true Artificial intelligence as in this, we first train the model for each data, and then only it can predict the correct outputMore close to the true Artificial Intelligence as it learns similarly as a child learns daily routine things by his experiences.
Linear Regression, Logistic Regression, Support Vector Machine, Multi-class Classification, Decision tree, Bayesian Logic, etc. are some of the examples of Supervised Learning.K-means clustering, KNN (k-nearest neighbors), Hierarchical clustering, Anomaly detection, Neural Networks, Principle Component Analysis, Independent Component Analysis, Apriori algorithm, Singular value decomposition are some of the examples of Unsupervised Learning.