Skip to content

Latest commit

 

History

History
116 lines (83 loc) · 8.92 KB

session_1.md

File metadata and controls

116 lines (83 loc) · 8.92 KB
Note:The assignment 1 is at the end of the document scroll down for instructions

Session 1

labeled vs unlabeled data

Unlabeled data

Typically, unlabeled data consists of samples of natural or human-created artifacts that you can obtain relatively easily from the world. Some examples of unlabeled data might include photos, audio recordings, videos, news articles, tweets, x-rays (if you were working on a medical application), etc. There is no "explanation" for each piece of unlabeled data -- it just contains the data, and nothing else.

Labeled data

Labeled data typically takes a set of unlabeled data and augments each piece of that unlabeled data with some sort of meaningful "tag," "label," or "class" that is somehow informative or desirable to know. For example, labels for the above types of unlabeled data might be whether this photo contains a horse or a cow, which words were uttered in this audio recording, what type of action is being performed in this video, what the topic of this news article is, what the overall sentiment of this tweet is, whether the dot in this x-ray is a tumor, etc.

labeled vs unlabeled data
examples of labeled vs unlabeled data

Supervised Learning

In general, supervised learning occurs when a system is given input and output variables with the intentions of learning how they are mapped together, or related. The goal is to produce an accurate enough mapping function that when new input is given, the algorithm can predict the output.

Training data for supervised learning includes a set of examples with paired input subjects and desired output (which is also referred to as the supervisory signal). For example, in an application of supervised learning for image processing, an AI system might be provided with labeled pictures of vehicles in categories such as cars or trucks. After a sufficient amount of observation, the system should be able to predict from unlabeled data.

In supervised machine learning we train the machine learning model with previously labelled data. Model learns from the previously labeled data and predicts on similar kind of new data(which is not labelled).

The term supervised itself says that model is build up with a supervision before it starts predicting on unforeseen data.

Examples of Supervised Learning:

  • Classification vs Regression: Regression and classification are both related to prediction, where regression predicts a value from a continuous set, whereas classification predicts the 'belonging' to the class.

    For example, the price of a house depending on the 'size' (in some unit) and say 'location' of the house, can be some 'numerical value' (which can be continuous): this relates to regression.

    Similarly, the prediction of price can be in words, viz., 'very costly', 'costly', 'affordable', 'cheap', and 'very cheap': this relates to classification.


Unsupervised Learning

**Please go through what is labelled and ublabelled data to understand unsupervised learning**

Unsupervised learning is the training of an artificial intelligence (AI) algorithm using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance.

In unsupervised learning, an AI system is presented with unlabeled, uncategorised data and the system’s algorithms act on the data without prior training. The output is dependent upon the coded algorithms. Subjecting a system to unsupervised learning is one way of testing AI.

Examples of unsupervised Learning:

  • Clustering: This is used to seperate differnet types of data so that when given a new data the model should be able to tell which group of data it belongs to.


Supervised and Unsupervised Machine Learning Algorithms

Linear Regression

Problem statement:

Let's say we have some data of house size(in sqft) and their corresponding prices.Given We need to figure out how can i use this data to predict a new house price based on this data(the test data will have only size, we won't know the price of it)


So we have training data x and output y. We need to find a h(hypothesis) which will map x to y

Cost function:

It is a function that measures the performance of a Machine Learning model for given data. Cost Function quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways. The purpose of Cost Function is to be either


Linear regression with one variable :

Let's say theta0 is 0, we have only one variable. Let's try to plot a graph of hypothesis vs theta0 and Cost function vs theta0.

[The above image explanation](https://www.youtube.com/watch?v=yR2ipCoFvNo&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=6)

We have a plot of cost function vs theta0 and we can find the value of theta for which cost function is minimum

if we consider both theta0 and theta1, we get a bow shaped cost funciton graph and we try to find values of theta0 and theta1 for which cost function is minimum

Let's talk about gradient descent: let's say we have cost function graph like below. In this case how do we reach to minimum point. Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Parameters refer to coefficients in Linear Regression and weights in neural networks.

This is how they say the math looks like :

This is what happens Actually:

[Interested people can go through the andrew ng lecture of machine learning from 2.1 to 2.8](https://www.youtube.com/watch?v=kHwlB_j7Hkc&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=4)

linear regression explained

play with linear regression online

Good read

Please look for medium articles for linear regression with gradient decent

Assignment_1:

  1. Add a folder Assignment_1 to PIKTORML git repo and Write an md file (There should be only one md file in this folder, Assignment_1a.md (remove any README.md file if exists in Assignment_1 folder)) to describe at least 2 real life examples of supervised/unsupervised learning, clustering,classification problems (explain clearly and elaboratively)
  2. Coding assignment:
    1. Open this link 'Assignment_1b' and make a copy to your google drive. Open it in Google Colab and Do the required as instructed in the file which is to run each cell and add comments and explaining the code.
    2. Open this link 'Assignment_1c' and make a copy to your google drive. Open it in Google Colab and Do the required as instructed in the file.
  3. Save these files in Assignment_2 folder of your local git repo.
  4. Add and Commit them to git
  5. Push the changes to git
  6. Submit the Assignment_1 git folder link to LMS Assignment_1 submit link

Session_1 Video

Session1