Fake_News_Classifier

Fake news has become one of the biggest problems of our age. It has serious impact on our online as well as offline discourse. More often we see conflicting facts for the same topic and wondering either it is true or not. The task to classify that either the news is fake or not can be done by using Python and Machine Learning.We can use classifier algorithm to train our model that can predict whether the news is fake or not.

About the Project

This is the project I am working on while learning the concepts of Machine Learning and Data Science.

Aim - The aim of the project is to build a fake news classifier using Natural Language Processing.
We will take a dataset of labeled public-messages and apply classification techniques with frequency vectorizer like Tf-idf vectorizer and Count vectorizer.
In NLP(Natural Language Processing) we face stop words which we will remove using Stemming Technique
We can later test different model like Naive Bayes Model, Random Forest Model and K-NN (K-Nearest Neighbour) for accuracy and performance on unclassified public-messages using both Tf-idf vectorizer and count vectorizer.

Data

I am using dataset from kaggle.com which contain following instances:

id: unique id for a news article
title: the title of a news article
author: author of the news article
text: the text of the article; could be incomplete
label: a label that marks the article as potentially unreliable
Label 1-> represent News is fake
Label 0-> represent News is not fake

Data Preprocessing and Text Cleaning

Data Preprocessing - Before traing the model we have to do data preprocessing that is get the information of data structure and how many data values are null.
Text Cleaning - Then after cleaning of inconsistence data. We have to do text claening i.e. removing all numbers which is attached to the letter, converting all uppercase to lowercase, replacing all \n to spaces and removing all non-Ascii characters.
Removing stop words and stemming the text - In natural language processing, useless words are called stop words which on removing from the sentence does not affect the measning of sentence. Stop words like "a", "an", "the", "in", "on" etc. There is something called Porter Stemming Algorithm that is used to remove common morphological words. For more detail about the algorithm you can refer to the link

Frequency Vectorizer

Tf-idf Vectorizer - TF-IDF stands for “term frequency-inverse document frequency”, meaning the weight assigned to each token not only depends on its frequency in a document but also how recurrent that term is in the entire corpora.
Count Vectorizer - The most straightforward one, it counts the number of times a token shows up in the document and uses this value as its weight.
For more details : Click Here

Model

We are using three models:
1. Naive Bayes Model
2. Random Forest Model
3. K-NN
We use both TfIdf Vectorizer and Count vectorizer to convert our text strings to numerical representations and initialize Naive Based Model, Random Forest Model, K-Nearest Neighbour to fit the model.
At the end we are comapring all different models using Confusion Matrix and Acuuracy score of scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
learning-github		learning-github
.gitignore		.gitignore
Fake_News_Classifier.ipynb		Fake_News_Classifier.ipynb
README.md		README.md
fake_news_classifier.py		fake_news_classifier.py
nlp_utils.py		nlp_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake_News_Classifier

About the Project

Data

Data Preprocessing and Text Cleaning

Frequency Vectorizer

Model

About

Releases

Packages

Languages

Sahil06012002/Fake_news_classifier

Folders and files

Latest commit

History

Repository files navigation

Fake_News_Classifier

About the Project

Data

Data Preprocessing and Text Cleaning

Frequency Vectorizer

Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages