Home

Short Name

Detect email phishing with Watson Natural Language Classifier

Short Description

Train Watson Natural Language Classifier to detect email phishing attempts.

Offering Type

Cognitive

Introduction

In this Code Pattern, we will build an app that classifies email, either labeling it as "Phishing", "Spam", or "Ham" if it does not appear suspicious. We'll be using IBM Watson Natural Language Classifier (NLC) to train a model using email examples from an EDRM Enron email dataset. The custom NLC model can be quickly and easily built in the Web UI, deployed into our nodejs app using the Watson Developer Cloud Nodejs SDK, and then run from a browser.

Author

By Zia Mohamammad

Code

https://github.com/IBM/nlc-email-phishing

Video

https://www.youtube.com/watch?v=vnnUYAi9Zy4

Overview

In this Code Pattern, we will build an app that classifies email, either labeling it as "Phishing", "Spam", or "Ham" if it does not appear suspicious. We'll be using IBM Watson Natural Language Classifier (NLC) to train a model using email examples from an EDRM Enron email dataset. The custom NLC model can be quickly and easily built in the Web UI, deployed into our nodejs app using the Watson Developer Cloud Nodejs SDK, and then run from a browser.

When the reader has completed this Code Pattern, they will understand how to:

Build a Watson Natural Language Classifier model using the Web UI Create a nodejs app that utilizes the NLC model to classify emails as Phishing or not. Use the Watson Developer Cloud SDK for nodejs.

Flow

User interacts with Natural Language Classifier (NLC) GUI to train the model.
EDRM data is loaded to the NLC service to provide sample emails for training.
User sends email text to the application to have it classified.
App uses Watson Natural Language Classifier to determine if text is phishing, spam, or ham.

Included components

Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
Watson Natural Language Classifier: An IBM Cloud service to interpret and classify natural language with confidence.

Featured technologies

Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
Node.js: An open-source JavaScript run-time environment for executing server-side JavaScript code.

Blog

Blog Title

Spam e-mail classification with Watson Natural Language Classifier

Blog Author

Scott D'Angelo and Zia Mohammad

Blog Content

SPAM? Not to be mistaken for the canned meat, these unwanted texts, tweets, and emails that we receive come at a cost. Enterprises around the globe deal with spam messages, the Radicati Research Group stated that spam will cost a business $20.5B, annually, due to decreased productivity. Although, traditional spam classifiers exist, the key differentiation required for enterprise solutions is scalability and the ability to own your own data.

So what can you do? Since we know about Spam and have email examples we can use AI tools to help us determine the nature of email content. IBM Watson Natural Language Classifier (NLC) is a perfect fit for this use case. By providing the training data, we give the NLC service all that is needed to determine which emails are Spam and which are not (which we label "Ham"). The convenient drag-n-drop GUI makes creating the model simple, and then the Watson Developer Cloud SDKs can be used to integrate the service into your application.

Build your own tools Use the open source code to run through this Code Pattern and quickly get up to speed. You can adapt this code for your own purposes, and use other Watson Services to solve your complex problems with simple to use AI tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly