Skip to content
Open

Ryan #864

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/1.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
<!--title={Introduction}-->
<!--title={lab tool}-->

## tweepy
## lab tool

For this lab we will utilize the skills you've gained working with APIs to visualize tweets using the **tweepy** Twitter API.
In order to do this project, at first, we should create environment for this lab, and create your id and keys to use the tool.

The idea is simple, given a topic, all hashtags with greater than 5% frequency pertaining to that topic are plotted in a pie graph. All hashtags with less than 5% frequency fall under an "Other" category.
For this lab, we will use Python and the API of Tweet `tweepy`.

Hashtags provide an efficient way of deducing how tweeters feel about the topic they are tweeting about, since Twitter users use hashtags to summarize their tweets, often with more emotion. Therefore hashtags provide a sufficient summary of the tweet - there is a lesser need to process every character and word of a tweet if the hashtags are available.
In order to use the API, we need to get keys from the website of Twitter.

By seeing the most common hashtags associated with a topic, we can evaluate what Twitter users are discussing under the scope of a greater topic and how people feel about the topic at hand. It's easy to get caught in our own echo chambers on social media, and analyzing the most common hashtags across *all* tweets for a certain topic helps us analyze the feelings behind a topic in a more objective manner.
#### Steps:

Here is an example of what we will be aiming to accomplish at the end of this lab:

![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/pieplot.png)
1. Create environment.
2. Authentification.


14 changes: 14 additions & 0 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!--title={Create Environment}-->

## Create Environment

In order to do this project, at first, we should create environment for this lab.

For this lab, we will use Python and the API of Tweet 'Tweepy'.

We will use `pip install` to install all the packages. Be sure to distinguish `pip3` and `pip`.

#### Steps:

1. Access `tweepy` API
2. Load packages
13 changes: 10 additions & 3 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/111.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
<!--title={Accessing the Tweepy API}-->

## Install tweepy and get keys
## Access the Tweepy API

In order to access the `tweepy` api, there are 2 steps we need to do:

1. Install `tweepy` api.

2. Get keys.

#### Install tweepy

If you're on Anaconda installin the Tweepy API is simple, just type the following:

Expand All @@ -10,8 +18,7 @@ conda install tweepy

After installing the API you will also have to create a developer account with Twitter in order to access the API, this process is quick and straightforward. Just click [this](https://developer.twitter.com/en/apply-for-access.html) link to get started.

### Get keys
#### Get keys

In the previous step, when you register you can get important keys for your future exploration. You should clip on the button of "create app" and finish some questions.

![](https://github.com/ryansxl/xshuai/blob/master/111.png?raw=true)
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/112.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Loading Packages}-->

## Import packages
## Load Packages

The following packages will need to be installed in order to complete the necessary functions of the lab. By now you are already familiar with loading Python packages thanks to your previous labs.

Expand Down
8 changes: 5 additions & 3 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/12.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
<!--title={Authentification}-->

## steps
## Authentification

In order to complete the various functions and methods we will perform, we need to login to twitter through our program.

To complete this we need to go through 3 simple steps:
To complete this we need to go through 3 simple steps.

#### Steps:

1. Define your search keys
2. Create the access token to login
3. Finally, accessing the API
3. Access the API
4 changes: 2 additions & 2 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/121.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<!--title={Defining Keys}-->

## Get the keys
## Define Keys

Defining the keys to login is simple, the information you need is all provided when you're developer account is created (The keys you get in card 111md). Type the following to store it in a variable:
Defining the keys to login is simple, the information you need is all provided when your developer account is created (The keys you get in card 111md). Type the following to store it in a variable:

``` python
consumer_key= 'yourkeyhere'
Expand Down
6 changes: 4 additions & 2 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/122.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
<!--title={Creating Access Token}-->

Now we must create our access token, this is a key step to complete the login process.
## Creating Access Token

Now we must create our access token, which is a key step to complete the login process.

First, store the token values in a variable:

Expand All @@ -9,7 +11,7 @@ access_token= 'yourkeyhere'
access_token_secret= 'yourkeyhere'
```

Second, use the OAuthHandler() and set_access_token() methods to create the instance that will allow login.
Second, use the `OAuthHandler()` and `set_access_token()` methods to create the instance that will allow login.

``` python
auth = tw.OAuthHandler(consumer_key, consumer_secret)
Expand Down
7 changes: 6 additions & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/123.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
<!--title={Accessing the API}-->

Now that we have created our access token we can finally access the API, this can be done in a simple line.
## Accessing the API

Now that we have created our access token we can finally access the API, which can be done in a simple line.

``` python
api = tw.API(auth, wait_on_rate_limit=True)
```

`auth` is reponsible for authenticating your access to the API via your keys

`wait_on_rate_limit` will specify whether the function call will wait if you have called the API too many times (instead of quitting), since the API has a limited amount of times you can call the API
13 changes: 11 additions & 2 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/2.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
<!--title={Finding Tweets}-->

## Get data about "climate change"
## Finding Tweets

Now that we've authenticated we're ready to search for tweets. Let's start by searching for all tweets surrounding the topic of climate change. ("climate change" being your query string)

![sample image](https://www.diggitmagazine.com/sites/default/files/styles/inline_image/public/Climate%20change%20photo_1.jpg?itok=2BfiKsqU)
![](./images/earth.png)



In this card, we will get the data of tweet with the topic of "climate change"

#### Steps:

1. Get data of tweet with the topic of "climate change".
2. Store the data.
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/21.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={The -filter method}-->

## Get data about "climate change"
## The -filter method

In order to search for tweets under our desired hashtag, we will use the -filter method to find tweets under the climate change hashtag.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/22.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--title={Using Tweets}-->

## Store the data
## Using Tweets

Now that we've found the recent tweets containg the hashtags that we will eventually analyze, we need to store the tweets in an organized manner for analysis.
2 changes: 2 additions & 0 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/221.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
<!--title={Grabbing 1000 Recent Tweets}-->

## Grabbing 1000 Recent Tweets

For our analysis we need an accurate sample size for credible findings. We will grab 1,000 tweets under the climate change hashtag for our analysis.

To accomplish this we will use the Cursor method to iterate through the tweets, you may remember seeing this method from a previous lab.
Expand Down
2 changes: 2 additions & 0 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/222.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
<!--title={Adding Tweets to a List}-->

## Adding Tweets to a List

Now we can use list comprehension to iterate through our recently found items in a list.

``` python
Expand Down
6 changes: 3 additions & 3 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/3.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!--title={Cleaning Tweets}-->

## Deal with Data
## Cleaning Tweets

As you saw from the output of our lists there are links to the tweets. While this may be nice to track the source of the tweets it will be a hinderance when parsing through the list for analysis.
The output of our lists are links to the tweets. Although this may be nice to track the source of the tweets, it will be a hinderance when parsing through the list for analysis.

We will use regular expressions to accomplish the data cleaning. Throughout the previous labs you have gone through you may by now know that cleaning data is the longest portion of analysis projects.
We will use regular expressions to accomplish the data cleaning. Throughout the previous labs you have gone through, you now know that cleaning data is the first portion of analysis projects.
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/311.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Using Regular Expressions}-->

## Module re
## Using Regular Expressions

You may remember seeing ```import re``` while we were loading our packages earlier. Re stands for ```regular expressions```. Regular expressions are a special syntax that is used to identify patterns in a string.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/312.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={re.sub method}-->

## re.sub
## re.sub method

`re.sub` allows you to substitute a selection of characters defined using a regular expression, with something else.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/313.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Creating a remove_url function}-->

## Create a "remove" function
## Creating a remove_url function

Using the re.sub method we just looked at we can create a function that removes urls from the items of our list.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/32.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Creating a List of Clean Tweets}-->

## Delete URL in data
## Creating a List of Clean Tweets

Now that we have finished removing urls from our tweets we can add them to a list for analysis.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/33.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Addressing Case Issues}-->

## Deal with list
## Addressing Case Issues

Another challenge we will address is capitalization which becomes a challenge with data analysis for text data.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/331.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={.lower() Method}-->

## Make word lowercase
## .lower() Method

To begin to remedy this issue we can make each word lowercase using the string method `.lower()`. In the code below, this method is applied using a list comprehension.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/332.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={set() Method}-->

## Unique list
## set() Method}

Now all of the words in your list are lowercase. You can again use `set()` function to return only unique words.

Expand Down
2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/333.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Creating a List of Lower Case Words from Tweets}-->

## Deal with list
## Creating a List of Lower Case Words from Tweets

Right now, you have a list of lists that contains each full tweet and you know how to lowercase the words.

Expand Down
6 changes: 4 additions & 2 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/4.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
<!--title={Calculating Hashtag Frequency}-->

Now we will incorporate some elementary math to enable us to display the frequencies of each hashtag and plot it as you will see later.
## Calculating Hashtag Frequency

To get the count of how many times each word appears in the sample, you can use the built-in `Python` library `collections`, which helps create a special type of a `Python dictionary.`
Now we will incorporate some elementary math methods which can enable us to display the frequencies of each hashtag and plot it as you will see later.

To get the count of how many times each word appears in the sample, you can use the built-in `Python` library `collections`, which helps us create a special type of a `Python dictionary.`

2 changes: 1 addition & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/5.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<!--title={Plotting Hashtag Frequency}-->

## Visualize the data
## Plot Hashtag Frequency

Now that we have cleaned the data (seemingly) we can plot it to show our findings!

Expand Down
2 changes: 2 additions & 0 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/51.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
<!--title={Using the pd.DataFrame}-->

## Using the pd.DataFrame

Based on the counter, you can create a `Pandas Dataframe` for analysis and plotting that includes only the top 15 most common words.

``` python
Expand Down
4 changes: 3 additions & 1 deletion Module_Twitter_API/labs/Twitter Hashtag Frequency/52.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
<!--title={Creating a Visualization}-->

## Creating a Visualization

Using this `Pandas Dataframe`, you can create a horizontal bar graph of the top 15 most common words in the tweets as shown below.

```python
Expand All @@ -20,5 +22,5 @@ These are simple commands and paramters that we have encountered before. The plo

With that, we are now done! Below is the output of the common words found in our Tweets.

![Imgur](https://i.imgur.com/GloG9zm.png)
![Imgur](./images/result.png)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions Module_Twitter_API/labs/Twitter Hashtag Frequency/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!--title={Introduction}-->

For this lab we will utilize the skills you've gained working with APIs to visualize tweets using the **tweepy** Twitter API.

The idea is simple, given a topic, all hashtags with greater than 5% frequency pertaining to that topic are plotted in a pie graph. All hashtags with less than 5% frequency fall under an "Other" category.

Hashtags provide an efficient way of deducing how tweeters feel about the topic they are tweeting about, since Twitter users use hashtags to summarize their tweets, often with more emotion. Therefore hashtags provide a sufficient summary of the tweet - there is a lesser need to process every character and word of a tweet if the hashtags are available.

By seeing the most common hashtags associated with a topic, we can evaluate what Twitter users are discussing under the scope of a greater topic and how people feel about the topic at hand. It's easy to get caught in our own echo chambers on social media, and analyzing the most common hashtags across *all* tweets for a certain topic helps us analyze the feelings behind a topic in a more objective manner.

In the end of this card, we will visualization the frequency of hashtags in tweets. To achieve this goal, there are serveral steps to do.

#### Steps:

1. Install lab tool

2. Find tweets

3. Clean tweet data

4. Calculate hashtags frequency

5. plot hashtags frequency

#### Examples:

Here is an example of what we will be aiming to accomplish at the end of this lab:

![](./images/pieplot.png)