diff --git a/.gitignore b/.gitignore index 0fa494df..c098e050 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,3 @@ - Module4_Labs/.DS_Store Module4_Labs/Lab2_Doubly_Linked_List/.DS_Store Module4_Labs/Lab3_File_System/.DS_Store diff --git a/Module3_Labs/Lab3_Minesweeper/Cards/2.md b/Module3_Labs/Lab3_Minesweeper/Cards/2.md index b40fb616..24043d50 100644 --- a/Module3_Labs/Lab3_Minesweeper/Cards/2.md +++ b/Module3_Labs/Lab3_Minesweeper/Cards/2.md @@ -6,15 +6,15 @@ Here are a couple examples of how your output should look: -![Capture](C:\Users\kevin\Documents\Programming\Darlene\Rewritten\Minesweeper\Minesweeper\Capture.PNG) +![Capture](./Capture.PNG) After the first move "55", then the board looks like this: -![Capture(2)](C:\Users\kevin\Documents\Programming\Darlene\Rewritten\Minesweeper\Minesweeper\Capture(2).PNG) +![Capture(2)](./Capture(2).PNG) A little further down the game, I have made some more moves, and I know a flag is at "36". If I want to place a flag there, it would look like this: -![Capture(3)](C:\Users\kevin\Documents\Programming\Darlene\Rewritten\Minesweeper\Minesweeper\Capture(3).PNG) +![Capture(3)](./Capture(3).PNG) Notice the "F" at column 3, row 6. diff --git a/Module4.2_Intermediate_Data_Structures/activities/Act3_Binary Heaps/1.md b/Module4.2_Intermediate_Data_Structures/activities/Act3_Binary Heaps/1.md index fc2c2790..7061aa1e 100644 --- a/Module4.2_Intermediate_Data_Structures/activities/Act3_Binary Heaps/1.md +++ b/Module4.2_Intermediate_Data_Structures/activities/Act3_Binary Heaps/1.md @@ -54,5 +54,3 @@ class BinHeap: - - diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/1.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/1.md new file mode 100644 index 00000000..a25f4583 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/1.md @@ -0,0 +1,14 @@ + + +For this lab, we will be analyzing tweets using Twitter's API and `tweepy` package in Python by creating a Word Cloud to observe the frequency of words being used in a celebrity's tweets. + +Analyzing the words used by a celebrity in an increasingly cluttered social media world has many uses. + +In this day and age, having a prominent social media presence can mean the difference for celebrities' public persona. A celebrity can use social media to generate excitement from millions of fans on Twitter, and if done right, propel their fame to new heights. Conversely, celebrities have to be careful about what they post on a site like Twitter, because one offensive tweet will get viral for the wrong reasons and destroy their reputation, not just in social media but in real life as well. Therefore, the words that celebrities choose when tweeting are vitally important, to cultivate an online persona and propel their own fame. Seeing a word cloud of the words in their tweets can start to help us find common trends in their tweets and determine what kind of persona they wish to conjure on social media. + +To do this in Python using `tweepy`, you should start off by importing our necessary libraries. The libraries we will be using are: + +`tweepy`, `pandas`, `sys`, `csv`, `WordCloud` and `STOPWORDS` from `wordcloud`, `matplotlib`, `matplotlib.pyplot`, `string`, `re`, `PIL` + +The next thing we want is to be able to use the python-twitter API client. To do that, you need to acquire a declare a set of application tokens. Name the tokens `consumer_key`, `consumer_secret`, `access_token_key`, and `access_token_secret`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/11.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/11.md new file mode 100644 index 00000000..4c384a8d --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/11.md @@ -0,0 +1,24 @@ + + +Import the following libraries: + +- `tweepy` + +- `pandas` + +- `sys` + +- `csv` + +- `WordCloud` and `STOPWORDS` from `wordcloud` + +- `matplotlib` + +- `matplotlib.pyplot` + +- `string` + +- `re` + +- `PIL` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/111.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/111.md new file mode 100644 index 00000000..713c4a5f --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/111.md @@ -0,0 +1,36 @@ + + +An example of importing libraries is as below: + +```python +import math +``` + +You can import a library and give it a preferred name, as below: + +```python +import math as mt +``` + +Import the following libraries: + +- `tweepy` + +- `pandas` + +- `sys` + +- `csv` + +- `WordCloud` and `STOPWORDS` from `wordcloud` + +- `matplotlib` + +- `matplotlib.pyplot` + +- `string` + +- `re` + +- `PIL` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/12.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/12.md new file mode 100644 index 00000000..5ba73d1a --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/12.md @@ -0,0 +1,7 @@ + + +Head over to your Twitter Developer Account and create an app. After your app is created, you will see a new page that shows all the information you need. + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part2.png) + +Copy the information needed and declare `consumer_key`, `consumer_secret`, `access_token_key`, and `access_token_secret`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/121.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/121.md new file mode 100644 index 00000000..574b7cdd --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/121.md @@ -0,0 +1,6 @@ + + +Head over to your Twitter Developer Account and create an app. Fill out the fields on the next page that looks like this: + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part1.png) + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/122.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/122.md new file mode 100644 index 00000000..4af68c35 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/122.md @@ -0,0 +1,17 @@ + + +After your app is created, you will see a new page that shows all the information you need. + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part2.png) + +Copy the information needed and declare the keys as follows: + +```python +consumer_key = '' +consumer_secret = '' +access_token_key = '' +access_token_secret = '' +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/2.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/2.md new file mode 100644 index 00000000..1f43d941 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/2.md @@ -0,0 +1,8 @@ + + +The next thing we want to do is to create a function to extract tweets. Define a function `get_tweets(username)` that obtains the tweets from the user with username indicated in the parenthesis. + +In our `get_tweets(username)` function, we first get authorization to our consumer key and consumer secret by declaring the variable `auth` using the `tweepy.OAuthHandler` function imported from `tweepy`. Then, we want to access to the user's access key and access secret by using the`auth.set_acess_token` function. + +After we are done with that, we call the API by declaring the variable `api` using the `tweepy.API` function. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/21.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/21.md new file mode 100644 index 00000000..178e89d5 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/21.md @@ -0,0 +1,7 @@ + + +Define a function called `get_tweet()` which takes the parameter `username`. Then, gain authorization to the consumer key and consumer secret by using the function `OAuthHandler()` from the `tweepy` library. + +The next thing we want to do is to gain access to the access key and access secret. We do so by using the `set_access_token()` function. + +Once we are done with the authorization procedure, we move on by calling the API by calling `tweepy.API()`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/211.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/211.md new file mode 100644 index 00000000..f0f50ef6 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/211.md @@ -0,0 +1,20 @@ + + +Define a function called `get_tweet()` which takes the parameter `username` as below: + +```python +def get_tweet(username): +``` + +Under that function, we gain authorization to the consumer key and consumer secret by using the function `OAuthHandler()` from the `tweepy` library as below: + +```python +auth = tweepy.OAuthHandler(consumer_key,consumer_secret) +``` + +After that, we want to gain access to the access key and the access secret. Use the `set_access_token()` function to do the following: + +```python +auth.set_access_token(access_token_key,access_token_secret) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/212.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/212.md new file mode 100644 index 00000000..409c22de --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/212.md @@ -0,0 +1,8 @@ + + +The next thing we want to move on to is to call the Twitter API. Do so by the following: + +```python +api = tweepy.API(auth) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/3.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/3.md new file mode 100644 index 00000000..96345795 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/3.md @@ -0,0 +1,9 @@ + + +For our next step, we want to obtain a number of tweets from the user and write it to a new csv file from the list of tweets. + +To do that, we first declare an empty list and name it `tfile`. Then, create a for loop to access the items in `tweepy.Cursor()` and append tweet data into the `tfile` list. The information that we want to append into `tfile` are `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. + +Once we have all our data we need in our `tfile` list, we copy them into a new csv file. Declare a variable `outfile` that names our new .csv file. Copy the data from `tfile` into the .csv file by using the `open` and `writerow` functions. + +Once we are done with this, we have completed our `get_tweet()` function and we can move on to define our main function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/31.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/31.md new file mode 100644 index 00000000..7468015e --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/31.md @@ -0,0 +1,4 @@ + + +Create a list called `tfile`. Then, create a `for` loop to access the items in the user's timeline by calling `tweepy.Cursor(api.user_timeline,screen_name= username).items()`. Within the `for` loop, use the `append()` function on `tfile` to append `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/311.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/311.md new file mode 100644 index 00000000..d6fee25b --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/311.md @@ -0,0 +1,21 @@ + + +Create a list and name it `tfile` by doing the following: + +```python +tfile = [] +``` + +Then, use a for loop to access the items in the user's timeline: + +```python +for tweet in tweepy.Cursor(api.user_timeline,screen_name=username).itms(): +``` + +In the for loop, append the `tfile` with the append function: + +```python +tfile.append() +``` + +The data that we want to append are `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/32.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/32.md new file mode 100644 index 00000000..fe884e5c --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/32.md @@ -0,0 +1,9 @@ + + +After obtaining tweet data into `tfile`, we want to copy the data into a .csv file. To do so, we create a .csv file and open it by using the following code: + +```python +with open(file,'w+') as file: +``` + +Then, copy the data from `tfile` by using the `writerows(tfile)` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/321.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/321.md new file mode 100644 index 00000000..9d428e0b --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/321.md @@ -0,0 +1,3 @@ + + +To create a .csv file, we declare a variable `outfile` and store the name of the .csv file as `username + "_tweets_V1.csv"`. Then type `print("writing to " + outfile)` in the following line to make sure that we are writing to our .csv file. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/322.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/322.md new file mode 100644 index 00000000..843bb4de --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/322.md @@ -0,0 +1,18 @@ + + +Open and write in the .csv file by using the following line of code: + +```python +with open(outfile,'w+') as file: +``` + +Under the opened file, use the `csv.writer(file,delimeter)`function to specify how our data should be separated. In this case, we want them to be separated by a comma. Declare this function in a variable called `writer`. + +Using `writer`, we want to write in our .csv file. To make our data tidy and easy to understand, we write the categories on the first row of the .csv file and then add the data from `tfile` in the rows below it as shown: + +```python +writer.writerow(['User_Name','Tweet_ID','Source','Created_date','Retweet_count', + 'Favourite_count','Tweet']) +writer.writerow(tfile) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/4.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/4.md new file mode 100644 index 00000000..4fb16204 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/4.md @@ -0,0 +1,12 @@ + + +The next thing we will move on to is to create our main function. This function utilizes the tweets we obtained into a .csv file in the previous function, cleanse them, and output a Wordcloud based on the highest number of repeated words. + +To do that, we first define our `main()` function. In that function, we start by obtaining the tweet-filled .csv file with the `get_tweets()` function we defined earlier. + +Then, you should pick a celebrity that you want to examine the tweets of, and pass the company's Twitter handle into `get_tweets()`. + +Then, use the `read_csv()` function from the `pandas`library. + +Please leave the code utilizing the `re` library inside of `main()`. Also please make sure you name your object from `read_csv` "bg". Bear in mind the "cleaned" data will be in the DataFrame "bg3". + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/41.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/41.md new file mode 100644 index 00000000..e24d4367 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/41.md @@ -0,0 +1,3 @@ + + +The next step we want to do is to define a `main()` function and do the rest of our tasks there. To read the .csv file that we generated from the `get_tweets()` function, we declare a variable `bg` that calls the function `read_csv()` from the `pandas` library. Print out the first 5 rows of data from `bg` using the `head()` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/411.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/411.md new file mode 100644 index 00000000..d6b0fb37 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/411.md @@ -0,0 +1,16 @@ + + +The next thing we want to do is to define a `main()` function as follows: + +```python +def main(): +``` + + + +To generate the .csv file under the `main()` function that obtains tweets from a certain user, call the function `get_tweets()` as below: + +```python +get_tweets() +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/412.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/412.md new file mode 100644 index 00000000..49648350 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/412.md @@ -0,0 +1,14 @@ + + +Read the .csv file generated by the `get_tweets()` function by declaring a variable that calls the `read_csv()` function. The `read_csv()` function works like this: + +```python +bg = pd.read_csv(,encoding='utf-8') +``` + +Print the first `n` rows from your .csv file to make sure everything is going smoothly by using the `print()` and `head()` function like this: + +```python +print(bg.head(n)) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/42.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/42.md new file mode 100644 index 00000000..72c0962e --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/42.md @@ -0,0 +1,3 @@ + + +After we have obtained our cleansed tweets in `bg2`, we create a new variable `bg3` that makes `bg2` into a data frame using the `DataFrame` function from the `pandas` library. Print out `bg3` to check for the right data frame output. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/421.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/421.md new file mode 100644 index 00000000..69045e59 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/421.md @@ -0,0 +1,9 @@ + + +After obtaining our cleansed tweets in `bg2`, we create a new variable `bg3` to form a data frame for `bg2` as follows: + +```python +bg3 = pd.DataFrame(bg2, columns = ['tweet']) +``` + +Additionally, print out `bg3` to make sure we have the data frame output that we want. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/5.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/5.md new file mode 100644 index 00000000..f9b27878 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/5.md @@ -0,0 +1,3 @@ + + +Our last main step is to create a Wordcloud based on the data frame of cleansed tweets. To do that, use the functions from `matplotlib`, `wordcloud`, and `matplotlib.pyplot` libraries. After that, compile your entire code and you are done! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/51.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/51.md new file mode 100644 index 00000000..9daca4b0 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/51.md @@ -0,0 +1,4 @@ + + +We start by setting the parameters of our Wordcloud plot. Use the `rcParams()` function from the `matplotlib` library to do so. The parameters we want to set are `figure.figsize`, `font.size`, `savefig.dpi`, and `figure.subplot.bottom`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/511.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/511.md new file mode 100644 index 00000000..55ab9579 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/511.md @@ -0,0 +1,9 @@ + + +We start by setting the parameters of our Wordcloud plot. Use the `rcParams()` function from the `matplotlib` library to do so, as follows: + +```python +mpl.rcParams[''] = +``` + +The parameters we want to set are `figure.figsize`, `font.size`, `savefig.dpi`, and `figure.subplot.bottom`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/52.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/52.md new file mode 100644 index 00000000..1f5f0007 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/52.md @@ -0,0 +1,7 @@ + + +The next thing we want to do is to create the Wordcloud using `STOPWORDS` and `WordCloud` from the `wordcloud` library. + +Set the stopwords using `set(STOPWORDS)`. Then, create a variable `text` to join all the tweets in `bg3`. + +Create the wordcloud using the function `WordCloud().generate(str(text))`. The parameters we want to edit in the `WordCloud()` function are `background_color`, `stopwords`, `max_words`, `max_font_size`, and `random_state`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/521.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/521.md new file mode 100644 index 00000000..53e41333 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/521.md @@ -0,0 +1,16 @@ + + +Now, we want to create the Wordcloud using `STOPWORDS` and `WordCloud` from the `wordcloud` library. We start by setting the stopwords, as follows: + +```python +stopwords = set(STOPWORDS) +``` + +Next, create a variable `text` that joins all the tweets in `bg3`, separated with a space: + +```python +text = " ".join() +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/522.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/522.md new file mode 100644 index 00000000..c9a18ce8 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/522.md @@ -0,0 +1,21 @@ + + +We want to set the parameters in our wordcloud by doing the following: + +```python +cloud = WordCloud( + background_color = , + stopwords = stopwords, + max_words = , + max_font_size = , + random_state = ) +``` + +After that, we generate the wordcloud as follows: + +```python +wordcloud = cloud.generate(str(text)) +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/53.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/53.md new file mode 100644 index 00000000..a960cafd --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/53.md @@ -0,0 +1,10 @@ + + +After we have created our Wordcloud, we want to display it. To do so, we use functions from the `matplotlib.pyplot` library below: + +- `matplotlib.pyplot.figure()` +- `matplotlib.pyplot.imshow()` +- `matplotlib.pyplot.axis()` +- `matplotlib.pyplot.show()` + +Once you have done the above, you can choose to add another line of code to save the Wordcloud you generated with the `savefig()` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/531.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/531.md new file mode 100644 index 00000000..c4f5bb76 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/531.md @@ -0,0 +1,22 @@ + + +After we have created our Wordcloud, we want to display it. To do so, we use functions from the `matplotlib.pyplot` library. We start by plotting a figure: + +```python +fig = matplotlib.pyplot.figure(1) +``` + +Then, we adjust some parameters and display our wordcloud by doing the following: + +```python +matplotlib.pyplot.imshow() +matplotlib.pyplot.axis('off') +matplotlib.pyplot.show() +``` + +Once you have done the above, you can choose to add another line of code to save the Wordcloud you generated with the `savefig()` function as shown: + +```python +fig.savefig("",dpi=1400) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/54.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/54.md new file mode 100644 index 00000000..890c6e96 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/54.md @@ -0,0 +1,3 @@ + + +Now that we are done with all our code, we can compile it all together and run it. Congratulations for successfully generating a wordcloud to visualize tweets! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/541.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/541.md new file mode 100644 index 00000000..caee57af --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Celebrities/541.md @@ -0,0 +1,28 @@ + + +Compile your code in the following manner and you are done! Congratulations! + +```python +#import your libraries +import + +#declare your keys +consumer_key = +consumer_secret = +access_token_key = +access_token_secret = + +#Function to extract tweets +def get_tweets(username): + + + +#Function to generate Wordcloud +def main(): + + + +#Call the main() function +main() +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/1.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/1.md new file mode 100644 index 00000000..02db1ccd --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/1.md @@ -0,0 +1,12 @@ + + +For this lab, we will be analyzing tweets using Twitter's API and `tweepy` package in Python by creating a Word Cloud to observe the frequency of words being used in a company's tweets. + +Analyzing the words used by a company in an increasingly cluttered social media world has many uses. Companies use social media to generate excitement around their products and increase awareness of their company, so the words they choose in their tweets can provide lots of insight into the values of marketing strategies and values of companies on Twitter. How do companies advertise their products? How often do they attack the products of other companies? What kind of feelings are they trying to conjure in relation to their products? Seeing a word cloud of their most common words is an easy way to help us paint a picture of a company's marketing on Twitter. + +To do this in Python using `tweepy`, you should start off by importing our necessary libraries. The libraries we will be using are: + +`tweepy`, `pandas`, `sys`, `csv`, `WordCloud` and `STOPWORDS` from `wordcloud`, `matplotlib`, `matplotlib.pyplot`, `string`, `re`, `PIL` + +The next thing we want is to be able to use the python-twitter API client. To do that, you need to acquire a declare a set of application tokens. Name the tokens `consumer_key`, `consumer_secret`, `access_token_key`, and `access_token_secret`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/11.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/11.md new file mode 100644 index 00000000..4c384a8d --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/11.md @@ -0,0 +1,24 @@ + + +Import the following libraries: + +- `tweepy` + +- `pandas` + +- `sys` + +- `csv` + +- `WordCloud` and `STOPWORDS` from `wordcloud` + +- `matplotlib` + +- `matplotlib.pyplot` + +- `string` + +- `re` + +- `PIL` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/111.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/111.md new file mode 100644 index 00000000..713c4a5f --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/111.md @@ -0,0 +1,36 @@ + + +An example of importing libraries is as below: + +```python +import math +``` + +You can import a library and give it a preferred name, as below: + +```python +import math as mt +``` + +Import the following libraries: + +- `tweepy` + +- `pandas` + +- `sys` + +- `csv` + +- `WordCloud` and `STOPWORDS` from `wordcloud` + +- `matplotlib` + +- `matplotlib.pyplot` + +- `string` + +- `re` + +- `PIL` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/12.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/12.md new file mode 100644 index 00000000..5ba73d1a --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/12.md @@ -0,0 +1,7 @@ + + +Head over to your Twitter Developer Account and create an app. After your app is created, you will see a new page that shows all the information you need. + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part2.png) + +Copy the information needed and declare `consumer_key`, `consumer_secret`, `access_token_key`, and `access_token_secret`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/121.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/121.md new file mode 100644 index 00000000..574b7cdd --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/121.md @@ -0,0 +1,6 @@ + + +Head over to your Twitter Developer Account and create an app. Fill out the fields on the next page that looks like this: + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part1.png) + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/122.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/122.md new file mode 100644 index 00000000..4af68c35 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/122.md @@ -0,0 +1,17 @@ + + +After your app is created, you will see a new page that shows all the information you need. + +![alt](https://python-twitter.readthedocs.io/en/latest/_images/python-twitter-app-creation-part2.png) + +Copy the information needed and declare the keys as follows: + +```python +consumer_key = '' +consumer_secret = '' +access_token_key = '' +access_token_secret = '' +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/2.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/2.md new file mode 100644 index 00000000..1f43d941 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/2.md @@ -0,0 +1,8 @@ + + +The next thing we want to do is to create a function to extract tweets. Define a function `get_tweets(username)` that obtains the tweets from the user with username indicated in the parenthesis. + +In our `get_tweets(username)` function, we first get authorization to our consumer key and consumer secret by declaring the variable `auth` using the `tweepy.OAuthHandler` function imported from `tweepy`. Then, we want to access to the user's access key and access secret by using the`auth.set_acess_token` function. + +After we are done with that, we call the API by declaring the variable `api` using the `tweepy.API` function. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/21.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/21.md new file mode 100644 index 00000000..178e89d5 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/21.md @@ -0,0 +1,7 @@ + + +Define a function called `get_tweet()` which takes the parameter `username`. Then, gain authorization to the consumer key and consumer secret by using the function `OAuthHandler()` from the `tweepy` library. + +The next thing we want to do is to gain access to the access key and access secret. We do so by using the `set_access_token()` function. + +Once we are done with the authorization procedure, we move on by calling the API by calling `tweepy.API()`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/211.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/211.md new file mode 100644 index 00000000..f0f50ef6 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/211.md @@ -0,0 +1,20 @@ + + +Define a function called `get_tweet()` which takes the parameter `username` as below: + +```python +def get_tweet(username): +``` + +Under that function, we gain authorization to the consumer key and consumer secret by using the function `OAuthHandler()` from the `tweepy` library as below: + +```python +auth = tweepy.OAuthHandler(consumer_key,consumer_secret) +``` + +After that, we want to gain access to the access key and the access secret. Use the `set_access_token()` function to do the following: + +```python +auth.set_access_token(access_token_key,access_token_secret) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/212.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/212.md new file mode 100644 index 00000000..409c22de --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/212.md @@ -0,0 +1,8 @@ + + +The next thing we want to move on to is to call the Twitter API. Do so by the following: + +```python +api = tweepy.API(auth) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/3.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/3.md new file mode 100644 index 00000000..96345795 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/3.md @@ -0,0 +1,9 @@ + + +For our next step, we want to obtain a number of tweets from the user and write it to a new csv file from the list of tweets. + +To do that, we first declare an empty list and name it `tfile`. Then, create a for loop to access the items in `tweepy.Cursor()` and append tweet data into the `tfile` list. The information that we want to append into `tfile` are `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. + +Once we have all our data we need in our `tfile` list, we copy them into a new csv file. Declare a variable `outfile` that names our new .csv file. Copy the data from `tfile` into the .csv file by using the `open` and `writerow` functions. + +Once we are done with this, we have completed our `get_tweet()` function and we can move on to define our main function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/31.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/31.md new file mode 100644 index 00000000..7468015e --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/31.md @@ -0,0 +1,4 @@ + + +Create a list called `tfile`. Then, create a `for` loop to access the items in the user's timeline by calling `tweepy.Cursor(api.user_timeline,screen_name= username).items()`. Within the `for` loop, use the `append()` function on `tfile` to append `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/311.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/311.md new file mode 100644 index 00000000..d6fee25b --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/311.md @@ -0,0 +1,21 @@ + + +Create a list and name it `tfile` by doing the following: + +```python +tfile = [] +``` + +Then, use a for loop to access the items in the user's timeline: + +```python +for tweet in tweepy.Cursor(api.user_timeline,screen_name=username).itms(): +``` + +In the for loop, append the `tfile` with the append function: + +```python +tfile.append() +``` + +The data that we want to append are `username`, `tweet.id_str`, `tweet.source`, `tweet.created_at`, `tweet.retweet_count`, `tweet.favourite_count`, and `tweet.text.encode("utf-8")`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/32.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/32.md new file mode 100644 index 00000000..fe884e5c --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/32.md @@ -0,0 +1,9 @@ + + +After obtaining tweet data into `tfile`, we want to copy the data into a .csv file. To do so, we create a .csv file and open it by using the following code: + +```python +with open(file,'w+') as file: +``` + +Then, copy the data from `tfile` by using the `writerows(tfile)` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/321.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/321.md new file mode 100644 index 00000000..9d428e0b --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/321.md @@ -0,0 +1,3 @@ + + +To create a .csv file, we declare a variable `outfile` and store the name of the .csv file as `username + "_tweets_V1.csv"`. Then type `print("writing to " + outfile)` in the following line to make sure that we are writing to our .csv file. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/322.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/322.md new file mode 100644 index 00000000..843bb4de --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/322.md @@ -0,0 +1,18 @@ + + +Open and write in the .csv file by using the following line of code: + +```python +with open(outfile,'w+') as file: +``` + +Under the opened file, use the `csv.writer(file,delimeter)`function to specify how our data should be separated. In this case, we want them to be separated by a comma. Declare this function in a variable called `writer`. + +Using `writer`, we want to write in our .csv file. To make our data tidy and easy to understand, we write the categories on the first row of the .csv file and then add the data from `tfile` in the rows below it as shown: + +```python +writer.writerow(['User_Name','Tweet_ID','Source','Created_date','Retweet_count', + 'Favourite_count','Tweet']) +writer.writerow(tfile) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/4.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/4.md new file mode 100644 index 00000000..93a84433 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/4.md @@ -0,0 +1,12 @@ + + +The next thing we will move on to is to create our main function. This function utilizes the tweets we obtained into a .csv file in the previous function, cleanse them, and output a Wordcloud based on the highest number of repeated words. + +To do that, we first define our `main()` function. In that function, we start by obtaining the tweet-filled .csv file with the `get_tweets()` function we defined earlier. + +Then, you should pick a company that you want to examine the marketing strategy of, and pass the company's Twitter handle into `get_tweets()`. + +Then, use the `read_csv()` function from the `pandas`library. + +Please leave the code utilizing the `re` library inside of `main()`. Also please make sure you name your object from `read_csv` "bg". Bear in mind the "cleaned" data will be in the DataFrame "bg3". + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/41.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/41.md new file mode 100644 index 00000000..e24d4367 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/41.md @@ -0,0 +1,3 @@ + + +The next step we want to do is to define a `main()` function and do the rest of our tasks there. To read the .csv file that we generated from the `get_tweets()` function, we declare a variable `bg` that calls the function `read_csv()` from the `pandas` library. Print out the first 5 rows of data from `bg` using the `head()` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/411.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/411.md new file mode 100644 index 00000000..d6b0fb37 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/411.md @@ -0,0 +1,16 @@ + + +The next thing we want to do is to define a `main()` function as follows: + +```python +def main(): +``` + + + +To generate the .csv file under the `main()` function that obtains tweets from a certain user, call the function `get_tweets()` as below: + +```python +get_tweets() +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/412.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/412.md new file mode 100644 index 00000000..49648350 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/412.md @@ -0,0 +1,14 @@ + + +Read the .csv file generated by the `get_tweets()` function by declaring a variable that calls the `read_csv()` function. The `read_csv()` function works like this: + +```python +bg = pd.read_csv(,encoding='utf-8') +``` + +Print the first `n` rows from your .csv file to make sure everything is going smoothly by using the `print()` and `head()` function like this: + +```python +print(bg.head(n)) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/42.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/42.md new file mode 100644 index 00000000..72c0962e --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/42.md @@ -0,0 +1,3 @@ + + +After we have obtained our cleansed tweets in `bg2`, we create a new variable `bg3` that makes `bg2` into a data frame using the `DataFrame` function from the `pandas` library. Print out `bg3` to check for the right data frame output. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/421.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/421.md new file mode 100644 index 00000000..69045e59 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/421.md @@ -0,0 +1,9 @@ + + +After obtaining our cleansed tweets in `bg2`, we create a new variable `bg3` to form a data frame for `bg2` as follows: + +```python +bg3 = pd.DataFrame(bg2, columns = ['tweet']) +``` + +Additionally, print out `bg3` to make sure we have the data frame output that we want. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/5.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/5.md new file mode 100644 index 00000000..f9b27878 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/5.md @@ -0,0 +1,3 @@ + + +Our last main step is to create a Wordcloud based on the data frame of cleansed tweets. To do that, use the functions from `matplotlib`, `wordcloud`, and `matplotlib.pyplot` libraries. After that, compile your entire code and you are done! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/51.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/51.md new file mode 100644 index 00000000..9daca4b0 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/51.md @@ -0,0 +1,4 @@ + + +We start by setting the parameters of our Wordcloud plot. Use the `rcParams()` function from the `matplotlib` library to do so. The parameters we want to set are `figure.figsize`, `font.size`, `savefig.dpi`, and `figure.subplot.bottom`. + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/511.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/511.md new file mode 100644 index 00000000..55ab9579 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/511.md @@ -0,0 +1,9 @@ + + +We start by setting the parameters of our Wordcloud plot. Use the `rcParams()` function from the `matplotlib` library to do so, as follows: + +```python +mpl.rcParams[''] = +``` + +The parameters we want to set are `figure.figsize`, `font.size`, `savefig.dpi`, and `figure.subplot.bottom`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/52.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/52.md new file mode 100644 index 00000000..1f5f0007 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/52.md @@ -0,0 +1,7 @@ + + +The next thing we want to do is to create the Wordcloud using `STOPWORDS` and `WordCloud` from the `wordcloud` library. + +Set the stopwords using `set(STOPWORDS)`. Then, create a variable `text` to join all the tweets in `bg3`. + +Create the wordcloud using the function `WordCloud().generate(str(text))`. The parameters we want to edit in the `WordCloud()` function are `background_color`, `stopwords`, `max_words`, `max_font_size`, and `random_state`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/521.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/521.md new file mode 100644 index 00000000..53e41333 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/521.md @@ -0,0 +1,16 @@ + + +Now, we want to create the Wordcloud using `STOPWORDS` and `WordCloud` from the `wordcloud` library. We start by setting the stopwords, as follows: + +```python +stopwords = set(STOPWORDS) +``` + +Next, create a variable `text` that joins all the tweets in `bg3`, separated with a space: + +```python +text = " ".join() +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/522.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/522.md new file mode 100644 index 00000000..c9a18ce8 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/522.md @@ -0,0 +1,21 @@ + + +We want to set the parameters in our wordcloud by doing the following: + +```python +cloud = WordCloud( + background_color = , + stopwords = stopwords, + max_words = , + max_font_size = , + random_state = ) +``` + +After that, we generate the wordcloud as follows: + +```python +wordcloud = cloud.generate(str(text)) +``` + + + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/53.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/53.md new file mode 100644 index 00000000..a960cafd --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/53.md @@ -0,0 +1,10 @@ + + +After we have created our Wordcloud, we want to display it. To do so, we use functions from the `matplotlib.pyplot` library below: + +- `matplotlib.pyplot.figure()` +- `matplotlib.pyplot.imshow()` +- `matplotlib.pyplot.axis()` +- `matplotlib.pyplot.show()` + +Once you have done the above, you can choose to add another line of code to save the Wordcloud you generated with the `savefig()` function. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/531.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/531.md new file mode 100644 index 00000000..c4f5bb76 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/531.md @@ -0,0 +1,22 @@ + + +After we have created our Wordcloud, we want to display it. To do so, we use functions from the `matplotlib.pyplot` library. We start by plotting a figure: + +```python +fig = matplotlib.pyplot.figure(1) +``` + +Then, we adjust some parameters and display our wordcloud by doing the following: + +```python +matplotlib.pyplot.imshow() +matplotlib.pyplot.axis('off') +matplotlib.pyplot.show() +``` + +Once you have done the above, you can choose to add another line of code to save the Wordcloud you generated with the `savefig()` function as shown: + +```python +fig.savefig("",dpi=1400) +``` + diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/54.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/54.md new file mode 100644 index 00000000..890c6e96 --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/54.md @@ -0,0 +1,3 @@ + + +Now that we are done with all our code, we can compile it all together and run it. Congratulations for successfully generating a wordcloud to visualize tweets! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/541.md b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/541.md new file mode 100644 index 00000000..caee57af --- /dev/null +++ b/Module_Twitter_API/labs/Week 1/Visualizing Tweets - Companies/541.md @@ -0,0 +1,28 @@ + + +Compile your code in the following manner and you are done! Congratulations! + +```python +#import your libraries +import + +#declare your keys +consumer_key = +consumer_secret = +access_token_key = +access_token_secret = + +#Function to extract tweets +def get_tweets(username): + + + +#Function to generate Wordcloud +def main(): + + + +#Call the main() function +main() +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/1.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/1.md new file mode 100644 index 00000000..7765475a --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/1.md @@ -0,0 +1,15 @@ + + +For this lab we will utilize the skills you've gained working with APIs to visualize tweets using the **tweepy** Twitter API. + +The idea is simple, given a topic, all hashtags with greater than 5% frequency pertaining to that topic are plotted in a pie graph. All hashtags with less than 5% frequency fall under an "Other" category. + +Hashtags provide an efficient way of deducing how tweeters feel about the topic they are tweeting about, since Twitter users use hashtags to summarize their tweets, often with more emotion. Therefore hashtags provide a sufficient summary of the tweet - there is a lesser need to process every character and word of a tweet if the hashtags are available. + +By seeing the most common hashtags associated with a topic, we can evaluate what Twitter users are discussing under the scope of a greater topic and how people feel about the topic at hand. It's easy to get caught in our own echo chambers on social media, and analyzing the most common hashtags across *all* tweets for a certain topic helps us analyze the feelings behind a topic in a more objective manner. + +Here is an example of what we will be aiming to accomplish at the end of this lab: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/pieplot.png) + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/11.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/11.md new file mode 100644 index 00000000..f3a5f03b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/11.md @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/111.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/111.md new file mode 100644 index 00000000..80112577 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/111.md @@ -0,0 +1,12 @@ + + +If you're on Anaconda installing the Tweepy API is simple, just type the following: + +``` python +conda install tweepy +``` + +After installing the API you will also have to create a developer account with Twitter in order to access the API, this process is quick and straightforward. Just click [this](https://developer.twitter.com/en/apply-for-access.html) link to get started. + + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/112.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/112.md new file mode 100644 index 00000000..7f14cc6e --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/112.md @@ -0,0 +1,30 @@ + + +The following packages will need to be installed in order to complete the necessary functions of the lab. By now you are already familiar with loading Python packages thanks to your previous labs. + +``` python +import os +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +import itertools +import collections + +import tweepy as tw +import nltk +from nltk.corpus import stopwords +import re +import networkx + +import warnings + +import warnings +warnings.filterwarnings("ignore") + +sns.set(font_scale=1.5) +sns.set_style("whitegrid") +``` + +As we progress thourgh the lab you will see how all these packages play a key role in developing our program. + + \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/12.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/12.md new file mode 100644 index 00000000..62065a92 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/12.md @@ -0,0 +1,9 @@ + + +In order to complete the various functions and methods we will perform, we need to login to twitter through our program. + +To complete this we need to go through 3 simple steps: + +1. Define your search keys +2. Create the access token to login +3. Finally, accessing the API \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/121.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/121.md new file mode 100644 index 00000000..a51a01ac --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/121.md @@ -0,0 +1,9 @@ + + +Defining the keys to login is simple, the information you need is all provided when you're developer account is created. Type the following to store it in a variable + +``` python +consumer_key= 'yourkeyhere' +consumer_secret= 'yourkeyhere' +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/122.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/122.md new file mode 100644 index 00000000..f60cd59b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/122.md @@ -0,0 +1,18 @@ + + +Now we must create our access token, this is a key step to complete the login process. + +First, store the token values in a variable: + +``` python +access_token= 'yourkeyhere' +access_token_secret= 'yourkeyhere' +``` + +Second, use the OAuthHandler() and set_access_token() methods to create the instance that will allow login. + +``` python +auth = tw.OAuthHandler(consumer_key, consumer_secret) +auth.set_access_token(access_token, access_token_secret) +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/123.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/123.md new file mode 100644 index 00000000..64d068fd --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/123.md @@ -0,0 +1,9 @@ + + +Now that we have created our access token we can finally access the API, this can be done in a simple line. + +``` python +api = tw.API(auth, wait_on_rate_limit=True) + +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/2.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/2.md new file mode 100644 index 00000000..c85e0037 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/2.md @@ -0,0 +1,5 @@ + + +Now that we've authenticated we're ready to search for tweets. Let's start by searching for all tweets surrounding the topic of climate change. ("climate change" being your query string) + +![sample image](https://www.diggitmagazine.com/sites/default/files/styles/inline_image/public/Climate%20change%20photo_1.jpg?itok=2BfiKsqU) \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/21.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/21.md new file mode 100644 index 00000000..e8156afc --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/21.md @@ -0,0 +1,11 @@ + + +In order to search for tweets under our desired hashtag, we will use the -filter method to find tweets under the climate change hashtag. + +In order to accomplish we write the following: + +``` python +search_term = "#climate+change -filter:retweets" +``` + +Here we are telling the **tweepy** API to filter for recent tweets containg the climate change hashtag \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/22.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/22.md new file mode 100644 index 00000000..3cd01ff2 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/22.md @@ -0,0 +1,3 @@ + + +Now that we've found the recent tweets containg the hashtags that we will eventually analyze, we need to store the tweets in an organized manner for analysis. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/221.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/221.md new file mode 100644 index 00000000..6175be94 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/221.md @@ -0,0 +1,15 @@ + + +For our analysis we need an accurate sample size for credible findings. We will grab 1,000 tweets under the climate change hashtag for our analysis. + +To accomplish this we will use the Cursor method to iterate through the tweets, you may remember seeing this method from a previous lab. + +``` python +tweets = tw.Cursor(api.search, + q=search_term, + lang="en", + since='2018-11-01').items(1000) +``` + +Using the Cursor method tell the iterator to search through the api, with the climate change hashtag, in english. There will be 1000 items in this iteration from tweets tweeted since Novemeber 1st 2018. + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/222.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/222.md new file mode 100644 index 00000000..87dc80d7 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/222.md @@ -0,0 +1,22 @@ + + +Now we can use list comprehension to iterate through our recently found items in a list. + +``` python +all_tweets = [tweet.text for tweet in tweets] +``` + +Here's yet another example of when list comprehension comes in handy. Especially in data analysis. + +Now let's see the output we get... + +``` python +all_tweets[:5] +# Below is the output of the first 5 results +['@InsuranceBureau Hey! Yoohoo! Hey! @InsuranceBureau! \nMaybe sometime before today, and everyday from now on, you sh… https://t.co/sWc2XT1DO8', + 'Our rulers are golfing and trail running while human civilization burns down. \n\nNew piece by @KateAronoff. #climate… https://t.co/R6HZ78oK67', + '"These findings lend themselves to a somewhat controversial idea: that we might be able to manipulate these marine… https://t.co/71w3y6fWfA', + 'Information based on proven data about #climate change and how this affects #waterAvailability is so important! Tha… https://t.co/YDe1k1sJKj', + 'Here’s what @EmoryUniversity is doing to tackle #climate change. You can get involved by visiting… https://t.co/eQsGGsob1J'] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/3.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/3.md new file mode 100644 index 00000000..68392164 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/3.md @@ -0,0 +1,5 @@ + + +As you saw from the output of our lists there are links to the tweets. While this may be nice to track the source of the tweets it will be a hinderance when parsing through the list for analysis. + +We will use regular expressions to accomplish the data cleaning. Throughout the previous labs you have gone through you may by now know that cleaning data is the longest portion of analysis projects. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/311.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/311.md new file mode 100644 index 00000000..1602b646 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/311.md @@ -0,0 +1,11 @@ + + +You may remember seeing ```import re``` while we were loading our packages earlier. Re stands for ```regular expressions```. Regular expressions are a special syntax that is used to identify patterns in a string. + +While this lesson will not cover regular expressions, it is helpful to understand that this syntax below: + +``` +([^0-9A-Za-z \t])|(\w+:\/\/\S+) +``` + +Tells the search to find all strings that look like a URL, and replace it with nothing – `""`. It also removes other punctionation including hashtags - `#`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/312.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/312.md new file mode 100644 index 00000000..e7ed3ba9 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/312.md @@ -0,0 +1,10 @@ + + +`re.sub` allows you to substitute a selection of characters defined using a regular expression, with something else. + +In the function defined below, this line takes the text in each tweet and replaces the URL with nothing: + +``` python +re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", tweet +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/313.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/313.md new file mode 100644 index 00000000..9ad6e696 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/313.md @@ -0,0 +1,15 @@ + + +Using the re.sub method we just looked at we can create a function that removes urls from the items of our list. + +```python +def remove_url(txt): + return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split()) +``` + +This is a simple function that accomplishes quiet a lot. We replace URLs found ina. text string with nothing. + +The parameters that the function takes in is a text string which we'd like to parse and remove urls from. The function returns the same txt string with url's removed. + + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/32.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/32.md new file mode 100644 index 00000000..61642fe5 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/32.md @@ -0,0 +1,21 @@ + + +Now that we have finished removing urls from our tweets we can add them to a list for analysis. + +Again, we will use list comprehension to accomplish this task: + +``` python +all_tweets_no_urls = [remove_url(tweet) for tweet in all_tweets] +all_tweets_no_urls[:5] +``` + +Displaying the output of our first 5 elements of the list we see a much cleaner result: + +```python +['InsuranceBureau Hey Yoohoo Hey InsuranceBureau Maybe sometime before today and everyday from now on you sh', + 'Our rulers are golfing and trail running while human civilization burns down New piece by KateAronoff climate', + 'These findings lend themselves to a somewhat controversial idea that we might be able to manipulate these marine', + 'Information based on proven data about climate change and how this affects waterAvailability is so important Tha', + 'Heres what EmoryUniversity is doing to tackle climate change You can get involved by visiting'] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/33.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/33.md new file mode 100644 index 00000000..ff6ab3fc --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/33.md @@ -0,0 +1,15 @@ + + +Another challenge we will address is capitalization which becomes a challenge with data analysis for text data. If you are trying to create a list of unique words in your tweets, words with capitalization will be different from words that are all lowercase. + +Here's an example: + +```python +# Note how capitalization impacts unique returned values +ex_list = ["Dog", "dog", "dog", "cat", "cat", ","] + +# Get unique elements in the list +set(ex_list) +{',', 'Dog', 'cat', 'dog'} +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/331.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/331.md new file mode 100644 index 00000000..6b42bcc0 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/331.md @@ -0,0 +1,16 @@ + + +To begin to remedy this issue we can make each word lowercase using the string method `.lower()`. In the code below, this method is applied using a list comprehension. + +```python +# Note how capitalization impacts unique returned values +words_list = ["Dog", "dog", "dog", "cat", "cat", ","] + +# Make all elements in the list lowercase +lower_case = [word.lower() for word in words_list] + +# Get all elements in the list +lower_case +['dog', 'dog', 'dog', 'cat', 'cat', ','] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/332.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/332.md new file mode 100644 index 00000000..02802ae3 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/332.md @@ -0,0 +1,10 @@ + + +Now all of the words in your list are lowercase. You can again use `set()` function to return only unique words. + +```python +# Now you have only unique words +set(lower_case) +{',', 'cat', 'dog'} +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/333.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/333.md new file mode 100644 index 00000000..4dc90c4b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/333.md @@ -0,0 +1,51 @@ + + +Right now, you have a list of lists that contains each full tweet and you know how to lowercase the words. + +To split and lower case words in all of the tweets, you can string both methods `.lower()` and `.split()` together in a list comprehension. + +``` python +# Create a list of lists containing lowercase words for each tweet +words_in_tweet = [tweet.lower().split() for tweet in all_tweets_no_urls] +words_in_tweet[:2] + +``` + +Our output will give us data that is clean and ready to use: + +``` python +[['insurancebureau', + 'hey', + 'yoohoo', + 'hey', + 'insurancebureau', + 'maybe', + 'sometime', + 'before', + 'today', + 'and', + 'everyday', + 'from', + 'now', + 'on', + 'you', + 'sh'], + ['our', + 'rulers', + 'are', + 'golfing', + 'and', + 'trail', + 'running', + 'while', + 'human', + 'civilization', + 'burns', + 'down', + 'new', + 'piece', + 'by', + 'katearonoff', + 'climate']] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/4.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/4.md new file mode 100644 index 00000000..0ca1ac13 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/4.md @@ -0,0 +1,6 @@ + + +Now we will incorporate some elementary math to enable us to display the frequencies of each hashtag and plot it as you will see later. + +To get the count of how many times each word appears in the sample, you can use the built-in `Python` library `collections`, which helps create a special type of a `Python dictionary.` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/5.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/5.md new file mode 100644 index 00000000..383d4d45 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/5.md @@ -0,0 +1,5 @@ + + +Now that we have cleaned the data (seemingly) we can plot it to show our findings! + +We will begin by using the Pandas library which you are familiar with by now to create a DataFrame. From there we will create a bar graph which will be the most visually pleasing in this instance. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/51.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/51.md new file mode 100644 index 00000000..9a808806 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/51.md @@ -0,0 +1,12 @@ + + +Based on the counter, you can create a `Pandas Dataframe` for analysis and plotting that includes only the top 15 most common words. + +``` python +clean_tweets_no_urls = pd.DataFrame(counts_no_urls.most_common(15), + columns=['words', 'count']) + +clean_tweets_no_urls.head() +``` + +This will return a Pandas DataFrame containing two columns with the words and the frequencies they appear in our items. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/52.md b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/52.md new file mode 100644 index 00000000..97235026 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Hashtag Frequency/52.md @@ -0,0 +1,24 @@ + + +Using this `Pandas Dataframe`, you can create a horizontal bar graph of the top 15 most common words in the tweets as shown below. + +```python +fig, ax = plt.subplots(figsize=(8, 8)) + +# Plot horizontal bar graph +clean_tweets_no_urls.sort_values(by='count').plot.barh(x='words', + y='count', + ax=ax, + color="purple") + +ax.set_title("Common Words Found in Tweets (Including All Words)") + +plt.show() +``` + +These are simple commands and paramters that we have encountered before. The plot displays the frequency of all words in the tweets on climate change, after URLs have been removed. + +With that, we are now done! Below is the output of the common words found in our Tweets. + +![Imgur](https://i.imgur.com/GloG9zm.png) + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/1.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/1.md new file mode 100644 index 00000000..6b3279dc --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/1.md @@ -0,0 +1,13 @@ + + +For this lab we will utilize the skills you've gained working with APIs to visualize tweets using the **tweepy** Twitter API. + +The idea is simple, given a hashtag, the top 15 words pertaining to that hashtag are displayed and plotted on a bar graph. + +Twitter users use hashtags to summarize their feelings and associate their tweets with a greater idea or belief. By seeing the most common words associated with a hashtag, we can gain a general sense of what people are talking about with a certain hashtag, how people feel and gain a greater grasp of what the general sentiment is around a hashtag. It's easy to get caught in our own echo chambers on social media, and analyzing the most common words across *all* tweets for a certain hashtag helps us analyze the feelings behind a hashtag in a more objective manner. + +Here is an example of what we will be aiming to accomplish at the end of this lab: + +![sample image](https://i.imgur.com/TpBec4E.png) + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/11.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/11.md new file mode 100644 index 00000000..f3a5f03b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/11.md @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/111.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/111.md new file mode 100644 index 00000000..80112577 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/111.md @@ -0,0 +1,12 @@ + + +If you're on Anaconda installing the Tweepy API is simple, just type the following: + +``` python +conda install tweepy +``` + +After installing the API you will also have to create a developer account with Twitter in order to access the API, this process is quick and straightforward. Just click [this](https://developer.twitter.com/en/apply-for-access.html) link to get started. + + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/112.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/112.md new file mode 100644 index 00000000..7f14cc6e --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/112.md @@ -0,0 +1,30 @@ + + +The following packages will need to be installed in order to complete the necessary functions of the lab. By now you are already familiar with loading Python packages thanks to your previous labs. + +``` python +import os +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +import itertools +import collections + +import tweepy as tw +import nltk +from nltk.corpus import stopwords +import re +import networkx + +import warnings + +import warnings +warnings.filterwarnings("ignore") + +sns.set(font_scale=1.5) +sns.set_style("whitegrid") +``` + +As we progress thourgh the lab you will see how all these packages play a key role in developing our program. + + \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/12.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/12.md new file mode 100644 index 00000000..62065a92 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/12.md @@ -0,0 +1,9 @@ + + +In order to complete the various functions and methods we will perform, we need to login to twitter through our program. + +To complete this we need to go through 3 simple steps: + +1. Define your search keys +2. Create the access token to login +3. Finally, accessing the API \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/121.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/121.md new file mode 100644 index 00000000..a51a01ac --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/121.md @@ -0,0 +1,9 @@ + + +Defining the keys to login is simple, the information you need is all provided when you're developer account is created. Type the following to store it in a variable + +``` python +consumer_key= 'yourkeyhere' +consumer_secret= 'yourkeyhere' +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/122.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/122.md new file mode 100644 index 00000000..f60cd59b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/122.md @@ -0,0 +1,18 @@ + + +Now we must create our access token, this is a key step to complete the login process. + +First, store the token values in a variable: + +``` python +access_token= 'yourkeyhere' +access_token_secret= 'yourkeyhere' +``` + +Second, use the OAuthHandler() and set_access_token() methods to create the instance that will allow login. + +``` python +auth = tw.OAuthHandler(consumer_key, consumer_secret) +auth.set_access_token(access_token, access_token_secret) +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/123.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/123.md new file mode 100644 index 00000000..64d068fd --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/123.md @@ -0,0 +1,9 @@ + + +Now that we have created our access token we can finally access the API, this can be done in a simple line. + +``` python +api = tw.API(auth, wait_on_rate_limit=True) + +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/2.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/2.md new file mode 100644 index 00000000..eed95ca1 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/2.md @@ -0,0 +1,5 @@ + + +Now that we've authenticated we're ready to search for tweets. Let's start by searching for tweets that contain a hashtag of your choice, preferably a hashtag that is more thought-provoking. You can use the example **#climatechange** if you'd like. + +![sample image](https://www.diggitmagazine.com/sites/default/files/styles/inline_image/public/Climate%20change%20photo_1.jpg?itok=2BfiKsqU) \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/21.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/21.md new file mode 100644 index 00000000..e8156afc --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/21.md @@ -0,0 +1,11 @@ + + +In order to search for tweets under our desired hashtag, we will use the -filter method to find tweets under the climate change hashtag. + +In order to accomplish we write the following: + +``` python +search_term = "#climate+change -filter:retweets" +``` + +Here we are telling the **tweepy** API to filter for recent tweets containg the climate change hashtag \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/22.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/22.md new file mode 100644 index 00000000..3cd01ff2 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/22.md @@ -0,0 +1,3 @@ + + +Now that we've found the recent tweets containg the hashtags that we will eventually analyze, we need to store the tweets in an organized manner for analysis. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/221.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/221.md new file mode 100644 index 00000000..6175be94 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/221.md @@ -0,0 +1,15 @@ + + +For our analysis we need an accurate sample size for credible findings. We will grab 1,000 tweets under the climate change hashtag for our analysis. + +To accomplish this we will use the Cursor method to iterate through the tweets, you may remember seeing this method from a previous lab. + +``` python +tweets = tw.Cursor(api.search, + q=search_term, + lang="en", + since='2018-11-01').items(1000) +``` + +Using the Cursor method tell the iterator to search through the api, with the climate change hashtag, in english. There will be 1000 items in this iteration from tweets tweeted since Novemeber 1st 2018. + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/222.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/222.md new file mode 100644 index 00000000..87dc80d7 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/222.md @@ -0,0 +1,22 @@ + + +Now we can use list comprehension to iterate through our recently found items in a list. + +``` python +all_tweets = [tweet.text for tweet in tweets] +``` + +Here's yet another example of when list comprehension comes in handy. Especially in data analysis. + +Now let's see the output we get... + +``` python +all_tweets[:5] +# Below is the output of the first 5 results +['@InsuranceBureau Hey! Yoohoo! Hey! @InsuranceBureau! \nMaybe sometime before today, and everyday from now on, you sh… https://t.co/sWc2XT1DO8', + 'Our rulers are golfing and trail running while human civilization burns down. \n\nNew piece by @KateAronoff. #climate… https://t.co/R6HZ78oK67', + '"These findings lend themselves to a somewhat controversial idea: that we might be able to manipulate these marine… https://t.co/71w3y6fWfA', + 'Information based on proven data about #climate change and how this affects #waterAvailability is so important! Tha… https://t.co/YDe1k1sJKj', + 'Here’s what @EmoryUniversity is doing to tackle #climate change. You can get involved by visiting… https://t.co/eQsGGsob1J'] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/3.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/3.md new file mode 100644 index 00000000..68392164 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/3.md @@ -0,0 +1,5 @@ + + +As you saw from the output of our lists there are links to the tweets. While this may be nice to track the source of the tweets it will be a hinderance when parsing through the list for analysis. + +We will use regular expressions to accomplish the data cleaning. Throughout the previous labs you have gone through you may by now know that cleaning data is the longest portion of analysis projects. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/311.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/311.md new file mode 100644 index 00000000..1602b646 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/311.md @@ -0,0 +1,11 @@ + + +You may remember seeing ```import re``` while we were loading our packages earlier. Re stands for ```regular expressions```. Regular expressions are a special syntax that is used to identify patterns in a string. + +While this lesson will not cover regular expressions, it is helpful to understand that this syntax below: + +``` +([^0-9A-Za-z \t])|(\w+:\/\/\S+) +``` + +Tells the search to find all strings that look like a URL, and replace it with nothing – `""`. It also removes other punctionation including hashtags - `#`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/312.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/312.md new file mode 100644 index 00000000..e7ed3ba9 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/312.md @@ -0,0 +1,10 @@ + + +`re.sub` allows you to substitute a selection of characters defined using a regular expression, with something else. + +In the function defined below, this line takes the text in each tweet and replaces the URL with nothing: + +``` python +re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", tweet +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/313.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/313.md new file mode 100644 index 00000000..9ad6e696 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/313.md @@ -0,0 +1,15 @@ + + +Using the re.sub method we just looked at we can create a function that removes urls from the items of our list. + +```python +def remove_url(txt): + return " ".join(re.sub("([^0-9A-Za-z \t])|(\w+:\/\/\S+)", "", txt).split()) +``` + +This is a simple function that accomplishes quiet a lot. We replace URLs found ina. text string with nothing. + +The parameters that the function takes in is a text string which we'd like to parse and remove urls from. The function returns the same txt string with url's removed. + + + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/32.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/32.md new file mode 100644 index 00000000..61642fe5 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/32.md @@ -0,0 +1,21 @@ + + +Now that we have finished removing urls from our tweets we can add them to a list for analysis. + +Again, we will use list comprehension to accomplish this task: + +``` python +all_tweets_no_urls = [remove_url(tweet) for tweet in all_tweets] +all_tweets_no_urls[:5] +``` + +Displaying the output of our first 5 elements of the list we see a much cleaner result: + +```python +['InsuranceBureau Hey Yoohoo Hey InsuranceBureau Maybe sometime before today and everyday from now on you sh', + 'Our rulers are golfing and trail running while human civilization burns down New piece by KateAronoff climate', + 'These findings lend themselves to a somewhat controversial idea that we might be able to manipulate these marine', + 'Information based on proven data about climate change and how this affects waterAvailability is so important Tha', + 'Heres what EmoryUniversity is doing to tackle climate change You can get involved by visiting'] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/33.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/33.md new file mode 100644 index 00000000..ff6ab3fc --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/33.md @@ -0,0 +1,15 @@ + + +Another challenge we will address is capitalization which becomes a challenge with data analysis for text data. If you are trying to create a list of unique words in your tweets, words with capitalization will be different from words that are all lowercase. + +Here's an example: + +```python +# Note how capitalization impacts unique returned values +ex_list = ["Dog", "dog", "dog", "cat", "cat", ","] + +# Get unique elements in the list +set(ex_list) +{',', 'Dog', 'cat', 'dog'} +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/331.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/331.md new file mode 100644 index 00000000..6b42bcc0 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/331.md @@ -0,0 +1,16 @@ + + +To begin to remedy this issue we can make each word lowercase using the string method `.lower()`. In the code below, this method is applied using a list comprehension. + +```python +# Note how capitalization impacts unique returned values +words_list = ["Dog", "dog", "dog", "cat", "cat", ","] + +# Make all elements in the list lowercase +lower_case = [word.lower() for word in words_list] + +# Get all elements in the list +lower_case +['dog', 'dog', 'dog', 'cat', 'cat', ','] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/332.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/332.md new file mode 100644 index 00000000..02802ae3 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/332.md @@ -0,0 +1,10 @@ + + +Now all of the words in your list are lowercase. You can again use `set()` function to return only unique words. + +```python +# Now you have only unique words +set(lower_case) +{',', 'cat', 'dog'} +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/333.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/333.md new file mode 100644 index 00000000..4dc90c4b --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/333.md @@ -0,0 +1,51 @@ + + +Right now, you have a list of lists that contains each full tweet and you know how to lowercase the words. + +To split and lower case words in all of the tweets, you can string both methods `.lower()` and `.split()` together in a list comprehension. + +``` python +# Create a list of lists containing lowercase words for each tweet +words_in_tweet = [tweet.lower().split() for tweet in all_tweets_no_urls] +words_in_tweet[:2] + +``` + +Our output will give us data that is clean and ready to use: + +``` python +[['insurancebureau', + 'hey', + 'yoohoo', + 'hey', + 'insurancebureau', + 'maybe', + 'sometime', + 'before', + 'today', + 'and', + 'everyday', + 'from', + 'now', + 'on', + 'you', + 'sh'], + ['our', + 'rulers', + 'are', + 'golfing', + 'and', + 'trail', + 'running', + 'while', + 'human', + 'civilization', + 'burns', + 'down', + 'new', + 'piece', + 'by', + 'katearonoff', + 'climate']] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/4.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/4.md new file mode 100644 index 00000000..7b13d4aa --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/4.md @@ -0,0 +1,5 @@ + + +Now we will incorporate some elementary math to enable us to display the frequencies of each word and plot it as you will see later. + +To get the count of how many times each word appears in the sample, you can use the built-in `Python` library `collections`, which helps create a special type of a `Python dictonary.` \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/41.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/41.md new file mode 100644 index 00000000..868d3546 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/41.md @@ -0,0 +1,31 @@ + + +To begin, flatten your list, so that all words across the tweets are in one list. Note that you could flatten your list with another list comprehension like this: all_words = [item for sublist in tweets_nsw for item in sublist] + +While the list comprehension skill we have acquired works in this case we can use the itertools library to flatten the list as follows: + +``` python +# List of all words across tweets +all_words_no_urls = list(itertools.chain(*words_in_tweet)) + +# Create counter +counts_no_urls = collections.Counter(all_words_no_urls) + +counts_no_urls.most_common(15) +[('climate', 865), + ('change', 667), + ('the', 547), + ('to', 446), + ('of', 252), + ('is', 239), + ('a', 233), + ('and', 226), + ('in', 203), + ('climatechange', 197), + ('on', 176), + ('for', 134), + ('are', 101), + ('we', 93), + ('about', 75)] +``` + diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/5.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/5.md new file mode 100644 index 00000000..90c001e9 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/5.md @@ -0,0 +1,5 @@ + + +Now that we have cleaned the data (seemingly) we can plot it to show our findings! + +We will begin by using the Pandas library which you are familiar with by now to create a DataFrame. From there we will create a bar graph which will be the most visually pleasing in this instance. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/51.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/51.md new file mode 100644 index 00000000..9a808806 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/51.md @@ -0,0 +1,12 @@ + + +Based on the counter, you can create a `Pandas Dataframe` for analysis and plotting that includes only the top 15 most common words. + +``` python +clean_tweets_no_urls = pd.DataFrame(counts_no_urls.most_common(15), + columns=['words', 'count']) + +clean_tweets_no_urls.head() +``` + +This will return a Pandas DataFrame containing two columns with the words and the frequencies they appear in our items. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/52.md b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/52.md new file mode 100644 index 00000000..97235026 --- /dev/null +++ b/Module_Twitter_API/labs/Week 2/Twitter Word Frequency/52.md @@ -0,0 +1,24 @@ + + +Using this `Pandas Dataframe`, you can create a horizontal bar graph of the top 15 most common words in the tweets as shown below. + +```python +fig, ax = plt.subplots(figsize=(8, 8)) + +# Plot horizontal bar graph +clean_tweets_no_urls.sort_values(by='count').plot.barh(x='words', + y='count', + ax=ax, + color="purple") + +ax.set_title("Common Words Found in Tweets (Including All Words)") + +plt.show() +``` + +These are simple commands and paramters that we have encountered before. The plot displays the frequency of all words in the tweets on climate change, after URLs have been removed. + +With that, we are now done! Below is the output of the common words found in our Tweets. + +![Imgur](https://i.imgur.com/GloG9zm.png) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/1.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/1.md new file mode 100644 index 00000000..6b894357 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/1.md @@ -0,0 +1,22 @@ + + +For this lab, we will be conducting sentiment analysis on specific US airlines. **Sentiment Analysis** is the analysis of language to identify emotions (positive, negative, etc). + +To perform sentiment analysis we'll be gathering tweets referencing airlines using Twitter's API as well as the `tweepy` and `TextBlob` packages to determine whether generated tweets have a positive, neutral or negative attitude towards the airline in question. Then we'll graph that information, both in the form of a bar and line graph. + +Twitter presence is an important point of emphasis for companies. Twitter users can quickly form a positive or negative opinion of company, depending on what users are tweeting pertaining to companies. Companies perform sentiment analysis frequently to gauge what Twitter's opinion of their company is, and what steps they can take to ensure a positive Twitter presence for their company. We'll conduct an elementary sentiment analysis that companies may conduct for this lab, this is an example of what you should be making by the end: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/AirlineSentimentExample.png) + +To get started, you'll need to sign up for a Twitter API developer account. + + + +After you've signed up for the developer account you need to make a new Twitter app and get four credentials for future use: + +* consumer key +* consumer token +* access token +* access secret token + +Paste those credentials into the appropriate area in your starter code. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/11.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/11.md new file mode 100644 index 00000000..12df60dc --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/11.md @@ -0,0 +1,5 @@ + + +The following is the signup page for getting a Twitter developer account. Try to navigate to this page by yourself, and follow the steps for an *academic* account. + +![img](https://lh4.googleusercontent.com/bOVrW7NkR9zdzVGR5Wpn4blHLWwsbRapfxYJdsFB2MXaEGDfD6GQ7REp8h42A3fSQmHDLtpAhsxEuSymYElifWq_dn4742hYwzfhO2nmZce6u5CtLhh8mJmBLSQ4KydLGG9NMWNp9F4) \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/111.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/111.md new file mode 100644 index 00000000..72f9811e --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/111.md @@ -0,0 +1,6 @@ + + +If you couldn't find it, the Twitter Developer application is found [here](https://developer.twitter.com/en/apps). Using the Twitter API requires an account, so you'll need to follow the steps, starting with clicking “Apply”. + + + \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/112.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/112.md new file mode 100644 index 00000000..53c496d2 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/112.md @@ -0,0 +1,15 @@ + + +The Twitter developer application process starts at this page, please complete the form. Once you click 'student', proceed to the rest of the Application. + + + +When you're prompted to explain your use, you can explain that you'll be using Twitter to perform sentiment analysis for learning purposes. + + + +You will also be analyzing Twitter data, you can say this is also part of the exercise. + + +![img](https://lh4.googleusercontent.com/bOVrW7NkR9zdzVGR5Wpn4blHLWwsbRapfxYJdsFB2MXaEGDfD6GQ7REp8h42A3fSQmHDLtpAhsxEuSymYElifWq_dn4742hYwzfhO2nmZce6u5CtLhh8mJmBLSQ4KydLGG9NMWNp9F4) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/113.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/113.md new file mode 100644 index 00000000..f56ca29c --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/113.md @@ -0,0 +1,7 @@ + + + +After finishing your application, confirm your email and your account should be processed and reviewed swiftly. + + +![img](https://lh4.googleusercontent.com/8BKvmctSfLQEKERSZIc9_3jKl7lnpkRJO3736TBuIkfwBzZhkZMmPL8hUnNjrCf27SqX1iZaHOv1RBrNfB2V1990cl9z35ojA-RjoDnN0vgn5XWuDhwMjpbbhHLj5J1qcuq4M2KSC4g) \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/12.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/12.md new file mode 100644 index 00000000..11b31eef --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/12.md @@ -0,0 +1,12 @@ + + +When you have finished configuring your account, head back to Twitter Developer [here](https://developer.twitter.com/en/apps), your app menu should look like this: + +![img](https://lh6.googleusercontent.com/c2Eey4CUXd9gi3LFLPvbpKpDr1_qNTyZGHMKngCAjZ_prK1rITeI7AnLtWPRr0v_gRIGIxbT6MQUl7GAQ8wq6Hx1_JuFZFOhcUaPPhbf8RPTSprIvtluuqKWf3LULkCqRP-1FaPrkAU)Click on “Create an app”. Fill out app details; for the website URL field, you can input any website, we used https://bitproject.org. Leave the OAuth Callback URL, TOS and Privacy Policy fields blank. + +After creating your app, head to its “Keys and tokens” section, where you will find an API key and API secret key. Copy these keys for later use. (these keys in the picture have been erased) + +![img](https://lh4.googleusercontent.com/fLq7LZu_w2JKb2HCFHptAT1Ln4Z00JNMNq47knue29sH5HzWCSWbx_o6xpSeT0qOytCI7CLF8HqTdxlRQ_wb4JC9x_TnvSYgr8Ssjd3BKZBThHii-CkInXZ5UHO8mFVZU2L2e6DwpoE) + +There is also an area to generate an access token and access secret token, please generate them and keep track of those as well. + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/121.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/121.md new file mode 100644 index 00000000..07a09b03 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/121.md @@ -0,0 +1,6 @@ + + +Head back to Twitter Developer [here](https://developer.twitter.com/en/apps), your app menu should look like this: + +![img](https://lh6.googleusercontent.com/c2Eey4CUXd9gi3LFLPvbpKpDr1_qNTyZGHMKngCAjZ_prK1rITeI7AnLtWPRr0v_gRIGIxbT6MQUl7GAQ8wq6Hx1_JuFZFOhcUaPPhbf8RPTSprIvtluuqKWf3LULkCqRP-1FaPrkAU)Click on “Create an app”. + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/122.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/122.md new file mode 100644 index 00000000..3f94fc25 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/122.md @@ -0,0 +1,6 @@ + + +In the creation of your app, fill out all the required information for your app details. For the website URL field, you can input any website, we used https://bitproject.org. Leave the OAuth Callback URL, TOS and Privacy Policy fields blank. + +![img](https://lh6.googleusercontent.com/wCWo0frQNm2aPD3Fv30kMC90DQDk880eGb1KTGrL5I7dOjis95GoVBI2zJJ3tacIz-0ux9HFpgAYeB4Ym_LC2OAPabCMRzGeiRtnVRUbKAqn_PdGyMLunDhZCo_h-4XIysnYivjUwnI) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/123.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/123.md new file mode 100644 index 00000000..1fef04ca --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/123.md @@ -0,0 +1,10 @@ + + +After creating your app, head to its “Keys and tokens” section, where you will find an API key and API secret key. Copy these keys for later use. (these keys in the picture have been erased) + +![img](https://lh4.googleusercontent.com/fLq7LZu_w2JKb2HCFHptAT1Ln4Z00JNMNq47knue29sH5HzWCSWbx_o6xpSeT0qOytCI7CLF8HqTdxlRQ_wb4JC9x_TnvSYgr8Ssjd3BKZBThHii-CkInXZ5UHO8mFVZU2L2e6DwpoE) + + + +Below the consumer API keys, there is also an area to generate an access token and access secret token, please generate them and keep track of those as well. + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/2.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/2.md new file mode 100644 index 00000000..78fc90ed --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/2.md @@ -0,0 +1,12 @@ + + +The goal of this card is to set up authentication so your application can use Tweet information. You should: + +* Paste your keys and tokens in the starter code. +* Configure OAuth authentication with your consumer key and secret +* Set your access tokens and create a API object in `tweepy` to fetch tweets. + +Bear in mind we will be using the `us_search` dictionary for the rest of this lab. + +![image](https://images.pexels.com/photos/58639/pexels-photo-58639.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/21.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/21.md new file mode 100644 index 00000000..7e10b1bf --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/21.md @@ -0,0 +1,10 @@ + + +Paste your keys and tokens in the allocated space. You'll want to authenticate in three steps: + +1. Configure OAuth authentication with your consumer key and secret with the `OAuthHandler(consumer_key, consumer_secret)` call. This should create an `auth` object +2. Set your access tokens with `auth.set_access_token()` +3. Create a API object in `tweepy` to fetch tweets: `tweepy.API(auth, wait_on_rate_limit=True)` + +![image](https://images.pexels.com/photos/46148/aircraft-jet-landing-cloud-46148.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/211.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/211.md new file mode 100644 index 00000000..e36831ea --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/211.md @@ -0,0 +1,12 @@ + + + +Remember all those credentials you generated? Paste them appropriately in the starter code. + +```python +consumer_key = 'xxx' +consumer_secret = 'xxx' +access_token = 'xxx' +access_token_secret = 'xxx' +``` + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/212.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/212.md new file mode 100644 index 00000000..0d30563d --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/212.md @@ -0,0 +1,16 @@ + + +We are now going to authenticate using our app credentials. Please look at the next two lines of code: + +```python +auth = OAuthHandler(consumer_key, consumer_secret) +``` + +This line generates an authentication object using our consumer key and secret. + +```python +auth.set_access_token(access_token, access_token_secret) +``` + +This line enables access to our authentication object with our access token and access secret token. + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/213.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/213.md new file mode 100644 index 00000000..e7466b3c --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/213.md @@ -0,0 +1,8 @@ + + +The `tweepy` package allows us to very easily use Twitter's API within a Python environment. The line below will give us an API object that will allow us to fetch tweets. + +```python +api = tw.API(auth, wait_on_rate_limit=True) +``` + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/3.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/3.md new file mode 100644 index 00000000..a930dbf2 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/3.md @@ -0,0 +1,25 @@ + + + +Now that our application is set up, we're going to produce a dataframe that stores two items: the data, and the sentiment of each tweet found. + + + +Complete the function `produce_dataframe(search_dict, num_tweets)`, with the following descriptions of the parameters: + +* `search_dict` - a dictionary that has a {key, value} pair in the items category. The `key` is the date you will be searching for, and the `value` is the name of the airline you will search for. +* `num_tweets` - the number of tweets in the search dictionary + + + +The function `produce_dataframe(search_dict, num_tweets)` should complete the following checklist: + +* Return a dataframe using the `pandas` library, the column names should be 'date', and 'sentiment'. Set rownames to 'positive', 'neutral', and 'negative.' +* Fill the returning dataframe with + + + +You are not allowed to change any function definition or code that was given to you. The end result should look like this: + +![image]() + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/31.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/31.md new file mode 100644 index 00000000..857614b8 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/31.md @@ -0,0 +1,17 @@ + + + + +In order to add items to the dataframe, we first need to create dataframe. We can use the `pandas` library to create a dataframe. + +The function we want to use is `pandas.Dataframe(arguments)`. Use the argument: + +* `index` to set the rownames to 'positive', 'neutral', and 'negative.' +* `columns` to set the columnnames to `date` and `sentiment` + + + +If you run your code before inserting any data, your graph should look like this: + +![image]() + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/311.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/311.md new file mode 100644 index 00000000..2e74590e --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/311.md @@ -0,0 +1,24 @@ + + + + +We can create a `pandas` dataframe with the function `pd.DataFrame()`. If we want a dataframe with the following structure (it currently has : + +| | Airline_0 | Airline_1 | ... | +| -------- | --------- | --------- | ---- | +| positive | ... | ... | ... | +| neutral | ... | ... | ... | +| negative | ... | ... | ... | + +We need the arguments: + +* `data` - this argument is defaulted to None, we define it to an explicit `[]` +* `Index` - We'll set this to a list of []'positive', 'neutral', 'negative'] + +Our final code looks like: + +```python +df = pd.DataFrame([], index=['positive', 'neutral', 'negative']) +``` + +Now we need to get the tweets from the cursor object diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/312.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/312.md new file mode 100644 index 00000000..f32b7d59 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/312.md @@ -0,0 +1,17 @@ + + +The `Cursor` in `tweepy` will allow us to find `num_tweets` tweets given a search query `val`. + +```python +# Collect tweets +tweets = tw.Cursor(api.search, q=val, lang="en", since=date_since).items(num_tweets) +``` + +Because we are given a dictionary with search queries, we want to iterate through this dictionary and call the above line for each search query (each query corresponds to one airline): + +```python +for key, val in search_dict.items(): + # Collect tweets + tweets = tw.Cursor(api.search, q=val, lang="en", since=date_since).items(num_tweets) +``` + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/32.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/32.md new file mode 100644 index 00000000..17ef4fba --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/32.md @@ -0,0 +1,14 @@ + + + + + +Now that we've created our Dataframe, we need to fill our dataframe. Recall the parameter `search_dict` is a dictionary that has a {key, value} pair in the items category. The `key` is the date you will be searching for, and the `value` is the name of the airline you will search for. + + + +After producing the dataframe you should iterate through the `search_dict` and call the `tweepy.Cursor()` object to get the tweets with the search query. + + + +![image]() diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/321.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/321.md new file mode 100644 index 00000000..6a27d795 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/321.md @@ -0,0 +1,30 @@ + + + + +For each tweet object in the cursor, we can use the `.text` attribute to get the text of the tweet itself. Then for each text, we make a `TextBlob` object out of that text. + +From there, simply check the values of the `.sentiment.polarity` attribute. Positive polarity indicates positive sentiment, zero polarity indicates neutral sentiment and neutral polarity indicates negative sentiment. + +We keep track of positive, neutral and negative tweets with counter variables. At the end, we add the acquired sentiment data to the dataframe. + +Return `df` at the end. + +```python +for key, val in search_dict.items(): + # ... + + positive = 0 + neutral = 0 + negative = 0 + for t in tweets: + analysis = TextBlob(t.text) + if analysis.sentiment.polarity > 0: + positive += 1 + elif analysis.sentiment.polarity == 0: + neutral += 1 + else: + negative += 1 + df[key] = [positive, neutral, negative] +return df +``` \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/33.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/33.md new file mode 100644 index 00000000..69b0311c --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/33.md @@ -0,0 +1,13 @@ + + +Now that we've got the tweets, we need to iterate through the tweets to find the sentiments. + + + +We also want to keep track of how many of each sentiments you have. Make sure to initialize the number of sentiments of each 'positive', 'neutral' or 'negative' first. + + + +You need to use the `TextBlob` module to get the `sentiment.polarity` attribute. For each item in the Tweets we have, set the item in the dataframe date to the sentiment. + +![image]() \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/4.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/4.md new file mode 100644 index 00000000..45feff80 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/4.md @@ -0,0 +1,7 @@ +# Set-up Dataframes for Graph + +Now that we have our raw dataframe of dates and sentiments, we need to calculate the total tweets and the percentage of positive/neutral/negative tweets per day. + +Set up *one* dataframe that contains all of that data. This is what your dataframe should look like: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/Airline_DF.PNG) diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/5.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/5.md new file mode 100644 index 00000000..90e99ac0 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/5.md @@ -0,0 +1,14 @@ +# Graphing + +Now that we have our dataframe with the number of positive, neutral, and negative tweets for each candidate, it's time to graph! + + +We'll be graphing a bar graph representing the total number of tweets per day for + + +Inside of the function `produce_graph(df, keys)`, create a bar graph, with each bar labelled with an appropriate date for the tick. Then on top of that bar graph, create a line graph with three lines for the positive, neutral, and negative sentiment on *each day*. + +Data presentation is everything - without putting your due diligence into your presentation, less people will read your analytics! So make sure your graph is properly titled, with a legend, x-axes, y-axes, and x and y labels. Don't forget to have 2 y-axes! (one for your bar graph and one for your line graph). As a reminder this is what your graph should look like: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/AirlineSentimentExample.png) + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/51.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/51.md new file mode 100644 index 00000000..49a59138 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/51.md @@ -0,0 +1,12 @@ +# Setting Up Bars + +Remember that even though this is a stacked bar graph, you are still essentially graphing numbers on a bar graph, with the caveat that the bars are on top of each other, and so the location of the bars needs to be controlled. + +To make your bars on your graph, there are two steps you need to take: + +* Locate the positive, neutral and negative lists in your data frame. That is the data you will be graphing. +* Use `plt.bar` to graph bar graphs. + * Start off with just graphing the positive bars and making sure those work. + * Then graph the neutral bars, setting the **bottom** to be the positive bars. + * Do the same for negative bars. + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/52.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/52.md new file mode 100644 index 00000000..65ab6ddc --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/52.md @@ -0,0 +1,22 @@ +# Making Graph Pretty + +Now that your bars are set up, your graph most likely looks unorganized. To organize our graph better, we can set margins to the x ticks to space them out. Because the specifics of how to space out x tick labels can get quite complicated and out of the scope of this bootcamp, I'll provide a chunk of code for you here: + +```python +# space out x ticks and give margins +plt.gca().margins(x=0) +plt.gcf().canvas.draw() +tl = plt.gca().get_xticklabels() +maxsize = max([t.get_window_extent().width for t in tl]) +m = 0.1 # inch margin +s = maxsize/plt.gcf().dpi*7+2*m +margin = m/plt.gcf().get_size_inches()[0] + +plt.gcf().subplots_adjust(left=margin, right=1.-margin) +plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1]) +``` + +This code should space out your x ticks and set margins. + +Don't forget a title (`plt.title`) and a legend (`plt.legend`)! + diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/53.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/53.md new file mode 100644 index 00000000..224384de --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/53.md @@ -0,0 +1,5 @@ +# Text Labels + +To add text labels, we'll have to iterate through all the bars and append an appropriate label to each one. So firstly, set up a list called `labels`, that has all the positive, neutral and negative data *in one list*. + +We can use `ax.patches` to find a list of all the bars currently in the graph. We can then `zip` the labels and patches together, iterate through that, and for each patch use `ax.text` to attribute a label to each bar. Make sure you have an appropriate location when using `ax.text`! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/6.md b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/6.md new file mode 100644 index 00000000..45f890d6 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Airline Sentiment Analysis/6.md @@ -0,0 +1,3 @@ +# `main()` + +It's time to put all of our functions together into a program! Call `produce_dataframe()` and `produce_graph` inside of your main() function with the proper parameters, and run your main() to see your graphs made! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/1.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/1.md new file mode 100644 index 00000000..d51df678 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/1.md @@ -0,0 +1,15 @@ + + + +With the wealth of information at our disposal, American politics have increasingly become a game of misinformation and narrative twisting. To win elections, often times it doesn't matter what political candidates actually do or say, but how candidates can control the narrative surrounding their campaigns, and what the American people think of their candidates. + +Twitter reactions are a valuable source of political opinions from regular Americans, because anyone can tweet their honest feelings about political candidates. Using sentiment analysis on Americans' tweets can give us genuine insight on the sentiment behind every candidate in the race, outside of media spin or campaign biases. For this lab, we'll dive into the tweets during and after the 7th Democratic Debate to gauge how average Americans perceive political candidates, and graph our results in a stacked bar graph! Here is what the result should look like: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/6thDemDebateGraph.png) + +To do this, we'll be gathering tweets referencing political candidates using Twitter's API as well as the `tweepy` and `TextBlob`packages to determine whether generated tweets have a positive, neutral or negative attitude towards the airlines. + +Proceed to Twitter Developer [here](https://developer.twitter.com/en/apps), and you'll need to make a new Twitter app and get four credentials for future use: consumer key, consumer token, access token and access secret token. + +Paste those credentials into the appropriate area in your starter code. + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/11.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/11.md new file mode 100644 index 00000000..6f5bdcdf --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/11.md @@ -0,0 +1,7 @@ + + +The Twitter developer application process starts here, please complete the form. + + +![img](https://lh4.googleusercontent.com/bOVrW7NkR9zdzVGR5Wpn4blHLWwsbRapfxYJdsFB2MXaEGDfD6GQ7REp8h42A3fSQmHDLtpAhsxEuSymYElifWq_dn4742hYwzfhO2nmZce6u5CtLhh8mJmBLSQ4KydLGG9NMWNp9F4) + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/111.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/111.md new file mode 100644 index 00000000..6e78cbed --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/111.md @@ -0,0 +1,6 @@ + + +Proceed to Twitter Developer [here](https://developer.twitter.com/en/apps). Implementing sign-in with Twitter requires a Twitter developer account, so click “Apply”. + + + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/112.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/112.md new file mode 100644 index 00000000..7c1b766b --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/112.md @@ -0,0 +1,8 @@ + + +The Twitter developer application process starts here, please complete the form. + +As you are most likely a student, click on the student option to begin. If another option is more relevant, click on that instead. + +![img](https://lh4.googleusercontent.com/bOVrW7NkR9zdzVGR5Wpn4blHLWwsbRapfxYJdsFB2MXaEGDfD6GQ7REp8h42A3fSQmHDLtpAhsxEuSymYElifWq_dn4742hYwzfhO2nmZce6u5CtLhh8mJmBLSQ4KydLGG9NMWNp9F4) + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/113.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/113.md new file mode 100644 index 00000000..35d947ef --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/113.md @@ -0,0 +1,7 @@ + + +After finishing your application, confirm your email and your account should be processed and reviewed swiftly. + +![img](https://lh4.googleusercontent.com/8BKvmctSfLQEKERSZIc9_3jKl7lnpkRJO3736TBuIkfwBzZhkZMmPL8hUnNjrCf27SqX1iZaHOv1RBrNfB2V1990cl9z35ojA-RjoDnN0vgn5XWuDhwMjpbbhHLj5J1qcuq4M2KSC4g) + +After you have been approved, you will then be able to access the neccessary API keys. Be patient! This process can lasrt up to 2 bussiness weeks. diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/12.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/12.md new file mode 100644 index 00000000..cf334045 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/12.md @@ -0,0 +1,13 @@ + + +When you have finished configuring your account, head back to Twitter Developer [here](https://developer.twitter.com/en/apps), your app menu should look like this: + +![qf76jiT](https://i.imgur.com/qf76jiT.png) + +Click on “Create an app” and fill out app details. + +After creating your app, head to its “Keys and tokens” section, where you will find an API key and API secret key. Copy these keys for later use (these keys in the picture have been erased). The keys will be used to access Twitter's sentiment anlysis API and will be inserted into the starter code. + +There is also an area to generate an access token and access secret token, please generate them and keep track of those as well. They will be used in the starter code. + + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/121.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/121.md new file mode 100644 index 00000000..d0f61e73 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/121.md @@ -0,0 +1,13 @@ + + +Head back to Twitter Developer [here](https://developer.twitter.com/en/apps), your app menu should look like this: + + +![qf76jiT](https://i.imgur.com/qf76jiT.png)Click on “Create an app”. Fill out app details; for the website URL field, you can input any website, we used https://bitproject.org. Leave the OAuth Callback URL, TOS and Privacy Policy fields blank. + +After creating your app, head to its “Keys and tokens” section, where you will find an API key and API secret key. Copy these keys for later use. (these keys in the picture have been erased) + +![img](https://lh4.googleusercontent.com/fLq7LZu_w2JKb2HCFHptAT1Ln4Z00JNMNq47knue29sH5HzWCSWbx_o6xpSeT0qOytCI7CLF8HqTdxlRQ_wb4JC9x_TnvSYgr8Ssjd3BKZBThHii-CkInXZ5UHO8mFVZU2L2e6DwpoE) + +There is also an area to generate an access token and access secret token, please generate them and keep track of those as well. + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/122.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/122.md new file mode 100644 index 00000000..7ff3a60f --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/122.md @@ -0,0 +1,14 @@ + + + +In the creation of your app, fill out all the required information for your app details. For the website URL field, you can input any website, we used https://bitproject.org. Leave the OAuth Callback URL, TOS and Privacy Policy fields blank. + + OAuth Callback URLs are used for providing directions on where a user should go after signing in with their Twitter credentials. They can even be used to redirect a user to a spcific page, which won't be neccessary for our lab. + +TOS stands for 'terms of service' which again, we don't need to specify here. + +Privacy Policy is similar to TOS, but would explain to users what the data collected from them would be used for. Since there will be no users other than yourself, it is not relevant. + + +![img](https://lh6.googleusercontent.com/wCWo0frQNm2aPD3Fv30kMC90DQDk880eGb1KTGrL5I7dOjis95GoVBI2zJJ3tacIz-0ux9HFpgAYeB4Ym_LC2OAPabCMRzGeiRtnVRUbKAqn_PdGyMLunDhZCo_h-4XIysnYivjUwnI) + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/123.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/123.md new file mode 100644 index 00000000..3e8edec2 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/123.md @@ -0,0 +1,11 @@ + + +After creating your app, head to its “Keys and tokens” section, where you will find an API key and API secret key. Copy these keys for later use. (these keys in the picture have been erased) + +![img](https://lh4.googleusercontent.com/fLq7LZu_w2JKb2HCFHptAT1Ln4Z00JNMNq47knue29sH5HzWCSWbx_o6xpSeT0qOytCI7CLF8HqTdxlRQ_wb4JC9x_TnvSYgr8Ssjd3BKZBThHii-CkInXZ5UHO8mFVZU2L2e6DwpoE) + + + +Below the consumer API keys, there is also an area to generate an access token and access secret token, please generate them and keep track of those as well. Remember, these keys will be used to access Twitter's sentiment anlysis API and will be inserted into the starter code. + + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/2.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/2.md new file mode 100644 index 00000000..785101ad --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/2.md @@ -0,0 +1,7 @@ + + +So far we have created a twitter developer account and an application. We will next use our the keys and tokens that have been automatically generated to continue creating our API. + +We will need to paste the keys and tokens in the allocated space. Then configure OAuth authentication with your consumer key and secret, set your access tokens and create a API object in `tweepy` to fetch tweets. After the authentication process, we should have access to data on twitter. + +Bear in mind we will be using the `dem_search` dictionary for the rest of this lab. diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/21.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/21.md new file mode 100644 index 00000000..04dfde57 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/21.md @@ -0,0 +1,15 @@ + + +Paste your keys and tokens in the allocated space. You'll want to authenticate in three steps: + +1. Configure OAuth authentication with your consumer key and secret with the `OAuthHandler(consumer_key, consumer_secret)` call. This should create an `auth` object + + By configuring OAuth authentication, it allows API client app to access a user's twitter account without having to handle or store the user's login info. + +2. Set your access tokens with `auth.set_access_token()`. + + Setting the access tokens gives us authorization to a specific application that will give us access to specific parts of a user's data + +3. Create a API object in `tweepy` to fetch tweets: `tweepy.API(auth, wait_on_rate_limit=True)` + + This function will return the tweets. diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/211.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/211.md new file mode 100644 index 00000000..ef9f121b --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/211.md @@ -0,0 +1,19 @@ + + +Remember all those credentials you generated? Paste them appropriately in the starter code. + +We will need to add the following information in our code so we could retrieve the tweets! + +```python +consumer_key = 'xxx' +consumer_secret = 'xxx' +access_token = 'xxx' +access_token_secret = 'xxx' +``` + +A comsumer key is the API key that provides the consumer with a certain service. This will provide the consumers the key that identifies them as the consumer. + +The consumer secret is the consumer's "password" that is used and also the onsumer key. These two information will be used +to request access to a user's resources from twitter. + +An access token is given to the consumer once they have completed the authorization. This token is provided by the service provider and will determine what access benefits that consumer will recieve over a particular user's resources. When the consumer wants to access data, they will need to provide the access token when requesting the information. diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/212.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/212.md new file mode 100644 index 00000000..bf8671f3 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/212.md @@ -0,0 +1,22 @@ + + +We are now going to authenticate using our app credentials. Please look at the next two lines of code: + +```python +auth = OAuthHandler(consumer_key, consumer_secret) +``` + +This line generates an authentication object using our consumer key and secret. + +```python +auth.set_access_token(access_token, access_token_secret) +``` + +This line enables access to our authentication object with our access token and access secret token. + +A comsumer key is different from an access token because this key is what differentiates the user as a consumer. +An access token on the other hand provides the consumers the access the user's resources. There are both essential to an API +because it will determine if the user is a consumer and when they need to access data, they need to have +an access key or token. + +Once you have authenticated the credentials, we will begin creating the start code! diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/213.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/213.md new file mode 100644 index 00000000..4347b0cd --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/213.md @@ -0,0 +1,14 @@ + + +The `tweepy` package allows us to very easily use Twitter's API within a Python environment. The line below will give us an API object that will allow us to fetch tweets. + +The wait_on_rate_limit will determine if the rate limits will replenish automatically. + +To fetch the tweets we will set the wait_on_rate_limit equal to True. + + +```python +api = tw.API(auth, wait_on_rate_limit=True) +``` + +After we retrieve the tweets, our next step will be to create authentification to use the app! diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/3.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/3.md new file mode 100644 index 00000000..5e059159 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/3.md @@ -0,0 +1,4 @@ +# Producing Dataframes + +First, we'll have to complete the function `produce_dataframe()`. Please reference the description in the starter code. Bear in mind we provide what will be passed into the `search_dict` parameter for you, and you are not allowed to change any function definition or given code. + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/31.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/31.md new file mode 100644 index 00000000..27371b7d --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/31.md @@ -0,0 +1,12 @@ + + + + +Our end result is going to be a dataframe indexed by sentiment categories `['positive', 'neutral', 'negative']` and our columns being various airlines of a category. From this dataframe, one should be able to easily look up the number of positive/neutral/negative tweets regarding an airline. + +Firstly, make a empty dataframe `df` indexed by the array `['positive', 'neutral', 'negative']`. + +A `search_dict` simply is a dictionary with airline names mapped to appropriate search queries on Twitter. We can use `tweepy` to search for tweets including those search queries. + +For each search query, use the `Cursor` object from `tweepy` to generate n number of tweets including each query. (n corresponds to the parameter `num_tweets` in this case.) + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/311.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/311.md new file mode 100644 index 00000000..4e41893f --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/311.md @@ -0,0 +1,20 @@ + + + + +We want a dataframe with the following structure: + + + +| | Airline_0 | Airline_1 | ... | +| -------- | --------------- | --------------- | ---- | +| positive | 321 | 23 | ... | +| neutral | 76 | 32 | ... | +| negative | \# example data | \# example data | ... | + +Our dataframe `df` will do the trick: + +``` +df = pd.DataFrame([], index=['positive', 'neutral', 'negative']) +``` + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/312.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/312.md new file mode 100644 index 00000000..f32b7d59 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/312.md @@ -0,0 +1,17 @@ + + +The `Cursor` in `tweepy` will allow us to find `num_tweets` tweets given a search query `val`. + +```python +# Collect tweets +tweets = tw.Cursor(api.search, q=val, lang="en", since=date_since).items(num_tweets) +``` + +Because we are given a dictionary with search queries, we want to iterate through this dictionary and call the above line for each search query (each query corresponds to one airline): + +```python +for key, val in search_dict.items(): + # Collect tweets + tweets = tw.Cursor(api.search, q=val, lang="en", since=date_since).items(num_tweets) +``` + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/32.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/32.md new file mode 100644 index 00000000..b96fa681 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/32.md @@ -0,0 +1,15 @@ + + + + +In our `Cursor` object is a list of tweets with our search query. + +Iterate through this cursor object and find whether its sentiment is positive, neutral or negative. + +* You'll have to use `TextBlob` as well as the `sentiment.polarity` attribute. + +Keep track of a count of positive, neutral and negative tweets. + +When done iterating through the tweets, index the dataframe so that the sentiment data shows up in your dataframe. + +The function should return `df`. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/321.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/321.md new file mode 100644 index 00000000..6a27d795 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/321.md @@ -0,0 +1,30 @@ + + + + +For each tweet object in the cursor, we can use the `.text` attribute to get the text of the tweet itself. Then for each text, we make a `TextBlob` object out of that text. + +From there, simply check the values of the `.sentiment.polarity` attribute. Positive polarity indicates positive sentiment, zero polarity indicates neutral sentiment and neutral polarity indicates negative sentiment. + +We keep track of positive, neutral and negative tweets with counter variables. At the end, we add the acquired sentiment data to the dataframe. + +Return `df` at the end. + +```python +for key, val in search_dict.items(): + # ... + + positive = 0 + neutral = 0 + negative = 0 + for t in tweets: + analysis = TextBlob(t.text) + if analysis.sentiment.polarity > 0: + positive += 1 + elif analysis.sentiment.polarity == 0: + neutral += 1 + else: + negative += 1 + df[key] = [positive, neutral, negative] +return df +``` \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/33.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/33.md new file mode 100644 index 00000000..b5fc3605 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/33.md @@ -0,0 +1,3 @@ + + +Now let's fill out `init_dataframes()`. Call your method `produce_dataframe` on the defined dictionaries `low_cost_search` and `luxury_search` to produce two dataframes which consist of sentiment data on low-cost airlines and luxury airlines. Print *the entirety* of these dataframes to the Python console and return *both* of them. \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/331.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/331.md new file mode 100644 index 00000000..9d87b411 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/331.md @@ -0,0 +1,14 @@ + + +To do this, call `produce_dataframe()` twice on `low_cost_search` and `luxury_search`, each with the 100 tweets as a parameter. + +Then we print the *entirety* of the dataframe using `.to_string()` and return the resulting dataframes. + +```python +def init_dataframes(): + low_cost_df = produce_dataframe(low_cost_search, 100) + luxury_cost_df = produce_dataframe(luxury_search, 100) + print(low_cost_df.to_string()) + print(luxury_cost_df.to_string()) + return low_cost_df, luxury_cost_df +``` \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/4.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/4.md new file mode 100644 index 00000000..9cb3384a --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/4.md @@ -0,0 +1,17 @@ +# Graphing + +Now that we have our dataframe with the number of positive, neutral, and negative tweets for each candidate, it's time to graph! + +Inside of the function `produce_graph(df, keys)`, create a stacked bar graph, with each bar labelled with the number of the positive/neutral/negative tweets. Data presentation is everything - without putting your due diligence into your presentation, less people will read your analytics! So make sure your graph is properly titled, with a legend, x-axes, y-axes, and x and y labels. + +Here are some formatting rules (and hints!) that were used in the graph below: + +* Width of bars is 0.40 +* The legend is placed outside the graph using `bbox_to_anchor` +* 0.1 inch margin between x-axis labels +* Think about what `matplotlib` function you would use to set up a bar graph. How would you set up bars so that the bottom of one bar is set at the same location as the top of another? +* You can set up text labels by initializing a figure `fig` with `plt.figure`, then initializing a subplot `ax` with `fig.add_subplot` and adding text labels at a location `(x, y)` with `ax.text(x, y, ...)` + +Remember that this is the result you are aiming for: + +![](https://projectbit.s3-us-west-1.amazonaws.com/darlene/labs/6thDemDebateGraph.png) \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/41.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/41.md new file mode 100644 index 00000000..49a59138 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/41.md @@ -0,0 +1,12 @@ +# Setting Up Bars + +Remember that even though this is a stacked bar graph, you are still essentially graphing numbers on a bar graph, with the caveat that the bars are on top of each other, and so the location of the bars needs to be controlled. + +To make your bars on your graph, there are two steps you need to take: + +* Locate the positive, neutral and negative lists in your data frame. That is the data you will be graphing. +* Use `plt.bar` to graph bar graphs. + * Start off with just graphing the positive bars and making sure those work. + * Then graph the neutral bars, setting the **bottom** to be the positive bars. + * Do the same for negative bars. + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/42.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/42.md new file mode 100644 index 00000000..65ab6ddc --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/42.md @@ -0,0 +1,22 @@ +# Making Graph Pretty + +Now that your bars are set up, your graph most likely looks unorganized. To organize our graph better, we can set margins to the x ticks to space them out. Because the specifics of how to space out x tick labels can get quite complicated and out of the scope of this bootcamp, I'll provide a chunk of code for you here: + +```python +# space out x ticks and give margins +plt.gca().margins(x=0) +plt.gcf().canvas.draw() +tl = plt.gca().get_xticklabels() +maxsize = max([t.get_window_extent().width for t in tl]) +m = 0.1 # inch margin +s = maxsize/plt.gcf().dpi*7+2*m +margin = m/plt.gcf().get_size_inches()[0] + +plt.gcf().subplots_adjust(left=margin, right=1.-margin) +plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1]) +``` + +This code should space out your x ticks and set margins. + +Don't forget a title (`plt.title`) and a legend (`plt.legend`)! + diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/43.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/43.md new file mode 100644 index 00000000..224384de --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/43.md @@ -0,0 +1,5 @@ +# Text Labels + +To add text labels, we'll have to iterate through all the bars and append an appropriate label to each one. So firstly, set up a list called `labels`, that has all the positive, neutral and negative data *in one list*. + +We can use `ax.patches` to find a list of all the bars currently in the graph. We can then `zip` the labels and patches together, iterate through that, and for each patch use `ax.text` to attribute a label to each bar. Make sure you have an appropriate location when using `ax.text`! \ No newline at end of file diff --git a/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/5.md b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/5.md new file mode 100644 index 00000000..45f890d6 --- /dev/null +++ b/Module_Twitter_API/labs/Week 3/Democratic Debate Sentiment/5.md @@ -0,0 +1,3 @@ +# `main()` + +It's time to put all of our functions together into a program! Call `produce_dataframe()` and `produce_graph` inside of your main() function with the proper parameters, and run your main() to see your graphs made! \ No newline at end of file