extract.update_status(status= 'Using Python to access Twitter') 

Using Python to retrieve data from twitter

Author: Yusra Farooqui

15/05/2018  

Overview

If you understood the script of the page heading, you may have guessed that this article is about Twitter data mining using Python. And you have guessed it right. If you are new to this data mining methodology, after reading this article you will be able to successfully extract tweets from Twitter of any given profile, or tweets pertaining to any hashtags. Pretty interesting, isn’t it. The article will aim to cover the preliminary steps to follow projects pertaining to twitter data. It is recommended to require moderate knowledge of Python 3.0.0 programming language to successfully understand the scripts and methodologies in this article. 

According to IFL Science we currently produce roughly 2.5 quintillion bytes of data a day . And with ever-increasing popularity of social media and the way humans are connecting via electronic devices, the number is destined to multiply over the coming years. Internet population has grown by 7.5 percent since 2016 and now includes over 3.7 billion humans. The US alone generates 2,657,700 gigabytes of Internet data every minute. You can find more information on world’s data generation at Data Never Sleeps. With all this data at our disposal, we have a great opportunity for consumer data mining. Consumer analytics is one of the most sought-after disciplines in the business World. It is always useful to understand how consumer data can be contained and the kind of analysis that can be utilised to create meaning out of this data.

In this article I will demonstrate how data scientists can mine tweets with Python. The data can be used to study engagement, trends, popularity, run sentiment analysis and explore other interesting scientific analysis. We will be using the standard (free) Twitter APIs which consists of REST APIs and Streaming APIs. You can retrieve tweets by using search words such as the ‘Royal Wedding’, user location, user ID such as ‘realDonaldTrump’, dates and a variety of other rules, however, with some limitations. 

The code is written in iPython 3.5.2. For the most part the script relies only on tweepy, a python tool made to access the Twitter API. The code covers the following topics;

1.     Accessing Twitter API

2.     Twitter Rate Limits

3.     Retrieving tweets from own and user timelines

4.     Live-streaming tweets

5.     Searching tweets with Emojis

First things first

To access twitter data in Python we need to create an app that will be able to interact with the Twitter API. The reason why I use the terminology ‘twitter data’ instead of ‘tweets’ is because quite a lot of information apart from the tweet itself is retrieved through requests. To create a twitter app, begin with registering a new application at twitter apps. You will be prompted to log into your twitter account and then magically your app is created. You can then retrieve your consumer key and consumer secret, in addition to your access token and access token secret from your app. Never reveal this information to anyone other than yourself. You can also regenerate these keys on the app. These keys will be your gateway to retrieve information from Twitter.

There are several libraries built for the Twitter Platform to work with Twitter data. We will be using tweepy; one of the most widely used Python wrapper for the Twitter API. Begin by installing tweepy via pip and then running the following script in your Python console;

import tweepy

consumer_key = 'your consumer key'

consumer_secret = 'your consumer secret'

access_token = 'your access token'

access_token_secret = 'your access secret'

 

def  twitter_Access():

     auth = OAuthHandler(consumer_key, consumer_secret)

     auth.set_access_token(access_token, access_token_secret)

     api = tweepy.API(auth)

     return api

 

extract = twitter_Access()

Before running the code, add your consumer key, consumer secret, access token and access token secret that you have gotten from the twitter app. This is used for authentication. We have created a function to setup the Twitter API, defined as twitter_Access() and have used it to create the variable extract. The code does a bunch of fancy stuff that you really don’t have to worry about. This will be the source of our entry into Twitter. 

Limitations

Before we proceed, it is essential to be informed about the limitations of the data we are collecting. It may seem that you will be able to access every single tweet on twitter at any given time. Unfortunately, that is not the case. Twitter has some policies on how we can process tweets and most importantly how much data we can retrieve with our Standard (free) API in third party applications. There are some standard API rate limits which is the amount of data-calls you can make to twitter API. You are allotted 15 requests per 15 minutes with your standard API. Additionally, the Standard search API searches against a sampling of recent tweets published in the past 7 days and is focused on relevance and not completeness. Therefore, some tweets will be omitted from your sample. There is also a quota on the maximum count of tweets that you can retrieve but this varies with the 'get' commands you are using. If there is an abuse of the rate limits, your app can get blacklisted. To avoid this, you can add the wait on rate limit snippet to the previously defined code;

api = tweepy.API(auth, wait_on_rate_limit=True)

However, if you have decided to run for the president of United States and twitter is your main domain to reach out to people then its best to get a subscription to premium API. This will allow you to search for tweets dated back to 2006. You can find more information on the rate limits at Twitter official developer site.  If you want to check your rate limit in Python, use the following code;

data = extract.rate_limit_status()

display(data)

print (data['resources']['statuses']['/statuses/home_timeline'])

print (data['resources']['users']['/users/lookup'])


Basic Syntax

The first thing that you generally do with twitter data is explore all the possibilities of retrieving data. As this section is more for learning rather than analysis, I have kept the tweet count to a minimum. 

Extract own tweets

Let’s start with retrieving the first 5 tweets on our homepage;

for tweets in tweepy.Cursor(extract.home_timeline).items(5):

    print(tweets.text)

Your output will look something like this;

Pokalfinale: Der grandiose Abgang des Eintracht-Torwarts https://t.co/FKFQz5zAX9 https://t.co/8nY3yuqXlE

U.S. government bonds pay more than debt from other developed nations for first time in nearly 20 years https://t.co/15nNjDMQIi

RT @Florian_Kain: Zum Zuhören: Der brandneue Podcast #BILD Direkt: Was hinter Alice Weidels Kopftuchmädchen-Rede steckt - meine Analyse - s…

Studies have long shown that gun violence in PG-13 movies has been rising, sometimes exceeding what is shown in pop… https://t.co/ahddvZt19A

Jokes von Charles & Harry - So lustig war die  Hochzeitsparty https://t.co/AVbHZyEkN1

Search Term for Tweets

We can also search for specific terms on Twitter. You might have noticed that the tweets are truncated in your previous result. One way to avoid this, is by adding the snippet, ‘tweet_mode="extended" ’ which will show the full text in a tweet. Below I have created the RoyalWed object which extracts tweets with a mention of royal wedding and have set it up to print the first 5 tweets;

Royalwed = extract.search(q="Royal Wedding", tweet_mode="extended")

for tweet in Royalwed[:5]:

    print(tweet.full_text)

    print()

The results will vary depending on when you run the code. As I mentioned earlier, the retrieval of data is count specific. In the result below you will notice that the abbreviation RT appears for all tweets, this is basically referring to retweets. 

RT @isabeIIehuppert: throwback to the greatest royal wedding of all time https://t.co/itUFR3dyCJ

RT @drizzyxcole: this the Royal Wedding not a Love and Hip Hop special https://t.co/gdvjTsLTX7

RT @bieberbacio: Kris Jenner programming next royal wedding between prince George and Stormi https://t.co/y6y2p3XoTZ

RT @mytylkamizono: in light of the royal wedding here is my favourite tumblr post of all time https://t.co/XY4Ir1bsAx

RT @STARKlNDLER: i'm only watching one (1) royal wedding this weekend https://t.co/seIGfav52D

Tweets from other profiles

You can also retrieve tweets from another user. If you don’t add the count parameter in your code, you will get the 20 most recent statuses posted from the user specified;

extract.user_timeline(screen_name="BarackObama", count=100, tweet_mode='extended')

Publishing a tweet from Python

With a simple query you can also publish a tweet directly on your twitter account;

extract.update_status(status='Updating from Python')

There is an exhaustive list of things that we can do with twitter API, such as add a friend, delete a friend, follow all your followers and much more. However, we will be primarily focusing on searching for users' tweets and terminologies in tweets.

Live streaming tweets

We can also live stream tweets, basically in simple terms getting tweets in real-time. You might end up getting a huge amount of data as it will constantly spit out data pertaining to your query term. Therefore, it is critical for you to keep an eye on the process. Let’s explore a minimal working example for this case. To do so, you need to import the following modules from tweepy first;

from tweepy import Stream

from tweepy.streaming import StreamListener

Make sure you have already completed the first step of creating your consumer and access variables. Run the following code in your console. You can adjust the text in inverted commas corresponding to track to search for any other terms;

class listener(StreamListener):

    def on_data(self, data):

        print(data)

        return True

    def on_error(self, status):

        print(status)

auth = OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_secret)

twitter_stream = Stream(auth, listener())

twitter_stream.filter(track='car')

For an in-depth understanding of this methodology, an exhaustive documentation on streaming APIs is available on tweepy documentation. You can abort this process whenever you feel you have enough information to move forward with your analysis. 

Note: Do not overrun the process if you don't have enough memory for the data.

Searching tweets by Emojis

Now let’s say you want to search for emojis instead of text in a tweet. This is a little tricky, but thanks to Emoji package not entirely impossible. To achieve this, you have to install the emoji package via pip first. After installing the package and then importing it in python, I have created a variable ThumbsUp that corresponds to this emoji : 👍. Run the following code to search for 👍 in recent tweets;

import Emoji

ThumbsUp = emoji.emojize(':thumbs_up:')

emoji = extract.search(q=ThumbsUp, count=5)

for tweet in emoji:

    print(tweet.text)

        

I get the following results;

RT @MF_Subthai: [THAISUB] BTS Wins Top Social Artist @ 2018 Billboards Music Awards 

ปีที่ 2 แล้ว พลังอาร์มี่สุดยอด 👍

>> https://t.co/jkkKn…

2 Knot with one in summer plumage present from the hide on BGM with c80 Blackwits.👍 https://t.co/iU4qfGmr95

Hello followers!!.💪😉

https://t.co/XwjqBim2x6

CONGRATULATIONS!!.👍

#bodybuildingcom  #newyorkpro 

#bodybuilding  #fitnessaddict 

#gymlife  #gym  #fitness

Relatively simple, isn’t it? You don’t have to worry about the Unicode for emojis using this snippet. If you are interested in looking up other emojis but are not aware of the emoji codes, you can look them up on this cheat sheet. You must have noticed that the tweets here appear in different languages, unfortunately, at the free plan you cannot filter for this. However, such filtering is possible with paid API plans. 

Conclusion

We have established that a lot can be done with Twitter API and we can retrieve a variety of information. If you want to explore all facets of using twitter API you will find this user guide from twitter particularly useful. At this stage, we have not yet explored what exactly can we do with this data. If you are intrigued and want to work with twitter data, you may want to see my rudimentary project on sentiment analysis of Donald Trump and Barack Obama using Python. There is a lot that can be achieved here; whether you are trying to see the success of your campaign or doing a research on suicide prevention. All in all the possibilities of analysing tweets are many. You will find a detailed python script for the syntax used in this article on my GitHub repository.