Skip to main content

Twitter API and Microsoft Text Analytics API in Python

Since we are getting closer to the French presidential election, and that I'm working on a project that involves using social media API and sentiment analysis, I've decide to post an example that will use these technologies to try and give an idea about each major candidate popularity.

Solution description:

1. Collect social media information related to each candidate: For this example the main source is Twitter.

2. Extract sentiment for each Candidate from the Twitter posts collected previously.

Implementation:

For a quick implementation I decided to use python, but I'll definitely post a C# version as well.

Code:

1. Twitter Client code:

The code is pretty basic, I'm streaming the posts to a text file in son, by applying a list of filters; the names of the candidates I've decided to include:

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = "{your access token}"
access_token_secret = "{your access token secret}"
consumer_key = "{your consumer key}"
consumer_secret = "{your consumer secret}"

class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':
#This handles Twitter authentication and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 
    stream.filter(track=['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon'])

I run this code by using this command and I leave it for multiple hours or even days to collect as much twits as possible:
python TwitterClientVersion2.py > stream_2017_03_28.txt

2. Microsoft Sentiment Analytics:

At the time I'm writing this post a series of Rest Service are available for free it is call MS Cognitive Service, this API encapsulate the intelligent algorithms (mainly artificial intelligence) for:
1.  Vision
2. Speech: Recognition..
3. Language: Language understanding, language recognition, Text Analytics,....
4. Knowledge
5. Search

In this post we'll be using the Text Analytics API to calculate post sentiments:
This API will send a sentiment score (number between 0 and 1), it has some limitation in the number of requests (1000 documents per request) and size of each document.

########### Python 2.7 #############
import httplib, urllib, base64, json
from prompt_toolkit import document

class setimentScore:
    def __init__(self, data):
        #self.s = encode(s)
        #data = json.loads(self.s)
        self.score = data.get('score',0)
        self.id = data.get('id',0)
    def __repr__(self):
        return '' % self.id

class scores:
    def __init__(self, s):
        self.s = encode(s)
        data = json.loads(self.s)
        self.scores = []
        for sc in data.get('documents',[]):
            self.scores.append(setimentScore(sc))
    def __repr__(self):
        return '' % self.s
    def append(self, s):
        data = json.loads(self.s)
        for sc in data.get('documents',[]):
            self.scores.append(setimentScore(sc))

headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': '{Your Key}',
}

params = urllib.urlencode({
})

def AnalyseTextSentiment(inputTexts):
    try:
        conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com')
        scs = None
        for inputText in inputTexts:
            conn.request("POST", "/text/analytics/v2.0/sentiment?%s" % params, inputText, headers)
            response = conn.getresponse()
            data = response.read()
            if(scs == None):
                scs = scores(data)
            else:
                scs.append(data)
        total = 0
        nbResults = len(scs.scores)
        for s in scs.scores:
            total += s.score
        if nbResults == 0:
            print("Avg Score= 0")
        else:
            print("Avg Score= %f"  % (total / nbResults))
        print("Total Score= %f" % total)
        print("Nb Scores= %f" % nbResults)
        conn.close()
    except Exception as e:
        print("[Errno {0}] {1}".format(e.errno, e.strerror))

def encode(txt):
    if txt:
        try:
            return txt.encode('utf-8')
        except:
            try:
                return txt.decode().encode('utf-8')
            except:
                return txt
    return ""

def ReadFileWithFilter(filePath, filterText):
    f = open(filePath, "r")
    line = f.readline()
    strExp = ''
    documents = []
    batches = []
    i=0
    ids = []
    while(line != ''):
        if(line != '\n'):
            data = json.loads(line)
            if(data.get('text',0) != None):
                twittText = encode(data.get('text',0))
                currentId = encode(data.get('id_str',0))
                if(filterText in encode(twittText) and currentId not in ids):
                    ids.append(currentId)
                    documents.append({"Id":currentId,"text" : encode(twittText)})
                    i +=1
                    if(i >= 1000):
                        jsonBody = {"documents":documents}
                        strExp = json.dumps(jsonBody)
                        batches.append(strExp)
                        documents = []
                        i = 0
        line = f.readline()
    if(len(documents) > 0):
        jsonBody = {"documents":documents}
        strExp = json.dumps(jsonBody)
        batches.append(strExp)
    print("Generated %d batches" % len(batches))
    return batches

candidats = ['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon']
for candidat in candidats:
    print(candidat)
    print("=============")
    marinLePenTweets = ReadFileWithFilter('.\stream_2017_03_26.txt', candidat)
    AnalyseTextSentiment(marinLePenTweets)

Below an example out put:
Marine Le Pen
=============
Generated 5 batches
Avg Score= 0.631855
Total Score= 3159.275843
Nb Scores= 5000.000000
Francois Fillon
=============
Generated 1 batches
Avg Score= 0.648703
Total Score= 24.650710
Nb Scores= 38.000000
Emmanuel Macron
=============
Generated 2 batches
Avg Score= 0.706528
Total Score= 1413.056910
Nb Scores= 2000.000000
Benoit Hamon
=============
Generated 1 batches
Avg Score= 0.681619
Total Score= 109.058961
Nb Scores= 160.000000
Jean-Luc Melenchon
=============
Generated 1 batches
Avg Score= 0.727667
Total Score= 8.732005
Nb Scores= 12.000000

Comments

Popular posts from this blog

Understanding Monte Carlo Simulation C#

This method has been introduced to resolve numerically complex physics problems, such as neutron diffusion, that were to complex for an analytical solution. This method is based on generating random values for parameters or inputs to explore the behaviour of complex systems, now this method is used in various domains like: Engineering, science and finance. These approaches tend to follow a particular pattern: 1- Define a domain of possible inputs. 2- Generate inputs randomly from the domain using a certain specified probability distribution. 3- Perform a deterministic computation using the inputs. 4- Aggregate the results of the individual computations into the final result. Here is an example of using Monte Carlo simulation to approximate the value of Pi: In this case we have 2 parameters x,y which defines a location in the plane (e.g The picture on the left). We will calculate the probability to have a point in the 1/4 Circle area, with a radius of 1. To calculate Pi we...

Full Text Search using Entity Framework

I've been working on a project where I needed to implement full time search on one table. My current solution was based on SQL Server db and Entity Framework 6. I had two choices implement the full text search in C# or use the functionality available in SQL server, since the data was stored in SQL Server the obvious solution was to use the built in full text search. How this works: 1. You need to activate and configure the full text search: Activate on the sql server table by using SSMS, and specify which columns are going to be included. 2. To perform a full text search in a T-SQL query you have the choice between 2 Boolean functions: Contains and Freetext or two functions that returns 2 columns tables. In my case I need a function that could be used in a where clause (Boolean), and decided to use 'Contains'. For more details about the difference between Freetext and contains have a look at this article . 3. I need to instruct EF6 to generate a particular T-SQL stateme...