Twitter API and Microsoft Text Analytics API in Python

Since we are getting closer to the French presidential election, and that I'm working on a project that involves using social media API and sentiment analysis, I've decide to post an example that will use these technologies to try and give an idea about each major candidate popularity.

Solution description:

1. Collect social media information related to each candidate: For this example the main source is Twitter.

2. Extract sentiment for each Candidate from the Twitter posts collected previously.

Implementation:

For a quick implementation I decided to use python, but I'll definitely post a C# version as well.

Code:

1. Twitter Client code:

The code is pretty basic, I'm streaming the posts to a text file in son, by applying a list of filters; the names of the candidates I've decided to include:

#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

#Variables that contains the user credentials to access Twitter API 
access_token = "{your access token}"
access_token_secret = "{your access token secret}"
consumer_key = "{your consumer key}"
consumer_secret = "{your consumer secret}"

class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':
#This handles Twitter authentication and the connection to Twitter Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords: 
    stream.filter(track=['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon'])

I run this code by using this command and I leave it for multiple hours or even days to collect as much twits as possible:
python TwitterClientVersion2.py > stream_2017_03_28.txt

2. Microsoft Sentiment Analytics:

At the time I'm writing this post a series of Rest Service are available for free it is call MS Cognitive Service, this API encapsulate the intelligent algorithms (mainly artificial intelligence) for:
1. Vision
2. Speech: Recognition..
3. Language: Language understanding, language recognition, Text Analytics,....
4. Knowledge
5. Search

In this post we'll be using the Text Analytics API to calculate post sentiments:
This API will send a sentiment score (number between 0 and 1), it has some limitation in the number of requests (1000 documents per request) and size of each document.

########### Python 2.7 #############
import httplib, urllib, base64, json
from prompt_toolkit import document

class setimentScore:
    def __init__(self, data):
        #self.s = encode(s)
        #data = json.loads(self.s)
        self.score = data.get('score',0)
        self.id = data.get('id',0)
    def __repr__(self):
        return '' % self.id

class scores:
    def __init__(self, s):
        self.s = encode(s)
        data = json.loads(self.s)
        self.scores = []
        for sc in data.get('documents',[]):
            self.scores.append(setimentScore(sc))
    def __repr__(self):
        return '' % self.s
    def append(self, s):
        data = json.loads(self.s)
        for sc in data.get('documents',[]):
            self.scores.append(setimentScore(sc))

headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': '{Your Key}',
}

params = urllib.urlencode({
})

def AnalyseTextSentiment(inputTexts):
    try:
        conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com')
        scs = None
        for inputText in inputTexts:
            conn.request("POST", "/text/analytics/v2.0/sentiment?%s" % params, inputText, headers)
            response = conn.getresponse()
            data = response.read()
            if(scs == None):
                scs = scores(data)
            else:
                scs.append(data)
        total = 0
        nbResults = len(scs.scores)
        for s in scs.scores:
            total += s.score
        if nbResults == 0:
            print("Avg Score= 0")
        else:
            print("Avg Score= %f"  % (total / nbResults))
        print("Total Score= %f" % total)
        print("Nb Scores= %f" % nbResults)
        conn.close()
    except Exception as e:
        print("[Errno {0}] {1}".format(e.errno, e.strerror))

def encode(txt):
    if txt:
        try:
            return txt.encode('utf-8')
        except:
            try:
                return txt.decode().encode('utf-8')
            except:
                return txt
    return ""

def ReadFileWithFilter(filePath, filterText):
    f = open(filePath, "r")
    line = f.readline()
    strExp = ''
    documents = []
    batches = []
    i=0
    ids = []
    while(line != ''):
        if(line != '\n'):
            data = json.loads(line)
            if(data.get('text',0) != None):
                twittText = encode(data.get('text',0))
                currentId = encode(data.get('id_str',0))
                if(filterText in encode(twittText) and currentId not in ids):
                    ids.append(currentId)
                    documents.append({"Id":currentId,"text" : encode(twittText)})
                    i +=1
                    if(i >= 1000):
                        jsonBody = {"documents":documents}
                        strExp = json.dumps(jsonBody)
                        batches.append(strExp)
                        documents = []
                        i = 0
        line = f.readline()
    if(len(documents) > 0):
        jsonBody = {"documents":documents}
        strExp = json.dumps(jsonBody)
        batches.append(strExp)
    print("Generated %d batches" % len(batches))
    return batches

candidats = ['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon']
for candidat in candidats:
    print(candidat)
    print("=============")
    marinLePenTweets = ReadFileWithFilter('.\stream_2017_03_26.txt', candidat)
    AnalyseTextSentiment(marinLePenTweets)

Below an example out put:

Marine Le Pen
=============
Generated 5 batches
Avg Score= 0.631855
Total Score= 3159.275843
Nb Scores= 5000.000000
Francois Fillon
=============
Generated 1 batches
Avg Score= 0.648703
Total Score= 24.650710
Nb Scores= 38.000000
Emmanuel Macron
=============
Generated 2 batches
Avg Score= 0.706528
Total Score= 1413.056910
Nb Scores= 2000.000000
Benoit Hamon
=============
Generated 1 batches
Avg Score= 0.681619
Total Score= 109.058961
Nb Scores= 160.000000
Jean-Luc Melenchon
=============
Generated 1 batches
Avg Score= 0.727667
Total Score= 8.732005
Nb Scores= 12.000000

Understanding Monte Carlo Simulation C#

This method has been introduced to resolve numerically complex physics problems, such as neutron diffusion, that were to complex for an analytical solution. This method is based on generating random values for parameters or inputs to explore the behaviour of complex systems, now this method is used in various domains like: Engineering, science and finance. These approaches tend to follow a particular pattern: 1- Define a domain of possible inputs. 2- Generate inputs randomly from the domain using a certain specified probability distribution. 3- Perform a deterministic computation using the inputs. 4- Aggregate the results of the individual computations into the final result. Here is an example of using Monte Carlo simulation to approximate the value of Pi: In this case we have 2 parameters x,y which defines a location in the plane (e.g The picture on the left). We will calculate the probability to have a point in the 1/4 Circle area, with a radius of 1. To calculate Pi we...

MK Blog

Search This Blog

Advanced Solidity Features: Unraveling the Complexity