- Get link
- X
- Other Apps
Since we are getting closer to the French presidential election, and that I'm working on a project that involves using social media API and sentiment analysis, I've decide to post an example that will use these technologies to try and give an idea about each major candidate popularity.
I run this code by using this command and I leave it for multiple hours or even days to collect as much twits as possible:
python TwitterClientVersion2.py > stream_2017_03_28.txt
1. Vision
2. Speech: Recognition..
3. Language: Language understanding, language recognition, Text Analytics,....
4. Knowledge
5. Search
In this post we'll be using the Text Analytics API to calculate post sentiments:
This API will send a sentiment score (number between 0 and 1), it has some limitation in the number of requests (1000 documents per request) and size of each document.
Below an example out put:
Solution description:
1. Collect social media information related to each candidate: For this example the main source is Twitter.
2. Extract sentiment for each Candidate from the Twitter posts collected previously.
Implementation:
For a quick implementation I decided to use python, but I'll definitely post a C# version as well.Code:
1. Twitter Client code:
The code is pretty basic, I'm streaming the posts to a text file in son, by applying a list of filters; the names of the candidates I've decided to include:#Import the necessary methods from tweepy library from tweepy.streaming import StreamListener from tweepy import OAuthHandler from tweepy import Stream #Variables that contains the user credentials to access Twitter API access_token = "{your access token}" access_token_secret = "{your access token secret}" consumer_key = "{your consumer key}" consumer_secret = "{your consumer secret}" class StdOutListener(StreamListener): def on_data(self, data): print data return True def on_error(self, status): print status if __name__ == '__main__': #This handles Twitter authentication and the connection to Twitter Streaming API l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) #This line filter Twitter Streams to capture data by the keywords: stream.filter(track=['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon'])
I run this code by using this command and I leave it for multiple hours or even days to collect as much twits as possible:
python TwitterClientVersion2.py > stream_2017_03_28.txt
2. Microsoft Sentiment Analytics:
At the time I'm writing this post a series of Rest Service are available for free it is call MS Cognitive Service, this API encapsulate the intelligent algorithms (mainly artificial intelligence) for:1. Vision
2. Speech: Recognition..
3. Language: Language understanding, language recognition, Text Analytics,....
4. Knowledge
5. Search
In this post we'll be using the Text Analytics API to calculate post sentiments:
This API will send a sentiment score (number between 0 and 1), it has some limitation in the number of requests (1000 documents per request) and size of each document.
########### Python 2.7 ############# import httplib, urllib, base64, json from prompt_toolkit import document class setimentScore: def __init__(self, data): #self.s = encode(s) #data = json.loads(self.s) self.score = data.get('score',0) self.id = data.get('id',0) def __repr__(self): return '' % self.id class scores: def __init__(self, s): self.s = encode(s) data = json.loads(self.s) self.scores = [] for sc in data.get('documents',[]): self.scores.append(setimentScore(sc)) def __repr__(self): return ' ' % self.s def append(self, s): data = json.loads(self.s) for sc in data.get('documents',[]): self.scores.append(setimentScore(sc)) headers = { # Request headers 'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': '{Your Key}', } params = urllib.urlencode({ }) def AnalyseTextSentiment(inputTexts): try: conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com') scs = None for inputText in inputTexts: conn.request("POST", "/text/analytics/v2.0/sentiment?%s" % params, inputText, headers) response = conn.getresponse() data = response.read() if(scs == None): scs = scores(data) else: scs.append(data) total = 0 nbResults = len(scs.scores) for s in scs.scores: total += s.score if nbResults == 0: print("Avg Score= 0") else: print("Avg Score= %f" % (total / nbResults)) print("Total Score= %f" % total) print("Nb Scores= %f" % nbResults) conn.close() except Exception as e: print("[Errno {0}] {1}".format(e.errno, e.strerror)) def encode(txt): if txt: try: return txt.encode('utf-8') except: try: return txt.decode().encode('utf-8') except: return txt return "" def ReadFileWithFilter(filePath, filterText): f = open(filePath, "r") line = f.readline() strExp = '' documents = [] batches = [] i=0 ids = [] while(line != ''): if(line != '\n'): data = json.loads(line) if(data.get('text',0) != None): twittText = encode(data.get('text',0)) currentId = encode(data.get('id_str',0)) if(filterText in encode(twittText) and currentId not in ids): ids.append(currentId) documents.append({"Id":currentId,"text" : encode(twittText)}) i +=1 if(i >= 1000): jsonBody = {"documents":documents} strExp = json.dumps(jsonBody) batches.append(strExp) documents = [] i = 0 line = f.readline() if(len(documents) > 0): jsonBody = {"documents":documents} strExp = json.dumps(jsonBody) batches.append(strExp) print("Generated %d batches" % len(batches)) return batches candidats = ['Marine Le Pen', 'Francois Fillon', 'Emmanuel Macron', 'Benoit Hamon','Jean-Luc Melenchon'] for candidat in candidats: print(candidat) print("=============") marinLePenTweets = ReadFileWithFilter('.\stream_2017_03_26.txt', candidat) AnalyseTextSentiment(marinLePenTweets)
Below an example out put:
Marine Le Pen
=============
Generated 5 batches
Avg Score= 0.631855
Total Score= 3159.275843
Nb Scores= 5000.000000
Francois Fillon
=============
Generated 1 batches
Avg Score= 0.648703
Total Score= 24.650710
Nb Scores= 38.000000
Emmanuel Macron
=============
Generated 2 batches
Avg Score= 0.706528
Total Score= 1413.056910
Nb Scores= 2000.000000
Benoit Hamon
=============
Generated 1 batches
Avg Score= 0.681619
Total Score= 109.058961
Nb Scores= 160.000000
Jean-Luc Melenchon
=============
Generated 1 batches
Avg Score= 0.727667
Total Score= 8.732005
Nb Scores= 12.000000
=============
Generated 5 batches
Avg Score= 0.631855
Total Score= 3159.275843
Nb Scores= 5000.000000
Francois Fillon
=============
Generated 1 batches
Avg Score= 0.648703
Total Score= 24.650710
Nb Scores= 38.000000
Emmanuel Macron
=============
Generated 2 batches
Avg Score= 0.706528
Total Score= 1413.056910
Nb Scores= 2000.000000
Benoit Hamon
=============
Generated 1 batches
Avg Score= 0.681619
Total Score= 109.058961
Nb Scores= 160.000000
Jean-Luc Melenchon
=============
Generated 1 batches
Avg Score= 0.727667
Total Score= 8.732005
Nb Scores= 12.000000
Comments
Post a Comment