Frequency of occurence of a term in a tweet dataset →

Twitter Sentiment Analysis

July 16, 2014 Leave a comment

I finally got around to working on this problem however simple it may be.

The algorithm was proposed by another ‘Data Science’ course participant and I haven’t implemented the algorithm from this paper

I can explore that later.

This simple algorithm discussed in the forums is this.

1. Find all words in a tweet that exist in a master list. This list already associates a Valence score for a word. Scores can be positive or negative numbers.

2. Find the scores of these words and add them. This is the total score of the tweet.

3. Find all words from the tweet that don’t exist in the master list. These are the non-sentimental words.

4. If such a non-sentimental word occurs in a tweet with a positive score add 1 to a value associated with this word. If the non-sentimental word occurs in a tweet with a negative score or if the score is ‘0’ subtract one from the value associated with the word. The effect on the sentiment when we equate a negative score with ‘0’(else part of the if loop) is not explored. As I mentioned this is a simple algorithm.

This is accomplished by using a dictionary of words with each word associated with a list of two values, one for the positive accumulator and one for the negative accumulator.

import json
import sys
import types
import os
import os.path
import re

class Sentiment(object):

 
   def __init__(self):

        if not (os.path.isfile(sys.argv[1]) and os.access(sys.argv[1], os.R_OK) and os.path.isfile(sys.argv[2]) and os.access(sys.argv[2], os.R_OK)):
            print "Either files are missing or they are not readable"
    
        self.nonsentimentalwords = {}
        self.sent_file = open(sys.argv[1],'r')
        self.tweet_file = open(sys.argv[2],'r')

   def loadscores(self):
        self.scores = {} # initialize an empty dictionary
        for line in self.sent_file:
          term, score  = line.split("\t")  # The file is tab-delimited. "\t" means "tab character"
          self.scores[term] = int(score)  # Convert the score to an integer.

   def score(self,text):
        count = 0
        tweet = text.split()
        for s in tweet:
            if self.scores.has_key(s):
                count = count + self.scores.get(s)
        #print count             
        return count
   
   def scorenonsentimentalwords(self,text,count):
           tweet = text.split()
           for s in tweet:
                for s in tweet:
                    if (not self.scores.has_key(s.lower())) and (self.nonsentimentalwords.has_key(s.lower())):        
                        if count > 0:
                            self.nonsentimentalwords[s][0] = self.nonsentimentalwords[s][0] + 1 
                        else:   
                            self.nonsentimentalwords[s][1] = self.nonsentimentalwords[s][1] + 1 
   
   def addnonsentimentalwords(self,text):
       pos = 0
       neg = 0
       tweet = text.split()
       for s in tweet:
            if (not self.scores.has_key(s.lower())) and (not self.nonsentimentalwords.has_key(s.lower())):
                self.nonsentimentalwords[s] = [pos,neg]
                
   def analyze(self):
        with open(sys.argv[2],'r') as f:
            for data in f:
                d = json.loads(data)
                try: 
                    # print json-formatted string
                    #print json.dumps(d, sort_keys=True, indent=4)
                 
                    if d.get('text') and d.get('lang') == 'en':
                            #print "Tweet: ", d['text']
                            tex = re.sub("[^A-Z\sa-z]", "", d['text'])
                            count = Sentiment.score(self,tex)
                            Sentiment.addnonsentimentalwords(self,tex)
                            Sentiment.scorenonsentimentalwords(self,tex,count)

                except (ValueError, KeyError, TypeError):
                    print "Error"
        #for keys,values in self.nonsentimentalwords.items():
            #print(keys,values[0] - values[1],values)                
        for key, value in self.nonsentimentalwords.iteritems():
            print(str(key) + " " + str(value[0] - value[1]))              
            
                  
if __name__ == '__main__':


    sentiment=Sentiment()
    sentiment.loadscores()
    sentiment.analyze()

Filed under Python

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

MindSpace

Twitter Sentiment Analysis

Leave a comment Cancel reply

Blogroll

MindSpace

Twitter Sentiment Analysis

Share this:

Leave a comment Cancel reply

Blogroll