Python: How to total of the scores for all the tweets in a region divided by the number of tweets (Lots of info provided,) -
i have 2 files, 1 containing on 200 tweets, , containing key words , values. typical tweet looks like:
[41.923916200000001, -88.777469199999999] 6 2011-08-28 19:24:18 life moviee.
and keywords
love,10 like,5 best,10 hate,1
with 2 numbers @ beginning of tweet, use determine region tweet made in (shown below in code). & each individual tweet (each line in file), depending on number of keywords in tweet, add them, divided total of values associated them (per tweet) gives me score. question is, how able total scores tweets in region , divide number of tweets in region? below, put happynesstweetscore, how calculated score individual tweets in file (each line) contain keywords. part, i'm not sure how add values depending on region, , divide them depending on number of tweets in region? should add them list depending on region add?? don't know. divided tweets 4 regions (latitude, long) using these values (rectangle) way @ bottom of code:
p1 = (49.189787, -67.444574) p2 = (24.660845, -67.444574) p3 = (49.189787, -87.518395) p4 = (24.660845, -87.518395) p5 = (49.189787, -101.998892) p6 = (24.660845, -101.998892) p7 = (49.189787, -115.236428) p8 = (24.660845, -115.236428) p9 = (49.189787, -125.242264) p10 = (24.660845, -125.242264) collections import counter try: keyw_path = input("enter file named keywords: ") keyfile = open(keyw_path, "r") except ioerror: print("error: file not found.") exit() # read keywords list keywords = {} wordfile = open('keywords.txt', 'r') line in wordfile.readlines(): word = line.replace('\n', '') if not(word in keywords.keys()): #checks word doesn't exist. keywords[word] = 0 # adds word db. wordfile.close() # read file name user , open file. try: tweet_path = input("enter file named tweets: ") tweetfile = open(tweet_path, "r") except ioerror: print("error: file not found.") exit() #calculating sentiment values open('keywords.txt') f: sentiments = {word: int(value) word, value in (line.split(",") line in f)} open('tweets.txt') f: line in f: values = counter(word word in line.split() if word in sentiments) if not values: continue keyw = ["love", "like", "best", "hate", "lol", "better", "worst", "good", "happy", "haha", "please", "great", "bad", "save", "saved", "pretty", "greatest", 'excited', 'tired', 'thanks', 'amazing', 'glad', 'ruined', 'negative', 'loving', 'sorry', 'hurt', 'alone', 'sad', 'positive', 'regrets', 'god'] open('tweets.txt') oldfile, open('newfile.txt', 'w') newfile: line in oldfile: if any(word in line word in keyw): newfile.write(line) def score(tweet): total = 0 word in tweet: if word in sentiments: total += 1 return total def total(score): sum = 0 number in score: if number in values: sum += 1 #classifying regions class region: def __init__(self, lat_range, long_range): self.lat_range = lat_range self.long_range = long_range def contains(self, lat, long): return self.lat_range[0] <= lat , lat < self.lat_range[1] and\ self.long_range[0] <= long , long < self.long_range[1] eastern = region((24.660845, 49.189787), (-87.518395, -67.444574)) central = region((24.660845, 49.189787), (-101.998892, -87.518395)) mountain = region((24.660845, 49.189787), (-115.236428, -101.998892)) pacific = region((24.660845, 49.189787), (-125.242264, -115.236428)) eastscore = 0 centralscore = 0 pacificscore = 0 mountainscore = 0 happyscoree = 0 line in open('newfile.txt'): line = line.split(" ") lat = float(line[0][1:-1]) #stripping [ , , long = float(line[1][:-1]) #stripping ] if eastern.contains(lat, long): eastscore += score(line) elif central.contains(lat, long): centralscore += score(line) elif mountain.contains(lat, long): mountainscore += score(line) elif pacific.contains(lat, long): pacificscore += score(line) else: continue
ok, 2 places work on line parsing , score function. have score function return both total score , valid word count, can keep track of both in main block. right not pulling value associated keywords score function...
def score(tweet): total = 0 # don't need this, a*al way, total_value = 0 # said word in tweet: word = word.lower() # presuming keywords in lower case if word in sentiments: total_value += sentiments[word] # have pull value total_count += 1 # , keep running total return total_value, total_count
this called called:
for line in open('newfile.txt'): line = line.split(" ") lat = float(line[0][1:-1]) #stripping [ , , long = float(line[1][:-1]) #stripping ] if eastern.contains(lat, long): line_score, line_count = score(line) eastscore += line_score eastcount += line_count elif central.contains(lat, long): line_score, line_count = score(line) centralscore += line_score centralcount += line_count elif mountain.contains(lat, long): line_score, line_count = score(line) mountainscore += line_score mountaincount += line_count elif pacific.contains(lat, long): line_score, line_count = score(line) pacificscore += line_score pacificcount += line_count else: continue
if line parsing works, i'll leave alone, prefer more explicit cute tedium of data preparation. explicit makes sense @ midnight, when have revisit code; cute requires 2 cups of coffee.
for line in open('newfile.txt'): line = line.replace('[','') # kill leading bracket text = line.split(']') # split coords / words using trailing bracket coordinates = text[0].split(',') # split coords on comma.... iirc lat = float(coordinates[0].replace(' ','')) # kill spaces long = float(coordinates[1].replace(' ','')) # ditto word_stream = text[1] # later called via score(word_stream)
Comments
Post a Comment