Python: How to total of the scores for all the tweets in a region divided by the number of tweets (Lots of info provided,) -

i have 2 files, 1 containing on 200 tweets, , containing key words , values. typical tweet looks like:

[41.923916200000001, -88.777469199999999] 6 2011-08-28 19:24:18 life moviee.

and keywords

love,10 like,5 best,10 hate,1

with 2 numbers @ beginning of tweet, use determine region tweet made in (shown below in code). & each individual tweet (each line in file), depending on number of keywords in tweet, add them, divided total of values associated them (per tweet) gives me score. question is, how able total scores tweets in region , divide number of tweets in region? below, put happynesstweetscore, how calculated score individual tweets in file (each line) contain keywords. part, i'm not sure how add values depending on region, , divide them depending on number of tweets in region? should add them list depending on region add?? don't know. divided tweets 4 regions (latitude, long) using these values (rectangle) way @ bottom of code:

p1 = (49.189787, -67.444574)  p2 = (24.660845, -67.444574)  p3 = (49.189787, -87.518395)  p4 = (24.660845, -87.518395)  p5 = (49.189787, -101.998892)  p6 = (24.660845, -101.998892)  p7 = (49.189787, -115.236428) p8 = (24.660845, -115.236428)  p9 = (49.189787, -125.242264) p10 = (24.660845, -125.242264)   collections import counter try:     keyw_path = input("enter file named keywords: ")     keyfile = open(keyw_path, "r") except ioerror:     print("error: file not found.")     exit() # read keywords list keywords = {} wordfile = open('keywords.txt', 'r') line in wordfile.readlines():     word = line.replace('\n', '')     if not(word in keywords.keys()): #checks word doesn't exist.         keywords[word] = 0 # adds word db. wordfile.close() # read file name user , open file. try:     tweet_path = input("enter file named tweets: ")     tweetfile = open(tweet_path, "r") except ioerror:     print("error: file not found.")     exit() #calculating sentiment values open('keywords.txt') f:     sentiments = {word: int(value) word, value in (line.split(",") line in f)}  open('tweets.txt') f:     line in f:         values = counter(word word in line.split() if word in sentiments)         if not values:             continue keyw = ["love", "like", "best", "hate", "lol", "better", "worst", "good", "happy", "haha", "please", "great", "bad", "save", "saved", "pretty", "greatest", 'excited', 'tired', 'thanks', 'amazing', 'glad', 'ruined', 'negative', 'loving', 'sorry', 'hurt', 'alone', 'sad', 'positive', 'regrets', 'god'] open('tweets.txt') oldfile, open('newfile.txt', 'w') newfile:     line in oldfile:         if any(word in line word in keyw):             newfile.write(line) def score(tweet):     total = 0     word in tweet:         if word in sentiments:             total += 1     return total def total(score):     sum = 0     number in score:         if number in values:             sum += 1 #classifying regions class region:     def __init__(self, lat_range, long_range):         self.lat_range = lat_range         self.long_range = long_range     def contains(self, lat, long):         return self.lat_range[0] <= lat , lat < self.lat_range[1] and\                self.long_range[0] <= long , long < self.long_range[1] eastern = region((24.660845, 49.189787), (-87.518395, -67.444574)) central = region((24.660845, 49.189787), (-101.998892, -87.518395)) mountain = region((24.660845, 49.189787), (-115.236428, -101.998892)) pacific = region((24.660845, 49.189787), (-125.242264, -115.236428))  eastscore = 0 centralscore = 0 pacificscore = 0 mountainscore = 0 happyscoree = 0  line in open('newfile.txt'):     line = line.split(" ")     lat = float(line[0][1:-1]) #stripping [ , ,     long = float(line[1][:-1])  #stripping ]     if eastern.contains(lat, long):         eastscore += score(line)     elif central.contains(lat, long):         centralscore += score(line)     elif mountain.contains(lat, long):         mountainscore += score(line)     elif pacific.contains(lat, long):         pacificscore += score(line)     else:         continue

ok, 2 places work on line parsing , score function. have score function return both total score , valid word count, can keep track of both in main block. right not pulling value associated keywords score function...

def score(tweet):     total = 0   # don't need this, a*al way,     total_value = 0  # said     word in tweet:         word = word.lower()    # presuming keywords in lower case         if word in sentiments:             total_value += sentiments[word]  # have pull value             total_count += 1                 # , keep running total     return total_value, total_count

this called called:

for line in open('newfile.txt'):     line = line.split(" ")     lat = float(line[0][1:-1]) #stripping [ , ,     long = float(line[1][:-1])  #stripping ]     if eastern.contains(lat, long):         line_score, line_count = score(line)         eastscore += line_score         eastcount += line_count     elif central.contains(lat, long):         line_score, line_count = score(line)         centralscore += line_score         centralcount += line_count     elif mountain.contains(lat, long):         line_score, line_count = score(line)         mountainscore += line_score         mountaincount += line_count     elif pacific.contains(lat, long):         line_score, line_count = score(line)         pacificscore += line_score         pacificcount += line_count     else:         continue

if line parsing works, i'll leave alone, prefer more explicit cute tedium of data preparation. explicit makes sense @ midnight, when have revisit code; cute requires 2 cups of coffee.

for line in open('newfile.txt'):     line = line.replace('[','')   # kill leading bracket     text = line.split(']')       # split coords / words using trailing bracket     coordinates = text[0].split(',')    # split coords on comma.... iirc     lat = float(coordinates[0].replace(' ',''))   # kill spaces     long = float(coordinates[1].replace(' ',''))  # ditto     word_stream = text[1]  # later called via score(word_stream)

Search This Blog

QR

Python: How to total of the scores for all the tweets in a region divided by the number of tweets (Lots of info provided,) -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -