python - Completing a function to add Values depending on specific "Regions" (More info provided) -
i have 2 files, 1 containing on 200 tweets, , containing key words , values. typical tweet looks like: (i provided code below)
[41.923916200000001, -88.777469199999999] 6 2011-08-28 19:24:18 life moviee. ( number in brackets , words after time relevant)
and keywords like
love,10 like,5 best,10 hate,1
with 2 numbers @ beginning of tweet, use determine region tweet made in (shown below in code). & each individual tweet (each line in file), depending on number of keywords in tweet, add them, divided total of values associated them (per tweet) gives me score. my question is, how able total scores tweets in region , divide number of tweets in region? below, put happynesstweetscore, how calculated score individual tweets in file (each line) contain keywords. for part, i'm not sure how add values depending on region, , divide them depending on number of tweets in region? should add them list depending on region add?? don't know. i started this:
def score(tweet): total = 0 total_value = 0 word in tweet: if word in sentiments: total_value += sentiments[word] total_count += 1 return total_value, total_count
but dont know how use in order total scores of tweets in each region indivdually , divide number of tweets in region?
i divided tweets 4 regions (latitude, long) using these values (rectangle) way @ bottom of code:
p1 = (49.189787, -67.444574) p2 = (24.660845, -67.444574) p3 = (49.189787, -87.518395) p4 = (24.660845, -87.518395) p5 = (49.189787, -101.998892) p6 = (24.660845, -101.998892) p7 = (49.189787, -115.236428) p8 = (24.660845, -115.236428) p9 = (49.189787, -125.242264) p10 = (24.660845, -125.242264) collections import counter try: keyw_path = input("enter file named keywords: ") keyfile = open(keyw_path, "r") except ioerror: print("error: file not found.") exit() # read keywords list keywords = {} wordfile = open('keywords.txt', 'r') line in wordfile.readlines(): word = line.replace('\n', '') if not(word in keywords.keys()): #checks word doesn't exist. keywords[word] = 0 # adds word db. wordfile.close() # read file name user , open file. try: tweet_path = input("enter file named tweets: ") tweetfile = open(tweet_path, "r") except ioerror: print("error: file not found.") exit() #calculating sentiment values open('keywords.txt') f: sentiments = {word: int(value) word, value in (line.split(",") line in f)} open('tweets.txt') f: line in f: values = counter(word word in line.split() if word in sentiments) if not values: continue keyw = ["love", "like", "best", "hate", "lol", "better", "worst", "good", "happy", "haha", "please", "great", "bad", "save", "saved", "pretty", "greatest", 'excited', 'tired', 'thanks', 'amazing', 'glad', 'ruined', 'negative', 'loving', 'sorry', 'hurt', 'alone', 'sad', 'positive', 'regrets', 'god'] open('tweets.txt') oldfile, open('newfile.txt', 'w') newfile: line in oldfile: if any(word in line word in keyw): newfile.write(line) def score(tweet): total = 0 word in tweet: if word in sentiments: total += 1 return total def total(score): sum = 0 number in score: if number in values: sum += 1 #classifying regions class region: def __init__(self, lat_range, long_range): self.lat_range = lat_range self.long_range = long_range def contains(self, lat, long): return self.lat_range[0] <= lat , lat < self.lat_range[1] and\ self.long_range[0] <= long , long < self.long_range[1] eastern = region((24.660845, 49.189787), (-87.518395, -67.444574)) central = region((24.660845, 49.189787), (-101.998892, -87.518395)) mountain = region((24.660845, 49.189787), (-115.236428, -101.998892)) pacific = region((24.660845, 49.189787), (-125.242264, -115.236428)) eastscore = 0 centralscore = 0 pacificscore = 0 mountainscore = 0 happyscoree = 0 line in open('newfile.txt'): line = line.split(" ") lat = float(line[0][1:-1]) #stripping [ , , long = float(line[1][:-1]) #stripping ] if eastern.contains(lat, long): eastscore += score(line) elif central.contains(lat, long): centralscore += score(line) elif mountain.contains(lat, long): mountainscore += score(line) elif pacific.contains(lat, long): pacificscore += score(line) else: continue
lets - said, have file containting data like:
love,10 movie,5
first of all, create dictionary file.
kw_to_score = {} kw_file = 'keywords.txt' open(kw_file, 'r') kwf: line in kwf.readlines(): word, score = line.split(',') kw_to_score[word] = int(score)
one done it, need create score function:
def score(tweet, keywords): score = 0 count = 0 word in tweet.split(): # split words spaces if word in keywords: score += keywords[word] count += 1 return score, count
after that, continue..
class region: def __init__(self, lat_range, long_range): self.lat_range = lat_range self.long_range = long_range self.score = 0 # add new field self.quantity = 0 # add new field def contains(self, lat, long): return self.lat_range[0] <= lat , lat < self.lat_range[1] and\ self.long_range[0] <= long , long < self.long_range[1] eastern = region((24.660845, 49.189787), (-87.518395, -67.444574)) central = region((24.660845, 49.189787), (-101.998892, -87.518395)) mountain = region((24.660845, 49.189787), (-115.236428, -101.998892)) pacific = region((24.660845, 49.189787), (-125.242264, -115.236428)) line in open('newfile.txt'): line = line.split(" ") lat = float(line[0][1:-1]) #stripping [ , , long = float(line[1][:-1]) #stripping ] region in (eastern, central, mountain, pacific): if region.contains(lat, long): region_score, count = score(line, kw_to_score) # pass dict keywords mapped score region.score += region_score region.quantity += count
then need go for:
print(eastern.score / eastern.quantity) # give avg.
Comments
Post a Comment