twitter - How to write multiple txt files in Python? -
i doing preprocessing tweet in python. unpreprocess tweets in folder. each file containing unpreprocess tweet named 1.txt, 2.txt,...10000.txt. want preprocess them , write them new files named 1.txt , 2.txt,...10000.txt. code follows :
for filename in glob.glob(os.path.join(path, '*.txt')): open(filename) file: tweet=file.read() def processtweet(tweet): tweet = tweet.lower() tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','url',tweet) tweet = re.sub('@[^\s]+','user',tweet) tweet = re.sub('[\s]+', ' ', tweet) tweet = re.sub(r'#([^\s]+)', r'\1', tweet) tweet = tweet.translate(none, string.punctuation) tweet = tweet.strip('\'"') return tweet fp = open(filename) line = fp.readline() count = 0 processedtweet = processtweet(line) line = fp.readline() count += 1 name = str(count) + ".txt" file = open(name, "w") file.write(processedtweet) file.close()
but code give me new file named 1.txt preprocessed. how can write other 9999 files? there mistake in code?
your count getting reset 0 call count=0. everytime write file, write "1.txt". why trying reconstruct filename, instead of using existing filename tweet preprocessing. also, should move function definition outside loop:
def processtweet(tweet): tweet = tweet.lower() tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','url',tweet) tweet = re.sub('@[^\s]+','user',tweet) tweet = re.sub('[\s]+', ' ', tweet) tweet = re.sub(r'#([^\s]+)', r'\1', tweet) tweet = tweet.translate(none, string.punctuation) tweet = tweet.strip('\'"') return tweet filename in glob.glob(os.path.join(path, '*.txt')): open(filename) file: tweet=file.read() processedtweet = processtweet(tweet) file = open(filename, "w") file.write(processedtweet) file.close()
Comments
Post a Comment