How do I loop through scraped data, and export the results to a CSV file, with each dictionary as a new row in Python? -
i new coding, , messing around how export scraped data csv.
problem
my script goes through set of similar pages , extracts data each page , stores dictionary. each dictionary has same keys, different values, i.e. each scraped page has dictionary associated it, although keys same.
i export individual dictionaries, once scraped, csv file, each dictionary occupying 1 row, struggling figure out syntax.
do need create dictionary of dictionaries? or can each scraped dictionary appended single csv file?
cheers,
this have far:
papers = [] urls = [] dict = {'topics':0,'link':0, 'heading':0,"summary intro":0,"summary text":0, "date":0} in range(1,4): url = str(a) + str(i) + str(c) urls.append(url) pprint(urls) url in urls: print url html = urllib2.urlopen(url).read() soup = beautifulsoup(html) soup.find_all('div', class_="bp-paper-item commons") link in soup.find_all('a', class_="title"): pdflist = [] pdflink1 = 'https://researchbriefings.parliament.uk' pdflink2 = link.get('href') pdflink = pdflink1 + pdflink2 x = str(pdflink) dict['link'] = x pdfsoup = urllib2.urlopen(x).read() #opens link pdf pdfdata = beautifulsoup(pdfsoup) date in pdfdata.find_all('div', id="bp-published-date"): dict['date'] = date.text.encode('utf').strip() heading in pdfdata.find_all('h1'): dict['heading'] = heading.text.encode('utf').strip() topics in pdfdata.find_all('div', id="bp-summary-metadata"): dict['topics'] = topics.text.encode('utf').strip() downloadlink in pdfdata.find_all('div',id="bp-summary-fullreport"): dl = downloadlink.find('a', id="bp-summary-fullreport-link") print dl tocsv = [dict] keys = tocsv[0].keys() open('ukparl.csv', 'wb') output_file: dict_writer = csv.dictwriter(output_file, keys) dict_writer.writeheader() dict_writer.writerows(tocsv)
Comments
Post a Comment