How do I loop through scraped data, and export the results to a CSV file, with each dictionary as a new row in Python? -

- July 15, 2015

i new coding, , messing around how export scraped data csv.

problem

my script goes through set of similar pages , extracts data each page , stores dictionary. each dictionary has same keys, different values, i.e. each scraped page has dictionary associated it, although keys same.

i export individual dictionaries, once scraped, csv file, each dictionary occupying 1 row, struggling figure out syntax.

do need create dictionary of dictionaries? or can each scraped dictionary appended single csv file?

cheers,

this have far:

papers = []  urls  = []  dict = {'topics':0,'link':0, 'heading':0,"summary intro":0,"summary text":0, "date":0}      in range(1,4):                 url = str(a) + str(i) + str(c)       urls.append(url)  pprint(urls)     url in urls:      print url          html = urllib2.urlopen(url).read()      soup = beautifulsoup(html)      soup.find_all('div', class_="bp-paper-item commons")      link in soup.find_all('a', class_="title"):          pdflist = []                      pdflink1 = 'https://researchbriefings.parliament.uk'           pdflink2 = link.get('href')          pdflink = pdflink1 + pdflink2          x = str(pdflink)          dict['link'] = x          pdfsoup = urllib2.urlopen(x).read() #opens link pdf          pdfdata = beautifulsoup(pdfsoup)          date in pdfdata.find_all('div', id="bp-published-date"):              dict['date'] = date.text.encode('utf').strip()          heading in pdfdata.find_all('h1'):              dict['heading'] = heading.text.encode('utf').strip()                       topics in pdfdata.find_all('div', id="bp-summary-metadata"):                         dict['topics'] = topics.text.encode('utf').strip()                            downloadlink in pdfdata.find_all('div',id="bp-summary-fullreport"):              dl = downloadlink.find('a', id="bp-summary-fullreport-link")              print dl          tocsv = [dict]          keys = tocsv[0].keys()          open('ukparl.csv', 'wb') output_file:              dict_writer = csv.dictwriter(output_file, keys)              dict_writer.writeheader()              dict_writer.writerows(tocsv)

Search This Blog

QR

How do I loop through scraped data, and export the results to a CSV file, with each dictionary as a new row in Python? -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -