extract data from https site into python using urllib (your request cannot be completed error) -

- April 15, 2012

i've been attempting extract contents of https website python using urllib. i've used 4 lines of code.

import urllib fhand = urllib.urlopen('https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startpage=1#search-results')  line in fhand:     print line.strip()

the connection appears working page being opened python. i'm getting few different error messages in output in title, heading , paragraph headings below. had expected output series of html tags containing data available on website such address, base rates , case number (ie html available if go elements on google chrome developer). can guide me towards getting data python please?

thank & regards

<!doctype html> <html class="no-branding"><head><meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>your request cannot completed - gov.uk</title> <link href="/edge-assets/gone.css" media="screen" rel="stylesheet" type="text/css"> <!--[if lte ie 8]><link href="/edge-assets/ie.css" media="screen" rel="stylesheet" type="text/css"><![endif]--> <link rel="icon" href="/edge-assets/govukfavicon.ico" type="image/x-icon" /> </head> <body> <div id="wrapper"> <div id="banner" role="banner"> <div class="inner"> <h1> <a href="https://www.gov.uk/"> <img src="/edge-assets/govuk-logo.png" alt="gov.uk"> </a> </h1> </div> </div> <div id="message" role="main"> <div class="inner"> <div id="detail"> <h2>sorry, there problem handling request.</h2> <p class="call-to-action">please try again shortly.</p> </div> <div id="footer"> </div> </div> </div> </div> </body></html>

some website block requests when user-agent not specified or not desirable them. try adding user-agent in headers of request

import urllib2   headers = {'user-agent': 'mozilla/5.0'} url = 'https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startpage=1#search-results' req = urllib2.request(url, headers=headers) f = urllib2.urlopen(req) s = f.read() print s f.close()

or alternatively can pip install requests , use print(requests.get(url).text)

Search This Blog

QR

extract data from https site into python using urllib (your request cannot be completed error) -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -