extract data from https site into python using urllib (your request cannot be completed error) -
i've been attempting extract contents of https website python using urllib. i've used 4 lines of code.
import urllib fhand = urllib.urlopen('https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startpage=1#search-results') line in fhand: print line.strip()
the connection appears working page being opened python. i'm getting few different error messages in output in title, heading , paragraph headings below. had expected output series of html tags containing data available on website such address, base rates , case number (ie html available if go elements on google chrome developer). can guide me towards getting data python please?
thank & regards
<!doctype html> <html class="no-branding"><head><meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>your request cannot completed - gov.uk</title> <link href="/edge-assets/gone.css" media="screen" rel="stylesheet" type="text/css"> <!--[if lte ie 8]><link href="/edge-assets/ie.css" media="screen" rel="stylesheet" type="text/css"><![endif]--> <link rel="icon" href="/edge-assets/govukfavicon.ico" type="image/x-icon" /> </head> <body> <div id="wrapper"> <div id="banner" role="banner"> <div class="inner"> <h1> <a href="https://www.gov.uk/"> <img src="/edge-assets/govuk-logo.png" alt="gov.uk"> </a> </h1> </div> </div> <div id="message" role="main"> <div class="inner"> <div id="detail"> <h2>sorry, there problem handling request.</h2> <p class="call-to-action">please try again shortly.</p> </div> <div id="footer"> </div> </div> </div> </div> </body></html>
some website block requests when user-agent not specified or not desirable them. try adding user-agent in headers of request
import urllib2 headers = {'user-agent': 'mozilla/5.0'} url = 'https://www.tax.service.gov.uk/view-my-valuation/list-valuations-by-postcode?postcode=w1a&startpage=1#search-results' req = urllib2.request(url, headers=headers) f = urllib2.urlopen(req) s = f.read() print s f.close()
or alternatively can pip install requests
, use print(requests.get(url).text)
Comments
Post a Comment