json - Python 'ascii' codec can't encode character with request.get -


i have python program crawls data site , returns json. crawled site has meta tag charset = iso-8859-1. here source code:

url = 'https://www.example.com' source_code = requests.get(url) plain_text = source_code.text 

after getting information beautiful soup , creating json. problem is, symbols i.e. symbol displayed \u0080 or \x80 (in python) can't use or decode them in php. tried plain_text.decode('iso-8859-1) , plain_text.decode('cp1252') encode them afterwards utf-8 every time error: 'ascii' codec can't encode character u'\xf6' in position 8496: ordinal not in range(128).

edit

the new code after @chriskoston suggestion using .content instead of .text

url = 'https://www.example.com' source_code = requests.get(url) plain_text = source_code.content the_sourcecode = plain_text.decode('cp1252').encode('utf-8') soup = beautifulsoup(the_sourcecode, 'html.parser') 

encoding , decoding possible still character problem.

edit2

the solution set .content.decode('cp1252')

url = 'https://www.example.com' source_code = requests.get(url) plain_text = source_code.content.decode('cp1252') soup = beautifulsoup(plain_text, 'html.parser') 

special tomalak solution

you must store result of decode() somewhere because not modify original variable.

another thing:

  • decode() turns list of bytes string.
  • encode() oposite, turns string list of bytes

beautifulsoup happy strings; don't need use encode() @ all.

import requests bs4 import beautifulsoup  url = 'https://www.example.com' response = requests.get(url) html = response.content.decode('cp1252') soup = beautifulsoup(html, 'html.parser') 

hint: working html might want @ pyquery instead of beautifulsoup.


Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -