Special character use in Python 2.6 -
i more bit tired, here goes:
i doing tome html scraping in python 2.6.5 beautifulsoap on ubuntubox
reason python 2.6.5: beautifulsoap sucks under 3.1
i try run following code:
# dataretriveal html files detherm # -*- coding: utf-8 -*- import sys,os,re,csv beautifulsoup import beautifulsoup sys.path.insert(0, os.getcwd()) raw_data = open('download.php.html','r') soup = beautifulsoup(raw_data) numdiv in soup.findall('div', {"id" : "sec"}): currenttable = numdiv.find('table',{"class" : "data"}) if currenttable: numrow=0 numcol=0 data_list=[] row in currenttable.findall('td', {"class" : "datahead"}): numrow=numrow+1 ncol in currenttable.findall('th', {"class" : "datahead"}): numcol=numcol+1 col in currenttable.findall('td'): col2 = ''.join(col.findall(text=true)) if col2.index('±'): col2=col2[:col2.index('±')] print(col2.encode("utf-8")) ref=numdiv.find('a') niceref=''.join(ref.findall(text=true))
now due ± signs following error when trying interprent code with:
python code.py
traceback (most recent call last): file "detherm-wtest.py", line 25, in if col2.index('±'): unicodedecodeerror: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
how solve this? putting u in have: '±' -> u'±' results in:
traceback (most recent call last): file "detherm-wtest.py", line 25, in if col2.index(u'±'): valueerror: substring not found
current code file encoding utf-8
thank you
byte strings "±"
(in python 2.x) encoded in source file's encoding, might not want. if col2
unicode object, should use u"±"
instead tried. might know somestring.index
raises exception if doesn't find occurrence whereas somestring.find
returns -1. therefore, this
if col2.index('±'): col2=col2[:col2.index('±')] # not indented correctly in question btw print(col2.encode("utf-8"))
should be
if u'±' in col2: col2=col2[:col2.index(u'±')] print(col2.encode("utf-8"))
so if statement doesn't lead exception.
Comments
Post a Comment