Special character use in Python 2.6 -


i more bit tired, here goes:

i doing tome html scraping in python 2.6.5 beautifulsoap on ubuntubox

reason python 2.6.5: beautifulsoap sucks under 3.1

i try run following code:

# dataretriveal html files detherm # -*- coding: utf-8 -*-  import sys,os,re,csv beautifulsoup import beautifulsoup   sys.path.insert(0, os.getcwd())  raw_data = open('download.php.html','r') soup = beautifulsoup(raw_data)  numdiv in soup.findall('div', {"id" : "sec"}):     currenttable = numdiv.find('table',{"class" : "data"})     if currenttable:         numrow=0         numcol=0         data_list=[]         row in currenttable.findall('td', {"class" : "datahead"}):             numrow=numrow+1         ncol in currenttable.findall('th', {"class" : "datahead"}):             numcol=numcol+1         col in currenttable.findall('td'):             col2 = ''.join(col.findall(text=true))         if col2.index('±'):         col2=col2[:col2.index('±')]             print(col2.encode("utf-8"))         ref=numdiv.find('a')         niceref=''.join(ref.findall(text=true)) 

now due ± signs following error when trying interprent code with:

python code.py

traceback (most recent call last): file "detherm-wtest.py", line 25, in if col2.index('±'): unicodedecodeerror: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

how solve this? putting u in have: '±' -> u'±' results in:

traceback (most recent call last): file "detherm-wtest.py", line 25, in if col2.index(u'±'): valueerror: substring not found

current code file encoding utf-8

thank you

byte strings "±" (in python 2.x) encoded in source file's encoding, might not want. if col2 unicode object, should use u"±" instead tried. might know somestring.index raises exception if doesn't find occurrence whereas somestring.find returns -1. therefore, this

    if col2.index('±'):         col2=col2[:col2.index('±')] # not indented correctly in question btw         print(col2.encode("utf-8")) 

should be

    if u'±' in col2:         col2=col2[:col2.index(u'±')]         print(col2.encode("utf-8")) 

so if statement doesn't lead exception.


Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -