Skip to content

不能处理unicode的中文字符? #7

@GoogleCodeExporter

Description

@GoogleCodeExporter
如题,这个函数不能处理unicode的中文字符串吗?
比如,cuttest(u"我喜欢python和c++。")
报错:
Traceback (most recent call last):
  File "D:\bluecat2\Desktop\smallseg_0.5.1\test_fenci.py", line 41, in <module>
    cuttest(u"我喜欢python和c++。")
  File "D:\bluecat2\Desktop\smallseg_0.5.1\test_fenci.py", line 18, in cuttest
    wlist = seg.cut(text)
  File "D:\bluecat2\Desktop\smallseg_0.5.1\smallseg.py", line 56, in cut
    text = text.decode('utf-8','ignore')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)
Windows, Python 2.7


Original issue reported on code.google.com by blurr...@gmail.com on 22 Feb 2012 at 12:50

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions