Unicode

Python 3.6.0 » Documentation » Python HOWTOs » Unicode HOWTO
https://docs.python.org/3/howto/unicode.html

parserwith open(filename, encoding=’utf-16′) as f:

>> from nltk.corpus import names

>> labeled_names = ([(name, ‘male’) for name in names.words(‘male.txt’)] + [(name, ‘female’) for name in names.words(‘female.txt’)])

>> len(labeled_names)
7944

>> for i in range(3):
print(labeled_names[i])
(u’Aamir’, ‘male’)
(u’Aaron’, ‘male’)
(u’Abbey’, ‘male’)

>> labeled_names[881][0]
u’Franz’
>>> type(labeled_names[881][0])
<type ‘unicode’>
>>> type(labeled_names[881][0].decode(“utf-8”))
<type ‘unicode’>
>>> labeled_names[881][0].encode(“ascii”)
‘Franz’

ch.isdigit() will return True if ch has either No or Nd Unicode property.
http://stackoverflow.com/questions/9480419/best-way-to-check-the-type-of-a-variable/27797640#27797640

Python 2 had unicode() function
http://www.diveintopython3.net/porting-code-to-python-3-with-2to3.html

Python 2.7 > Unicode strings
https://docs.python.org/2/tutorial/introduction.html#unicode-strings

Advertisements

One thought on “Unicode

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s