ASCII code

def orc(file_name):
____file_object = open(file_name)
____while True:
________paragraph = file_object.readline()
________if paragraph == “”:
____________#print(“EOF has been reached”)
____________return file_object.close()

>>> orc(“temp.txt”)



parserwith open(filename, encoding=’utf-16′) as f:

>> from nltk.corpus import names

>> labeled_names = ([(name, ‘male’) for name in names.words(‘male.txt’)] + [(name, ‘female’) for name in names.words(‘female.txt’)])

>> len(labeled_names)

>> for i in range(3):
(u’Aamir’, ‘male’)
(u’Aaron’, ‘male’)
(u’Abbey’, ‘male’)

>> labeled_names[881][0]
>>> type(labeled_names[881][0])
<type ‘unicode’>
>>> type(labeled_names[881][0].decode(“utf-8”))
<type ‘unicode’>
>>> labeled_names[881][0].encode(“ascii”)

ch.isdigit() will return True if ch has either No or Nd Unicode property.

Python 2 had unicode() function

import urllib.request
import xml.etree.ElementTree as ET

url = ‘

f = urllib.request.urlopen(url)
data =“utf-8”)


root = ET.fromstring(data)
-> ParseError


>>from bs4 import BeautifulSoup
>>>html_tag = BeautifulSoup(data)(‘html’)[0]

XML instance:

Socket Programming

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((‘’, 80))
s.send(b‘GET HTTP/1.0\n\n’)
while True:
____data = s.recv(512)    #or 1024
____if (len(data) < 1):


import urllib.request

url = ‘
s = urllib.request.urlopen(url)
for line in s:


import urllib.request

url = ‘

with urllib.request.urlopen(url) as f:


The with statement


import urllib.request
url = ‘
local_filename, headers = urllib.request.urlretrieve(url)

for bytes() or decode():

Python Source Code Encoding

By default, Python source files are treated as encoded in UTF-8.
In that encoding, characters of most languages in the world can be used simultaneously in string literals, identifiers and comments — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow.

Python will default to ASCII as standard encoding if no other encoding hints are given.

The default encoding was set to “ascii” in version 2.5.

ANSI format


‘\ufeffChapter’ .decode(‘utf-8-sig’)