ASCII code
ASCII code

def orc(file_name):
____file_object = open(file_name)
____while True:
________paragraph = file_object.readline()
________if paragraph == “”:
____________#print(“EOF has been reached”)
____________return file_object.close()

>>> orc(“temp.txt”)



Python 3.6.0 » Documentation » Python HOWTOs » Unicode HOWTO

parserwith open(filename, encoding=’utf-16′) as f:

>> from nltk.corpus import names

>> labeled_names = ([(name, ‘male’) for name in names.words(‘male.txt’)] + [(name, ‘female’) for name in names.words(‘female.txt’)])

>> len(labeled_names)

>> for i in range(3):
(u’Aamir’, ‘male’)
(u’Aaron’, ‘male’)
(u’Abbey’, ‘male’)

>> labeled_names[881][0]
>>> type(labeled_names[881][0])
<type ‘unicode’>
>>> type(labeled_names[881][0].decode(“utf-8”))
<type ‘unicode’>
>>> labeled_names[881][0].encode(“ascii”)

ch.isdigit() will return True if ch has either No or Nd Unicode property.

Python 2 had unicode() function

Python 2.7 > Unicode strings

Parsing XML from a webpage

ParseErrorParsing XML from a webpage

import urllib.request
import xml.etree.ElementTree as ET

url = ‘

f = urllib.request.urlopen(url)
data =“utf-8”)


root = ET.fromstring(data)
-> ParseError


>>from bs4 import BeautifulSoup
>>>html_tag = BeautifulSoup(data)(‘html’)[0]

XML instance:

Socket Programming

socket_programmingSocket Programming HOWTO

18.1. socket — Low-level networking interface

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((‘’, 80))
s.send(b‘GET HTTP/1.0\n\n’)
while True:
____data = s.recv(512)    #or 1024
____if (len(data) < 1):


21.6. urllib.request — Extensible library for opening URLs

import urllib.request

url = ‘
s = urllib.request.urlopen(url)
for line in s:


import urllib.request

url = ‘

with urllib.request.urlopen(url) as f:


The with statement


import urllib.request
url = ‘
local_filename, headers = urllib.request.urlretrieve(url)

get_content_typePython: How to get the Content-Type of an URL?

Using Python to Access Web Data
Coursera, Oct 26 — Dec 14, 2015.

for bytes() or decode():

for more info on the “with” statement:

Python Source Code Encoding

2.2. The Interpreter and Its Environment
2.2.1. Source Code Encoding
By default, Python source files are treated as encoded in UTF-8.
In that encoding, characters of most languages in the world can be used simultaneously in string literals, identifiers and comments — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow.

PEP 0263 — Defining Python Source Code Encodings

Python will default to ASCII as standard encoding if no other encoding hints are given.

The default encoding was set to “ascii” in version 2.5.

ANSI format


‘\ufeffChapter’ .decode(‘utf-8-sig’)