html.unescape(s)

html_unescape html — HyperText Markup Language support > html.unescape(s)
https://docs.python.org/3.6/library/html.html

indexerrorPython 3.4.1 html parser is unable to process “H&E”, even after html.unescape()

Advertisements

Parsing XML from a webpage

ParseErrorParsing XML from a webpage

import urllib.request
import xml.etree.ElementTree as ET

url = ‘http://www.oxfordlearnersdictionaries.com/us/definition/english/felicity

f = urllib.request.urlopen(url)
data = f.read().decode(“utf-8”)

print(len(data))

root = ET.fromstring(data)
-> ParseError

print_line

>>from bs4 import BeautifulSoup
>>>
>>>html_tag = BeautifulSoup(data)(‘html’)[0]
bs4_element

XML instance:
https://d18ky98rnyall9.cloudfront.net/aFJF93QMEeWtlRLKY8QGgw.processed/full/360p/index.mp4

XML tutorials

XML tutorials
http://www.w3schools.com

XML examples
http://www.w3schools.com/xml/xml_examples.asp

Extracting data from XML: geocode example
file format: .py
adapted for Python 3.4.1 from: http://www.pythonlearn.com/code/geoxml.py

Extracting data from XML: exercise
file format: .py
reference: https://docs.python.org/3.4/library/xml.etree.elementtree.html

from:
Using Python to Access Web Data
Coursera, Oct 26 — Dec 14, 2015.
https://www.coursera.org/learn/python-network-data