NLTK word_tokenize()

cell_structureNLTK word_tokenize()
http://www.nltk.org/book/ch03.html#tokenization_index_term

>>>from nltk import word_tokenize

>>>word_tokenize(‘cells’ structure’)
[‘cells‘, ‘structure’]

>>with open(‘h.txt’) as f:
____word_tokenize(f.read())

Source code for nltk.collocations()
http://www.nltk.org/api/nltk.html
http://www.nltk.org/_modules/nltk/collocations.html

>>> dir(nltk.collocations)

>>> print(nltk.collocations.__doc__)

>>> with open(‘Histology14_Ch01_ALL.txt’) as f:
____nltk.Text(word_tokenize(f.read())).collocations()

cell nuclei;
electron microscopy;
fluorescent compounds;
glass slides;
gold particles;
labeled secondary; <=======
light microscope;
light microscopy;
MEDICAL APPLICATION;
nucleic acids;
objective lens;
organic solvents;
primary antibody;
resolving power;
secondary antibody;
secretory granules;
situ hybridization; <=======
tissue components;
tissue section;
tissue sections

>> with open(‘Histology14_Ch01_i.txt‘) as f:
nltk.Text(word_tokenize(f.read())).collocations()

matrix components;
tissue biology

 

Naturally, the quality of the collocations is also higher than computer-generated lists – as we would expect from a manually produced compilation.
p. 174
http://nlp.stanford.edu/fsnlp/promo/colloc.pdf
phrasal verbs: good example of a collocation with often non-adjacent words

related:
Collocations dictionary
https://franzcalvo.wordpress.com/2015/09/07/collocations

 

 

SQLite

SQLite
http://www.sqlite.org

https://docs.python.org/3.5/library/sqlite3.html

import sqlite3

conn = sqlite3.connect(’emaildb.sqlite’)
cur = conn.cursor()

cur.execute(”’DROP TABLE IF EXISTS Counts”’)

cur.execute(”’CREATE TABLE Counts (org TEXT, count INTEGER)”’)

fname = input(‘Enter file name: ‘)
if ( len(fname) < 1 ) : fname = ‘mbox.txt’
fh = open(fname)
for line in fh:
____if not line.startswith(‘From: ‘) : continue
pieces = line.split()
email = pieces[1]
organization = email.split(‘@’)[1]
cur.execute(‘SELECT count FROM Counts WHERE org = ? ‘, (organization, ))
row = cur.fetchone()
if row is None:
____cur.execute(”’INSERT INTO Counts (org, count)
____VALUES ( ?, 1 )”’, ( organization, ) )
else :
____cur.execute(‘UPDATE Counts SET count=count+1 WHERE org = ?’,
(organization, ))
print(organization)
conn.commit()

sqlstr = ‘SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10’

for row in cur.execute(sqlstr) :
____print(str(row[0]), row[1])

cur.close()

 

Python: Writing files

Codecademy_FileIOmy_list = [i**2 for i in range(1,11)]
# Generates a list of squares of the numbers 1 – 10

f = open(“output.txt”, “w”)

for item in my_list:
f.write(str(item) + “\n”)

f.close()

=============

with open(“text.txt”, “w”) as textfile:
textfile.write(“Success!”)

with open(“text.txt”, “w”) as my_file:
my_file.write(“I am Sam”)

The Python Tutorial > 7. Input and Output
https://docs.python.org/3.5/tutorial/inputoutput.html

6.2.2. Closing Files
Open files consume system resources
6.2.4 Writing to files
http://www.diveintopython.net/file_handling/file_objects.html
It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes

Say It

Say It: Pronunciation from Oxford
By Oxford University Press
Updated: Aug 05, 2015
Version: 1.1.5
https://elt.oup.com/catalogue/items/global/dictionaries/9780194279598
https://itunes.apple.com/app/id919978521
sound wave

https://elt.oup.com/searchresults?cc=us&selLanguage=en&searchtype=cat&fq=&q=pronunciation

dateModified 2013-03-17
http://busyteacher.org/14864-10-coolest-pronunciation-tools-esl.html
Sounds: The Pronunciation App
By Macmillan Education
http://www.macmillanenglish.com/educational-apps
http://www.macmillaneducationapps.com/soundspron
https://itunes.apple.com/us/app/sounds-the-pronunciation-app/id442713833

/ˈfɜːrnɪtʃər/ http://www.oxfordlearnersdictionaries.com/us/definition/english/furniture

much
/mʌtʃ/ http://dictionary.cambridge.org/us/pronunciation/english/much
\ˈməch\

clutch \ˈkləch\
http://media.merriam-webster.com/soundc11/c/clutch01.wav

such \ˈsəch\

about \ə-ˈbau̇t\
banana \bə-ˈna-nə

heed -> OALD /hiːd/: “Key”
M-W \ˈhēd\: “Key”

feed -> 🙂
M-W \ˈfēd\

bag -> [OALD] “Bag”, “Bad”, “Ban”

bat -> [MW] “Back”,

Guide to IPA Symbols
http://learnersdictionary.com/help/ipa
http://media.merriam-webster.com/audio/prons/en/us/mp3/a/about001.mp3

http://www.engvid.com

http://www.cambridgemobileapps.com

for word in line.split():

>> lista = []

>> for line in open(“Histology_Ch01_01.txt”):
for word in line.split():
lista.append(word)

>> len(lista)
358

>> for word in sorted(set(lista)):
if not (word.isalpha() and word.islower()):
print(word)

“tissue”
“web,”
&
(ECM).
1.
13e
2013
Advances
Anthony
Basic
Cells
Chapter
ECM
ECM.
Familiarity
Greek
Histology
Histology,
Introduction
Its
Junqueira’s
L.
Many
Mescher
Methods
Organs
Study
Study:
The
These
This
Thus,
Tissues
approaches.
biochemistry,
biology,
biology.
cell-specific
cells’
cells,
cells.
components:
extensively,
fibers,
immunology,
linings.
macromolecules,
membranes.
molecules.
noncellular,
organ.
organs.
physiology,
products.
receptors.
structures,
students.
study.
subject.
tissues,
together.
whole.

————————————–

period followed by uppercase –NOT–> end of sentence