NLTK Essentials [BOOK, 2015]

NLTK Essentials
July 27, 2015
by Nitin Hardeniya
https://www.amazon.com/NLTK-Essentials-Nitin-Hardeniya/dp/1784396907

Mastering Natural Language Processing with Python
Deepti Chopra, Nisheeth Joshi, Iti Mathur
June 2016
https://www.packtpub.com/big-data-and-business-intelligence/mastering-natural-language-processing-python
https://www.amazon.com/Mastering-Natural-Language-Processing-Python/dp/1783989041

https://www.packtpub.com/big-data-and-business-intelligence/natural-language-processing-python-and-nltk

Text Analytics with Python
A Practical Real-World Approach to Gaining Actionable Insights from your Data
Authors: Sarkar, Dipanjan
http://www.apress.com/us/book/9781484223871

http://pyparsing.wikispaces.com
http://infohost.nmt.edu/tcc/help/pubs/pyparsing/web/index.html
2013

https://www.amazon.com/Web-Scraping-Python-Collecting-Modern/dp/1491910291

Python Projects for Kids
2016
https://www.amazon.com/dp/1782175067

Java Programming for Kids
2014
https://www.amazon.com/dp/B00O9GAGYQ

Advertisements

NLTK word_tokenize()

cell_structureNLTK word_tokenize()
http://www.nltk.org/book/ch03.html#tokenization_index_term

>>>from nltk import word_tokenize

>>>word_tokenize(‘cells’ structure’)
[‘cells‘, ‘structure’]

>>with open(‘h.txt’) as f:
____word_tokenize(f.read())

Source code for nltk.collocations()
http://www.nltk.org/api/nltk.html
http://www.nltk.org/_modules/nltk/collocations.html

>>> dir(nltk.collocations)

>>> print(nltk.collocations.__doc__)

>>> with open(‘Histology14_Ch01_ALL.txt’) as f:
____nltk.Text(word_tokenize(f.read())).collocations()

cell nuclei;
electron microscopy;
fluorescent compounds;
glass slides;
gold particles;
labeled secondary; <=======
light microscope;
light microscopy;
MEDICAL APPLICATION;
nucleic acids;
objective lens;
organic solvents;
primary antibody;
resolving power;
secondary antibody;
secretory granules;
situ hybridization; <=======
tissue components;
tissue section;
tissue sections

>> with open(‘Histology14_Ch01_i.txt‘) as f:
nltk.Text(word_tokenize(f.read())).collocations()

matrix components;
tissue biology

 

Naturally, the quality of the collocations is also higher than computer-generated lists – as we would expect from a manually produced compilation.
p. 174
http://nlp.stanford.edu/fsnlp/promo/colloc.pdf
phrasal verbs: good example of a collocation with often non-adjacent words

related:
Collocations dictionary
https://franzcalvo.wordpress.com/2015/09/07/collocations

 

 

Intro to NLP (UM 2015)

Introduction to Natural Language Processing
University of Michigan
Coursera, October 5 – December 27, 2015
https://www.coursera.org/course/nlpintro

Greedy transition-based parsing
https://class.coursera.org/nlp/lecture/177

NLTK dependency parsing
https://github.com/nltk/nltk/wiki/Dependency-Parsing
https://github.com/nltk/nltk/blob/develop/nltk/parse/transitionparser.py

NLTK BOOK
6. Learning to Classify Text
http://www.nltk.org/book/ch06.html

Assignment 1
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=3 Feature extracting question
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=143
Ms. Haag plays Elianti

Assignment on word similarity
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59 https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59#post-244

Assignment 2, part 3A
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=163#post-632

the assignments for this class have been developed and tested on Python 2.7 and NLTK 2.

DLNLP
http://web.eecs.umich.edu/~radev/dlnlp/list.txt
—————————————
CS224d: Deep Learning for Natural Language Processing
March-June 2015
http://cs224d.stanford.edu
—————————————

International Workshop on Semantic Evaluation 2015
http://alt.qcri.org/semeval2015

SDP 2015: Broad-Coverage Semantic Dependency Parsing
http://alt.qcri.org/semeval2015/task18

————————————–
NL generation & information extraction
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=172#post-667
NL generation
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=44#post-186
NACLO for Week 7

https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-566
week 6
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-436
week 5
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-435
week4
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-361
week 3
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-272
week 2
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-182
week1
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-99

some good papers in NLP
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=48#post-211
NLP libraries in Java
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=33#post-128

this course is more introductory than …
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=46#post-197

volunteers: covering installation of Python and NLTK on different platforms

http://www.ioling.org

https://www.linkedin.com/in/sabinebuchholz

scikit-learn 0.15.2

scikit-learn
http://scikit-learn.org

0.15.2
>>> sklearn.__version__
https://pypi.python.org/pypi/scikit-learn/0.15.2

dependencies
scikit-learn is tested to work under Python 2.6, Python 2.7, and Python 3.4.
The required dependencies to build the software are

  • a working C/C++ compiler.

Create a separate environment
http://conda.pydata.org/docs/using/envs.html

—————————-

NumPy (4.6 MB) download
http://sourceforge.net/projects/numpy/files
the below notes are about building Numpy, which for most users is *not* the recommended way to install Numpy.  Instead, use either a complete scientific Python distribution or a binary installer

——————————–

Dragomir Radev, September 2015
Here is how to install a specific older version of a Python library:
pip uninstall scikit-learn
or
pip uninstall sklearn
then
pip install scikit-learn==0.15.2

Hint: the following packages conflict with each other:
  – scikit-learn ==0.15.2
  – python 3.5*

conda

Conda
http://conda.pydata.org
© Continuum Analytics

.tar.gz files
http://www.gzip.org
http://www.7-zip.org

Assignment on word similarity
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59 https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59#post-244

DLNLP
http://web.eecs.umich.edu/~radev/dlnlp/list.txt
—————————————
CS224d: Deep Learning for Natural Language Processing
March-June 2015
http://cs224d.stanford.edu
—————————————

International Workshop on Semantic Evaluation 2015
http://alt.qcri.org/semeval2015

SDP 2015: Broad-Coverage Semantic Dependency Parsing
http://alt.qcri.org/semeval2015/task18

————————————–
NL generation & information extraction
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=172#post-667
NL generation
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=44#post-186
NACLO for Week 7

https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-566
week 6
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-436
week 5
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-435
week4
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-361
week 3
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-272
week 2
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-182
week1
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-99

Assignment 2, part 3A
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=163#post-632

some good papers in NLP
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=48#post-211
NLP libraries in Java
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=33#post-128

this course is more introductory than …
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=46#post-197

the assignments for this class have been developed and tested on Python 2.7 and NLTK 2.

volunteers: covering installation of Python and NLTK on different platforms

http://www.ioling.org

 

 

WordNet::Similarity

similarityWordNet::Similarity
http://wn-similarity.sourceforge.net
a Perl module that implements a variety of semantic similarity and relatedness measures based on information found in the lexical database WordNet.

WordNet
A lexical database for English
https://wordnet.princeton.edu

http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi

UMLS::Similarity v1.41 released! (July 17, 2014)
http://www.d.umn.edu/~tpederse

North American Chapter of the Association for Computational Linguistics
http://naacl.org

NLTK_similarityNLTK (Python)

challenge:
How Strong Is Your Vocabulary?
http://www.merriam-webster.com/quiz/index.htm

from:
Introduction to Natural Language Processing
University of Michigan
Coursera, October 5 – December 27, 2015
https://www.coursera.org/course/nlpintro