>>>from nltk import word_tokenize
>>with open(‘h.txt’) as f:
Source code for nltk.collocations()
>>> with open(‘Histology14_Ch01_ALL.txt’) as f:
labeled secondary; <=======
situ hybridization; <=======
>> with open(‘Histology14_Ch01_i.txt‘) as f:
Naturally, the quality of the collocations is also higher than computer-generated lists – as we would expect from a manually produced compilation.
phrasal verbs: good example of a collocation with often non-adjacent words
Morphological similarity: Stemming
Stemming and Lemmatization with Python NLTK
Martin Porter’s official site:
a Perl module that implements a variety of semantic similarity and relatedness measures based on information found in the lexical database WordNet.
A lexical database for English
UMLS::Similarity v1.41 released! (July 17, 2014)
North American Chapter of the Association for Computational Linguistics
How Strong Is Your Vocabulary?
Introduction to Natural Language Processing
University of Michigan
Coursera, October 5 – December 27, 2015