scikit-learn 0.15.2

scikit-learn
http://scikit-learn.org

0.15.2
>>> sklearn.__version__
https://pypi.python.org/pypi/scikit-learn/0.15.2

dependencies
scikit-learn is tested to work under Python 2.6, Python 2.7, and Python 3.4.
The required dependencies to build the software are

  • a working C/C++ compiler.

Create a separate environment
http://conda.pydata.org/docs/using/envs.html

—————————-

NumPy (4.6 MB) download
http://sourceforge.net/projects/numpy/files
the below notes are about building Numpy, which for most users is *not* the recommended way to install Numpy.  Instead, use either a complete scientific Python distribution or a binary installer

——————————–

Dragomir Radev, September 2015
Here is how to install a specific older version of a Python library:
pip uninstall scikit-learn
or
pip uninstall sklearn
then
pip install scikit-learn==0.15.2

Hint: the following packages conflict with each other:
  – scikit-learn ==0.15.2
  – python 3.5*

conda

Conda
http://conda.pydata.org
© Continuum Analytics

.tar.gz files
http://www.gzip.org
http://www.7-zip.org

Assignment on word similarity
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59 https://class.coursera.org/nlpintro-001/forum/thread?thread_id=59#post-244

DLNLP
http://web.eecs.umich.edu/~radev/dlnlp/list.txt
—————————————
CS224d: Deep Learning for Natural Language Processing
March-June 2015
http://cs224d.stanford.edu
—————————————

International Workshop on Semantic Evaluation 2015
http://alt.qcri.org/semeval2015

SDP 2015: Broad-Coverage Semantic Dependency Parsing
http://alt.qcri.org/semeval2015/task18

————————————–
NL generation & information extraction
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=172#post-667
NL generation
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=44#post-186
NACLO for Week 7

https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-566
week 6
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-436
week 5
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-435
week4
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-361
week 3
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-272
week 2
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-182
week1
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=27#post-99

Assignment 2, part 3A
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=163#post-632

some good papers in NLP
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=48#post-211
NLP libraries in Java
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=33#post-128

this course is more introductory than …
https://class.coursera.org/nlpintro-001/forum/thread?thread_id=46#post-197

the assignments for this class have been developed and tested on Python 2.7 and NLTK 2.

volunteers: covering installation of Python and NLTK on different platforms

http://www.ioling.org

 

 

LTAG!

LTAG!
http://ltaggame.com
LTAG! is an absurd, irreverent card game based on Lexicalized Tree Adjoining Grammar

Compete and co-operate to generate offensive yet grammatical English sentences made of partial syntactic trees.
The first player to use up all of their cards wins!

The Birth of a Word

The Birth of a Word
(a doctoral thesis)
by Brandon Cain Roy
MIT. February 2013

using a small number of words (52 actively used words by 16 months of age) in concert with situational context to communicate effectively in a wide range of day-to-day situations

From a more technical standpoint, the fundamental meaningful unit in a language isn’t the word but the morpheme, a word’s more basic constituents.
p71

The Word Birth Browser was developed in Java, retrieving data from a local SQLite3 database containing a version of the corpus.
p88

Dromi notes that in the weeks of decreasing vocabulary growth rate, her daughter seemed to be exploring the words she had already learned, refining their use, and generally consolidating the lexicon. We find this a compelling idea, and since our first analysis in (Roy et al., 2009) we have wondered whether the drop in word birth rate could coincide with an increase in the child’s use of syntax.
p94

It is hard to imagine that with 669 words the child’s communicative needs are satisfied. Then again, the child has responsive caregivers and the range of activities in a 9{24 month old’s life are limited. The introduction of a new toy, activity or other experience (such as going to the zoo) could contribute new words in the child’s lexicon, but at a certain point the child’s vocabulary may be sufficient for the activities of everyday life.
p. 95

If the drop in vocabulary growth rate is not a statistical artifact, as suggested in the previous section, what else could contribute to the “vocabulary implosion” observed? Before 19 months of age, the child has 444 words in his productive vocabulary. If word learning is partly fueled by “communicative need”, does the decrease in vocabulary growth rate indicate that the  child has achieved some level of communicative sufficiency at 18 months? Or does communicative growth transition from learning new words to combining words together in new ways?
p98-100

By 24 months of age, the child had learned 669 words. He learned these words through exposure to them in his environment. But why did he learn these words, and in the order that he learned them? In the next chapter, we consider the relationship between lexical acquisition and the rich linguistic environment of a young child’s first years.
p102

Children’s early language learning is sometimes described as “effortless”, and to adults witnessing the seemingly autonomous birth and growth of language it may indeed appear so. But a better adjective might be “remarkable” when one accounts for the numerous challenges that young learners face in acquiring their first language.
p103

Children’s exposure to language is primarily through speech, and unlike text there are no “spaces” marking word boundaries. As Peters (1983) discusses, although the units of speech are words, children do not necessarily partition the speech stream into their final adult word forms. Even assuming the words and the concepts are available to the child, the mapping between them must be learned.
p103

Elizabeth Spelke and her colleagues argue that children come into the world equipped with systems of core knowledge about objects, agents, number, geometry as well as social knowledge (Spelke, 1994; Spelke and Kinzler, 2007).
Such systems of core knowledge may provide a necessary substrate for early learning, including language acquisition. Children are also sensitive to statistical regularities in the speech they hear, which can help in segmenting words (Saffran et al., 1996). Another skill children bring to bear, of particular relevance to word learning, is the ability to infer the referential intent of others. In the case of learning names for objects, a child must associate the name to what the speaker is referring to, even if that is not the child’s focus of attention when the name is uttered (Baldwin, 1991).
p104

Paul Bloom (2000, p. 90) says, “People cannot learn words unless they are exposed to them”. We can explain much of the character of children’s vocabularies in terms of this banal fact” and as such, characterizing the learning environment is crucial in understanding early word learning.
p104

In the case of word learning, strong evidence for the positive link between the total amount of maternal speech and children’s vocabulary size was provided by Hart and Risley (1995).
p105

Exposure to caregiver speech affects more than just the words that are learned. In recent work, Hurtado et al. (2008) showed that it also positively impacts children’s speech processing efficiency.
Children exposed to more caregiver speech at 18 months knew more words and were faster at word recognition at 24 months. One of the interesting results of this study was the substantial overlap in the effect of maternal speech input on these two outcomes, suggesting that increased processing efficiency supports faster lexical learning, but also that greater lexical knowledge contributed to faster processing efficiency. To use Snow’s analogy, these findings suggest that the developmental “strands” of speech processing skill and lexical knowledge are both entangled and mutually supportive.
p.106

whether a word is salient in particular contexts. It need not be salient in all contexts to have a high recurrence, but if is salient in some situations …
p111

general argument for the role of structured, predictable context as supporting word learning.
p142

Recurrence
But frequency is the weakest predictor in the ensemble of variables we have considered. Instead, in the purely linguistic domain, a word’s recurrence better predicts its age of acquisition.
Recurrence measures how clustered a word is in time; a high recurrence word is one that, when it is used, is used repeatedly over a short duration. For learners with a limited working memory, a word with high recurrence may occur frequently enough in a short duration to take hold in memory.
p163-164

KL-divergence is measuring a word’s scope or “groundedness”, with the idea that more grounded words are more strongly tied to other aspects of experience and are more tightly woven into the child’s understanding.
p165

————————–

Hurtado, N., Marchman, V., and Fernald, A. (2008). Does input in
uence uptake? Links between maternal talk, processing speed and vocabulary size in Spanish-learning children.
Developmental Science, 11(6):F31{F39.

TED video: semantic analysis, influencer, it’s like building a microscope …

How To Write A Sentence

Think You Know ‘How To Write A Sentence’?
July 14, 2011
http://www.npr.org/2011/01/25/133214521/stanley-fish-demystifies-how-to-write-a-sentence

Most people know a good sentence when they read one, but New York Times columnist Stanley Fish says most of us don’t really know how to write them ourselves. His new book, How To Write A Sentence: And How To Read One, is part ode, part how-to guide to the art of the well-constructed sentence.