for word in line.split():

>> lista = []

>> for line in open(“Histology_Ch01_01.txt”):
for word in line.split():
lista.append(word)

>> len(lista)
358

>> for word in sorted(set(lista)):
if not (word.isalpha() and word.islower()):
print(word)

“tissue”
“web,”
&
(ECM).
1.
13e
2013
Advances
Anthony
Basic
Cells
Chapter
ECM
ECM.
Familiarity
Greek
Histology
Histology,
Introduction
Its
Junqueira’s
L.
Many
Mescher
Methods
Organs
Study
Study:
The
These
This
Thus,
Tissues
approaches.
biochemistry,
biology,
biology.
cell-specific
cells’
cells,
cells.
components:
extensively,
fibers,
immunology,
linings.
macromolecules,
membranes.
molecules.
noncellular,
organ.
organs.
physiology,
products.
receptors.
structures,
students.
study.
subject.
tissues,
together.
whole.

————————————–

period followed by uppercase –NOT–> end of sentence

 

Tokenization

tokenization_issuesIssues in tokenization

from:
NLP
Stanford, June 2012
https://www.coursera.org/course/nlp
http://online.stanford.edu/course/natural-language-processing

issue:
“periodic acid-Schiff (PAS) reagent”

hyphen
“magnified 10001500 times”
ord(‘‘) == 45

en dash
“Figure 11″
ord(‘‘) == 8211
en dash is used between two quantities or dates to suggest a range
http://www.w3.org/wiki/Common_HTML_entities_used_for_typography

“Junqueira‘s Basic Histology”
“when a structure‘s three-dimensional volume is cut into …”
“microscope‘s resolving power”

hyphenated_wordsCh. 1: 117 rows

Junqueira’s Basic Histology

Junqueira’s Basic Histology: Text and Atlas, Thirteenth Edition
http://www.mhprofessional.com/product.php?isbn=0071780335

Chapter 1. Histology & Its Methods of Study > Preparation of Tissues for Study >
“Other spatial units commonly used in histology include the nanometer (1 nm = 0.001 μm = 10−6 mm = 10−9 m) and angstrom (1 Å = 0.1 nm or 10−4 μm).”

Chapter 1. Histology & Its Methods of Study > Preparation of Tissues for Study >
freezing, unlike fixation, does not inactivate most enzymes.”

“can be used with living, cultures cells (Figure 1–5).”

“The wavelength in the electron beam is much shorter …”

“For SEM specimens are coated with metal atoms …”

“The secondary antibody labeled with peroxidase was then applied and the localized brown color ? produced histochemically with the peroxidase substrate 3,3′-diamino-azobenzidine (DAB).”

“blood vessels and other tubular structures appear in sections as round or oval shapes whose size and shape depend on …”

“transverse cuts through tubular organelles such mitochondria.”

=============================

levels of organization:
“electrostatic (salt) linkages with ionizable radicals of molecules in tissues.”
“an antibody made against the tissue protein of interest” {in ECM}
“If the cell or tissue antigen of interest is detected by …” {in ECM}

=============================

represented in plural form:
“clearing solvents such as toluene dissolve cell lipids in fixed tissues”
“Many GAGs are synthesized while attached to a core protein and are part of a class of macromolecules called proteoglycans”
content of anionic carboxyl and sulfate groups
“electrostatic interactions underlying acidophilia”
“frontiers of light microscopy ”
“the greater density of microfilaments at the cell periphery”
“structures made of highly organized subunits.” -> structure made of highly organized subunits
“substances containing highly oriented molecules” – “substance containing highly oriented molecules
“the interaction of tissue components with beams of electrons.”
“cluster of cells
“Hybridization at stringent conditions …”

represented in its usual form (but an issue when using dictionaries):
“stain more readily with basic dyes and are termed basophilic”

tokenization issue:
“periodic acid-Schiff (PAS) reagent”

multi-word concept:
“transformation of 1,2-glycol groups present in the sugars into aldehyde residues”

and/or:
“Basophilic or PAS-positive material can be …”
“as well as fluorescence, phase-contrast, differential interference, confocal, and polarizing microscopy”
” magnifying power of the objective and ocular lenses” -> “magnifying power of the ocular lens”
” the power of the bright-field and other light microscopes”
“when passing through cellular and extracellular structures”

==================================

Lipid-rich structures [thing] of cells are best revealed with …”
“Cryofracture has been particularly useful in the study of membrane structure [arrangement].”
“an idea of the whole composition and structure [arrangement] of a cell or tissue”
“The TEM allows the observation of cells with all its internal structures [thing]
“… because many tissue structures [thing] are thicker than the section. Round structures [thing] seen microscopically may be portions of spheres or tubes. ”
“Because structures [thing] in a tissue have different orientations, …”

==================================

adj, v have authority==”j”

included:
“microscope‘s resolving power”

“when a structure‘s three-dimensional volume is cut into …”
“… are used to preserve tissue structure by …”

==================================

CONCEPTS NOT EXPLICITLY STATED:

cross-linking and denaturing proteins” -> “peptide cross-linking [GO]”

==================================

IDIOMS:

“the viewer must always keep in mind that components …” [OAAD]

=================================

adjectives that end in “-ed” (pattern: A N):
aminated sugar
cleared tissue
fixed tissue
impregnated tissue
ionized amino group
specialized method …
stained preparations
striated muscle

adjectives that end in “-ing” (pattern: A N):
magnifying power
polarizing microscopy

“the quality of its objective lens” -> “objective lens quality” (G+)

“… allows them to be digitally reconstructed into a 3D image.” -> digital reconstruction

UV is used as a noun:
“… emitting a characteristic blue fluorescence under UV.

“The ability to rotate the direction of vibration of polarized light is called birefringence and is a feature of crystalline substances ”

challenge for KE:
“repetitive, periodic macromolecular structure”

{metallic}
“collected on small metal grids”

longest concepts:
“axis of the light emerging from the polarizer” {8}
“artificial spaces between cells and other tissue component” -> “artificial space between a cell and other tissue component” {9}
“impossibility of differentially staining all tissue components on one slide” -> N/A
“a series of processes that began with collecting the tissue and ended with mounting a coverslip on the slide” {19}

acronyms:
“digoxigenin-labeled complementary DNA (cDNA)”

see also:
https://franzcalvo.wordpress.com/2014/12/24/tokenization

Introduction to Confocal Microscopy
http://www.nature.com/jid/journal/v132/n12/full/jid2012429a.html

Freeze fracture and etching
2014
https://www.leica-microsystems.com/science-lab/brief-introduction-to-freeze-fracture-and-etching

Explant culture
https://en.wikipedia.org/wiki/Explant_culture

 

Word classes

the focus [is] on how cells’ structure and arrangement optimize functions specific to each organ.

cells and ECM form a continuum that functions together and reacts to stimuli and inhibitors together.

the precise combination of these tissues allows the functioning of each organ and of the organism as a whole.

============

ANP inhibits Na+-H+ antiport in proximal tubular brush border membrane

[to form]
tissues of the body are each formed by several types of cell-specific associations between cells and ECM.

[the use of]
makes histology dependent on the use of microscopes

The most common procedure used in histologic research is

the more common methods used to study cells and tissues, focusing on microscopic approaches.

with the focus on how cells’ structure and arrangement optimize functions

[advances]
Advances in biochemistry, molecular biology, … are essential for a better knowledge of tissue biology.

Sectioning fixed and embedded tissue.

to expose the tissue for sectioning (slicing) on a microtome.

the preparation of tissue sections or slices that can be studied

slices that can be studied with the light microscope

The trimmed tissue specimen is mounted in the paraffin block holder. {ADJ?}

“functioning of each organ” –> “functioning of an organ”

” the higher temperatures needed for paraffin embedding”: “higher temperature” -> “high temperature”?

========================

Word classes (or parts of speech)
http://www.oxforddictionaries.com/words/word-classes-or-parts-of-speech

http://grammar.yourdictionary.com/parts-of-speech/verbs/regular-verb-list.html
missing: arrange, PRACTICE

http://www.enchantedlearning.com/wordlist/regularverbs.shtml

http://en.wikipedia.org/wiki/Regular_and_irregular_verbs

http://www.elearnenglishlanguage.com/blog/learn-english/grammar/verbs-irregular

irregular verbs
http://www.selfstudy.cambridge.org/media/11024/9780521189392p292-301.pdf

370 irregular verbs (longer list available)
http://www.englishpage.com/irregularverbs/irregularverbs.html

power verb list
http://www.jobskills.info/resume_edge/power_verb.htm

=============

ultrafast endocytosis
http://www.nature.com/nature/journal/v504/n7479/full/nature12809.html

Compound modifiers & compound nouns

http://en.wikipedia.org/wiki/Hyphen
The use of the hyphen in English compound nouns and verbs has, in general, been steadily declining.

Compound modifiers are groups of two or more words that jointly modify the meaning of another word.
When a compound modifier other than an adverb–adjective combination appears before a term, the compound modifier is often hyphenated to prevent misunderstanding, such as in American-football player or little-celebrated paintings.

diamino-azobenzidine
3H-fucose
acid-Schiff
biotin-avidin
bright-field
cell-specific
charge-coupled
computer-driven
cross-linking
DAPI-stained
digoxigenin-labeled
electron-dense .
FITC-labeled
fluorescein-phalloidin
H&E-stained
Hoechst-stained
image-enhancement
in-phase
lipid-rich
lipid-soluble
lysozyme-containing
long-chain
mucin-rich
mucus-secreting
N-terminal
PAS-amylase
PAS-positive
PAS-stained
peroxidase-labeled
phase-contrast
RNA-rich
self-digestion
spray-coated
tritium-labeled

related:
https://franzcalvo.wordpress.com/2015/10/13/noun-noun-phrases

“compound name” ==”name of a compound”
http://www.chem4kids.com/files/atom_naming.html