a large proportion of individuals i.e. 85.8% for S. haematobium were “zero egg excretors”… there was overwhelming evidence of overdispersion.

Chipeta, M.G., Ngwira, B.M., Simoonga, C. et al. BMC Res Notes (2014) 7: 856. https://doi.org/10.1186/1756-0500-7-856


Linear regression

Statistics and Probability > Exploring bivariate numerical data

Logistic regression



Economics Claims a Precision Rarely Found

Economics Claims a Precision Rarely Found
Thanks for Mr. Roberts’s views on how dismal the science is in the dismal science.
May 24, 2016

Regarding Kyle Peterson’s “The Weekend Interview with Russ Roberts: When All Economics Is Political” (May 14): … reliance on regression analysis to find confirmation of our preconceptions

… all systems, and especially economic systems, are highly nonlinear in how they respond over time to any disturbances to the system, especially human, but also natural, disturbances.
It is effectively impossible to include all effects in any model of such a system, so we simplify for the sake of obtaining a computational forecast estimate in our lifetime.
Unfortunately, any nonlinear system unwinds over time in ways not exactly predictable given the limits of our computational model.
It is hubris to think that more data can make predictions of large material, human, environmental and econometric ensembles more reliable.
This is commonly known in chaos theory as the butterfly effect.
Even meteorologists …

this is a letter to:
When All Economics Is Political
The dismal science has too much junk science, says Russ Roberts, an evangelist for humility in a discipline where it is often hard to find.
By Kyle Peterson
May 13, 2016

Training Data Scientists (2014)

Structure Data 2014: How Will We Train Data Scientists of the Future?
GIGAOM, April 13, 2014
AnnaLee Saxenian — Professor and Dean, UC Berkeley

12:50 the core that everybody would agree to is pretty small:

  • statistics
  • computer science programming
  • Big Data tools

Introduction to Big Data
September 2015
by University of California, San Diego

Sampling bias

Accuracy of X-rays





D1: tuberculosis
D2: no tuberculosis
T+: positive X-ray
T-: negative X-ray
It is useful to find P(D1|T+), the probability that an individual has the disease given that he tests positive. This probability is also called the predictive value of a positive test.

Remark: Here because of the problem of sampling bias, it is not correct to simply estimate P(D1|T+) based on the observed data, i.e., the numbers given in the table. This incorrect estimate gives 22/73, which has a large upward bias and over estimates P(D1|T+).

We use the Bayes theorem to find P(D1|T+). The Bayes theorem says:




There are two importants things here:
1. Prior probability: P(D1): the probability of TB before having the data. This is called prior probability. Usually, a
judgement call has to be made as to what prior probability to use. For the present problem, it seems reasonable to use the population prevalence as the prior probability. In 1987, there were 9.3 TB cases per 100,000 population. Therefore, we specify: …

Biostatistics, 2003
http://homepage.cs.uiowa.edu/~jian/S101/Notes/notes3.pdf .
Research Interests: analysis of high-dimensional data
Teaching: Topics in High Dimensional Data Analysis (STAT:7190)


Bayes Theorem

The multiplication rule for the occurrence of both of two or more events is as follows: If A, B, and C are independent, then
If two events such as B and D are not independent, then



The multiplication rule for probabilities when events are not independent can be used to derive one form of an important formula called Bayes’ theorem. Because P(B and D) equals both P(B | D) × P(D) and P(B) × P(D | B), these latter two expressions are equal. Assuming P(B) and P(D) are not equal to zero, we can solve for one in terms of the other, as follows:




which is found by dividing both sides of the equation by P(D). Similarly,


In the equation for P(B | D), P(B) in the right-hand side of the equation is sometimes called the prior probability, because its value is known prior to the calculation; P(B | D) is called the posterior probability, because its value is known only after the calculation.
The two formulas of Bayes’ theorem are important because investigators frequently know only one of the pertinent probabilities and must determine the other. Examples are diagnosis and management

Basic & Clinical Biostatistics, Fourth Edition
Copyright © 2004 by The McGraw-Hill Companies, Inc.


Markov processes

Ch. 5 Markov processes
by Scott E Page
Director, Center for the Study of Complex Systems
University of Michigan
http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R10Markov.pdf .

Ch 1 Decision Trees
http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R4Decision.pdf .
other resources: http://www.pgey.com/pdf/example-of-a-decision-tree.html


Untangling Skill and Luck
July 15, 2010
The outcomes for most activities combine skill and luck.