2017
Excel in data visualization:
- The New York Times
- The Economist
- McKinsey Global Institute
Excel in data visualization:
6:22
Vocabulary has increased
The more words you have, the more concepts you have.
The more concepts you have, the smarter you hare.
related:
https://franzcalvo.wordpress.com/2015/06/10/why-our-iq-levels-are-higher-than-our-grandparents
7:34
Calculus is routinely taught in high school
7:40
there’s been no continuous change: Scandinavia
Scandinavia does a better job than the rest of us bringing up the bottom
industrial revolution skills
10:18
Examples of the new tools: Statistics and probability theory
– sample
– population
– sample bias
– randomness
– law of large numbers (23:30)
– normal distribution
– standard deviation
– statistical significance
– regression to the mean (29:27-> 32:32)
26:57 sophomore slump, second novels, albums
https://en.wikipedia.org/wiki/Sophomore_slump
– base rate
– correlation (odds) 17:55
Perceived and actual correlations acrross two occasions and across 20 (21:44)
+ abilities (test scores)
+ traits (honesty)
10:47 Scientific methodology
– control group
– randomized control experiment
– confounded variable
multiple regression analysis (39:40)
control for social class: the prestige of their occupation
– self-selection
– independence of observations
– natural experiment
– artifact
11:01 Decision Theory
– cost/benefit analysis
– opportunity cost (48:59)
– sunk cost (52:31)
– loss aversion
26:57 sophomore slump, second novels, albums
https://en.wikipedia.org/wiki/Sophomore_slump
– base rate
– correlation (odds) 17:55
Perceived and actual correlations acrross two occasions and across 20 (21:44)
+ abilities (test scores)
+ traits (honesty)
a large proportion of individuals i.e. 85.8% for S. haematobium were “zero egg excretors”… there was overwhelming evidence of overdispersion.
Source:
Chipeta, M.G., Ngwira, B.M., Simoonga, C. et al. BMC Res Notes (2014) 7: 856. https://doi.org/10.1186/1756-0500-7-856
Logistic regression
https://www.medcalc.org/manual/logistic_regression.php
https://www.class-central.com/course/canvas-network-applied-logistic-regression-6579
https://www.canvas.net/browse/osu/courses/applied-logistic-regression
Economics Claims a Precision Rarely Found
Thanks for Mr. Roberts’s views on how dismal the science is in the dismal science.
May 24, 2016
http://www.wsj.com/articles/economics-claims-a-precision-rarely-found-1464121659
Regarding Kyle Peterson’s “The Weekend Interview with Russ Roberts: When All Economics Is Political” (May 14): … reliance on regression analysis to find confirmation of our preconceptions
… all systems, and especially economic systems, are highly nonlinear in how they respond over time to any disturbances to the system, especially human, but also natural, disturbances.
It is effectively impossible to include all effects in any model of such a system, so we simplify for the sake of obtaining a computational forecast estimate in our lifetime.
Unfortunately, any nonlinear system unwinds over time in ways not exactly predictable given the limits of our computational model.
It is hubris to think that more data can make predictions of large material, human, environmental and econometric ensembles more reliable.
This is commonly known in chaos theory as the butterfly effect.
Even meteorologists …
this is a letter to:
When All Economics Is Political
The dismal science has too much junk science, says Russ Roberts, an evangelist for humility in a discipline where it is often hard to find.
By Kyle Peterson
May 13, 2016
http://www.wsj.com/articles/when-all-economics-is-political-1463178093
12:50 the core that everybody would agree to is pretty small:
related:
from:
Introduction to Big Data
September 2015
by University of California, San Diego
https://www.coursera.org/learn/intro-to-big-data
Accuracy of X-rays
D1: tuberculosis
D2: no tuberculosis
T+: positive X-ray
T-: negative X-ray
It is useful to find P(D1|T+), the probability that an individual has the disease given that he tests positive. This probability is also called the predictive value of a positive test.
Remark: Here because of the problem of sampling bias, it is not correct to simply estimate P(D1|T+) based on the observed data, i.e., the numbers given in the table. This incorrect estimate gives 22/73, which has a large upward bias and over estimates P(D1|T+).
We use the Bayes theorem to find P(D1|T+). The Bayes theorem says:
There are two importants things here:
1. Prior probability: P(D1): the probability of TB before having the data. This is called prior probability. Usually, a
judgement call has to be made as to what prior probability to use. For the present problem, it seems reasonable to use the population prevalence as the prior probability. In 1987, there were 9.3 TB cases per 100,000 population. Therefore, we specify: …
from:
Biostatistics, 2003
http://homepage.cs.uiowa.edu/~jian/S101/s101.html
http://homepage.cs.uiowa.edu/~jian/S101/Notes/notes3.pdf .
Research Interests: analysis of high-dimensional data
Teaching: Topics in High Dimensional Data Analysis (STAT:7190)
http://homepage.cs.uiowa.edu/~jian/7190/index.html
related:
https://franzcalvo.wordpress.com/2014/07/23/measurement-error-variance-and-bias
The multiplication rule for the occurrence of both of two or more events is as follows: If A, B, and C are independent, then
If two events such as B and D are not independent, then
The multiplication rule for probabilities when events are not independent can be used to derive one form of an important formula called Bayes’ theorem. Because P(B and D) equals both P(B | D) × P(D) and P(B) × P(D | B), these latter two expressions are equal. Assuming P(B) and P(D) are not equal to zero, we can solve for one in terms of the other, as follows:
which is found by dividing both sides of the equation by P(D). Similarly,
In the equation for P(B | D), P(B) in the right-hand side of the equation is sometimes called the prior probability, because its value is known prior to the calculation; P(B | D) is called the posterior probability, because its value is known only after the calculation.
The two formulas of Bayes’ theorem are important because investigators frequently know only one of the pertinent probabilities and must determine the other. Examples are diagnosis and management
Basic & Clinical Biostatistics, Fourth Edition
Copyright © 2004 by The McGraw-Hill Companies, Inc.
http://mhprofessional.com/product.php?cat=116&isbn=0071410171
Ch. 5 Markov processes
by Scott E Page
Director, Center for the Study of Complex Systems
University of Michigan
http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R10Markov.pdf .
Ch 1 Decision Trees
http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R4Decision.pdf .
other resources: http://www.pgey.com/pdf/example-of-a-decision-tree.html
==================
Untangling Skill and Luck
July 15, 2010
http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R15SkillandLuck.pdf
The outcomes for most activities combine skill and luck.
Joint, Marginal and Conditional Probabilities
by Elizabeth A. Albright, PhD
http://sites.nicholas.duke.edu/statsreview/probability/jmc