MVA: Data Science and ML

Data Science and Machine Learning Essentials
Microsoft Virtual Academy. Level 300
02 November 2015
https://mva.microsoft.com/en-us/training-courses/data-science-and-machine-learning-essentials-14100

MVA
https://www.youtube.com/channel/UCEayK1pXZjg7_SW_bKQFIyg

Data Science with Microsoft SQL Server 2016 – Free eBook

Advertisement

Gartner’s Hype Cycles

Source: Gartner (August 2015)

Gartner’s Hype Cycles
http://www.gartner.com/technology/research/hype-cycles

Top IT Trends & Predictions in 2015
http://www.gartner.com/technology/topics/trends.jsp

Gartner’s 2016 Hype Cycle for Emerging Technologies Identifies Three Key Trends That Organizations Must Track to Gain Competitive Advantage
http://www.gartner.com/newsroom/id/3412017

cited by:
Introduction to Big Data
September 2015
by University of California, San Diego
https://www.coursera.org/learn/intro-to-big-data

Distributed innovation

How the U.S. Gets Manufacturing Policy All Wrong
June 2, 2015
By Martin Neil Baily
Bernard L. Schwartz chairman in economic policy development at the Brookings Institution
http://www.wsj.com/articles/how-the-u-s-gets-manufacturing-policy-all-wrong-1433301281
Washington measures success by the number of jobs, when it should be focused on speeding up automation

distributed innovation, in which crowdsourcing is used to find radical solutions to technical challenges much more quickly and cheaply than with traditional in-house research and development.

…putting robots in place of workers. There will still be good jobs in manufacturing, especially for those with big-data, programming and other specialized skills needed for advanced manufacturing.

It is hard to let go of old ways of thinking, but continuing to chase yesterday’s goals only puts off the inevitable. Instead of dragging out the fight for more manufacturing jobs, we need to focus on speeding up the manufacturing revolution, funding basic science and engineering, and ensuring that tech talent and best practice companies want to produce in the U.S.

Training Data Scientists (2014)

Structure Data 2014: How Will We Train Data Scientists of the Future?
GIGAOM, April 13, 2014
AnnaLee Saxenian — Professor and Dean, UC Berkeley
https://www.youtube.com/watch?v=8sv6Ul4ybNg

12:50 the core that everybody would agree to is pretty small:

  • statistics
  • computer science programming
  • Big Data tools

related:
data_scientistfrom:
Introduction to Big Data
September 2015
by University of California, San Diego
https://www.coursera.org/learn/intro-to-big-data

Our emotional state biases our expectations for the future

Mining Books To Map Emotions Through A Century
April 01, 2013
http://www.npr.org/blogs/health/2013/04/01/175584297/mining-books-to-map-emotions-through-a-century

“Generally speaking, the usage of these commonly known emotion words has been in decline over the 20th century,” Bentley says. We used words that expressed our emotions less in the year 2000 than we did 100 years earlier — words about sadness and joy and anger and disgust and surprise.

In fact, there is only one exception that Bentley and his colleagues found: fear. “The fear-related words start to increase just before the 1980s,” he says.

this method — mining vast amounts of written language — is incredibly promising.

language analysis seems so promising to him — as a new window that might offer a different, maybe even more objective, view into our culture. Because, he says, it’s difficult for people today to guess the emotions of people of different times.

Our current emotional state completely biases our memories of the past and our expectations for the future,” Pennebaker says. “And, using these language samples, we are able to peg how people are feeling over time.
That’s what I love about it as a historical marker, so we can get a sense of how groups of people — or entire cultures — might have felt 10 years ago, or 100 years ago.”

see also:
https://franzcalvo.wordpress.com/2014/09/02/our-use-of-little-words-can-uh

We’ve become loose in applying the term “mental disorder” to …
https://franzcalvo.wordpress.com/2013/08/26/is-emotional-pain-necessary

https://franzcalvo.wordpress.com/2014/06/15/less-and-less-willing-to-sit-with-our-emotions

Big Data has spawned a cult of infallibility

Forget YOLO: Why ‘Big Data’ Should Be The Word Of The Year
by Geoff Nunberg
December 20, 2012
http://www.npr.org/2012/12/20/167702665/geoff-nunbergs-word-of-the-year-big-data

Whatever the sticklers say, data isn’t a plural noun like “pebbles.” It’s a mass noun like “dust.”

It’s only when all those little chunks are aggregated that they turn into Big Data; then the software called analytics can scour it for patterns

You idly click on an ad for a pair of red sneakers one morning, and they’ll stalk you to the end of your days.
It makes me nostalgic for the age when cyberspace promised a liberating anonymity.
I think of that famous 1993 New Yorker cartoon by Peter Steiner: “On the Internet, nobody knows you’re a dog.
Now it’s more like, “On the Internet, everybody knows what brand of dog food you buy.”

In some circles, Big Data has spawned a cult of infallibility — a vision of prediction obviating explanation and math trumping science.
In a manifesto in Wired, Chris Anderson wrote, “With enough data, the numbers speak for themselves.”

The trouble is that you can’t always believe what they’re saying.
When you’ve got algorithms weighing hundreds of factors over a huge data set, you can’t really know why they come to a particular decision or whether it really makes sense.

When I was working with systems like these some years ago at the Xerox Palo Alto Research Center, we used to talk about a 95 percent solution.
So what if Amazon’s algorithms conclude that I’d be interested in Celine Dion’s greatest hits, as long as they get 19 out of 20 recommendations right?
But those odds are less reassuring when the algorithms are selecting candidates for the no-fly list.

I don’t know if the phrase Big Data itself will be around 20 years from now, when we’ll probably be measuring information in humongobytes.
People will be amused to recall that a couple of exabytes were once considered big data, the way we laugh to think of a time when $15,000 a year sounded like big money.
But 19 out of 20 is probably still going to be a good hit rate for those algorithms, and people will still feel the need to sort out the causes from the correlations — still asking the old question, what are patterns for?

related:
https://franzcalvo.wordpress.com/2014/08/18/weighing-brain-activity-with-the-balance

May 14, 2012
https://franzcalvo.wordpress.com/2015/03/09/algorithms-extending-the-power-of-the-human-mind

June 7, 2015
https://franzcalvo.wordpress.com/2015/06/09/algorithms-some-natural-neutral-world

How Quantum Computers and Machine Learning Will Revolutionize Big Data

How Quantum Computers and Machine Learning Will Revolutionize Big Data
Quanta Magazine. October 14, 2013
https://www.simonsfoundation.org/quanta/20131009-the-future-fabric-of-data-analysis

Large Hadron Collider (LHC) scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.

The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Moore first defined it in 1965, the so-called Moore’s law — which predicts that the number of transistors on integrated circuits will double every two years — has dominated the computer industry.
While that growth rate has proved remarkably resilient, for now, at least, “Moore’s law has basically crapped out; the transistors have gotten as small as people know how to make them economically with existing technologies,” said Scott Aaronson, a theoretical computer scientist at the Massachusetts Institute of Technology.

Alon Halevy, a computer scientist at Google, says the biggest breakthroughs in big data are likely to come from data integration.

Instead, since 2005, many of the gains in computing power have come from adding more parallelism via multiple cores, with multiple levels of memory.
The preferred architecture no longer features a single central processing unit (CPU) augmented with random access memory (RAM) and a hard drive for long-term storage.
Even the big, centralized parallel supercomputers that dominated the 1980s and 1990s are giving way to distributed data centers and cloud computing, often networked across many organizations and vast geographical distances.