《Themasteralgorithm》好书推荐

2015-10-31 约 2008 字预计阅读 5 分钟

                    (function(){
                        var cover = "http://mmbiz.qpic.cn/mmbiz/dcEP2tDMibce9PMxesfSylI1I8KfgSGDb9LRQDc7ibvhwoibCgSdeia5vAyXrS2ia2xkhO6Iqcet2LmaTiaCcBhNtrGA/0?wx_fmt=jpeg";
                        var tempImg = document.createElement('img');
                        tempImg.setAttribute('class', 'rich_media_thumb');
                        tempImg.setAttribute('id', 'js_cover');
                        tempImg.setAttribute('data-backsrc', cover);
                        tempImg.setAttribute('src', cover);
                        tempImg.setAttribute('onerror', 'this.parentNode.removeChild(this)');
                        
                        document.getElementById('media').appendChild(tempImg);

                    })();

作者：郭瑞东

机器学习已经快成为一种“刻奇”了，之所以这么说，不只是因为其在我们生活中的各个领域内都迁移默会的塑形着我们的生活与认知，更因为有太多的人对机器学习大数据一窍不通，却仍随着潮流，不得不在自己的PPT上加上大数据，仿佛这是一道魔咒，而《The master algorithm》这本书，则是解码这道魔咒的明镜。这本书中，没有公式与代码，有的只是对机器学习中的算法本质一针见血的点破，有的只是依据这些算法而编出的日常生活中的故事，是对机器学习中核心算法的概念化的模型。一言以概之，这是一本所有有高中数学水平且无计算机背景的读者都能够读懂的科普书。如果你不想对控制着我们衣食住行方方面面的机器学习算法一无所知，那么这本书是你必读的书。目前该书没有中文版。

下面的话，是针对那些多少有一些专业的背景读者。这些读者看了这本书，可以跳着看，透过作者的行文脉络，机器学习的从业者可以看出他们常用的算法隐藏在数学背后的逻辑引擎。而书的名字，则显示着作者试图在机器学习的各个流派间进行整合，最终提出机器学习里的“牛顿三定律”的理想。作者在这本书里，介绍了当前常用的算法的发展历程，这些算法包括决策树，遗传算法，神经网络，朴素贝叶斯及贝叶斯网络，隐式马尔可夫链，K最近邻及支持向量机，作者还介绍了无监督学习的算法。在介绍算法时，作者还介绍了机器学习里最大的两个阻碍，过拟合及维度灾难。

该书序言摘录：

Algorithms increasingly run our lives. They find books, movies, jobs, and dates for us, manage our investments, and discover new drugs. More and more, these algorithms work by learning from the trails of data we leave in our newly digital world. Like curious children, they observe us, imitate, and experiment. And in the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask.

Machine learning is the automation of discovery-the scientific method on steroids-that enables intelligent robots and computers to program themselves. No field of science today is more important yet more shrouded in mystery. Pedro Domingos, one of the field’s leading lights, lifts the veil for the first time to give us a peek inside the learning machines that power Google, Amazon, and your smartphone. He charts a course through machine learning’s five major schools of thought, showing how they turn ideas from neuroscience, evolution, psychology, physics, and statistics into algorithms ready to serve you. Step by step, he assembles a blueprint for the future universal learner-the Master Algorithm-and discusses what it means for you, and for the future of business, science, and society.

If data-ism is today’s rising philosophy, this book will be its bible. The quest for universal learning is one of the most significant, fascinating, and revolutionary intellectual developments of all time. A groundbreaking book, The Master Algorithm is the essential guide for anyone and everyone wanting to understand not just how the revolution will happen, but how to be at its forefront.

该书内容摘录：

The most popular option [in Bayesian inference], however, is to drown our sorrows in alcohol, get punch drunk, and stumble around all night. The technical term for this is Markov chain Monte Carlo, or MCMC for short. The “Monte Carlo” part is because the method involves chance, like a visit to the eponymous casino, and the “Markov chain” part is because it involves taking a sequence of steps, each of which depends only on the previous one. The idea in MCMC is to do a random walk, like the proverbial drunkard, jumping from state to state of the network in such a way that, in the long run, the number of times each state is visited is proportional to its probability. We can then estimate the probability of a burglary, say, as the fraction of times we visited a state where there was a burglary.

A “well-behaved” Markov chain converges to a stable distribution, so after a while it always gives approximately the same answers. For example, when you shuffle a deck of cards, after a while all card orders are equally likely, no matter the initial order; so you know that if there are n possible orders, the probability of each one is 1/n. The trick in MCMC is to design a Markov chain that converges to the distribution of our Bayesian network. One easy option is to repeatedly cycle through the variables, sampling each one according to its conditional probability given the state of its neighbors. People often talk about MCMC as a kind of simulation, but it’s not: the Markov chain does not simulate any real process; rather, we concocted it to efficiently generate samples from a Bayesian network, which is itself not a sequential model.

The origins of MCMC go all the way back to the Manhattan Project, when physicists needed to estimate the probability that neutrons would collide with atoms and set off a chain reaction. But in more recent decades, it has sparked such a revolution that it’s often considered one of the most important algorithms of all time. MCMC is good not just for computing probabilities but for integrating any function. Without it, scientists were limited to functions they could integrate analytically, or to well-behaved, low-dimensional integrals they could approximate as a series of trapezoids. With MCMC, they’re free to build complex models, knowing the computer will do the heavy lifting. Bayesians, for one, probably have MCMC to thank for the rising popularity of their methods more than anything else.

On the downside, MCMC is often excruciatingly slow to converge, or fools you by looking like it’s converged when it hasn’t. Real probability distributions are usually very peaked, with vast wastelands of minuscule probability punctuated by sudden Everests. The Markov chain then converges to the nearest peak and stays there, leading to very biased probability estimates. It’s as if the drunkard followed the scent of alcohol to the nearest tavern and stayed there all night, instead of wandering all around the city like we wanted him to. On the other hand, if instead of using a Markov chain we just generated independent samples, like simpler Monte Carlo methods do, we’d have no scent to follow and probably wouldn’t even find that first tavern; it would be like throwing darts at a map of the city, hoping they land smack dab on the pubs.

Inference in Bayesian networks is not limited to computing probabilities. It also includes finding the most probable explanation for the evidence, such as the disease that best explains the symptoms or the words that best explain the sounds Siri heard. This is not the same as just picking the most probable word at each step, because words that are individually likely given their sounds may be unlikely to occur together, as in the “Call the please” example. However, similar kinds of algorithms also work for this task (and they are, in fact, what most speech recognizers use).

Most importantly, inference includes making the best decisions, guided not just by the probabilities of different outcomes but also by the corresponding costs (or utilities, to use the technical term). The cost of ignoring an e-mail from your boss asking you to do something by tomorrow is much greater than the cost of seeing a piece of spam, so often it’s better to let an e-mail through even if it does seem fairly likely to be spam.

点击原文链接，下载该书电子版

欢迎投稿：

我们希望有更多的同道中人可以成为加盟作者，或者贡献主题，亦或给我们推荐好的问题，问题有时比答案更重要。

你可以，

1）写写你熟悉领域的进展或对未来的展望，即使你的小文不够成熟，我们也会为你提出修改意见或帮你一起韵色。

2）写出你的跨界思考，你是否有从一个领域的知识联想到的别的概念，从而让你脑洞大开，只要你的想法新颖，都值得被倾听

3）提出你感兴趣的话题，加入混沌大家庭和一起讨论。

或者如果你本身也有自己的公共号，但是秉承相似的想法，我们也希望你能加盟。投稿邮箱 guoruidong517@126.com

欢迎加小编铁哥个人微信：

★喜欢这篇文章？

欢迎转发至朋友圈或您的好友。

★对本文有想法？

回到首页，在“发送”栏输入观点。

阅读原文