Chapter Six / The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World / Библиотека (книги, учебники и журналы) / В помощь Веб-Мастеру

Обложка
Аннотация

Pedro Domingos i

Книги автора: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

Книга: The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

Chapter Six

Sharon Bertsch McGrayne tells the history of Bayesianism, from Bayes and Laplace to the present, in The Theory That Would Not Die (Yale University Press, 2011). A First Course in Bayesian Statistical Methods,* by Peter Hoff (Springer, 2009), is an introduction to Bayesian statistics.

The Na?ve Bayes algorithm is first mentioned in Pattern Classification and Scene Analysis,* by Richard Duda and Peter Hart (Wiley, 1973). Milton Friedman argues for oversimplified theories in “The methodology of positive economics,” which appears in Essays in Positive Economics (University of Chicago Press, 1966). The use of Na?ve Bayes in spam filtering is described in “Stopping spam,” by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005). “Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Na?ve Bayes-like methods in information retrieval.

“First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works. “The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web. Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work. Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at arxiv.org/pdf/cs/0504020v2.pdf). Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and S?ren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs. “Engineers look to Kalman filtering for guidance,” by Barry Cipra (SIAM News, 1993), is a brief introduction to Kalman filters, their history, and their applications.

Judea Pearl’s pioneering work on Bayesian networks appears in his book Probabilistic Reasoning in Intelligent Systems* (Morgan Kaufmann, 1988). “Bayesian networks without tears,”* by Eugene Charniak (AI Magazine, 1991), is a largely nonmathematical introduction to them. “Probabilistic interpretation for MYCIN’s certainty factors,”* by David Heckerman (Proceedings of the Second Conference on Uncertainty in Artificial Intelligence, 1986), explains when sets of rules with confidence estimates are and aren’t a reasonable approximation to Bayesian networks. “Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data,” by Eran Segal et al. (Nature Genetics, 2003), is an example of using Bayesian networks to model gene regulation. “Microsoft virus fighter: Spam may be more difficult to stop than HIV,” by Ben Paynter (Fast Company, 2012), tells how David Heckerman took inspiration from spam filters and used Bayesian networks to design a potential AIDS vaccine. The probabilistic or “noisy” OR is explained in Pearl’s book.* “Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base,” by M. A. Shwe et al. (Parts I and II, Methods of Information in Medicine, 1991), describes a noisy-OR Bayesian network for medical diagnosis. Google’s Bayesian network for ad placement is described in Section 26.5.4 of Kevin Murphy’s Machine Learning* (MIT Press, 2012). Microsoft’s player rating system is described in “TrueSkill^TM: A Bayesian skill rating system,”* by Ralf Herbrich, Tom Minka, and Thore Graepel (Advances in Neural Information Processing Systems 19, 2007).

Modeling and Reasoning with Bayesian Networks,* by Adnan Darwiche (Cambridge University Press, 2009), explains the main algorithms for inference in Bayesian networks. The January/February 2000 issue* of Computing in Science and Engineering, edited by Jack Dongarra and Francis Sullivan, has articles on the top ten algorithms of the twentieth century, including MCMC. “Stanley: The robot that won the DARPA Grand Challenge,” by Sebastian Thrun et al. (Journal of Field Robotics, 2006), explains how the eponymous self-driving car works. “Bayesian networks for data mining,”* by David Heckerman (Data Mining and Knowledge Discovery, 1997), summarizes the Bayesian approach to learning and explains how to learn Bayesian networks from data. “Gaussian processes: A replacement for supervised neural networks?,”* by David MacKay (NIPS tutorial notes, 1997; online at www.inference.eng.cam.ac.uk/mackay/gp.pdf), gives a flavor of how the Bayesians co-opted NIPS.

The need for weighting the word probabilities in speech recognition is discussed in Section 9.6 of Speech and Language Processing,* by Dan Jurafsky and James Martin (2nd ed., Prentice Hall, 2009). My paper on Na?ve Bayes, with Mike Pazzani, is “On the optimality of the simple Bayesian classifier under zero-one loss”* (Machine Learning, 1997; expanded journal version of the 1996 conference paper). Judea Pearl’s book,* mentioned above, discusses Markov networks along with Bayesian networks. Markov networks in computer vision are the subject of Markov Random Fields for Vision and Image Processing,* edited by Andrew Blake, Pushmeet Kohli, and Carsten Rother (MIT Press, 2011). Markov networks that maximize conditional likelihood were introduced in “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,”* by John Lafferty, Andrew McCallum, and Fernando Pereira (International Conference on Machine Learning, 2001).

The history of attempts to combine probability and logic is surveyed in a 2003 special issue* of the Journal of Applied Logic devoted to the subject, edited by Jon Williamson and Dov Gabbay. “From knowledge bases to decision models,”* by Michael Wellman, John Breese, and Robert Goldman (Knowledge Engineering Review, 1992), discusses some of the early AI approaches to the problem.

Оглавление книги

Оглавление статьи/книги

Похожие страницы