{"id":111,"date":"2020-05-30T10:48:09","date_gmt":"2020-05-30T10:48:09","guid":{"rendered":"https:\/\/www.pcube.tech\/blog\/?p=111"},"modified":"2020-05-30T10:48:09","modified_gmt":"2020-05-30T10:48:09","slug":"machine-learning","status":"publish","type":"post","link":"https:\/\/www.pcube.tech\/blog\/machine-learning\/","title":{"rendered":"Machine Learning"},"content":{"rendered":"<p>Machine Learning (ML) is coming into its own, with a growing recognition that ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems. ML provides potential solutions in all these domains and more, and is set to be a pillar of our future<br \/>\ncivilization.<\/p>\n<p>The supply of able ML designers has yet to catch up to this demand. A major reason for this is that ML is just plain tricky. This Machine Learning tutorial introduces the basics of ML theory, laying down the common themes and concepts, making it easy to follow the logic and get comfortable with machine learning basics.<\/p>\n<p>What is Machine Learning?<\/p>\n<p>So what exactly is \u201cmachine learning\u201d anyway? ML is actually a\u00a0<em>lot<\/em>\u00a0of things. The field is quite vast and is expanding rapidly, being continually partitioned and sub-partitioned ad nauseam into different sub-specialties and\u00a0types of machine learning.<\/p>\n<p>There are some basic common threads, however, and the overarching theme is best summed up by this oft-quoted statement made by Arthur Samuel way back in 1959:\u00a0<em>\u201c[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.\u201d<\/p>\n<p><\/em><\/p>\n<p>And more recently, in 1997,\u00a0Tom Mitchell\u00a0gave a \u201cwell-posed\u201d definition that has proven more useful to engineering types:\u00a0<em>\u201cA computer program is said to learn from <\/em><em>experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.\u201d<\/em><\/p>\n<p><em>\u201cA computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.\u201d &#8212; Tom Mitchell, Carnegie Mellon University<\/em><\/p>\n<p>So if you want your program to predict, for example, traffic patterns at a busy intersection (task T), you can run it through a machine learning algorithm with data about past traffic patterns (experience E) and, if it has successfully \u201clearned\u201d, it will then do better at predicting future traffic patterns (performance measure P).<\/p>\n<p><em>ML solves problems that cannot be solved by numerical means alone.<\/em><\/p>\n<p>Among the different types of ML tasks, a crucial distinction is drawn between supervised and unsupervised learning:<\/p>\n<ul>\n<li><strong>Supervised machine learning:<\/strong>The program is \u201ctrained\u201d on a pre-defined set of \u201ctraining examples\u201d, which then facilitate its ability to reach an accurate conclusion when given new data.<\/li>\n<li><strong>Unsupervised machine learning:<\/strong>The program is given a bunch of data and must find patterns and relationships therein.<\/li>\n<\/ul>\n<p>We will primarily focus on supervised learning here, but the end of the article includes a brief discussion of unsupervised learning with some links for those who are interested in pursuing the topic further.<\/p>\n<p><strong>Supervised Machine Learning<\/strong><\/p>\n<p>In the majority of supervised learning applications, the ultimate goal is to develop a finely tuned predictor function h(x) (sometimes called the \u201chypothesis\u201d). \u201cLearning\u201d consists of using sophisticated mathematical algorithms to optimize this function so that, given input data\u00a0x\u00a0about a certain domain (say, square footage of a house), it will accurately predict some interesting value\u00a0h(x)\u00a0(say, market price for said house).<\/p>\n<p>In practice,\u00a0x\u00a0almost always represents multiple data points. So, for example, a housing price predictor might take not only square-footage (x1) but also number of bedrooms (x2), number of bathrooms (x3), number of floors (x4), year built (x5), zip code (x6), and so forth. Determining which inputs to use is an important part of ML design. However, for the sake of explanation, it is easiest to assume a single input value is used.<\/p>\n<p>So let\u2019s say our simple predictor has this form:<\/p>\n<p>where\u00a0\u00a0and\u00a0\u00a0are constants. Our goal is to find the perfect values of\u00a0\u00a0and\u00a0\u00a0to make our predictor work as well as possible.<\/p>\n<p>Optimizing the predictor\u00a0h(x)\u00a0is done using\u00a0<strong>training examples<\/strong>. For each training example, we have an input value\u00a0x_train, for which a corresponding output,\u00a0y, is known in advance. For each example, we find the difference between the known, correct value\u00a0y, and our predicted value\u00a0h(x_train). With enough training examples, these differences give us a useful way to measure the \u201cwrongness\u201d of\u00a0h(x). We can then tweak\u00a0h(x)\u00a0by tweaking the values of\u00a0\u00a0and\u00a0\u00a0to make it \u201cless wrong\u201d. This process is repeated over and over until the system has converged on the best values for\u00a0\u00a0and\u00a0. In this way, the predictor becomes trained, and is ready to do some real-world predicting.<\/p>\n<p>Machine Learning Examples<\/p>\n<p>We stick to simple problems in this post for the sake of illustration, but the reason ML exists is because, in the real world, the problems are much more complex. On this flat screen we can draw you a picture of, at most, a three-dimensional data set, but ML problems commonly deal with data with millions of dimensions, and very complex predictor functions. ML solves problems that cannot be solved by numerical means alone.<\/p>\n<p>With that in mind, let\u2019s look at a simple example. Say we have the following training data, wherein company employees have rated their satisfaction on a scale of 1 to 100:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-113\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11.png\" alt=\"\" width=\"1720\" height=\"1177\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-1536x1051.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-990x677.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-1320x903.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/11-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>First, notice that the data is a little noisy. That is, while we can see that there is a pattern to it (i.e. employee satisfaction tends to go up as salary goes up), it does not all fit neatly on a straight line. This will always be the case with real-world data (and we absolutely want to train our machine using real-world data!). So then how can we train a machine to perfectly predict an employee\u2019s level of satisfaction? The answer, of course, is that we can\u2019t. The goal of ML is never to make \u201cperfect\u201d guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful.<\/p>\n<p>It is somewhat reminiscent of the famous statement by British mathematician and professor of statistics\u00a0George E. P. Box\u00a0that \u201call models are wrong, but some are useful\u201d.<\/p>\n<p><em>The goal of ML is never to make \u201cperfect\u201d guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful.<\/em><\/p>\n<p>Machine Learning builds heavily on statistics. For example, when we train our machine to learn, we have to give it a statistically significant random sample as training data. If the training set is not random, we run the risk of the machine learning patterns that aren\u2019t actually there. And if the training set is too small (see\u00a0law of large numbers), we won\u2019t learn enough and may even reach inaccurate conclusions. For example, attempting to predict company-wide satisfaction patterns based on data from upper management alone would likely be error-prone.<\/p>\n<p>With this understanding, let\u2019s give our machine the data we\u2019ve been given above and have it learn it. First we have to initialize our predictor\u00a0h(x)\u00a0with some reasonable values of\u00a0\u00a0and\u00a0. Now our predictor looks like this when placed over our training set:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-114\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/12.png\" alt=\"\" width=\"224\" height=\"23\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/12.png 224w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/12-215x23.png 215w\" sizes=\"(max-width: 224px) 100vw, 224px\" \/><\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-115\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13.png\" alt=\"\" width=\"1720\" height=\"1178\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-1536x1052.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-990x678.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-1320x904.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/13-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>If we ask this predictor for the satisfaction of an employee making $60k, it would predict a rating of 27:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-116\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14.png\" alt=\"\" width=\"1720\" height=\"1178\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-1536x1052.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-990x678.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-1320x904.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/14-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>It\u2019s obvious that this was a terrible guess and that this machine doesn\u2019t know very much.<\/p>\n<p>So now, let\u2019s give this predictor\u00a0<em>all<\/em>\u00a0the salaries from our training set, and take the differences between the resulting predicted satisfaction ratings and the actual satisfaction ratings of the corresponding employees. If we perform a little mathematical wizardry (which I will describe shortly), we can calculate, with very high certainty, that values of 13.12 for\u00a0\u00a0and 0.61 for\u00a0\u00a0are going to give us a better predictor.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-117\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/15.png\" alt=\"\" width=\"224\" height=\"23\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/15.png 224w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/15-215x23.png 215w\" sizes=\"(max-width: 224px) 100vw, 224px\" \/><\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-118\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16.png\" alt=\"\" width=\"1720\" height=\"1178\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-1536x1052.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-990x678.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-1320x904.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/16-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>And if we repeat this process, say 1500 times, our predictor will end up looking like this:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-119\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/17.png\" alt=\"\" width=\"224\" height=\"23\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/17.png 224w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/17-215x23.png 215w\" sizes=\"(max-width: 224px) 100vw, 224px\" \/><\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-120\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18.png\" alt=\"\" width=\"1720\" height=\"1178\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-1536x1052.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-990x678.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-1320x904.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/18-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>At this point, if we repeat the process, we will find that\u00a0\u00a0and\u00a0\u00a0won\u2019t change by any appreciable amount anymore and thus we see that the system has converged. If we haven\u2019t made any mistakes, this means we\u2019ve found the optimal predictor. Accordingly, if we now ask the machine again for the satisfaction rating of the employee who makes $60k, it will predict a rating of roughly 60.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-121\" src=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19.png\" alt=\"\" width=\"1720\" height=\"1178\" srcset=\"https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19.png 1720w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-300x205.png 300w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-1024x701.png 1024w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-768x526.png 768w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-1536x1052.png 1536w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-990x678.png 990w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-1320x904.png 1320w, https:\/\/www.pcube.tech\/blog\/wp-content\/uploads\/2020\/05\/19-379x259.png 379w\" sizes=\"(max-width: 1720px) 100vw, 1720px\" \/><\/p>\n<p>Now we\u2019re getting somewhere.<\/p>\n<p>Machine Learning Regression: A Note on Complexity<\/p>\n<p>The above example is technically a simple problem of\u00a0univariate linear regression, which in reality can be solved by deriving a simple normal equation and skipping this \u201ctuning\u201d process altogether. However, consider a predictor that looks like this:<\/p>\n<p>This function takes input in four dimensions and has a variety of polynomial terms. Deriving a normal equation for this function is a significant challenge. Many modern machine learning problems take thousands or even millions of dimensions of data to build predictions using hundreds of coefficients. Predicting how an organism\u2019s genome will be expressed, or what the climate will be like in fifty years, are examples of such complex problems.<\/p>\n<p><em>Many modern ML problems take thousands or even millions of dimensions of data to build predictions using hundreds of coefficients.<\/em><\/p>\n<p>Fortunately, the iterative approach taken by ML systems is much more resilient in the face of such complexity. Instead of using brute force, a machine learning system \u201cfeels its way\u201d to the answer. For big problems, this works much better. While this doesn\u2019t mean that ML can solve all arbitrarily complex problems (it can\u2019t), it does make for an incredibly flexible and powerful tool.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine Learning (ML) is coming into its own, with a growing recognition that ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems. ML provides potential solutions in all these domains and more, and is set to be a pillar [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":122,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/posts\/111"}],"collection":[{"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/comments?post=111"}],"version-history":[{"count":1,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/posts\/111\/revisions"}],"predecessor-version":[{"id":123,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/posts\/111\/revisions\/123"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/media\/122"}],"wp:attachment":[{"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/media?parent=111"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/categories?post=111"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pcube.tech\/blog\/wp-json\/wp\/v2\/tags?post=111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}