In this post I wanted to dive into a little more detail on supervised versus unsupervised machine learning. Basically, as described in the last post in this series, the focus has to do with the variables that are known and can be observed. The detail below comes from Dataconomy. I’ll provide the link at the end of the article.
Supervised learning is the Data mining task of inferring a function from labeled training data.The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called thesupervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a “reasonable” way.
In Data mining, the problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution.
The Dataconomy link below goes into more detail with examples using fruit. The idea is that once you have successfully trained the machine to reproduce the expected responses, using labeled training data, then you can accurately (to some degree) predict future occurrences. Once the machine recognizes a thing, or some data, then when it sees the functions associated with that data again at some point, it will be able to somewhat accurately guess that this is once again an occurrence of the same “thing”. As one can easily tell, this is purely in layman’s terms. I am new to data science and my posts on this subject are from a selfish desire to keep applying what I am learning and to hopefully add some knowledge along the way.
So, above, we have trained our machine to let us know when it sees future occurrences of a particular fruit using supervised learning. Now, where we will most likely see errors, is when we have functions or features that are common to more than one type of fruit. In general terms, though, our guesses should suffice for this type of activity. What if we are not even sure that what we are looking at is a fruit? Not only could we be unaware of the “thing” we are looking at, but we might also not be able to determine any clear features to describe the “thing”. In this case, we do not know what we are looking at or how to describe it. In fact, we might not know that there is a “thing” at all. This is where unsupervised learning can come in to play. This to me, is the more exciting part of data science. Knowing I have a particular type of issue and looking for future examples of that issue, is perfect supervised learning application for an industry like information technology, but sometimes we will never have enough data to accurately define a problem exists or be able to describe it.
Using unsupervised learning, we can just use our data to tell us what is happening. It might take even more time to find an algorithm that provides us the best view of our data set and gives us any clarity. The thing to remember with the algorithms is that the math is always right. When we choose an algorithm to apply to our data set, it will perform the mathematical functions it was designed to reproduce. That does not mean that the answer provided by any particular algorithm will be helpful for a given scenario. This is where I see the “science” in data science really becoming useful and creative.
Here is a great video describing supervised and unsupervised learning:
- Dataconomy – What’s the difference between supervised and unsupervised learning
- Machine Learning Mastery – Supervised vs unsupervised learning
Like what you see here? Sign up for our newsletter