For this weeks blog i will be doing a podcast from The testing show on machine learning, which can be found here. The podcast starts with a discussion of a recent tech story by facebook on their AI chat boxes developing their own language and having to be shut down. To understand this story the podcast goes into some detail about what machine learning is. The guest on the show Peter Varhol defined it as follows: machine learning works on the feedback principal, which is how we can produce better results by looking and comparing the results produced by the machine learning algorithm to the actual results and then feeding that back in to adjust the algorithm and produce incrementally
better results. After this they discuss what they think might of happened with the facebook chat bots, they went through multiple views such as being intrigued , to skeptical , to really questioning if the bots actually developed a language at all or just gibberish.
Although the podcast does not come to some big conclusion , the final view on the topic was the most thought provoking. The last view was disputing if the chat boxes really created their own language , and if so how? In the podcast the hosts were talking about how language is semantic and that it would be impossible right now for a computer to be able to understand semantics ,let alone creating a new language. The host theorizes that the chat bot created a language of gibberish to make up for the fact that it could not fully understand what was being asked of it semantically. I thought this was a really interesting point, i had never considered the semantics of language and shows how far computer scientists have to go for real artificial intelligence.
In the last part of the podcast we dive deeper into how machine learning works and some of the pros and cons. One pro is that we can set up a series of algorithms and iterative processes to achieve something we simply couldn’t do on our own. The example given here is the Facebook chat bot again, they talk about how even if Facebook released the source code for the bots, the algorithms would likely be to complicated to understand for almost everyone, and so it would be hard to verify the results for the chat bots. As maybe you can see from the example , our pro also becomes our con. This is because once we get what seems like a reasonable result from our machine learning algorithm, it becomes very hard to trace our answer back through the code.
The last part of machine learning covered in this podcast is the difference between supervised and unsupervised learning. Supervised learning is pretty straight forward we know the result from out training data and try to get a result close to our known result. Unsupervised learning is more difficult to explain, we don’t know the expected result and are just trying to optimize something. The example given for this is airline ticket sales online, the algorithm is built to maximize profits for the airline company. Roughly 3 times a day online ticket sale prices change, there is no exact formula to how the airliners change their prices, the algorithm is simply trying to optimize the amount of money made.
This was a very interesting listen on a topic which i don’t know a whole lot about, but seems to be the way of the future in tech. I picked this article because it was an interesting topic which comes up all the time in articles and discussion boards and i wanted more information, as well as how to test it. Although the podcast did not get around to talking about how machine learning is tested, there is a second part which was recently released which should go into more detail on that. This first section however gave a good overview on the topic and the issues that may arise, this gives you somethings to think about in terms of how testing will be affected compared to testing a standard algorithm. I enjoyed this article quite a bit and will be listening to the 2nd part to see how to go about testing Machine learning, i will be looking out to see what my expectations for testing using machine learning are before and after listening to part 2 of the podcast.
From the blog CS@Worcester – Dhimitris CS Blog by dnatsis and used with permission of the author. All other rights reserved by the author.