http://news.mit.edu/machine-learning-image-object-recognition-0918
Speech recognition systems usually require hundreds of thousands of transcripts in order to work properly. This new model of machine learning learns through audio visual associations similar to how a child would learn, where they correlate speech with related images. The researchers then modified the model to associate specific words with specific pixels. It works by dividing the image into a grid of cells consisting of patches of pixels while dividing the audio portion into segments of the spectrogram. It then compares each image cell to each audio segment and produces a similarity score for each individual one. The researches call this comparison method a “matchmap”. One good use of this is learning translations between all of the languages on the planet. There are an estimated total of 7,000 languages spoken wordwide and only about 100 have trascription data for speech recognition. With this model, two different language speakers can describe the same image and the machine can learn the speech signals of the two languages and match the words, making them translations of one another. This is interesting because that means the model does not require actual text to learn to translate. In languages where things are not commonly written down, the machine can translate meanings where other methods that are common today cannot.
This method is important to note because machine learning is a growing topic in the world of computer science and this could open up all kinds of possibilities. It is a new and innovative way to try to solve a problem and might become something needed for future jobs or projects in life. With this matchmap system, speech recognition no longer needs to be manually taught hundreds of thousands of transcriptions and examples of those transcriptions in order to function properly. This is increasingly important since new words enter our dictionary and become common for people over time. Currently, the machine can only recognize a few hundred words but in the future it could help advance the machine learning field while also improving the speech recognition software that exists such as siri.
From the blog CS-443 – Timothy Montague Blog by Timothy Montague and used with permission of the author. All other rights reserved by the author.