Category Archives: Independent Study

When It’s Easier to Just Do Everything [More] Manually

Sometimes doing things the hard way is a lot easier. The more tools you use and the more complicated those tools are, the more complexity you have to deal with. So while it may be nice to call a few simple methods and have a framework do everything for you behind the scenes, you’ll have to to learn how the framework works and maybe realize down the line it can’t do everything you want it to do. There may even be incompatibilities with other parts of your program.

This week in my independent study, I tried to figure out how I could run a machine learning model on Android. I had some success, but quickly discovered some complications. Android has the option of using TensorFlow Lite, which seems great. However, I built my model using Keras, so I needed to convert the model. That was relatively straightforward, but before I started calling my model, I realized that I needed to extract audio features on Android. This required using Python code on Android, particularly Librosa and Numpy. This led me to other potential frameworks to get this to run.

This would lead to a bloated app, so I looked into Google Cloud services and thought about running server-side code there. I already set up a way to upload and download files with Google FireBase, so this seemed reasonable. But this is a paid service and would be even more work to make it functional.

I already have all the code running on my personal machine, so what if I just set up a server with a REST API to upload and download files and run the necessary Python code locally? If I could get that working, it would be trivial to call the code I’m already running.

Getting the server to upload and download files is what I did this week. I used Flask, which makes it very easy to get a basic server up and running. For the time being, data can only be transmitted via WiFi, as there will be uncompressed audio files transmitted back and forth.

While there was some additional work to figure out HTTP requests on Android, already knowing the basic building blocks gives me much more flexibility moving forward. But with great flexibility comes great responsibility, and proper error checking will be an important part of development moving forward. Security measures are also very important to consider before deploying an app to production.

The next iteration will involve running the machine learning code with a REST API call and getting back both the results of the model’s prediction and any data I will need to plot within the app.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Improving the Spoken Digit Speech Recognition Machine Learning Model

After getting a simple machine learning model to recognize spoken digits, I was able to begin the iterative process of improving the model. Using only MFCC’s, the model was failing more than desirable, reaching a maximum of 60% accuracy when using validation data (my own voice, which was not used in training the model).

Below you will see plots of a sample of results when validating the model. For each digit, there is the extracted MFCC features, the actual spoken digit, the predicted digit by the model, and the certainty. There is also a plot of the certainty for each other digit for that recording.

This is just a sample of a larger validation set, and the actual results in this first model was only 45% accurate. But this shows that for all of these digits except 3 and 5, the model was 99% to 100% certain of the result. The differences in the MFCCs are subtle, but stark differences in color appear to be more likely to be correct, whereas 5 is clearly closer in color to 1, which it was mistake for. Additionally, every single audio clip of 3 was mistaken for a 0 using this model.

Retraining the model with different parameters may help in this case, but we can also hypothesize about the reason for these mistakes. Perhaps the MFCC is finding patterns in vowels that make “zero” and “three” look identical. If that’s the case, features that can detect consonants might help improve results. This sounds pretty obvious anyway, so it might be a good next step on the next iteration.

But first, let’s retrain the model without any changes.

Okay! This 3 was very accurately predicted. But the total accuracy of validation was only 50% (remember, this only shows a sample size of 10). Inspection of actual results now shows that 3 is sometimes mistake for a 2, and vice versa. This model is slightly better, bit still flawed. Which makes sense, because no changes have been made to the model and we just got lucky that it learned to be a bit better this time.

I’ve been training with 25 epochs, and getting 95-97% accuracy during training, and 93-97% accuracy using test data (from the same dataset as the training data, which was not used to train the model). Those results are pretty good, so maybe we can use fewer epochs and prevent some overfitting.

This certainly looks promising. With 95% accuracy during training, and 93.8% accuracy using test data, the results are still pretty good. However, the validation data with my voice is now 57.5% accurate! Only a single 3 was mistaken for a 0.

So I’m using a dataset of 4 voices to train and test, and my own voice to validate. But more data is probably better, so let’s use my voice to train the model and take a random sample to validate.

The plot is looking good! Each of these was very accurately predicted. During training and test data was 97% accurate. The validation data was 100% accurate. Of course, now that the validation data contains all voices that were trained, it’s more likely to be correct. Furthermore, the sample is small. So let’s see what happens if we use a new voice to validate. I had my roommate record himself saying each digit and used only his voice for validation data.

In general, the model is much more certain of its guesses. The final validation result was 80% accuracy, so not perfect but a major improvement. This much of an improvement was gotten just by adding more data and making small modifications to the model.

The importance of collecting data in order to improve a model is apparent. Even with 80% accuracy, there is still some predictive power. If this can be found to be useful, further data can be collected as it is used and this new data can be cleaned and used to train better models.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

A Machine Learning Model That Recognizes Spoken Digits (Introduction)

This week, I managed to prove (to myself, at least) the power of MFCCs in speech recognition. I was quite skeptical that I could get anything to actually recognize speech, despite many sources saying how vital they are to DSP and speech recognition.

A tutorial on Tensorflow I found a couple of months ago sparked the idea: if 2-dimensional images can be represented as a 1-dimensional array and used to train the model, perhaps the same could be done with features extracted from an audio file. After all, the extracted features are nothing but an array of coefficients. So this week, armed with weeks of knowledge of basic Tensorflow and signal processing, I finally tried to get it to work. And of course, many problems arose.

After hours of struggling with mismatches in the shape of the data, waiting for the huge dataset to reload when I made a mistake, and getting no results, I finally put together the last piece of code that made it run correctly, and immediately second-guessed the accuracy of the model (“0.99 out of 100, right???”).

Of course, when training a model, a result this good could be a case of overfitting. And indeed it is, because it is only 95% accurate when using separate test data. And even this percentage isn’t the whole story. The test data comes from the same dataset, which has a lot of recordings of each digit, but using only 4 voices. It’s quite possible that there are patterns found in the voices that would not exist in other voices. This would make it great using a random sample from the original dataset, but possibly useless for someone else. There’s also the problem of noise, which MFCC is strongly affected by. So naturally, I recorded my own voice speaking digits and ran it with the model. Unfortunately, I could only manage approximately 50% accuracy, although it is consistently accurate with digits 0, 1, 2, 4 and 6. Much better than chance, at least!

This is a very simple model, which allows you to extract only MFCCs from an audio recording of a spoken digit (0 through 9) and plug it into the model to get an answer. But MFCCs may not tell the whole story, so the next step will be to use additional extracted features to get this model to perform better. There is also much more tweaking I can do with the model to see if I can obtain better results.

I’d like to step through the actual code next week and describe the steps taken to achieve this result. In the meantime, I have a lot more tweaking and refactoring to do.

I would like to mention a very important concept that I studied this week in the context of DSP: convolution. With the help of Allen Downey’s ThinkDSP and related lecture, I learned a bit more detail on filtering of signals. Convolution is essentially sweeping one signal over another to get a new signal. In DSP, this is used for things such as low-pass filters and adding echo to audio.

Think of an impulse as an instantaneous tone consisting of many (or all) frequencies. If you record this noise in a room, you will get a recording of the “impulse response”. That is, how all of the frequencies are affected by the room over time. The discrete Fourier transform of this response is essentially a filter, because it gives the amplitude of each frequency in the impulse response, including all echos and any muffling. Multiplying these amplitudes by the DFT of an entirely different audio signal will modify each frequency in the exact same way. And thus, to the human hear, this different audio signal will sound like it does in the same room. If this concept is interesting, I encourage you to watch the lecture and work through the examples in the book.

I think these topics may come in handy if I need to pre-process recordings, in the event that noise is in fact causing errors in the above model.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Analysis and Comparison of Ascending and Descending Scales

With added pressure in realizing the semester is half over, as well as an upcoming interview for a position dealing with DSP and machine learning, I came into this week with newfound motivation. The focus that comes with a little bit of pressure is paradoxically quite freeing.

I had some issues when attempting to compare features between audio files. In hindsight, it was an obvious mistake that I had already learned in theory. But of course, applying theoretical knowledge always reveals the points of weak understanding.

As I’ve written in the past, MFCC (mel-frequency cepstrum coefficients) are most common with speech processing. There are time slices taken from the audio file and by default Librosa calculates 13 coefficients commonly used for speech processing. The MFCC is an array of time slices, each represented by 13 coefficients. These are plotted below, with color representing magnitude (from dark blue to dark red), time slices on the y-axis, and coefficients on the x-axis. The waveform, MFCC Delta, and Chromagram are also plotted.

The chromagram is of particular interest, as it extracts the frequencies in the time domain, revealing that the scale on the left is ascending and the scale on the right is descending. You can even see where my finger slipped playing the descending scale.

Analysis of an ascending and descending scale

This shows the importance of scale invariance when comparing features, which will also come to play in machine learning. This is why frames of equal time-slices, which usually overlap, are taken from an audio sample.

Originally, I was extracting features without cutting the audio files to the same size. This resulted in a larger MFCC. Attempting to plot the difference between the features caused an error. Files with the same length, however, naturally resulted in two arrays of the same size. Because they were only slightly off, I wanted to be sure that my understanding was correct, so I made the ascending scale exactly half the size and ran the program again.

Indeed, cutting the first sample in half reveals that the resulting matrix has half as many MFCC time slices. Librosa extracts the first 13 mel-frequency coefficients, so each array will be length of 13 and each time slice will have one of these arrays. Trying to find the difference by subtracting one matrix from another results in this error message:

ValueError: operands could not be broadcast together with shapes (44,13) (87,13)
Analysis after cutting the ascending scale in half

Also notice the chromagram only reveals 4 major frequencies. And because a chromagram is in the time domain, but the plot still has the same x-axis, the notes end at approximately the halfway point.

Plotting the absolute difference between MFCC features may not be visually illuminating, but potentially has uses for pattern identification. The real utility comes from comparing an audio sample to existing files. Take a look at the ascending versus ascending scales:

The absolute difference in MFCC features between ascending and descending scales

There is little difference in the higher coefficients, but some strong differences in the first coefficient. There are irregular differences through the rest of the plot, both in time and within coefficients. In isolation, this doesn’t reveal much. But when instead comparing two ascending scales offset by 0.1 seconds, the differences are very small. There are regular spikes in the first coefficient however, likely due to the earlier change of note in one sample.

The absolute difference in MFCC features between ascending scales, offset by 0.1 seconds

This lack of difference is one example of how a machine learning algorithm can detect whether a audio sample fits into a group. Actually training these models will be the topic for next week.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Creating Chords from Sine Waves in Python

This week as part of my independent study I worked on feature extraction in Python. I have been using Python for Engineers as a reference, and it describes basic digital signal processing. As an exercise, I expanded on the code found in that chapter.

It’s a non-trivial task to create a sine wave in code (although compared to the most complex aspects of DSP, it’s a cakewalk). A sine wave will create the purest tone possible, as it creates a constant oscillation. This oscillation is what you perceive as a pitch. The equation for a sine wave is given as:

y(t) = A sin(2πft + φ)

where A is amplitude, f is the frequency, t is time, and φ is the phase in radians. We can ignore phase because this simply indicates where the wave starts at t=0, and this doesn’t matter to our ear. We hear the same oscillation regardless of where it starts.

Take a look a the code for creating a sine wave. Some of the details aren’t as important, but you can see the book for a description. The line that actually creates the sine wave values is:

sine_wave = [np.sin(2 * np.pi * frequency * x/sampling_rate) for x in range(num_samples)]

If you aren’t familiar with list comprehension in Python, this is just using the sine wave equation above, substituting time, t, as a specified number of samples divided by the sampling rate. The result is a list of values representing a sine wave. In reality, this is all an audio file is, with some additional encoding (and usually more interesting oscillations than a sine wave).

So what if we wanted to do this for a chord of multiple sine waves? Maybe using more list comprehension? Sure.

# Note Frequencies
a4 = 440
c5 = 523.25
e5 = 659.25
chord = [a4, c5, e5]

sine_waves = [[np.sin(2 * np.pi * freq * x/sampling_rate) for x in range(num_samples)] for freq in chord]

This is doing the same as above, only it’s doing it for each frequency in a list of frequencies. But that’s the easy part. The original code just multiplies the samples by the amplitude, then converts them to hexadecimal values and writes it to the file:

for s in sine_wave:
    wav_file.writeframes(struct.pack('h', int(s*amplitude)))

But we can’t simply do that for each sine wave in succession, or we’d get different sine waves playing one after another. That’s an arpeggio, not a chord!

So we have to get a little creative. But not too creative. If you think of each sine wave playing from a separate speaker, what you hear is the sum of the air pressure from each speaker. A single speaker is the same story: it’s just playing the sum of the three sine waves. So then, iterating through each index, we can add the amplitudes of each individual sine wave. I also had to reduce it enough to store the value as a short int, by dividing by two.

# Only write samples to the end of the shortest sine wave
shortest_sample_len = min([len(j) for j in sine_waves]) 
for i in range(shortest_sample_len): 
    current_frequencies = [wave[i] for wave in sine_waves]
    value = sum(current_frequencies) / 2
    wav_file.writeframes(struct.pack('h', int(value*amplitude)))
Output of the sums of the sin waves before multiplying by amplitude

Python for Engineers goes on to describe how to use a Fast Fourier Transform to get the frequencies from the wave file with a single sine wave. But the code works just as well for the sine wave chord! This is because the FFT is an array that treats each index as a frequency, and the value at that index is the frequency’s amplitude. This means that regardless of the number of tones in a sample, the FFT can be plotted and will reveal outliers: those indices with a much higher amplitude. These are the frequencies in the audio file.

One last thing to keep in mind is that since the indices are used to represent the frequencies, they will be whole numbers. For example, c5 = 523.25hz will show up as a spike at indices 523 and 524, which 523 having the larger value of the two.

Output of finding the frequencies of the chord created above

Full code for creating a chord in Python is posted here.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Independent Study: The Plateau, the Snag, and the Obstacle

Researching more on audio feature extraction last week gave me a lot to think about. As it settled in, I came up with great ideas for hypothesis to test and specific applications for them. The kind of ideas that lead to turning off your cell phone, skipping lunch and dinner, and get things working.

Unfortunately, as mentioned last week, I do not have the time learn audio processing algorithms to the extent that I can implement them, so I will have to make a choice on a library that will help. A further complication was getting code to run in Android. Libraries exist to run Python code in Android, but the Python libraries I’ve found for audio analysis are lacking in features and I would need to use a combination.

The most comprehensive library I’ve found is Essentia. Of course, it’s so comprehensive I won’t be able to use the code for a professional app without a commercial license. Luckily there is a noncommercial license available that will allow me to get results for the project and determine if there are commercial applications.

Essentia is a C++ library. So the good news is I get to use JNI and the Android NDK (Native Development Kit) to run C++ code. Getting C++ code to run from an Android Activity is straightforward enough, but I do worry about potential complications in running Essentia. There are a number of dependencies that I fear might cause trouble with a feature I might want in the future. These are kept to a minimum with a special flag during compilation for Android. But alas, my paranoia strikes.

Because Essentia is open source, I am at least able to see implementations of the audio processing algorithms, and the code is well-documented with references to studies. Signal processing is a degree, not just a semester-long project. I certainly appreciate that fact more over the past couple of weeks. But this will be a great overview and using the code will still require understanding of the underlying processes.

Progress was made on converting files from audio to basic byte data. When I was considering Python libraries, I was under the assumption I could use WAV files and easily get byte data (for example, with librosa). Android’s MediaRecorder doesn’t support saving WAV files, so other formats must be used.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Audio Feature Extraction

This week I’ve continued working on the application part of my Android application, but I’ve also started to “dig deeper” (apprenticeship pattern blog post coming soon) and learn a bit more about audio processing.

I started with general concepts of feature extraction a couple of months ago now. Understanding how to use Python libraries to extract features is simple enough, but actually understanding how they work and why they work is another story. This week’s research has revealed an underlying elegance to the concept of signal processing and helped me reach a higher level of understanding and excitement for this project.

The single best explanation has been a video by Youtuber 3Blue1Brown on Fourier Series. I recommend all of his channel because he has an elegant way of describing and visualizing every topic he speaks about. He helped me understand the beauty of Calculus, and in my process of digging deeper I wound up watching his video on the uncertainty principle which was surprisingly relevant to signal processing. Understanding the specifics of the math behind signals and waves, and knowing the fact that mathematical equations are a language used to describe straightforward physical phenomena is key. This knowledge makes daunting concepts easier to break down. Seeing the same concepts used in different contexts also helps solidify them in your mind. And if you’re implementing this in code, it will make it much easier to remember the necessary logical steps required to extract a feature.

This entire tangent (and however useful, it was an unexpected tangent) started with trying to better understand the types of feature extraction that are used in speech recognition. By the way, you know you’re digging deeper when an article with an estimated read time of 11 minutes takes you a few hours to get through with all the additional research.

And universally, as far as I can tell, the first step in signal processing and feature extraction is the Fourier transform, which is simply turning a raw audio signal into separate sine and cosine signals. I say simply, but as the 3Blue1Brown video states, this seems a bit like figuring out which colors make up a mixed up can of paint. It turns out, however, that clever math makes it quite obvious which summation of sine and cosine signals make up a complex signal. I encourage you to watch the video to understand why.

The summation of cosine and sine waves is considered the frequency domain, while the original signal is in the time domain. From the resulting frequency domain, the individual signals can be normalized by taking the log magnitude of the signals and performing an inverse Fourier transform.

This is a new concept called a cepstrum, and it is one of many possible transformations you can make on a signal to begin to analyze the data. Its usefulness comes from the ability to see changes in individual waves. Additional operations can be performed to reveal new insights into patterns in a signal. Determining which if these works best is part of the process.

These individual transformations would be very interesting to implement in code. I may not get a chance to do so for this project, but the understanding of the underlying operations will help in using existing libraries.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Android Audio Recording and Playback, and an Animation False Start

This week, I spent a lot of time with animations in Android. And yet, I still could not get transitions to work to my liking. After reading Android’s docs, I resorted to a few different videos and tutorials, some of which seemed straightforward but were using methods of Android past.

The goal was to animate the title of a single audio file in a list, to move to the top of the screen and become a heading for details about that file. Animations are easy enough if you’re transitioning between Activities. But it appears that having a RecyclerView in a Fragment to list audio files adds some complications. I did successfully animate between the screens, but the first item in the RecyclerView’s list was animated, and it abruptly changed to the correct title when the motion ended. The issue is that this transition animation requires two elements to have a shared “transitionName”. Because I am using Card objects to display each of the audio files, only one Card can use that transitionName and be animated.

The solution is to set the name in the RecyclerView’s ViewHolder, so that when objects are bound to a specific card, they can get a unique transitionName. This can then be applied to the Fragment’s View before the animation begins. Attempts to do this caused some problems due to Android’s Lifecycle. Many a blog post have been written on this subject, but I’d like to discuss it in the near future to gain a stronger understanding.

All of this is to say: I want to efficiently getting from point A to point B by realizing which subjects and features are most important. Understanding the Android Lifecycle is clearly more important than an animation, and apparently prerequisite knowledge. And recording and playback are at the heart of the app itself. So my progress in animation is stashed in Git and ready for me to continue once I accomplish these other tasks.

Luckily, getting audio to record and play back was a much more enjoyable process. This might be due to my greater interest in the feature, but I was able to break down the problem and troubleshoot issues much easier.

I erroneously believed that my simple spike project would easily translate to my app. Android’s guide to MediaRecorder and MediaPlayer made it simple to get something quickly up and running. However, using their code directly would create a nightmare of an Activity, which neither properly separate concerns nor follow basic OOP principles. Furthermore, I needed the recording to begin immediately upon opening a new “RecordActivity”. This caused some issues with Android’s lifecycle, so I took to opportunity to explore that. The problem came from trying to start recording in onCreate(), which did not provide enough time to load the MediaRecorder into memory. The solution was to start recording on the onResume() event. However, this may be called more than once in the life of an Activity, so I simply check if the MediaRecorder is currently recording, and start recording if it isn’t.

I spent a bit too much time trying to find the recorded audio files in phone’s physical storage. It seems they do not appear, and I haven’t found a good explanation for this. Luckily, Android Studio’s Device File Explorer did reveal that the files were saved and properly recorded audio.

From there, implementing audio playback as a Service (which in Android is essentially an Activity without a UI) was rather smooth. This also allows playback to be initiated from anywhere in the app by passing the file name in a single line of code.

I have always been a function-over-form kind of guy. I made significant progress on the app this week in the function department. Hopefully the form will come in time.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Android Activities and Fragments: Getting Them Straight

AndroidX has brought a few changes to the Android framework, but the general architecture remains the same. Likewise, the few years since I’ve first learned Android has completely changed how I feel about Android. At this point, I am developing the basic architecture of my independent study app.

There are a lot of conflicting opinions about Activities and Fragments in the Android Developer community. A few years ago with my limited Android experience, I did not completely understand. I likely don’t completely understand now. However, I have better tools and more programming experience to see how they should be used, as well as to make my own decisions on how to use them.

At first, I found myself paralyzed with confusion on how Google wants its developers to use Activities and Fragments. As an example of how programming concepts translate well to other technologies, learning Angular helped me understand the difference. Activities are a single “thing” that a user does, and can be thought of as a web page. A Fragment should be used for a modular UI component, and function as Components do in Angular.

This isn’t a perfect analogy, as the frameworks are very different, but this is a good way to proceed when deciding how to structure your app. Google’s Introduction to App Architecture guide is a great explanation, and the most important thing to remember is to maintain a separation of concerns. In the end, Activities and Fragments aren’t a significant part of your app. They contain your app. They are something your app uses to work within the Android framework, and your business logic should be elsewhere because Android will pause or stop any Fragments or Activities it needs to if, for example, memory is running low. There is no guarantee they will maintain state unless you take additional steps to ensure it does so.

In researching opinions on how to use them, some people mentioned that they’ve seen developers decide on having a single Activity, and adding all features with Fragments. This is a tempting solution, but that might result in complex logic to control navigation. Furthermore, Fragments are meant to communicate through their parent Activity. In a large app, this would likely result in many implemented interfaces and complicated callbacks. Bloated Activities a big NO.

Likewise, others mentioned only using Activities and not adding complexity with Fragments. This seems a bit more reasonable, but restricts reusability of the Fragments in the UI. Only one Activity can run at a time. The beauty of Fragments is they can easily be dropped into a layout and reused. If they are designed to interact through their parent Activity, two Fragments can be shown at the same time on a larger tablet, even if they must be shown on different screens on a phone. Creating Activities only would mean either creating new Activities with repeated code for a tablet, or reusing the phone UI at the expense of user experience.

I’ll reiterate: separation of concerns. In Android, or any framework, understand the philosophy behind a class and component before deciding to try to simplify things. It’s likely that they were designed to prevent the problems you will run into.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

CS-499: Independent Study Introduction

This semester, I’m building an Android app for an independent study.

The Proposal

After building a breadboard computer and beginning to understand electronics, I started to learn about audio electronics. This sparked (or reignited) a latent interest in audio processing. Working in a call center selling audio equipment is actually the reason I was motivated to return to school to study computer science, so I feel there is good reason to pursue it in my final semester.

I also began with Python, and moved to Android apps in my early programming learning process, so I’d like to refresh these skills and dig deeper. This project will serve as a constant reminder to how far I’ve come from those early struggling days.

So the app will use Python machine learning libraries to analyze user audio data and provide the user feedback based on this data. I am purposely being vague; not because I think I have the next big idea on my hands, but because I expect many changes as I struggle with the machine learning model.

Regardless of where the model winds up, this is a software development independent study. I will have a working, professional app within the next 4 months, using the technologies I have proposed.

The Motivation

Why, though? As an independent study, with an already-busy schedule, I’m going to have to set aside time each week when I work on this project, no matter what. Originally, I wanted to take Robotics this semester and I was signed up for it originally, but unfortunately there is not enough time in my schedule. On Tuesdays I’m sure I will find my mind wandering, dreaming of playing with robots instead of struggling through machine learning and Android Studio.

But that is part of my reasoning. I want to find the motivation to do things with a self-imposed deadline. These are tools I want to learn, to create and potentially sell a product. At the end of this degree, I want to be able to show a project to future employers that say, “this is what I did. Not because I had to, but because I enjoy it”. I want to be able to have users who give me unfiltered feedback. I want to fail, figure out why I failed, and eventually succeed.

Of course, I have done all of these to some extent already. But this is my following my current interests and goals.

The Progress

I have made a couple small spike projects to begin relearning Android and get started with Tensorflow. I have already built the back-end and gotten an app to communicate with it. I’ve also done basic user authentication.

When I first proposed this project, I set a schedule of features and tasks to complete. Due to other projects which used the same technologies and flashes of motivation I’ve already worked ahead a bit, but I still plan to complete each portion according to the schedule, as best as I can. The machine learning model will be concurrent work as I adjust it.

Next week, I will go into more detail on the tasks I’ve completed so far.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.