Author Archives: James Young

Reflection on Independent Study

With this year’s independent study presentation complete and the semester’s coming to a close, I have a project that is working. Naturally, there’s a lot more I wish I could have done. But I’ve learned a lot not only about individual technologies, but also how to plan a large project and schedule tasks.

Setting deadlines for certain portions of the project was helpful, but I did deviate from the schedule a bit in order to get more-important parts done, or prerequisite issues that were unexpectedly necessary. Some flexibility is certainly required when entering into a project without a working knowledge of the technologies at hand.

I love throwing myself into things that I know will be difficult, and that’s kind of what I was going for in this project. Turning big problems into smaller, manageable problems is one of the main reasons I enjoy software. But it’s a balancing act, because I was essentially throwing myself into 3 big problems: Learn signal processing; learn machine learning; implement a mobile application. There were a few weeks of struggling with new technologies for more hours than I had planned, and knowing that I was barely scratching the surface of only one of these big problems caused considerable stress. At times, I thought there would be no way I could get a single portion done, let alone all three.

But somehow each week I got over a new learning hump just in time to implement my goal for the week, while concurrently doing the same for my capstone sprints. Deadlines are a beautiful thing. I did a speech in my public speaking class in my first year that discussed Parkinson’s law, which says work will expand to fill the time available for its completion. This idea has followed me ever since and has proven to be true.

In preparing my presentation in the last couple of weeks, I found a couple of issues and had a couple realizations on how I could do things differently (read: better). I added some of them as I went along, and I was tempted to completely revamp the machine learning model before my presentation. Instead of the inevitable all-nighter it would have required, I managed to restrain myself and save it for the future. But this shows the importance of presenting your work as you go along, as one does in a Scrum work environment. Writing about and reflecting on issues and solutions in a simple way forced me to re-conceptualize things, both in my blog posts as I went through the semester, and in my final presentation.

While I had guidance from my advisor on how to approach and complete the project, planning and implementation was on me. There were definitely pros compared to my capstone’s team project. For example, I knew every change that was made and I had to understand all the working parts. Getting things done was mostly efficient because I only had to coordinate my own tasks. However, in my capstone I was able to bounce ideas off of team members who could provide a different perspective when we both understood the language or framework at hand. Delegating tasks also made it easier to completely understand the subtle details that allows for efficient use of a technology. Both of these experiences taught me transferable skills which I’ll be able to use in the future in solo and team projects.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Capstone Sprint 3 Retrospective

The third sprint in my capstone was a race to presenting a working project. This sprint brought new difficulties in coordinating the team and getting work done in a logical order. The issue wasn’t our team work, but rather trying to work around the pressure of getting the work done. Many issues took longer to resolve than expected, meaning we had to put a pause on some issues to help each other get prerequisite issues done.

My contributions

Set up CI for Angular to automate testing and prevent failed pipeline merges

Create Angular Service to update backend

Create Docker container for Spring Boot

Create Docker container for Angular

Establish communication between all 3 docker containers (MongoDB included)

Retrospective

I was really happy this sprint with how our team dropped what they were doing for a team meeting. We all had snags with almost ALL of our issues, and the order in which we had planned to do things was not possible. For example, while setting up the CI, I decided to explore Docker a bit more in detail so we could potentially use the same Docker image for the CI as we used in our project. This meant that we as a team had to focus on Docker sooner than expected.

At the same time, there were some problems while working on different issues synchronously. The biggest problem came from a couple incidents of copying and pasting code from other branches to get it to work in their code, instead of waiting for the code to be merged. If someone else’s issue is blocking you from working, the team should try to resolve the issue as soon as possible first, then get the updated master branch. While copy+paste might seem easier in the short term, it has a potential to cause issues as people try to merge their own requests. For example, code was copied from one of my branches because a feature was required for another issue. These changes were then merged from a different feature branch. My merge request was never approved because the changes were already on master. When it came time to merge one of my requests, it looked like I had made no changes compared to master, because my changes had erroneously already been committed in another branch. It was especially confusing, since change had been made to my original code since then.

This could have been solved if I had blocked other issues from being merged before my issue was merged, which is a feature of GitLab. However, there were also a couple cases where certain features stopped working because code was merged to master without consulting other team members while resolving conflicts. This led to some of the work from my issues being completely erased. Luckily it was easy to add back in thanks to version control, but the extra effort could have easily been prevented. Checking the git diff more carefully while merging would help in this effort.

Branch names were also confusing in a couple cases. “Working_branch” followed by initials is not a useful name, although I understand wanting to signal to your teammates that it is your branch. Appending initials to a name describing the feature would be more useful for everyone. Even better, GitLab has options to prevent modification to a certain branch except for merge requests. You can name the branch after a feature, anyone can help you, and then you can accept changes only if you want. This makes it easier to find a branch when helping your team members with issues.

These problems did teach us about features in GitLab we were missing out on. Our team could improve by following the GitLab workflow and maintaining consistent software development processes. Deviating from this workflow hurts productivity because other members have a certain expectation on how things are being done, and for example, shouldn’t have to check to make sure a past bug fix is still on the master branch.

Despite these issues we got a lot of work done, even though it came down to finishing the night before because of all of our snags. I had a great time with this team and although we’ll no longer be working on this project on a sprint team, I’d like to continue working with any of them who will continue to contribute to this project. We’ve all learned a lot about GitLab and the technologies we’ve used and have adapted well to the new workflow.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Navigation in Android Applications

With the semester coming to a close, I have been taking all of the working pieces I have built over the past few months and began to put them together. This includes not only the Tensorflow code and the server, but also the Android application.

Having showed my current app to a few people to test the machine learning aspect, I was still presenting them with a clunky, ugly user interface with few features and poor-quality images. The user experience and interface is what I purposely saved for last.

The final app will be showcasing and describing digital signal processing techniques, as that was the focus of most of my work. As such, I had to begin setting up a way to navigate through different parts of the app and creating new “Activities” using the different “Fragments” I have created. This has been surprisingly smooth sailing so far. The tough part as been the navigation, because while it’s possible to create buttons that simply open up new pages, Android has principles that should be followed and even a Navigation component that can help define the user flow of the application.

The reason for defining navigation principles is to facilitate a consistent user experience across Android applications. For example, I have personally pressed the “Up” button on an app expecting to be brought to the previous page, but instead was brought to the Android home screen. This is a sign of poorly-maintained Activities, because previously visited pages should be on the backstack. If the Up button brings the user to the homescreen, it means the backstack was cleared at some point of previous activity.

But the user might enter a page using a deep link; going directly to a portion of the app that isn’t the normal entry point. In this case, you will still want the user to be able to return back to the standard “previous” page. To assist with this in Android, you can define a navigation graph in XML that describes how the user moves between pages. This looks very much like a user flow that should be created before implementing the app.

A top-level navigation graph
From https://developer.android.com/guide/navigation/navigation-design-graph

This also allows for nested navigation graphs, so that if one screen leads to 2 or more sections of the app, these can be defined in isolation and reused. In the image above, the “in_game” page is its own navigation graph, which the “match” screen navigates to. If the match page had a “game_options” screen, this could be another defined graph that could be linked to. Furthermore, the game options could be reached from any other pages just by linking to the defined graph. Because once in the game options, or any other portion of the app for that matter, the possible paths a user can take should not change. All in the name of consistent user experience.

With more defined in XML and handled by the Android framework, less care must be taken to monitor and control navigation through the app. While navigation components are not required in Android, it vastly simplifies the process of adhering to Android guidelines.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Running Server Side Code and Serving Up Some Tasty Results

Storing files on a server from a mobile app is nifty trick, but this week for my independent study I began running code on the server to extract audio features and make predictions on a spoken digit.

In its current state, my app allows a user to record an audio file. Once done, the file is uploaded to the server and will extract the audio features and submit it to the machine learning model, which currently predicts a spoken digit. The server then allows the user to look at specific information about the audio file: a graph of the certainty of which digits were spoken, the waveplot, and the MFCC features.

This basic framework allows room for growth in the future. First, I have been taking care to design the app to easily add additional features for the user’s viewing pleasure and plan on adding a spectrogram and FFT this week. Second, the machine learning model is currently trained on MFCC features only, but this can be retrained to work better using other features. And although it currently only guesses spoken digits, additional models can be trained to make a more complex system to analyze different kinds of audio data with different applications.

The biggest issue with what I’ve wanted to do in this project has been finding datasets large enough to train a model. I’d love to extend the features of the machine learning aspect of this app, but unfortunately the amount of work required is way out of scope for a single person in a single semester. Although there are many large human speech datasets, training a model in a supervised manner would require hours of manually labeling the data.

Luckily, I’ve learned enough about signal processing to make that a main aspect of the project. And as I said at the beginning of the semester, my main goal was to gain experience in the Android framework and software development in general. Having to overcome unexpected challenges and find creative ways to approach them has probably been the most important learning experience in this project.

I also continue to be reminded of the importance of knowing the shape of your data and what is actually represents before trying to work with it. MFCC features just aren’t displayed in the same way as a spectrogram or a waveplot, so each of these requires special considerations in plotting and, in the future, training machine learning models with them.

And to finish, I’d like to describe my biggest issue of the week. I had to determine how I wanted to get the data to a user after running server-side code. The naive approach would be to send all the data at once as a response, but not only would this take a long time, but the user might not want it. Instead, I send an HTTP request to get a JSON object of metadata for a given audio recording. This contains all the extracted features with a link to them for download, if desired. Then, the app itself can determine if they should be downloaded. In my case, I currently have an interface that handles the API calls, and passes back each file download link individually in a callback method when the HTTP request is successful. The app displays each link as it is received.

This week I also had to refactor an old project for an assignment and chose my first attempt at a Scrabble game in Python. The contrast between that one and this one was a reminder of the tools I’ve picked up over the past 4 years. I never would have been able to juggle this many different technologies and still understand the architecture without the help of many software engineering concepts.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Capstone Sprint 2 Retrospective

This second sprint brought with it some challenges in moving to online classes for the ongoing epidemic, but with it a stronger grasp on communication and documentation using GitLab and Discord, both out of necessity and intentional effort. We have learned to better work with LibreFoodPantry workflow and are ready to go into our next sprint with our REST API, Database, and Frontend all working in isolation.

My contributions

Create file with definition of done.

Change to default .gitignore file for Spring and remove unnecessary tracked files from before we added the .gitignore,

Research internationalization support and decide that it is better to be saved once a more-final version of the front-end is complete.

Research Angular Testing and create a Spike project that covers most cases we will encounter.

Integrate ID scanner to get Student ID with an Angular component and create tests to have 100% code coverage.

Retrospective

For the first half of the sprint, we were still having weekly meetings to work together. One of our troubles last sprint was that we were discussing things in person and not doing well documenting the reasons for decisions we made. We improved on this even while having in-person meetings. By the second half, although we were all coping with changes brought on by moving to online classes, we did well in keeping each other updated and communicating through GitLab. In hindsight, it’s probably a good experience to be forced to do this. Especially if this epidemic inspires more software companies to promote working from home.

The biggest issue we had as a team was working with merge requests. There were a couple cases where code on a feature branch was not kept up to date with the master branch. As a result, there were a lot of merge conflicts to work together on resolving as a team. Overall, working through these together as a team was a good experience, because this is bound to happen when working in tandem with version control. However, now we will be reminding ourselves to pull changes from origin/master as we are working on our local branches.

We also improved with creating merge requests for each individual features, although this took a few weeks for us to all do efficiently. GitLab has a great feature where you can tightly-bind an issue to a merge request, but this caused a couple of problems for me. When the merge request is accepted, the issue is automatically closed. This messes with our workflow, because we want the issues in the “done” column, only to be closed by the product owner. Moving forward, the issues should be linked with their merge request but we will have to take care that the description doesn’t include a “Closes issue” tag.

Furthermore, when a branch is automatically made in GitLab, it creates a very verbose branch name, which is simply annoying if your Git isn’t configured to autocomplete branch names when pressing “tab”. In the future, I will create a new merge request and manually select my already-created branch. Then I will manually link the issue.

The team’s willingness to quickly meet over Discord about an issue we were having was the best thing about this sprint. In the few cases where something occurred outside of class time that required all of us, we were able to set up a time the same day or the next day and resolve the problem. This flexibility to schedule work within the sprint is what helped us get as much work done as we did.

The next sprint will involve combining our individual pieces into a working product that is capable of storing actual checkout transactions. There is a still a lot to learn and to do, but we are well on our way to finishing a viable product that we are proud of, albeit with much room to grow in the future. We will have to pay close attention in the next sprint to creating well-written documentation as we combine our API, database, and front end so that future developers can easily recreate what we’ve done and get it running.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

When It’s Easier to Just Do Everything [More] Manually

Sometimes doing things the hard way is a lot easier. The more tools you use and the more complicated those tools are, the more complexity you have to deal with. So while it may be nice to call a few simple methods and have a framework do everything for you behind the scenes, you’ll have to to learn how the framework works and maybe realize down the line it can’t do everything you want it to do. There may even be incompatibilities with other parts of your program.

This week in my independent study, I tried to figure out how I could run a machine learning model on Android. I had some success, but quickly discovered some complications. Android has the option of using TensorFlow Lite, which seems great. However, I built my model using Keras, so I needed to convert the model. That was relatively straightforward, but before I started calling my model, I realized that I needed to extract audio features on Android. This required using Python code on Android, particularly Librosa and Numpy. This led me to other potential frameworks to get this to run.

This would lead to a bloated app, so I looked into Google Cloud services and thought about running server-side code there. I already set up a way to upload and download files with Google FireBase, so this seemed reasonable. But this is a paid service and would be even more work to make it functional.

I already have all the code running on my personal machine, so what if I just set up a server with a REST API to upload and download files and run the necessary Python code locally? If I could get that working, it would be trivial to call the code I’m already running.

Getting the server to upload and download files is what I did this week. I used Flask, which makes it very easy to get a basic server up and running. For the time being, data can only be transmitted via WiFi, as there will be uncompressed audio files transmitted back and forth.

While there was some additional work to figure out HTTP requests on Android, already knowing the basic building blocks gives me much more flexibility moving forward. But with great flexibility comes great responsibility, and proper error checking will be an important part of development moving forward. Security measures are also very important to consider before deploying an app to production.

The next iteration will involve running the machine learning code with a REST API call and getting back both the results of the model’s prediction and any data I will need to plot within the app.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Dig Deeper

For my final blog post on apprenticeship patterns, I wanted to discuss my favorite pattern. Software is so pervasive now that anyone can make a working product with little more than superficial knowledge of a language and a framework. This is great motivation to continue, but it may lead one to erroneously believe they are an expert. Finishing a product, even a successful one, doesn’t make you an expert programmer.

Digging deeper is going below surface knowledge of a technology and learning the nitty-gritty bit-y details. The caveat is to not become too specialized. The book warns to keep your perspective of the project as a whole, and only the learn as much detail necessary to help with a given task or problem.

I was originally taught to treat new classes as a black box, and I only found it frustrating once I graduated to more complicated tools. To truly understand how something is meant to work, you have to look inside. Another example: I’ve taken a few introductory classes that used metaphors to explain concepts and/or taught from the top down, adding detail over time. Biology class was boring and difficult because I had to memorize that a blue circle will separate the green, spiked lines so that two red hexagons can copy each of them. It wasn’t until high school, which provided an understanding of underlying chemical reactions, that biology became interesting and easy to remember.

So it is with software. I’ve been exceedingly frustrated with new tools when I tried to play without understanding. Sometimes, it works. Others, when things begin to get confusing, diving in becomes a necessity. Another caveat: you don’t know what you don’t know, and if you assume you’re doing it right, you may be wrong. Even if it works.

This is another pattern that requires balance. Learning details provides diminishing returns over time, but you should mostly understand why you need to do something a certain way, and how it is working. If you can explain this in simple words, you’re probably on the right track. This applies not only to software tools, but work processes as well.

You may not always agree with how a technology was designed. No one will tell you that the modern Internet is a perfect design, because it has been manipulated into working in a world it wasn’t designed for. Created in a world of text, it now works in a world of streaming video across billions of devices. This would never have been achieved without engineers and developers who understood the basic building blocks of the technology. Be like them.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Improving the Spoken Digit Speech Recognition Machine Learning Model

After getting a simple machine learning model to recognize spoken digits, I was able to begin the iterative process of improving the model. Using only MFCC’s, the model was failing more than desirable, reaching a maximum of 60% accuracy when using validation data (my own voice, which was not used in training the model).

Below you will see plots of a sample of results when validating the model. For each digit, there is the extracted MFCC features, the actual spoken digit, the predicted digit by the model, and the certainty. There is also a plot of the certainty for each other digit for that recording.

This is just a sample of a larger validation set, and the actual results in this first model was only 45% accurate. But this shows that for all of these digits except 3 and 5, the model was 99% to 100% certain of the result. The differences in the MFCCs are subtle, but stark differences in color appear to be more likely to be correct, whereas 5 is clearly closer in color to 1, which it was mistake for. Additionally, every single audio clip of 3 was mistaken for a 0 using this model.

Retraining the model with different parameters may help in this case, but we can also hypothesize about the reason for these mistakes. Perhaps the MFCC is finding patterns in vowels that make “zero” and “three” look identical. If that’s the case, features that can detect consonants might help improve results. This sounds pretty obvious anyway, so it might be a good next step on the next iteration.

But first, let’s retrain the model without any changes.

Okay! This 3 was very accurately predicted. But the total accuracy of validation was only 50% (remember, this only shows a sample size of 10). Inspection of actual results now shows that 3 is sometimes mistake for a 2, and vice versa. This model is slightly better, bit still flawed. Which makes sense, because no changes have been made to the model and we just got lucky that it learned to be a bit better this time.

I’ve been training with 25 epochs, and getting 95-97% accuracy during training, and 93-97% accuracy using test data (from the same dataset as the training data, which was not used to train the model). Those results are pretty good, so maybe we can use fewer epochs and prevent some overfitting.

This certainly looks promising. With 95% accuracy during training, and 93.8% accuracy using test data, the results are still pretty good. However, the validation data with my voice is now 57.5% accurate! Only a single 3 was mistaken for a 0.

So I’m using a dataset of 4 voices to train and test, and my own voice to validate. But more data is probably better, so let’s use my voice to train the model and take a random sample to validate.

The plot is looking good! Each of these was very accurately predicted. During training and test data was 97% accurate. The validation data was 100% accurate. Of course, now that the validation data contains all voices that were trained, it’s more likely to be correct. Furthermore, the sample is small. So let’s see what happens if we use a new voice to validate. I had my roommate record himself saying each digit and used only his voice for validation data.

In general, the model is much more certain of its guesses. The final validation result was 80% accuracy, so not perfect but a major improvement. This much of an improvement was gotten just by adding more data and making small modifications to the model.

The importance of collecting data in order to improve a model is apparent. Even with 80% accuracy, there is still some predictive power. If this can be found to be useful, further data can be collected as it is used and this new data can be cleaned and used to train better models.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

A Machine Learning Model That Recognizes Spoken Digits (Introduction)

This week, I managed to prove (to myself, at least) the power of MFCCs in speech recognition. I was quite skeptical that I could get anything to actually recognize speech, despite many sources saying how vital they are to DSP and speech recognition.

A tutorial on Tensorflow I found a couple of months ago sparked the idea: if 2-dimensional images can be represented as a 1-dimensional array and used to train the model, perhaps the same could be done with features extracted from an audio file. After all, the extracted features are nothing but an array of coefficients. So this week, armed with weeks of knowledge of basic Tensorflow and signal processing, I finally tried to get it to work. And of course, many problems arose.

After hours of struggling with mismatches in the shape of the data, waiting for the huge dataset to reload when I made a mistake, and getting no results, I finally put together the last piece of code that made it run correctly, and immediately second-guessed the accuracy of the model (“0.99 out of 100, right???”).

Of course, when training a model, a result this good could be a case of overfitting. And indeed it is, because it is only 95% accurate when using separate test data. And even this percentage isn’t the whole story. The test data comes from the same dataset, which has a lot of recordings of each digit, but using only 4 voices. It’s quite possible that there are patterns found in the voices that would not exist in other voices. This would make it great using a random sample from the original dataset, but possibly useless for someone else. There’s also the problem of noise, which MFCC is strongly affected by. So naturally, I recorded my own voice speaking digits and ran it with the model. Unfortunately, I could only manage approximately 50% accuracy, although it is consistently accurate with digits 0, 1, 2, 4 and 6. Much better than chance, at least!

This is a very simple model, which allows you to extract only MFCCs from an audio recording of a spoken digit (0 through 9) and plug it into the model to get an answer. But MFCCs may not tell the whole story, so the next step will be to use additional extracted features to get this model to perform better. There is also much more tweaking I can do with the model to see if I can obtain better results.

I’d like to step through the actual code next week and describe the steps taken to achieve this result. In the meantime, I have a lot more tweaking and refactoring to do.

I would like to mention a very important concept that I studied this week in the context of DSP: convolution. With the help of Allen Downey’s ThinkDSP and related lecture, I learned a bit more detail on filtering of signals. Convolution is essentially sweeping one signal over another to get a new signal. In DSP, this is used for things such as low-pass filters and adding echo to audio.

Think of an impulse as an instantaneous tone consisting of many (or all) frequencies. If you record this noise in a room, you will get a recording of the “impulse response”. That is, how all of the frequencies are affected by the room over time. The discrete Fourier transform of this response is essentially a filter, because it gives the amplitude of each frequency in the impulse response, including all echos and any muffling. Multiplying these amplitudes by the DFT of an entirely different audio signal will modify each frequency in the exact same way. And thus, to the human hear, this different audio signal will sound like it does in the same room. If this concept is interesting, I encourage you to watch the lecture and work through the examples in the book.

I think these topics may come in handy if I need to pre-process recordings, in the event that noise is in fact causing errors in the above model.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.

Learn How You Fail

All the big CEO’s and entrepreneurs will give you trite advice on their LinkedIn pages: you need to fail a lot in order to succeed a lot.

I can’t be too hard on anyone who wants to get this message out, because I had to hear it at some point too. Failure meant incompetence to me, not part of the path to success. But what if failures don’t stop? The “Learn How You Fail” apprenticeship pattern attempts to solve the problem of lingering failures despite increased success.

This is likely due to something you are repeatedly doing that causes the failures. Hone in on these behaviors, because changing them is the only way to stop failing. Furthermore, you may simply not be good at some things, and finding these things can prevent wasted time and effort.

A potential problem with this pattern is that a lot of failure is likely due to blind spots in our perception of ourselves, so it’s likely not easy to solve the problem of uncovering the cause of our failures. But as software developers, our skills in pattern recognition should be top notch. Carefully documenting your failures and how they connect should reveal the patterns.

A coworker asked me years ago what my chosen superpower would be. I answered that I would want to know everything. Omniscience. Besides briefly earning me the nickname “know-it-all”, this reflects my own desire to fix my unfixable flaws. Now, I do think significant effort should be put into things that you are especially bad at. But this pattern is a good reminder to let go of the things that you’ve tried to do and simply haven’t worked out.

Success is many cases is figuring out what you are good at, not about fixing all of your weaknesses. This is the reasons the US President has a cabinet, Congress has subcommittees, companies have divisions, and software team members have roles. We can’t all do everything.

My biggest issue with this pattern going forward will be striking a balance. There will always be a desire to learn as much as possible. Gaining more skill and broadening horizons may be tempting, but knowing when to give up in a specific area is the challenge. The CEOs who credit their success to past failures knew what to give up on within their chosen field. Almost always, this doesn’t mean choosing a brand new career; it means finding what you’re doing wrong in the current one.

From the blog CS@Worcester – Inquiries and Queries by James Young and used with permission of the author. All other rights reserved by the author.