This week, I read a blog post called “Netflix App Testing at Scale” which is based on an interview with Ken Yee, a Senior Engineer at Netflix. It takes a look at how Netflix tests their Android app, which is one of the most widely used streaming apps in the world. With over a million lines of code, 400+ modules, and support for all kinds of devices (including foldables and Android Go phones), testing at Netflix isn’t just about making sure the app works—it’s about making sure it works everywhere. I chose this article because we’ve been covering testing frameworks and strategies in class, and this felt like the real-world version of everything we’ve been learning. I also use Netflix a lot, it is interesting to learn how they keep it running smoothly through so many updates and features. This blog helped me connect the theory from class to an actual large-scale product.
Netflix used to have a separate team of SDETs (Software Development Engineers in Test), but now every feature team handles their own testing. That includes unit tests, screenshot tests, and end-to-end tests. They still have two SDETs who help across teams, but quality is everyone’s job now. I thought that was cool—it encourages developers to think about testing earlier and more often, rather than just tossing it over to QA at the end. They also go into the frameworks they use. For unit tests, they use tools like Strikt (for fluent assertions), Turbine (to help with Kotlin Flows), and Mockito (for mocks). They also use Hilt for dependency injection and Robolectric when they need to test Android-specific logic. What stood out to me was how conscious they are of performance—each layer of test framework (plain unit → Hilt → Robolectric → device tests) adds more time, so they encourage developers to keep tests as fast and simple as possible. That’s a great tip I’ll definitely remember for my own projects. I also learned a lot from their section on flakiness. I hadn’t realized how much those little issues could mess up tests—and how fixing them makes everything more reliable. Finally, Netflix uses screenshot testing heavily. They use Paparazzi for Jetpack Compose UI, localization testing for checking designs across different languages, and even visual accessibility checks. It is interesting to find out that they care about accessibility and localization.
This blog gave me a better understanding of how layered and thoughtful good testing needs to be—especially at scale. I’ll definitely use what I learned about speed, flakiness, and strategy in my future development work.
From the blog cameronbaron.wordpress.com by cameronbaron and used with permission of the author. All other rights reserved by the author.