What is stability testing?

It’s no secret that Android as a software platform has something of a quality problem. A recent study by Google on Play Store revealed that most one star reviews are caused by stability, stability or performance issues – namely crashes or freezes.

In the same go last year Google also announced that they would start to reduce visibility of poor performing apps in the Play Store based on data provided by their machine learning algorithms.

Image source: Google I/O 2017 presentation

As part of the Android Vitals initiative Google has listed common causes for poor performance:

  • App not responding
  • Crashes
  • Slow rendering (refreshing under 60 frames per second)
  • Frozen frames (a frame rendering that takes longer than 0,7 seconds)
  • Stuck partial wake locks
  • Excessive wakeups
  • Excessive background Wi-Fi scans
  • Excessive background network usage

With a closer look you’ll notice that all of these bad behaviors can be categorized as either stability, performance or battery consumption issues. The good news is that with proper testing it’s possible to catch all of these problems already in development.

How this applies to OEMs and Carriers

Android OEMs and Carriers that supply large amount of software to the ecosystem should be aware of these potential issues as well. After all, the Android OS consists also of many, many system applications that are critical to user experience like the launcher and call applications.

While system apps are not open to ranking in Play Store as 3rd party apps in a way bad behavior in these apps hurts user experience more than in 3rd party applications, as the system apps are used more often or are more difficult or impossible to restart to remedy the problems. Particularly severe issues may even cause customers to return their devices to stores.

The above chart shows all crashes, resets and freezes in system applications that were visible to the end user. The “score” in the start starts at 100 and for each reset 5 points are deducted and for each application crash or freeze 2 points are deducted. While this is of course an imperfect heuristic, it does at least provide a method to benchmark how different devices perform in the long run.

Functional testing is not enough

Functional testing is what we usually understand as “typical” testing. A developer makes a change to code and a test set is run through to make sure that the change did not break anything vital. Ideally you would like to keep the amount of tests low so that the time to feedback is short.

In stability testing the tests are repeated for very long periods of time – think hours or days – to see how a product would really behave in the long run. Here we are less interested in individual test case failures and instead focus on finding the really scary problems that won’t manifest themselves in brief checks – crashes, degrading performance and increased battery drain.

Typically most of the effort for testing in a software development project is geared towards shallow functional testing while little emphasis is placed on long term testing that would give a better understanding on how the product really performs over time.

However, these two types of testing are complementary – functional testing to find the easy problems and stability testing to root out the difficult to find bugs that would eventually have drastic consequences.

If you consider this from the point of view that testing is done to make sure that things work, then functional testing ensures that things work for the 1st time and stability testing makes sure that they will work on the 100th iteration just the same.

Selecting a test automation tool for stability testing

As a general rule if a tool can be used for stability testing it can be used for functional testing as well. Testing for stability does pose some extra requirements for a testing tool though

First, the tool itself should be robust enough and be able to scale in terms of testing time. Aside from actually being able to run tests for 100+ hours, the tool should also automatically recover from any kind of connection issues that might happen.

Second, the tool should provide as much data as possible on any found bugs to support the developers that have to fix the bugs. Logs or profiling data does not amount to much in functional testing where bugs can be typically easily and quickly reproduced, but in stability testing we’re hunting for bugs that might take a while or specific circumstances to happen. Arming your developers with as much data as possible at and before a bug happened can greatly reduce the time and effort it takes to fix a bug.

There are open source test automation frameworks and commercial test automation tools for Android with graphical user interfaces, but they rarely have much logging or profiling capabilities suitable for long term testing.

Profilence’s Tau tool was designed with these two requirements in mind and provides a good solution for Android OEMs, carriers and enterprise app developers alike who need to ensure their products are of high quality before launch.

If you’re developing an app that’s going to be published the Google Play store then you can take advantage of the Play Console that’s part of the Google’s aforementioned Android Vitals initiative. The web service aggregates data on crashes, freezes and poor performances from your end users that have opted in for the service. You won’t have much data or reproducible scenarios for the bugs, but at least you’ll have a better understanding of what kind of problems your end users are seeing.