What is TestOps (How Microsoft Does it)

Automation Testing Published on:
TestOps at Scale Automation Feature

After reading the title of this article, you might be thinking, “Oh no, not another Ops!”

You’ve probably heard of DevOps, DevSecOps, AiOps, BizOps, DataOps, ValueOps, etc., but how about TestOps?

I learned all about it in a recent interview I had with Oren Rubin, the founder of Testim, and Maor Frankel, an engineer at Microsoft.

I know what you’re thinking. It’s just another marketing term.

But this one isn’t just a buzzword.

Large enterprises like Microsoft have successfully implemented TestOps to scale their automation testing efforts.

Read on to find out how.

What is TestOps?

In a nutshell, TestOps is all about automation at scale.

How can you continue to increase your automation test suite size and maintain your tests while not decreasing the speed of your testing efforts?

Getting all your automated tests to run correctly may not seem like an issue for a simple application with a small team, but what happens when you have a large number of people working on your application and codebase? And at the same time, the size of your application keeps growing as well?

Hopefully, before you get to this point you've thought about how you’ll handle that amount of scale and manage the number of developers working on the same code simultaneously.

On top of all this, many companies are pushing engineers to release software quicker and with higher quality.

Fast Robot

Faster Release Cycles

With many firms moving towards continuous integration, continuous testing, and delivery, the need to increase the number of automated tests developed sprint after sprint has grown.

I know that since I began my testing career almost 25 years ago, web applications have been getting larger and larger.

The number of hands that handle testing applications is increasing, so it makes sense that we’d need tools to manage all these changes.

The Guild Wants YOU

Joe Super Hero Point
If you've read this far, you know how important it is to stay updated with the latest in test automation, tooling, processes, and practices. One of the best ways is to become a TestGuild member. Join now at no cost!

The Need for Test Speed

The speed at which teams are releasing software these days is staggering.

Oren mentioned that many companies want to be able to deploy continuously, which basically
means that they want to release with a click of a button.

The developer changes a few lines of code with one click to build and test everything, and deploy it.

Maor mentioned that at Microsoft, they release versions to production every few minutes, which means that a new version in production has to be tested continuously to ensure software quality.
And that's when you must think about automation at scale differently.

Enter the need for TestOps CI.

Performance Test Tool

The Four Features of TestOps

TestOps is the process that enables your teams to scale your large automation test suites across your organization, leveraging four core levers:

  • Planning
  • Control
  • Management
  • Insights

1. Planning

The key to testing in a CI/CD pipeline is to start testing before you get to a release.

And to do that you need to plan your test-related activities ahead of time.

You’d think this would be obvious, but you'd be surprised how often I talk to folks who tell me they didn't do any planning for their automation efforts.

Some basic questions to start asking in the planning phase are:

  • How can I plan who's doing what?
  • Do we have test automation SDETs to help?
  • Does your team have the technical skills needed?
  • How do we assign testing tasks to new features that need to be tested, and who should handle this?
  • Are there already reusable components that we can use, and/or which one do we need to build?

During the planning phase, Maor believes your teams should first determine whether your test should be a unit test or an end-to-end test.

Remember the sooner and smaller the better!

For instance, when you want to test a particular component, it would usually be a unit test.

If you want to test the integration between two components, usually a page or a feature, that would be a UI test without, for instance, a back end using mock data.

You might want to look into API Testing first, as well.

And when you want to test the integration between the front end and the back then you probably have just a few full-blown end-to-end scripts that test the whole scenario.

It actually gets much more complicated than that, but that's the approach the team at Microsoft tries to follow.

Also, planning doesn't just mean building new stuff.

It also encompasses knowing how to answer critical questions, like:

  • What are the tests we have technical data on?
  • Which tests are flaky and need to be fixed?
  • How do we ensure that the test is really flaky and not a real bug?
  • Which tests should be quarantined because of a specific, known bug that was recently introduced?

You need to plan and manage these types of situations to ensure everyone knows what's going on, and once all your team members are all on the same page, you need to find a way to control all your testing activities.

2. Control

Control enables the whole organization—not just testers—to own testing.

Whole-team ownership of testing is key to TestOps.

With it comes the need for control of your entire testing process, including unit testing, functional testing, performance testing, etc.

The word control often has a negative connotation, but in this context, it's a good thing.

You need to ensure that each contributor is also creating high-quality tests and code.

Giving that intention, how can you create higher standards?

Code Reviews

Requiring code and test reviews can help ensure that nobody is merging directly to the main branch without checks.

Enforce reviews and scan people's tests for code duplications by asking, “Are you sure you want to add that instead of reusing the same logging method that somebody else used?”

Maor mentioned that at Microsoft, they use checklists their teams have to go over after they’ve written and completed a feature.

Controlling, Not Delaying, the SDLC

Controlling the SDLC is a balancing act because you don't want to get to a point where you become a roadblock.

Generally, your teams should be allowed to manage themselves.

Allow them to review their own tests, code, and changes, and give them the freedom to decide what's essential for them.

That's the way they do it at Microsoft.

But to succeed with control, you should also have a solid management system in place.

Ai Testing Bots and Humans

3. Management

The third piece of TestOps is management.

Writing a testing framework for a team of five or ten people where everybody can communicate and manage themselves is one thing.

But doing it for dozens, sometimes hundreds of developers who can’t necessarily communicate with each other all the time is a challenge.

That's an entirely different thing to manage.

Making things more complicated, is the shift-left testing earlier in the software development process.

Remember: when we talk about scale, it isn’t just the number of tests but also the number of times you're running them.

You're running them earlier and earlier.

You find the bugs much faster.

Management is a critical piece when you’re trying to scale, because you want to catch any bugs that you’ve introduced as soon as possible.

That must happen before you commit your code.

That means you have to start thinking about how to manage that, and how to move things to an earlier place in that process.

Selenium is not enough to scale automation

Selenium and Appium are outstanding APIs for writing tests, but they don't give you that whole infrastructure to manage and run your tests.

Maor said that's where Testim helped them.

Their application was written very quickly and didn't have automation from day one.

They were looking for something that would offer them the entire suite of testing tools they needed—Selenium grids, test management, user management, etc.

Testim gave them most of what they needed without them having to try to develop an in-house solution.

Using a vendor tool saves them lots of time by giving them all the test management stuff they needed out of the box.

That's a big part of automation today; it is not just writing the tests; it's maintaining the whole process.

Test Ownership

The concept of test ownership is critical to effective management.

If a test fails, who owns it?

Having a system that automatically tracks who to contact when a specific test fails can be a huge time saver.

Being able to go directly to the individual who wrote a test to discuss possible errors and how to fix them is extremely helpful.

Maor said that using pull requests is also a big help for the folks at Microsoft.

A team member who’s not familiar with how to write tests can ask somebody to look at their test and review it before committing it.

Getting test feedback before rather than later has been a significant improvement for the quality of tests running in production.

Testim also employs a concept of branches where developers can make their changes separately instead of putting them in the master of the test.

This reduces the noise of false-positive test failures, ensuring a developer doesn't commit until they are 100% finished with what they were working on.

Another helpful feature is the concept of quarantine and evaluating testing.

Evaluating is helpful because new tests generally take some time to become stable.

Testim allows you to evaluate tests by having a known failing test run without failing the whole suite.

You can also quickly put the test in evaluation mode, let it run, and gain confidence with it over time before officially adding it to your test suite.

You can also quarantine a test.

Say, for example, a test has a problem or is failing due to a known bug the team is working on.
You can tag it as quarantined until someone fixes it.

The key is not having CI continuously failing.

These techniques will build confidence in your test and gain your team’s trust that it's a real issue that needs immediate attention when CI fails.

The last piece of TestOps is insights.

Test Management Machine Learning Robot

4. Insights

The whole point of continuous integration and continuous delivery is to act on feedback produced by delivering software to your customer quickly.

The way to do this is by using insights.

As automation engineers, we commonly talk about flaky automation.

One critical insight you should always be aware of is to have the ability to say, “Show me all my flaky tests.”

Complexity quickly occurs when you have multiple “flaky” tests per run, and you’re spending most of your time trying to triage random failures.

Artificial Intelligence in Testing

AI insights can help.

AI can give you insights when troubleshooting failing tests by informing you that a bunch of your tests failed for the same reason. It can also alert you to the exact step in your application that is causing the issue.
This saves you an incredible amount of time. You don’t need to go over all the bad tests—just one.

If you’re able to view the history of a failed test, you’ll more than likely see that there’s a past bug report associated with it.

If someone’s already investigated it, you don't need to address that one issue, one test, because you can see the reports from previous runs.

That's a big help as well.

Test Automation Code Duplication

Having some insight into code duplication will also help you to scale your automation more efficiently.

But how do you guide your teams to use already-existing code?

If there are just ten tests and ten reusable components, it’s not going to be an issue.

If you're talking about a massive application with thousands of tests, however, and thousands of reusable steps that you can reuse, how do you find them?

As you know, it's hard to scan thousands of tests manually.

Machine learning can make it easier to scan a thousand tests and find insights.

It can scan all the steps in your test and identify existing identical code that is already part of the framework.

Even better, it can alert you to the existing function you should use instead of you checking in duplicate code.

How cool would it be, before writing any code, to be notified that someone has already written a login step and promoted you to see if you wanted to use the existing function?

As they say, more code, more headaches.

So less code, fewer headaches, and insights can help in this area.

How to start TestOps

Put TestOps Into Place

As we’ve seen, the speed at which you're expected to release software will only increase.

You’ll need a modern approach/process in place to help with quality assurance.

Getting a good TestOps center in place is a great starting point.

Also, give a tool like Testim a POC to see if it can help scale your automation efforts.

TestOps at Scale Automation Feature