Why test data management is needed for test automation
Whenever I hear the phrase “test data management”, I think of Hermey the Elf from Rudolph the Red-Nosed Reindeer, who wanted to be a dentist rather than making toys as one of Santa’s helpers.
His catchphrase was “I want to be independent!”
That should also be your goal as you’re writing your tests because one of the main causes of the test automation failures I’ve seen is poor test data management.
Your tests need to be able to run independently to avoid many issues that can arise if those tests make assumptions or rely on certain data being in a known state.
Test need to run against different environments
Too many testers assume that the data their test needs will automatically be in the environment in which their test runs. If you’re like most testers, that usually means your tests need to run on a developer’s machine, in staging, in a CI environment, and so on.
I recently saw this in my own framework during a code review; a team had hardcoded a database key into their test.
The problem is that we need to run our tests in multiple environments, and this particular team’s tests kept failing while running in our CI environment. Of course, their tests would run fine in the team’s development environment. But after taking a closer look at their test code, we noticed that they had hardcoded a table key that was not consistent in all the database backends we against. The solution was to simply just use the unique name of the database id they needed, since those will always be the same, no matter the environment.
Test need to be in a known state
This is one of the most recent test data issues I’ve dealt with, but the more common scenario is that the environment doesn’t have data in the particular state that your test expects.
For example: say you’re testing a medical application and your tests needs to check out a patient. There are some things that need to be in place — and in a certain state — in order for the test to run. Obviously, you need a patient. That patient also needs to be in a checked-in state and have had a procedure performed on them. Then and only then can you check them out. If you assume there will always be a patient in this state, your tests are eventually going to fail.
What happens if another test changes the state of the patient before your test runs? Your test will fail because another test changed the data you were expecting to be available.
Tests should be responsible for managing their own data
Each of your tests should be responsible for managing its own test data, which means that it is critical that each test set up the data it needs in order to run. This will also make your tests independent, in that they can run without having to rely on any other tests.
Running tests in parallel
Another important reason your tests should be able to create and manage their own test data is so you can run tests in parallel. Agile and DevOps demand more automation, and practices like continuous integration and delivery require automated tests that can be run quickly and reliably. So it’s critical that your test suite can be run as fast as possible, and that usually involves running tests simultaneously rather than in a slow sequential method.
Having each of your tests be independent – not relying on any other tests — will allow you to run your them in parallel.
Don’t send your test to the island of misfit toys – Make your test like Hermey
Seb Rose summed it up nicely in episode 53 of TestTalks when he said, “I can’t stress enough how fundamental independent scenarios/tests are to successful automated testing.” I couldn’t agree more.
So remember — you should design your tests to be like Hermey the Elf and be…independent!
Manage Test Data
A great resource on strategies for managing your test data is Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation by Jez Humble. There is a whole chapter called Managing Data that you should check out that I found very helpful.