105: Data Strategies in Testing with Paul Merrill

5 June 2016, 08:57 AM

By Test Guild

On today’s show we’ll be test talking about data strategies in testing with Paul Merrill, Principle Software Development Engineer in Test and Found at Beaufort Fairmont, a consulting company dedicated to ridding the world of bad code.

Many of the testers I know have a testing or automation framework strategy, but most seem to overlook having a data strategy in place. It doesn’t matter what automation framework you’re using; if you haven’t planned how you will manage the data your tests need in order to interact with the system(s) under test, you’re in trouble.

A data strategy is a combination of procedure and infrastructure that affects the way tests interact with data to simulate your systems under test. That’s what today’s show is all about. Check it out.

About Paul

Paul Merrill Headshot
Paul Merrill is Principal Software Engineer in Test and founder of Beaufort Fairmont Automated Testing Services. Over the last 16 years Paul has lead, managed and implemented software and testing solutions for numerous projects and industries.

Paul works with Beaufort Fairmont’s clients every day to implement automated testing solutions that will “rid the world of bad code”.

Paul co-hosts “Reflection as a Service”, a podcast about Software Development, Automated Testing and Entrepreneurism.

Quotes & Insights from this Test Talk on Data Strategies in Testing

The genesis of my data strategies in testing was that I was trying to get down into is why is that we always have to deal with this and how do we deal with isolating test cases in such a way that they can run in parallel, that they can run on multiple different machines, but still interact with shared resources. The key to it really was that shared resource. The first one that I’m trying to overcome is the idea of data strategies and dealing with data sources.
Refresh data source, it basically means we’re going to go in and refresh the data source prior to or after we run a set of test cases. It’s not a particularly ingenious approach. It’s one that many of us have used before. You can run that refresh against the data source directly or you can run it through the system under test.
Each client that we walk into this is one of the questions that we ask when we start talking about automated testing is how do you manage your data? What is that you do in your test environments to get good data to test on and what is the procedure to do it? If there’s a blank look on people’s face, which many times there is, then we have to do some education and talk through this. That’s another part of where this presentation and some writing about it helps. It’s very important when you’re doing this work to be able to formalize your thinking in order to help other people understand because that’s really all I’m trying to do with my company and whatever else is help other people do this more effectively. Thats the main reason I feel more folks should look into these data strategies in testing.
Basically, what I decided to do was each test case would create a data that it needed which is not novel. Plenty of people do that. In fact, it was creating unique data each time a test case ran and there was no cleanup whatsoever. If the data’s unique each time and you can keep that data in memory in order to go back and verify it the only need for cleanup is caused by whatever your constraints in your environment are. If, for instance, your constraints are we only have a file system so big that might be one that we need to pay attention to.
One tool that I love in C# and in some other languages, I think it’s available in Java and there’s something similar to it in Javascript as well is a tool called Faker and it’ll create things like names or phone numbers or addresses or whatever and make them look very real, company names, all that kind of stuff. It makes them look like real data and it’ll generate something randomly for you each time you use it. Even you can use it with numbers and things like that. Any way you generate data it’s up to you with each of these strategies, but data generation and batch cleanup basically you’re just doing a batch cleanup at the end. You know what data you’ve created and you go in and clear it out after an entire test suite runs.
My once piece of actionable advice for data strategies in testing is determine the goals of your team with regard to test automation. Understand the surroundings in your environment and the forces that are acting on you. Then make a determination and make sure about what to do and go forward from there. I think so often we spend time just looking at technologies and tools to solve problems, but there’s so much more going on. Until you look at the entire scene to understand all the details and all the nuances going on with the people around you and the policy and whatever, the decision you make is usually inaccurate until you do that.

Resources

Pauls Workshop at TISQA 20141: Integration Testing: Why, When, and How?
Data Stategies in Testing Slide Deck
Gang of Four’s book, Design Patterns
Faker – This library is a port of Ruby's stympy/faker gem (as well as Perl's Data::Faker library) that generates fake data.

Connect with Paul

May I Ask You For a Favor?

Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page.

Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

SponsoredBySauceLabs

Test Talks is sponsored by the fantastic folks at Sauce Labs. Try it for free today!