139: Test Trend Analysis with Hugh McCamphill

5 February 2017, 08:42 AM

By Test Guild

Do you have a hard time of getting historical data for your test runs? Do you have flaky test, but not sure which test are consistently flaky? In this episode Hugh McCamphill a Principle Software Engineer in Test with Liberty IT in Belfast shares with us how you can instrument your test result information to provide actionable data, paving the way for more robust, reliable and timely test results. Listen now to discover how you can move away from the ‘re-run’ culture and better support your continuous integration goals of having quick, reliable, deterministic tests.

About Hugh McCamphill

Hugh McCamphill

Hugh works as a Principal Software Engineer in Test with Liberty IT in Belfast. He has over ten years of experience in testing and automation, ranging from being a technical tester and an automation engineer, to developing automation frameworks to providing training and coaching to help development teams test more effectively. He has been the organizer of the Belfast Selenium Meetup since 2013, and is a co-organizer of TinyTestBash Belfast 2017.

Quotes & Insights from this Test Talk

I'll just make the comment that what I talked about the re-run culture. That was part of the reason why I put this together using the visualization because I wanted people to [inaudible 00:03:18] on the second or third go. I was able to adapt a file that this was a recurring problem across a period of time. You may say that a test passes this time, but the next time and it may pass the second time. If this keeps happening, then obviously it's suggesting that there's something wrong, and a question … A video, and a blog from Scott of Watermelon, I think, where they had an issue. They were in a continuous delivery environment. If the tests all went green, then they pushed to production, but they were set up to re-run the tests. In this particular instance, test failed first time. We ran the set of tests that failed, went green, okay push to production. What they found was that whenever the users start to use it, the customer started losing sessions or crashing and then they went in and had a closer look and found an obscure cash in bug. Basically, it was something that was inherently wrong with the application, and the tests were highlighting it, but as Scott says, people hadn't really [inaudible 00:04:52] enough. The signal was there to say that something was wrong, but basically they were ignoring it.
Partly based on a Tweet from Noah Sussman around storing unit test results. Over time in Graphite, and then being able to visualize those unit test results from a time perspective. That combined with having discussions with some other folks within the company around being able to track some of this information over time. I thought we could do something similar except with our UI Tests, with our UI automation.
So, I looked around a little. I came across the ELK Stack. It was Elasticsearch, Kibana, that seemed to be quite a good fit. People liked its visualization capabilities and parsing capabilities once you have the data in there. But in terms of those tools, Elasticsearch is a time series database, so it would allow you stored in a database associated with a timestamp. Part of the tooling under the same company is another tool called Kibana which acts as the visualization for Elasticsearch, so they integrate really well. As I mentioned in the talk, there are a lot of tools that would do something similar. I just went for this because it was something that was open source. I could get set up without much overhead or cost, but there are plenty of other tools that would do the same thing. I know I've spoken to other people who are doing something similar, actually with Graphite, for their UI Tests.
I think then when you get into top slowest tests and top failing tests, I could visually see that there was outliers. So on a suite of tests I would have reasonable expectations that most of them, based on that particular application, most of them should roughly be the same time. There was a couple of outliers on this particular application that suggests that there was going to be something more to why these particular tests were slower than all the others. Part of the challenge is that we can do the visualization, but then obviously we need to go and actually investigate what those issues are, but without the visualization it's much harder to see where those issues might be in the first place.
I think at the very least, watch your slowest tests over time and watch your top failing tests over time and then combined with that other graph, which is possibly a bit easier to get, which is: Which tests are failing for the same reason. That way I think you can start to prioritize and deal with those first. And I think that we're not going to solve all the problems in one go, but I think if you can prioritize. Okay, these groups of tests are failing the most, okay, let's deal with those or these groups of tests are the slowest. That suggests that we need to dig in further. The other, I guess, point that maybe wasn't emphasized was that this information is being captured over time. It's not like we're looking at an individual run and drawing conclusions. We should be able to see those patterns over time, so if this test is consistently slow, or maybe there was just a single run where there was just a weird operation connecting to source labs or something, but by capturing it over time we should be able to filter those out.
It goes back to my comment earlier about complex type Elements. If you're not dealing with a standard Select Element, what's the code that's been written to interact with that, to click on the button to the side, to find the particular DIV within that nested Element that corresponds to the value that you want to select from the DropDown. How do all those micro-interactions synchronize with each other in a way that's still quick, but still robust at the same time. That can be hard balance to get. I know, I've been there. It gets awfully tempting to start putting slips in there and as I said before, I'm quite bullish about making sure those things are kept to an absolute minimum. So this is more general, but I think it's something that people possibly don't do enough. Even if you are not interested in committing to the Selenium project, there is so much value in going in and getting familiar with the underlying code because there's an awful lot of patterns. I think that if you are trying to build some functionality on top of Selenium, personally I think there's value in trying to maintain some of those patterns for consistency, but yeah, just generally a lot of good information about how some of the functionality works behind the scenes is contained right in the source code.

Resources

Connect with Hugh McCamphill

Twitter: @hughleo01
LinkedIn: Hugh McCamphill LinkedIn Profile

May I Ask You For a Favor?

Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page.

Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

SponsoredBySauceLabs

Test Talks is sponsored by the fantastic folks at Sauce Labs. Try it for free today!