About This Episode:
As testers, we need to change with the times. What may have made sense to automate ten years ago may not still apply today. In this episode, Gojko Adžić will share five universal rules for test automation. Discover why the test pyramid may no longer apply to modern test automation, and learn about some newer approaches to testing. Listen up and find out how to automate non-deterministic tests.
The Test Guild Automation Podcast is sponsored by the fantastic folks at Sauce Labs. Try it for free today!
About Gojko Adzic
Connect with Gojko Adzic
Full Transcript Gojko Adzic
Intro:[00:00:02] Welcome to the Test Guild Automation podcast, where we all get together to learn more about automation and software testing, with your host, Joe Colantonio.
Joe Colantonio:[00:00:16] Hey, it's Joe, and welcome to another episode of the TestGuild Automation podcast. Today, we'll be talking with Gojko Adzic all about Your testing what? And all kinds of topics around testing and automation. If you don't know, Gojko is a world-renowned consultant, author, and speaker. Some of my favorite books by him, I think it's called 50 Quick Ideas to Improve Your Test, Specification by Example and Running Serverless, Introduction to AWS LAMDA, and Serverless Application models. Gojko is also a frequent speaker, you have probably seen him at multiple software development testing conferences, and is one of the creators of some of the cool software solutions I've been seeing lately called MindMup and Narakeet I think we're going to dive into a little today as well. So you don't want to miss this episode. We're going to be talking about a topic that Gojko expanded on to this year's Robocon, You're testing What? where he presented five universal rules for test automation that will help you bring Continuous Integration testing to the darkest corners of your system. You don't want to miss this episode. Check it out.
Joe Colantonio: [00:01:14] The Test Guild Automation podcast is sponsored by the fantastic folks at SauceLabs, the cloud-based test platform helps ensure your favorite mobile apps and websites work flawlessly in every browser, operating system, and device. Get a free trial, visit TestGuild.com/SauceLabs and click on the Exclusive Sponsor section to try it for free for 14 days. Check it out.
Joe Colantonio: [00:01:42] Hey, Gojko, welcome back to the Guild.
Gojko Adzic: [00:01:46] Hello.
Joe Colantonio: [00:01:46] Awesome, awesome to have you back on the show. It's been I don't know, it's been years, I think since you've been on here. So it's great to have you. Gojko, is there anything I missed in your bio that you want the Guild to know more about?
Gojko Adzic: [00:01:55] No, no, that's perfectly fine. I kind of hate boring people with my bio anyway.
Joe Colantonio: [00:01:59] It's a killer bio. I'd be bragging about it everywhere. So it's, it's really great to have you. You are an expert. As I mentioned, this talk is going to be kind of around some point you brought up at this year's Robocon. But the first thing I was really struck by is when you started off this presentation this year, you mentioned how there's been kind of a shift in testing, like kind of almost a paradigm shift from where we used to use the test pyramid for a deterministic type testing to a more non-deterministic type of testing. So I just wonder if you can maybe start off with, that point of maybe how, maybe testing has changed since we may have started to where we are now as testers?
Gojko Adzic: [00:02:33] I think what people usually do now and what a lot of people do with their applications and how they design them is fundamentally changed in the last 20 years, which is not surprising because, you know, technology evolves all the time and where, you know, 20 years ago maybe a very small portion of people depending heavily on, in runtime on other companies. And today, it's very, very rare to see a software package that, you know, is an isolated island on its own and doesn't work with anything on the Internet. And as soon as we're using services and the Internet, we start depending a lot on changes that other companies might do without our involvement at all. And one episode I kind of really experienced to help nail that in my head was when our system pretty much completely went down as a result of somebody at Google changing the sequence of events they sent out when we were connecting to the real-time Google Drive API a few years ago. And, you know, this happened without any real involvement or any of our deployments. I was actually on a flight when that change happened in runtime and I landed, I had a very short connection and my phone was just overwhelmed with monitoring panic emails from our monitoring system, explaining that, you know, users are having all these problems in the front end. And it took us a while to realize that basically there was a bug that happened after we tested, deployed, and haven't retouched that piece of software for a long time. So I think what I was getting at is the testing parameter and things like that were generally amazing for an age where everything is under your control. In an age where major components of risk happen after deployment and happen without any influence or control from the people working on a specific piece of software, we need to be able to do these things much more effectively. And this is where testing small, isolated units of code and focusing on the speed of isolated testing doesn't really bring as many benefits because these things only happen when it's highly integrated and deployed in production.
Joe Colantonio: [00:04:59] So I guess what's scary is it was hard enough to do when it was deterministic, but now that it's non-deterministic, how do you know what's risky then? Is it always the areas that are out of your control that maybe you should start focusing on when you're doing the testing?
Gojko Adzic: [00:05:11] I think, you know, all the good practices we had looking at deterministic stuff are still there. You know, those risks are still there. It's just that there are some new types of risks and additional types of risks. And I think one of the things that are really interesting for me is to see how these trends of observability in testing and production are kind of meeting with traditional testing techniques and how people that are in a quality role, you know, now we can start throwing buzzwords around whether it's quality control quality assurance, or is it reliability, is it what, but these things can start really merging and lots of interesting overlapping concepts start appearing between the whole site reliability, engineering and testability, and observability. And I think that's a very interesting space. We'll see lots of practices, kind of cross-pollinating these communities. And I think, you know, looking at production risks and looking at the observability risk, there's a whole, you know, industry emerging there. But now we have from that segment of the industry concepts like the reliability kind of budget and how long a system can go down and what kind of risks you can take, what kind of risks you must derisk. And I think that will start informing how we do software testing and it's already informing how we do software testing in a different way.
Joe Colantonio: [00:06:39] Absolutely. And I think that was your first rule. Of the five rules that you presented was a test where the risks are and you did mention observability. I'm wondering, is that why I've been hearing more about SRE type of approaches now to testing as we are? Back in the day when I started, you'd never touch production, but now it's like you have to test in production. So observability chaos engineering, those types of approaches are going to become more common?
Gojko Adzic: [00:06:59] Absolutely. I think you mentioned testing production. I remember a while ago I was at a testing conference and there was a running joke about some company there that they are testing their software in production. And that was like a wholly derogatory term. Like you couldn't offend anybody more than by suggesting that, you know, they are testing their software in production where, you know, testing software in production is a reality for many teams today. And if it's actually done right, quite a liberating way of not having to spend a lot of time looking for some stuff that, you know, you can't even derisk before going into production. For example, the whole idea of canary deployment's that is coming from the kind of chaos engineering and things like that is, well, you know, launched two versions of your app and give five percent of your users access to one version, 95 percent of the other. Watch if some really weird crap is happening. And if not, keep increasing that percentage. And, you know, in some sense, this allows you to catch things that you were not even able to catch before deployment because you don't get real user traffic. You can get some assumptions and things like that. On the other hand, it's also kind of liberating if you do small changes. For example, performance testing and load testing was usually a big problem on anything that's not the production environment because people would set up a staging environment, but they never wanted to pay as much money. I worked with a big TV station a while ago and they had this massively expensive storage in the production environment. They would never pay something like that for the testing environment and their production environment, therefore was, you know, I/O-Bound but their production environment was CPU-Bound. But their testing environment was I/O-Bound because the storage was slow and they had these all wonderful performance tests that they would run, but they were never relevant because it's a completely different bottleneck. With canary deployments, you don't actually have to have, you know, a separate copy of production or something and you have a single production if you are doing relatively low-risk releases where performance is not going to go up or down horribly in a single release, then monitoring for that in production is several orders of magnitude cheaper than, you know, having a separate testing environment is just there for running your performance tests once every six months, and it opens up a whole other kind of set of practices. And this is where, again, site reliability engineering, that's kind of Google's, you know, a term for DevOps, because they all, of course, they have to invent their own terms for everything, comes in. But and there are lots and lots of really good practices that we can take, combining the traditional testing approaches with DevOps and chaos engineering and things like that, because at the end of the day, they're all about creating a high-quality product and managing your risk. You're never going to eliminate the risk completely. And people who think that testing is about eliminating risk are deluding themselves. But we can do really good stuff to derisk something or reduce the risk or reduce uncertainty. And all of these practices are about reducing uncertainty.
Joe Colantonio: [00:10:20] Absolutely. No, I don't know if this ties in to rule two with when you're tested in production, especially performance testing. Again, fast feedback rule 2 was Test should help us change the code faster. What did you mean by that rule?
Gojko Adzic: [00:10:32] I myself have several times fallen into this trap where we build wonderful, wonderful tests. But then those tests are kind of like cement. They, they are so hard to change that they are preventing us from changing the code. And there are lots of ways people fall into this trap. Usually when they are trying to test something really complex and don't think about decoupling or remodeling it, sometimes it's because the APIs are such that you can't kind of easily test something. And we build these frameworks that help us derisk the software, but slow down how quickly we can change the software in a horribly mistaken case of priorities. I've remembered a team I worked with where they had something like 7000 tests on, you know, testing the user interface. And then when they wanted to change a couple of things into website design, that broke half of those tests because they were all very, very tightly coupled to DOM paths and identifiers and things like that. And at this point, you know, you really have this conflict between maintaining the test suite and improving the software, and really there should not be that kind of conflict, because if there is the test, we will always lose and people will disable failed tests and forget to update them and delete them. And I think my rule number two that I presented at Robocon, was really that tests should help us facilitate the change. This is inspired by a quote, I don't know where David got the quote from, but I learned about that quote from my colleague David Evans, as there's this joke about what's the purpose of the brakes on a train. And usually people answer that, well, the purpose of the brakes are, you know, to stop the train when it's moving. But no, you know, a train, if you stop powering it will stop at some point, it just it might crash. So the purpose of the brakes is there to help stop the train in a reliable way. But really, if you think about why the brakes are there if you didn't have brakes, then your maximum speed would be one where you can safely stop the train on its own, which wouldn't be that much. The fact that we have brakes allows us to drive trains at very high speeds because we can reliably stop when we need to stop. And I think from that perspective, tests are very similar to that. And the automated test, in particular, the purpose of automated tests should be that we can move development at a much faster speed because they should provide confidence that we can stop when we need to stop. But if they are slowing down, you know, if they're slowing down, how fast the developing codes, then those brakes are on all the time that we're driving with the handbrake. And that's kind of not that good. So we need to design a test, we need to design automated tests in particular. This is not such a problem with manual tests because you can adjust them. But automated tests have to be designed so that they can be changed easily. They don't slow down development. If they start slowing down development, then they're going to lose.
Joe Colantonio: [00:13:48] Absolutely. And like I said, this is based on a keynote you made at Robocon, can I have a link to it in the show notes so people can actually watch it there? But as I was moderating this during the event, I wrote down some notes and one of the notes I wrote during rule two was something about appraisal, visual approval testing. I have heard of visual validation testing. I don't know if it was the same thing. So where does visual approval testing coming and what is visual approval testing should be the first question?
Gojko Adzic: [00:14:10] Approval testing is a relatively unknown testing technique. I think people should know about that more. But basically, approval testing is a way of automating the tests where the test automation system doesn't know if the difference in test results is good or bad, but it knows how to measure the difference between an old baseline and what the system does now, and to present the difference to humans who can make a decision very quickly and with approval testing, if the change was good, then a human can just say, I approve and the new state becomes the baseline for the next test. And approval testing is incredibly useful for situations where it's difficult to describe an expectation upfront but you kind of know if it's good or bad once you see it. And it's also good to catch these unknowns that people can't even predict. Traditional unit testing, for example, automation based on fixed expectations, is looking more for expected outcomes. And if it's not looking for something specific, it will not notice it and approval of testing helps you spot if something else is there, something you might have not been expecting. And the reason why it was kind of mentioning that in the keynote is for Mindmup we had lots and lots and lots of tests that we're trying to inspect visuals from a kind of this is what I expect perspective and visuals are really tricky to pin down like that because the fact that you're looking at some specific elements somewhere and that's okay doesn't mean that there's not something really ugly on that page as well, or that something else is covering some other element and things like that. So visual stuff is really something where the machines are not that good at telling you if the outcome is okay or not but they're really, really good at saying there's a difference here. I don't think the difference is what you wanted or not but here's a difference. And this is where instead of these really difficult, hardcoded tests, we moved to basically automating 99% of the testing process, leaving one percent that really required human opinion for the humans. And we realized that every time we have major change we want to do the breaks, a lot of these, kind of expected tests, what we end up doing is we run some example documents to exporters through kind of the visual rendering. And then we look at them and say, well, you know, this is good or bad. Then we decided, let's just automate that part but we might not be able to automate it entirely, but let's automate the part we can to optimize our time. So this is where kind of this tool called Appraise got started and it's now open-source, people can download it from appraise.qa. It basically sets up a fixture like expected tests, usual kind of frameworks for tests based on expectations, for example, or kind of unit testing. It runs up a website or kind of takes an image of an app in action, depending on how to configure it, and then compares that to the baseline image. And if there's a difference, you just kind of highlight the differences for a human to approve. And by moving from kind of the model of we have to specify everything up front to, “Hey, let the machine look for unexpected stuff and show it to us.” We've been able to replace thousands and thousands of horrible tests with probably hundreds of visual tests that are much, much easier to maintain and give us a lot more value.
Joe Colantonio: [00:17:40] And I think it's a perfect match of testing because it's using automation, but using a person to work with that automation to see whether or not it's an issue. So it's I think it's the best of both worlds. And what I like about this project, hopefully, it won't get off-topic. I don't want to waste time. I think this is like the real world. You've done this in the real world with Narakeet, where you do something with video and audio. And so this is not like some heady type of thing you just made up. I believe it came out of the actual real-world experience. So could you just talk a little bit about that, maybe?
Gojko Adzic: [00:18:06] Absolutely. So Appraise came out of not actually Narakeet, it came out of kind of working with MindMup because MindMup has the visual look and feel is one of our key advantages. And it's something that, you know, absolutely is what we want to test. It's not just the functionality, but this whole kind of visual look and feel is very, very complicated. And that's why we had to look for a way to speed up that part of the job. And, you know, luckily or unluckily, I quite enjoyed the fact that there's only two of us on the team and we do everything and we compete with companies that probably have hundreds of employees. So in order to do that, we have to heavily rely on automation. But at the same time, you know, we need to place humans where humans need to play a role. And that kind of, you know, comes back to, I guess that the third rule I presented in the keynote is that we really should automate to assist people not to replace them, stuff that can be replaced or if a machine can do what I do, absolutely should do it. And I've been in enough testing conferences to get allergic to the fact that people talk about how, you know, automation will never replace them. And why do programmers always think that automation can replace testers? I'm primarily a developer. I develop software for a living. I learned how to test it as well. But I respect that, you know, there are people who are much, much better at testing than I am. And every time some tester complains how why don't developers automate themselves rather than automate testers, I tell them to look at the history of development where developers have been automating themselves for the last 50 years. We have compilers, we have leaders, we have dependency management tools, we have IDs. They're all taking stuff that developers were doing manually and automating them. You know, one of my favorite books, when I started making software for money, was Refactoring by Martin Fowler, and he was capturing manual processes back then. The developers were doing it over and over again. They were error-prone and he was codifying them in the book so that people can do them reliably. But what happened is the result of the book, kind of 10 years later, you have automated right-click refactoring handlers, even in all popular IDs and not even, you know, I think about a year or two after his book, you already started seeing automation there. So I think testers are kind of afraid of automation, it is something that they think is going to take away their job. But if you look at what actually is kind of happening in the programming side of the industry, people are just taking bits and blobs of automatable parts of the job that are slow, reliable, slow, and reproducible. And they just need to be done over and over and over again and nobody wants to do that manually. And what I now do as a developer, compared to what I was doing 22 years ago when I started, is completely different. Fair enough. You know, we had compilers back then, but now automation is on a completely different level and I could be much more productive the same way I look for opportunities like that in my testing on all my products. And this is where, you know, Appraise came out basically a side product we built for testingMindMup and I've used it on Narakeet as well to test videos, which is another one of these. I don't know if it's right or wrong by looking at the content in terms of bytes, but if I watch a video, I know if it's right or wrong. And the videos were even trickier because we've DOM and with visuals that we have on MindMup, at least it's deterministic. The videos are lousy by design, the MPEG compression, you can run it twice on the same input, you can get slightly different results because your CPU was busy and the compressor optimized something different. And comparing the binary output is kind of silly. But looking at the video and seeing, well, the transition still plays the same in eyes a human can't spot the difference. Therefore, it's okay This is really good. So I've modified Appraise by allowing it to turn video into some keyframes and then outline it as an image. And that's how I'm testing with the transitions on Narakeet to working okay and where the kind of audio and video synchronization is working, okay. And it's turned out to be amazingly well, actually.
Joe Colantonio: [00:22:28] I love this approach and hopefully it's going to catch on even more because I got a question asked all the time, how do I touch audio and video? And I think this is one way that can definitely help a lot of folks. And I love your point, what I have in my notes as you mentioned, you can't automate the whole thing, but you can automate the deterministic part. So a lot of people just say, I can't automate the whole thing, forget about it. But I think this approach is really, really a good approach that is going to help a lot of people now.
Gojko Adzic: [00:22:50] And I think, you know, even without looking at crazy non-deterministic stuff.
Joe Colantonio: [00:22:54] Yeah
Gojko Adzic: [00:22:55] I don't know how many times I've seen teams in, say, a large financial institution. Well, you know, this big database thing well, is just slow and it's non-deterministic because of unique ideas and things like that and we can't automate it. Well, you can automate like 99 percent of it. You cannot only take the cleanup, at least when, you know, people are spending a lot of time just removing all data. So it's not rocket science to clean up the data from the data. We just horribly kind of error done, error-prone. You can automate data loading in most cases, even if you can't automate 100 percent of it. You can automate a bunch of things there. And I think looking for opportunities where we can automate to assist people in doing their job is an amazingly kind of fruitful approach.
Joe Colantonio: [00:23:39] Well, great point. I think we need to think about automation, they automatically think of automation testing. I think that word you came up with, automation assistance is much better and I think it saves teams a lot of time. So that's an awesome point. So that brings us to your rule four, which was Results should focus on the wildly important stuff. What did you mean by that?
Gojko Adzic: [00:23:57] So what I meant by that is the test results are usually for humans to consume. And if we want to optimize human time in dealing with a large automated test suite, then I don't want to be reading through garbage and spending my time looking at stuff that's kind of accidental complexity. That's not really helping me make any decisions I need to make. And what I want to see from a test result is something that quickly helps him make an important decision. And the decision I need to make is should I deploy this or should they not deploy this? Are there problems here that prevent me from proceeding? Should I roll back and go back and fix this stuff? And lots and lots of testing frameworks just bombard people with data in the results because they want to show how they are grand and how useful they are and, you know, spitting out numbers and percentages and stuff like that all the time. And if I'm not making a concrete decision based on something, I don't want to see the data at all. Doug Hubbard in his wonderful book, How to Measure Anything, talked about that the value of a piece of information is proportional to the decision it helps you make. Every time I see these code coverage reports where they're all, you know, 76%, 55%, 98%. And then people have this massive, massive paper that's green and orange and red, and they never make any decisions based on that. It's kind of just a total waste. If you're making an important decision based on it, brilliant if you're not making a decision, why are you showing it? And I think looking at Appraise and how we were trying to optimize our time, we realized that basically what we need to do is optimize the difference, like optimize showing the difference. I don't want to see thousands and thousands of side-by-side images that are all the same or have minor differences. It doesn't really matter if something's really important, different. I want to see what the difference is. So then I can make a decision should I, you know, approve or reject this? And we also then invested a lot in figuring out if something is different. How do we find the root cause of the difference very quickly and including that in the test results, not requiring us to dig through five levels of files to figure out what actually changed? So what I mean by results should focus on the wildly important is look at the decisions you want to make and try to figure out what data do you need to make those decisions. Put that in the front, put that in front of the people who are making decisions. Everything else is just noise. It can be there for later investigation if it needs to be there for later investigation. But I think a big problem lots of testers make, especially when they are starting in the industry, is to overwhelm their audience with data, with test reports with stuff that people really don't care about instead of just presenting the key stuff that helps people make decisions. So that's what I meant by focusing on the wildly important.
Joe Colantonio: [00:26:52] Another awesome point. In your last rule was to focus on intent, not current implementation. Interesting. Why would you say something like that?
Gojko Adzic: [00:26:58] So, again, this is something that I often catch myself doing, and I've kind of seen this with lots of other teams that I worked with as a consultant and people who I interviewed for the books where people often describe their tests in the context of the current implementation, say if you have a process of the customer going and registering and going to one page, going to another page, going to a third page, we kind of describe our tests very often in terms of how something is done, not what needs to be done. And when the underlying implementation changes, say synchronous process becomes asynchronous or the design of the web page slightly changes, or the log in button is no longer on this page, but it's on some other page then tests that are too tied to the implementation, tend to all break and they tend to be difficult to understand. And of course, the actual automation has to be bound to how something is implemented at that very moment because it's automating that. But the description of the test shouldn't. The description of the test should stay relatively universal, regardless of the actual implementation. And some testing frameworks allow people to separate the definition of a test from the automation of the test. Now, this bunch of names, you know, there was data-driven testing with tables and things like that then you had fixtures and test definitions. And even with kind of unit testing tools, there's kind of test, tests and then kind of automation components and a bunch of other terminological things there. But they're all implementations of this idea that we have two layers that kind of separate informational layer that explains the intent of the system and then a layer that automates the testing of that intent. And if these two are mixed, then the intent is not clear and the tests are difficult to maintain. They become the cement. This is a very, very common reason why kind of test becomes a cement for the code, and then they cannot facilitate change. And at the same time, it's difficult to add new intent cases and reuse the automation. Once we have these two layers separately, then it's very, very easy to add another intent. You can just reuse existing automation. And this is where, you know, all those data-driven tests come in because you just add another piece of data. But at the same time, it's very easy to kind of adjust implementation because if the button moves, all we need to do is flip one automation component. All the kinds of intent full of descriptions are still the same. They're not tied to the implementation. And in one of the best usages of this idea, I kind of have worked with a large investment bank where they were migrating from a Legacy system. I somehow feel that these kinds of large organizations spend most of their time migrating from a Legacy system to something new. And by the time they finish this something new, somebody else in the company already started a migration from that and declared it a Legacy. But the tests that describe an intent really are the same, regardless of whether you're running against Legacy or not, because the intent is the same, the implementation is different. So describing an intent on one level and automating that intent on a lower level allows people to run the same tests against two different systems or two different implementations of the same idea. And I think that's very powerful.
Joe Colantonio: [00:30:30] So Gojko, I always learn so much from your presentation. I really enjoyed this as well. It's been a while since I saw one of your presentations. So I think this is a must-see. As I mentioned, it will be in the show notes for folks to check out but Gojko, before we go, is there one piece of actionable advice you can give to someone right now that they could take away, that they can implement to help with their automation testing efforts? And what's the best way to find or contact you?
Gojko Adzic: [00:30:49] So the top level of your tests, the entry point that humans read should explain what you're testing, not how you're testing it.
Joe Colantonio: [00:30:57] Love it.
Gojko Adzic: [00:30:57] That's kind of the one advice I suggest people take away. Then, of course, there will be lower levels that automate how you're testing it. But if you separate what you're testing from how you're testing it, you will have a much, much easier time maintaining things. You will have a much easier time showing the Dif, you will have a much easier time focusing on the wildly important, and everything else will fall into place much easier.
Joe Colantonio: [00:31:19] Awesome. And Gojko, the best place for folks to contact you if they want to learn more?
Gojko Adzic: [00:31:23] My website is gojko.net, gojko.com works as well. Probably a bunch of other extensions works, so that's the easiest way to get in touch.
Joe Colantonio: [00:31:32] Thanks again for your automation awesomeness. If you missed something of value we covered in this episode, head on over to TestGuild.com/a351, and while you're there, make sure to click on the Try for Free Today link under the Exclusive Sponsor section to learn all about SauceLabs' awesome products and services. And if the show has helped you in any way, why not read it and review it on iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation podcast. I'm Joe. My mission is to help you succeed in creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.
Outro: [00:32:15] Thanks for listening to the Test Guild Automation podcast. Head on over to TestGuild.com for full show notes, amazing blog articles, and online testing conferences. Don't forget to subscribe to the Guild to continue your testing journey.
Rate and Review TestGuild
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.