About this Episode:
What do potatoes have to do with performance testing? In this episode, Joey Hendricks will share his open-source, Python-based performance testing framework for unit testing— Quick Potato. Discover how it can help catch problematic performance bottlenecks in the early stages of the development life cycle. Listen in to find out how to truly shift your performance testing left in your SDLC.
TestGuild Performance Exclusive Sponsor
SmartBear is dedicated to helping you release great software, faster, so they made two great tools. Automate your UI performance testing with LoadNinja and ensure your API performance with LoadUI Pro. Try them both today.
About Joey Hendricks
Joey Hendricks has been working as a junior performance engineer for two years at a major consulting company working mainly for prominent Dutch financial institutes for that entire time as an external consultant. Because he is newer to the field I’m excited to get his fresh take on his views on performance testing. Also, this interview is based on a session he did at this year’s Neotys PAC event which I’ll have a link to in the show notes.
Learn more about the free PAC performance events
Since its beginning, the Neotys Performance Advisory Council aims to promote engagement between various experts from around the world, to create relevant, value-added content sharing between members about several topics on the minds of today’s performance tester. Some of the topics addressed during these councils and virtual summits are DevOps, Shift Left/Right, Test Automation, Blockchain, and Artificial Intelligence.
Connect with Joey Hendricks
-
- Company: www.accenture.com/nl-en
- LinkedIn: joey-hendricks
Full Transcript Joey Hendricks
Joe [00:01:52] Hey, Joey! Welcome to the Guild.
Joey [00:01:55] Hi! Thank you for having me, Joe. It's a great honor to be on your podcast.
Joe [00:01:59] Awesome. Thank you so much. I was really excited to see the session you did for Neotys and really want to show more people as well on my podcast. So thank you for agreeing to do this. Before we dive in, though, Joey, is there anything I missed in your bio that you want the Guild to know more about?
Joey [00:02:14] Yeah, not really. I think you are spot on with the bio. I love it.
Joey [00:02:19] Cool. Awesome. So I think the first thing we need to go over is a lot of times when people think of performance test they think of a stress test or a load test. I don't know if they're necessarily familiar with performance testing on a unit level. So at a high level, what is performance testing on a unit level?
Joey [00:02:34] On a high level, well it's basically defining three sets of tests so you'll have a boundary test or a regression test that you could do on the unit level. And the third one is where you go and start profiling and checking out and playing with the results. So that's basically the three levels for a unit performance test. So it depends a bit on what you're trying to do and what you're trying to test. But that's about it.
Joe [00:02:58] All right. So let's attack these levels then. First off, who performs these types of tests?
Joey [00:03:03] Well, it's typically done then by a developer. So it could be included in your unit test framework where you would define a couple of tests that you would have for key parts of your functionality, which when they fail they would produce results that you could use to debug your application. So it's typically run by developers.
Joe [00:03:21] Nice. So, Joey, as I mentioned in the intro, you have been working at a major consulting company for a while now. When you're on-site with customers did you notice that there's a gap in this type of performance testing in this area?
Joey [00:03:32] Actually, I see nobody doing it. I don't think it's something that I've mainly noticed just throughout the community and online. So it's not something that I've just picked up from my experiences as a consultant. I see this a lot when I'm on Reddit or when I talk to people online or offline, people at…the clients. I see that people barely touch profiling tools and they only touch them when it's absolutely necessary. And they're not always the easiest to use and they're quite complicated. So as I have experience with Python, I was like, let's try and find a way to make it a little bit easier, more accessible, and to get a very clear and cut way to pull performance metrics out of one of those tools.
Joe [00:04:17] No, I agree. I've been in software engineering for over 20 years and this is an area I hardly see anyone do. I maybe worked at one company that kinda did it, but not even at the level that you described. So I guess how then do you get more people involved in this type of testing? Is it just more awareness that you think people aren't doing this? Or like you said, it's so hard maybe to start messing around with it at first?
Joey [00:04:37] It's difficult. So the type of testing I'm describing like recap quickly with the regression tests on the code level and maybe a boundary test or profiling, just profiling itself. I see it hardly done. And I think it's a bit of the difficulty or the people are very focused on functionality and delivering functionality that they forget to do this because they're just battling to go through the sprint to make sure that they can deliver the functionality that they promised. Sometimes I see the performances then put in the back seat until it really becomes a problem when I pick it up with a load testing tool or find it in NAV (??) dynamics. That kind of stuff happens then. So I pick it up in the later stages where it's obviously very expensive to go back and fix it because we have to find it first. And if a developer would have picked it up earlier while he was doing unit testing, circa what I would call a unit performance test, if you would do one of these, a unit performance test you would find that bottleneck earlier. He could have solved it in like ten minutes but now it takes me and maybe three other people to find out we're at this bottleneck is. Obviously quite a waste of time and resources that you want to prevent from being wasted.
Joe [00:05:52] No, absolutely. And it always falls back to…once again we've been doing performance for a long, long time. Back in the day, they used to wait until the whole application was developed and then they throw on a load like, “Oh, it doesn't work,” where your approach is more agile, more shift left almost where before you even get to that stage, you're actually baking in performance into your application of code so that maybe you can scale better than if they're able to handle it at the unit level. Am I understanding that correctly?
Joey [00:06:17] Yeah. Consider it the same thing as a test-driven development of what they use for functional testing. So before we start any piece of functionality, we are going to establish in the agile world the definition of done and inside the definition of done should not always only be or should work this in this way. It should also contain the…it should run this and this like this and it should run between five or four seconds or it should come back within a certain amount of time. You want that in there.
Joe [00:06:44] Absolutely. So, you know, once again, we talked about three layers of three types of test boundary, regression, and profiling. I guess before we touch that, we should talk a little bit about quick potato. What's Quick Potato?
Joey [00:06:55] So I still laugh at the name. I have something with potatoes. I picked it up on social media that people were posting very long posts. They would post a potato or an image of a potato from like a story for a long post is a potato. So I started coding and I was like if the code runs long, that image came to mind. So I'm thinking about a framework that would performance test long-running code. I was like, yeah, it has to have the word quick in it. And then the first thing that came to mind was potato. So I glued those two words together and that's it. Stuff and grime behind it so maybe a little bit funny, but Quick Potato is a performance testing framework, which I built for the performance of the Python programming language, which basically allows you to use those three layers and allows you to set boundaries on your code. And if one of those boundaries is breached, then your code, your test failed. It could be a minimal value or a maximum value or an average or a certain percentile. It can program that in, no problem. But the second would be done, a regression test. So I would run a regression test on the code. And if you make a change to your code, let's say you put a time sleep somewhere in there and in the previous iteration of your code, it didn't have that time sleep in it. So if you would run it again, you might have a second delay because you might sleep for a second because of the time sleep. Because of the delay, your code would then be failed because you have a difference between your first test and your second test. And that difference is what I would look for in those regression tests. Inside the framework, I'm using a t-test to do that. So a very simple statistical test to do that, which is quite cool, because actually what allows you then is if you have those type two tests inside your unit test framework, you would create boundaries that you could check if your code doesn't overstep certain boundaries and you could check if any changes you just made to the code have impacted the performance. And that way you can if something goes wrong, those tests will be automatically failed. And you could generate with Quick Potato then reporting. And one of the reports that I generated with it is our flame graphs. So I generate a flame graph of what happened during the test. And then you can just quickly see and compare another flame graph that was generated on the previous test. And you can compare them in one view and see that function is new and that' one is slowing down. It was expected that it was going to slow down. Then you just pass the test and you can continue. So that's a little bit of that idea behind that, that you can automatically check two tests and see if there are any changes, finding bottlenecks a little bit earlier and making sure you don't waste your time in the later stages of development.
Joe [00:09:38] Very cool. So is this tool then only for an application that was written in Python? Or do you use this framework to create your test to run against other applications regardless of what technology they have written it?
Joey [00:09:48] For now, it only works against Python because I'm coding a Python profile. So it's still built (??) in Python profile or C profile. For the people that are interested in Python, I use C profiler. I spend it up on demand and then I profile the code that's coming after. And once the test stops, the profiler stops and then you can see what's happening. That's a bit how it works.
Joe [00:10:11] Nice. So how would someone incorporate this into the life cycle if they're Python developers? I assume this is only, this one to be shipped to production. How does it work? Like how do integrate it with their application then so to get the benefits and not slow down anything or cause any unintended consequences?
Joey [00:10:26] So a bit technical. I'll go a bit technical into the Python stuff. I'm using a decorator for now and I also have plans to not use a decorator, but I'm using a decorator, which basically if you're unfamiliar, it's a function that you can put upon another function. So it's a little bit of a nested function system. That nested function of the decorator can then profile the code that's running inside it. Once it then exits, it throws the output that that function gave back of the decorator. And your code will run functionally as normal so it will have zero impact. But you need to put those decorators on the methods that you wish to test. So if you have certain methods that you only wish to profile, then you need to put those decorators on it. I'm working on a way to also spend up the profile without using the decorators, but I really like the decorator approach because you'll have an intrusive line of code on top of your business logic, your actual code that tells you, “Okay, this line of code is performance-critical” because you see the name of the decorator in there. And I named the decorator performance-critical. So you'll have above every function in your Python code, you'll have a line of code setting performance-critical. That also with those profilers there, they're only enabled when you're running your tests. So you would have to turn on a boolean and then your tests will kick off the profilers and make sure that it'll only run during your tests. Obviously, when you have your profiler attached or integrated so deeply into your code, it also gives you some benefits. It allows you and Quick Potato also does that, it allows you to spin up the decorator on demand. So if you have a certain service that you have, say you have a Django REST API that has a certain transaction in there and you want to only test only profile that particular piece of code in live end production. So you're having issues with it and you're taking the hit of the performance loss that you're going to get because you're turning on the C profiler, because obviously when your profile stuff, it automatically becomes slower. But you're taking that risk of making it slower so you can get more debug statistics out of your code. You could use that information to also then debug a live production system by particular on purpose, slowing down one or more transactions to look at it for debugging. It would be obviously much cooler if that overhead would be much lower. So you could do what APM tooling is doing and just flick it on and see live information about your code and ultimately the profile. But that's what Quick Potato cannot do as of now, but something that I would actually find really awesome to put into the project.
Joe [00:13:07] Nice. So this is all documented in your GitHub as well. So it's very clear how to integrate it with the code. So once you have your decorators, then I guess it's just like any other tool. When you're running CI/CD, then if you have that flag on, it'll run your test automatically. And there are reports if there are any issues. So it's not only local on the developer's machine, but also, like you said, the grime in production, but also maybe in your staging environments. And when before you merge up to code, you can check to make sure there are no performance issues before it goes completely to your main branch.
Joey [00:13:36] Yeah. It would work the same as any other unit tests. So I designed it in a way that it would click nicely into your unit test frameworks. Also in my GitHub, I have an example where I just used the unit test framework standard into Python. I just use that library as an example. I just click on that. So wherever that can run, Quick Potato can run and you can integrate those unit performance tests, as I may call them.
Joe [00:14:00] So I'll be honest, I've always had a hard time getting developers to me just because I've worked for dysfunctional companies to do any type of testing because they do some minimal unit testing. But it's hard to get past that. Have you found or do you know a way to get developers more involved then with the performance testing and say, “Look, not only can you do your normal unit testing, we have this great utility of this great tool you can use to help your performance as well?” Do you see that they normally will embrace this or how much encouragement do they need in your experience?
Joey [00:14:26] That's a very hard question. So my personal opinion about this is that the performance and the quality of the code you've written go hand in hand. If you have good performing code, it's usually also well thought through and developed better. It's not always the case because mistakes happen or problems might arise that you didn't expect to. But higher-quality code is usually also higher performance code. So if you're keen as a developer to write better and higher quality code, then you would take performance seriously. And you would also do in the testing serious, functional in the testing, obviously, but you could also then integrate besides your functional unit testing, also the performance unit testing, making sure that both are well kept and that you're executing them well. So the concept of this is new for me, too. So I've been pushing this for about a month, two months from this recording, and I haven't really gotten much feedback from developers yet. So it's very hard to get them to embrace this. It's also very hard to get them interested. But as soon as people have a problem, if people burned their fingers in something, they tend to not want to burn them again. So it's also a matter of time that people encounter problems and want to prevent those problems for themselves, that these types of tooling also get a little more footing. But like I said, it's quite a hard question. And there is also, I believe, a better answer.
Joe [00:15:45] Because only time will tell then. But I agree it's a hard thing to get done. So I guess one way to find out is if, like you said, if it actually found something before it got into production or helped someone to help to bug an issue quickly while it was in production, that seems like it has been live that long to get that type of feedback yet?
Joey [00:16:04] No, no, no. It's basically live since the Neotys (??) is back.
Joe [00:16:07] So, okay.
Joey [00:16:09] It hasn't been on the open-source market that long and there are still a lot of improvements that it needs. But I'm really happy to work on this because it's becoming a little bit of my pet project, which is fun because I can get a lot of my technical know-how out of this and create cool stuff with it. As you can believe, there is a lot of room for improvement and making it cooler and fancier, and loving it more to do more cool stuff. But it's obviously an amazing basis to taste and test this feeling to test this concept out and see if developers appreciate it and that it actually brings the benefits that I think that it could bring because that's still obviously up to debate and not fully tested yet.
Joe [00:16:49] So the information, its captions are pretty cool. Like you said, there's some visualization, I think you said it was a flame graph. What are you using up for that? Is it Tableau or is that built into the application as of yet into the framework?
Joey [00:17:01] I've built it in completely. So what it's doing. So I got introduced to flame graphs through a framework called DEXTER from AppDynamics, and DEXTER was a cool plugin for AppDynamics. Anybody using AppDynamics and once you pull out flame graphs out of their AppDynamics instances, DEXTER created doing that. I really like using that. So that's the first time I was introduced to the concept and I was intrigued by it. So I searched up and I found out that the guy who made those, was Brendan Gregg, a guy working for Netflix, and he's quite a big guy in the performance engineering namespace. And so I read up on his blog and I kind of knew how these things work and I was really intrigued by them. So I was like, these things are really powerful because it allows you to get one image of the entire profile and you can see exactly where slowdowns happen, where CPU time is being spent, said these flame graphs are usually a bit more complicated to interpret. So inspired by his work, I set out to create my own flame graphs for Quick Potato. So they're built-in. They are renderable by code so you can automate that process. So if you want them automatically spend out every single time a test fails, then that's possible. Or you want to spend it out every single time a test is run. So you always have flame graphs of each iteration or each test. I can better say that's possible. So I don't use Tableau to do that. I do use Tableau for other analyses in the performance testing, my performance testing work, especially when looking at raw data.
Joe [00:18:32] Like I said these graphs are really cool. Once again I have it in the show notes. But you know, another issue I've seen people have, this is just to change topics a little bit is…and this has happened all the time, not having realistic test environments. So usually they get in trouble because even if they do test, they're not testing in a reliable testing environment. So any tips on how to create a more reliable test environment when you're doing this type of testing?
[00:18:54] It depends a bit on what you're trying to test. So if you're trying to test something end to end and you're doing a full-blown performance test, then you would obviously want the most production-like environment you have. And with DevOps and more infrastructure as code, it's becoming more and more easy to spin up a production-like environment for full-blown load testing. And that makes it a little bit more easier, but also for unit performance testing, being able to have infrastructure as code ready that you can just click a button and a database is spend up. It's quickly filled with an X amount of data and you could run an interpreted test, performance test against it. And then you can know I'm running against an environment that has 50 percent of the data or 25 percent of the data, those queries are going to be fostered and I'm going to be expecting them. So if you have those kinds of environments that you can spin up really quickly, it's very easy to d a unit performance test against an environment, which is reasonable, reasonably production like. And then you can validate, “Okay, I'm generating queries a lot of times. Are they within the boundaries that I'm expecting them? Are they performant?” Obviously, I'm not going to think that my queries are going to come back at the same speed as they would in production. It has more data so that could be slower. But I could validate if they're expecting in the way that I'm expecting them. So if they run for five seconds and I'm expecting them to run for five seconds, then it's okay. But if I am expecting that they would run for five seconds and they're running 15 seconds, 20 seconds, then I know that something is wrong. Maybe with my query, maybe with my environment, then I can go investigate and I can validate if everything is working. And I would with a sound of mind, I can see, okay, there's a problem in my testing environment. My code running fine, but I at least have checked it. So if you just check if my code is working functionally against any random database, then you're not doing a really well, good job in testing. So a testing environment is quite difficult when you're doing real end-to-end testing to get it right. And to do that, it's basically checking the environment and checking if everything is expecting a nice production like and asking questions and questions and questions. You're going to do full-blown load testing. And if this can be spend up with code, with infrastructure's code, then you can do it much quicker, which is obviously easier, but a little bit like that.
Joey [00:21:21] Great advice. So I guess this just popped in my head. I guess it's the opposite. It should be probably a good issue. We talked about how it's hard to maybe get developers more into this type of performance testing. Are there any issues or where you see maybe someone can go overboard with this type of approach, baking in performance into the code where it's not helpful or it's overboard?
Joey [00:21:41] Yeah, you can go overboard with this. So when you reach a Catch 22 scenario and you add these checks and they do not make sense and you add criteria to your tests, which are always going to pass, then just keeping these tests and having them around is useless or having an environment which they cannot be used in, its also makes and renders these tests useless. So you would have if you're going to test, run these tests against an environment or against a database, then you need to trust that environment. If you do not trust that environment, those tests will be rendered useless and it's not going to be at the same value that you would want them to have. That's about it with the Catch 22.
Joe [00:22:19] Great. So I guess before you even get into a Catch 22 though, is there anything inherent with Python, that if someone like coding things that are known to be bad for performance before even doing any type of testing that may be a Python developer needs to know about?
Joey [00:22:33] I don't think I have enough experience to give a good tip on that because I use Python mainly, I use it daily to automate stuff and to write these cool kinds of tooling. I don't really have a clear saying don't do this because this could involve performance. The only two tips I would give a developer, the only one tip I could that comes to my mind right away is if you're using any type of framework that could generate SQL queries, it's to make sure that whatever it's rendering it, that it's making sense for sometimes these packages could render a query that's not as effective as you would want it to be. I've seen a case of that happening once in my career. So when I write queries in Python, I would check if they're a bit logical, just as a sanity check on that side, that it's running in a way that I'm expecting it.
Joe [00:23:27] Absolutely. Once again, that's really… So it usually comes out of a performance issue with the SQL statement somehow being dynamic or too huge or something pulling back too much data. So that's great advice. Okay Joey before we go, is there one piece of actionable advice you can give to someone to help them with their performance testing efforts? And what's the best way to find or contact you?
Joey [00:23:45] Well, first, if you have a lot of questions, the best way to contact me is through LinkedIn. Just send me a message. I always respond quite promptly. The second on your first question, the advice that I would give to other performance testers and also developers is to test early. Try to catch these problems as fast as you can, and fail as often as you can. That way you'll spend less time debugging issues that you shouldn't be debugging.
Rate and Review TestGuild Performance Podcast
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.