Traffic-Driven Testing for Faster Execution with Nate Lee

28 January 2024, 07:19 PM

By Test Guild

About This Episode:

In this episode, Nate Lee, co-founder of Speedscale, delves into the exciting topic of using traffic flow to revolutionize testing environments. Nate Lee discusses how capturing and utilizing traffic can accelerate test execution, create realistic test scenarios, and provide invaluable insights. We also explore the shift from deterministic to nondeterministic testing, the importance of replicating production conditions, and how Speedscale is transforming the paradigm of load testing. Join us as we uncover the innovative approach of leveraging traffic to enhance testing and ensure performance and resiliency in the ever-evolving world of automation.

See the magic for yourself now: https://links.testguild.com/speed

Exclusive Sponsor

Discover TestGuild – a vibrant community of over 34,000 of the world's most innovative and dedicated Automation testers. This dynamic collective is at the forefront of the industry, curating and sharing the most effective tools, cutting-edge software, profound knowledge, and unparalleled services specifically for test automation.

We believe in collaboration and value the power of collective knowledge. If you're as passionate about automation testing as we are and have a solution, tool, or service that can enhance the skills of our members or address a critical problem, we want to hear from you.

Take the first step towards transforming your and our community's future. Check out our done-for-you services awareness and lead generation demand packages, and let's explore the awesome possibilities together.

About Nate Lee

Nate Lee wearing a grey sweater.

Nate has served a variety of roles within the DevOps and Service Virtualization space over the past 13 years. He started off in a presales role at iTKO, helping pioneer the Service Virtualization space before being acquired by CA Technologies. There he served as a presales leader for strategic accounts before heading up Product Management. He has worked with Fortune 500's across a variety of verticals, establishing Service Virtualization standards and implementing scalable testing practices. As a Georgia Tech Computer Science and MBA grad, you’ll most likely find him outdoors on 2 wheels in Georgia when he’s not innovating with his Speedscale buddies.

Connect with Nate Lee

- Company: www.speedscale
- LinkedIn: www.nathaniellee

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Transcript

Download New Tab

[00:00:04] Get ready to discover the most actionable end-to-end automation advice from some of the smartest testers on the planet. Hey, I'm Joe Colantonio, host of the Test Guild Automation Podcast, and my goal is to help you succeed with creating automation awesomeness.

[00:00:25] Hey, wouldn't it be great to be able to shine a spotlight on your Kubernetes applications and illuminate what's going on in your apps to see what you could troubleshoot with high fidelity, debug with less headaches, and quickly uncover blind spots and unusual behavior, especially with all kinds of traffic in a cloud-native world? If so, this episode is for you because we have joining us, Lee who is going to talk all about Kubernetes, traffic testing, mocking, and a whole bunch more. You don't want to miss it. Check it out.

[00:00:51] This episode of the TestGuild Automation Podcast is sponsored by the Test Guild. Test Guild offers amazing partnership plans that cater to your brand awareness, lead generation, and thought leadership goals to get your products and services in front of your ideal target audience. Our satisfied clients rave about the results they've seen from partnering with us from boosted event attendance to impressive ROI. Visit our website and let's talk about how Test Guild could take your brand to the next level. Head on over to TestGuild.info and let's talk.

[00:01:27] Joe Colantonio Hey, Lee, welcome to the Guild.

[00:01:30] Nate Lee Hey, Joe. Nice to be here. Appreciate it.

[00:01:32] Joe Colantonio Absolutely. Before we get into it, can you just tell us a little bit more about yourself. I know you have over 10 years of experience in DevOps, but I know you have an interesting background. You're one of the founders of a really cool tool. So just let us know a little bit more about yourself.

[00:01:45] Nate Lee Yeah. Lately, I've got tons of experience in the DevOps and observability space. Originally started out as kind of like a sales engineer, and then had been part of a startup that had founded kind of the service virtualization space. So service virtualization, service mocking, I mean, it's all kind of the same thing. One's a little bit more hyped-up version of the other, but the idea is you can simulate these backend endpoints, right? So did sales engineering there. Eventually, I moved into product management and helped kind of pioneer that space with a lot of Fortune 500 companies, joined forces with a few of my Georgia Tech colleagues, Ken Ahrens, and Matt LeRay who come from companies like New Relic and Observe, heavy DevOps monitoring experience there too, and develop speed scale. So with a lot of the monitoring, observability tools, I mean, they're great kind of canaries in the coal mine, if you will. But how do we figure things out before the canary keels over is kind of what the problem statement was, right? Instead of reacting to these issues as the alerts come up, could we preemptively tell you that there is going to be an issue and fix it before and so increasing the focus on resiliency and scalability nowadays, especially with these companies in the news talking about outages all the time. That's why we developed a speed scale. And it was a way for us to use traffic to preemptively catch issues that could affect production before they happen.

[00:03:06] Joe Colantonio Awesome. Now, I do trends every year, and one of the trends is moving from deterministic testing to non-deterministic testing because if we're moving from bare metal to now in the cloud, we're using all these services. You're in production, you're in the wild, you have no idea what the heck's going on. And that's why I think I see an increase in tools like the need for speed scale. Is that the use case? Is that why it seems to be like a bump in why the relevancy for a tooling like yours, rather than old-school monitoring like we had back in the day?

[00:03:36] Nate Lee Yeah, that's definitely part of it. Actually, I was on a customer call yesterday, as a matter of fact, and I think to put deterministic and not deterministic testing in layman's terms, I think it's they said I could run a few tests a million times, or I could run a million tests a few times. And he said the latter is going to give me more coverage. It's going to cover more use cases. Whereas, right now I think a lot of the tools ecosystem is equipped to allow you to run a few tests over and over again. Part of the reason for that is tests take a lot of work to build and maintain from release to release. The reason why we turned to traffic, I mean, that's one of the reasons deterministic and nondeterministic testing and in transitioning from one to the other. But it's actually to address what we thought were the three primary reasons you are delayed when you're testing. So testing delays that typically involve well, first of all I need to write the test. And what are you doing when you write the test? You're trying to simulate how users interact with your systems. You're trying to imitate traffic that hits your applications. So that's the first delay test. The second one is, I've been with lots of companies where they built up a gigantic automation framework and whatever tool, then they find out, well, we don't have the right environments to run this in. So if I don't have the environments to run it anymore, then how am I going to run early and often. And if you can auto-generate these environments based off the traffic that you see, you can move faster, you can test early and often. The third reason is, well, once I do have the environments, the next kind of log pull in the test becomes data. I don't have the right data. I don't have, like, I need to test with Platinum Medallion members. I only have nonmedallion members in my test database. How do I get more Platinum Medallion members or I've got to actually set up X, Y, and Z users before I can even test the delete operation? If they don't exist in the database, I'm just going to get user not found, but that's not really what I'm trying to do. I'm trying to test the deletion operation kind of thing. So it's a combination of those factors. Definitely. And we're seeing kind of the market trend shift that way as well.

[00:05:47] Joe Colantonio Nice. I definitely agree. I did an article once again on the 4 destroyers of automation. And you mentioned two of them are bad data and bad environments. Seems to be what everyone's struggling with. So how is your approach different then? Most people when they think of testing, they think of functional testing using a functional tool, maybe like Playwright or Selenium. And then I think you take a different approach. You keep mentioning traffic. So what does that mean traffic and how was your approach different than maybe traditionally tools that you would use for functional testing?

[00:06:17] Nate Lee Yeah. In fact, I mean, we're using the word testing, but we don't think of our solution as testing at all. We actually think of it as kind of the anti-test. So with most of these testing solutions, the base test is like kind of the core element of the framework. You've got a Selenium or Cypress or Playwright test, or you've got like a Postman or a Jmeter test. That test is the base element. And then you actually copy-paste that test and you create more of them. First, for the way we look at the world, traffic is your base element. You're taking spans of traffic. We call them snapshots. So you're taking snapshots of traffic and actually going through that traffic and saying, do I like this traffic? Is this representative of the day-to-day interactions that we're seeing? Right there immediately, A, it's more realistic, right? Because testing usually is a guess of how I'm going to be encountering my customers. Like so, for example, I don't know if, you know, I'm testing the order of basketballs, but really the thing that sells the most is footballs at my retail store. Well, I could be testing the wrong user flow or the wrong skew. And so there's something to be said for testing realistically or replaying traffic realistically. And then secondly, once you do have an idea of what you want to exercise in your application, you have to go build it. You have to write code to write more code. Usually, there's a scripting language that's associated with these solutions. But if you can just actually look through lengths of traffic and say, hey, this is a good user flow, I want to use this traffic flow, you can actually take that session and then multiply it and then direct that traffic at your environments. The next kind of differentiator as well. And the question that pops up at that point is, well, if I'm replaying traffic, if I don't have the right test data set up or the right skew available, then none of this is going to succeed anyway. And so we've already preempted that by what we do is when we're capturing traffic, we also capture the interactions that are going to the backend systems. And so the way we do this is by using a Kubernetes sidecar where we can work with other containerized environments. But sidecars are very well understood kind of Kubernetes pattern. And so by capturing this traffic, the developer now got a way to self-service kind of go into the solution and understand okay I want to exercise this API. What do I need back-end wise environment wise to exercise this API. And then secondly how do I get it? Conventionally, you may be doing this in your organization by saying, well I need to talk to the lead engineer. I need to talk to the architect, I need to talk to the DBA to get this instance going. And then you got to go set it up in a cluster somewhere or maybe you've already got the cluster. But the problem is everybody in there, brother, is already in that cluster. They're using data. They're running their own performance tests which may conflict with yours. You get the idea. So these are just a few ways we're different. By looking at traffic we can kind of auto generate the traffic that's hitting your app. And we can auto-generate the test environment as well. The idea is we can kind of give you orders of magnitude faster execution.

[00:09:33] Joe Colantonio I mean, that's a lot of people that don't realize. A lot of the breakup and testing. Good testing has to do with assumptions? Like you make assumptions from the beginning all the way to production, and then you're like, oh boy, we were completely wrong. So it sounds like you have some way to listen to traffic that's going on in production and what people are actually doing, and then you take that and then you're able to generate tests that realistically are mimic exactly what's happened in production, if I understand that correctly.

[00:10:00] Nate Lee Yeah. Absolutely.

[00:10:01] Joe Colantonio Nice. So are you able then to create environments based on this traffic then? Because a lot of times you have dependencies on all these other services in production that you may not want to use in your staging or QA environments because it costs money, or you don't want to open up a firewall to get to it from these environments, so does it automatically then do the mocking for you. How does that work?

[00:10:24] Nate Lee Yeah, that's exactly what we do. You can think of it as auto-generated service mocks. So it's kind of on both sides of the equation. So if there's a certain workload in the middle like your Kubernetes workload, you're trying to test or exercise those backends. There may be 1 or 2. There may be 9 or 10, right. There could be a huge multitude of those. And if you're using a solution like wire mock, or maybe you've got a ninja engineer somewhere who's building them out of JavaScript or Python, regardless of the way from talking to customers, I mean, they can take anywhere from two days to a week to build each one right manually. But by using a Kubernetes listener, you can actually pick up the traffic. We do it using sidecars again. And so we're actually coordinating the traffic that comes into the system you're trying to test, as well as the backend calls you're making. They're actually synchronized. And this is another huge piece is maintaining the state of these backends in conjunction with the invocations. And it could be APIs. It could be REST, JSON messages. It could be databases as well. So open standard databases like Dynamo or Postgres or things like that. These sorts of databases are often the kind of the delay right, when setting up these environments. Being able to have a repeatable environment that you can spin up over and over again in a new namespace, there are a couple of benefits. One, they're short-lived. It's not like a staging environment. I've read this article the other day. It's deer staging. I'm breaking up with you. It's not working out. I think it's on devops.com or something like that, but it makes sense. Everybody's using it and they're kind of stomping on each other's data. And things are always broken in staging. So it's like, well, what's the point of staging anymore? So having these short-lived environments that actually they for all intents and purposes, they look like real full-fledged end-to-end environments. However, they're actually miniaturized. One of our customers called bubble environments. So they just exist just around that API. And so the idea is you can do these isolation tests, you can run these isolation traffic replays that give you faster validation and aren't as complex to set up. And so I think engineers have a lot to worry about nowadays. And SDETs have a lot to worry about. And so as cloud gets even more complex, it's kind of tough to figure out, well, what all do I need in this fireman and how do I keep it running? And do I have drift from production? You want to test like you do in production, like it looks like in production. And if you've got drift, well, hey, it didn't break when we were testing it. But production is different. And I think how do we minimize that?

[00:13:07] Joe Colantonio This happened to us all the time. I worked for a healthcare company and no one knew how to create the full system. You'd have all these people that had their little specialty. And you pray to God that in your staging and QA, they all had all the pieces that mimicked a production. Never got it right, never got it right. This sounds like, can this also illuminate like create a diagram like, hey, did you know in production you're using this service and you have no coverage here? Can you do those types of things for you as well?

[00:13:34] Nate Lee Yeah, it can I think with the latest term is platform engineering. It was agile and DevOps and was platform engineering. I don't know who's catching the bug this time, but there's this idea of reducing cognitive load for the developers and testers, and there are just so many microservices that it's tough to reason about for humans to. And so that's one of the big features that it's kind of a byproduct of listening to traffic is we can kind of paint a world and show you what the interconnected systems are. And so when we're illustrating these connectivity points, it has actually been quite eye-opening for our customers, like, oh, wait, we don't call this system or hey, we only send this call once. No, actually, according to the traffic that we just recorded, you make five calls, and here's the conversation. Oh, you know what? This has been in our code for the last six years and we never do. Another interesting thing is monitoring tools like Prometheus health checks or something like that. Oh Gosh! It's hitting this API six times a second. And you know what? The observability tools don't show you that. But you're basically, inundating your APIs with your own monitoring traffic. And you could tune that down and still get the benefits. There have also been cases where we've like, seen like, oh, wait, that's this illustration is incorrect because this shows our staging environment connected to a prod database. That can't be right. And so we've been like, well, this isn't documentation. We're actually generating this from real traffic that we're observing that. Hold on a second. Let me go fix that real quick and come back. It's been quite eye-opening turning lights on for folks even there's been another case where engineers are supposed to be checking like a CDN before making a real call. And that's a performance gain. And by having our listener on and staging engineering managers were able to see that happening in like, oh, let's fix that before we get to production.

[00:15:27] Joe Colantonio Now you pick up a good point actually here from people now and then as they work for an enterprise, they don't realize how many monitoring tools are on the production that are adding overhead because they don't know the one team may use New Relic, another teammate, I'm just naming names, may use another monitoring tool, then all of a sudden like, why is it slow? Like, that's a great use case. Like, oh, you know what, using four different monitoring solutions for that starting overhead.

[00:15:51] Nate Lee Don't want to dash your own stuff.

[00:15:53] Joe Colantonio Right. I guess that gets me to another question fundamentally of what is what type of testing are we doing. Most realistic functional testing. But this looks like also you have a transaction end-to-end transaction. So it's taken I don't know 10 seconds. And in that 10 seconds, it seems like with this tool you say it's taken 10 seconds but nine of it is taking place in your this API or in this database. So if you fix that, you automatically fix the whole end-to-end time for your end user. Is that another use case that you see?

[00:16:24] Nate Lee Yeah, we actually don't. We do kind of more the distributed tracing side of things. I mean, I think, a full-fledged monitoring tool. We don't claim to be a full-fledged monitoring tool, by the way. I could probably show you the delays there, but let's say you did observe that in production. And then you wanted to verify that we have on multiple occasions had teams be like the databases, the delay, and then be like, no, your API calls or the delay and then they point fingers. Well, if you install our sidecar on the API, we can automatically understand how you're interacting with the database. We can mock the database and zero out the latency. Then any latency you're seeing is all a factor of the API. And so you can prove without a shadow of a doubt. There's definitely a faster kind of root cause analysis and troubleshooting that happens with speed scale because you can actually shorten your meantime to a resolution. The whole reason is with the ability to mock or generate traffic at will, you can essentially isolate any component that is typically a tightly coupled microservices architecture. It's like the ability on any plane you can walk up to the plane, grab the wing, and stick it in a wind tunnel. It's kind of the analogy. And then you can stick it back on. So that capability, I think, is becoming more and more critical in understanding the resiliency and the performance of your cloud architecture. So to get back to your question, we focus primarily on API testing. And so we don't play at all in the Selenium Cypress UI Playwright testing space. Our philosophy as it's kind of the iceberg. And the UI is kind of the part you see above the waterline. And then most of your application actually exists below the waterline and its APIs. They're great for programmatic repeated testing. However, if you know one of the microservices blows up, it can cause cascading failures. If there's contract changes, everybody works remotely nowadays. I change the contract and I don't talk to the other team on the rest of the country. That could easily proliferate into a problem. Having kind of a centralized dashboard with listeners that you can kind of self-service go in there. I want to test this API. I want to test that API. I want to build an environment for it. I want to simulate the transactions going into this becomes a huge force multiplier, I think, for engineering teams as well as just everybody being on the same page, hey, I want to look at this API. What is it doing? We're not looking at documentation that's 10 years old kind of thing. And I think a lot of people are focusing on AI. And we are too actually to help generate the test. But there's a wealth of information and the traffic that's going through. And yes, I don't think we've covered this yet. But primary concern when it comes to listening to or observing production traffic is security.

[00:19:18] Joe Colantonio That's my next question. I was going to say a lot of people like, I'm not using this tool. I work for health care insurance. Time are you doing what to what in production?

[00:19:26] Nate Lee Yeah.

[00:19:27] Joe Colantonio No.

[00:19:28] Joe Colantonio Absolutely. Yeah. So with those teams, we proceed carefully. I mean, we are SOC2 type 2 compliant, and we actually have redaction engines that are built into the tool for PII redaction, hashing, and obfuscation. And so that's natively built in. Of course, we are happy to do kind of like technical proof and proceed crawl, walk, and run with non-sensitive data first to prove out the technology. But yeah, it's definitely something we want to proceed carefully with but we're in a variety of Fortune 100 enterprises already that we've gone through the security scans. And a big part of the simulation, though, is also after you have just a small span of traffic, you can actually replace the data. So if you capture a two-minute span of traffic. And it has five users. You can take the message schemas and the general message format, and the mocks, and augment certain data fields to make two minutes of data look like 10,000, if that makes sense. You can take that traffic, you can run it at 1x speed as more of kind of a functional regression format. You can literally change up the format of the traffic replay, multiply it to look like thousands of users. You can take out the delays. And so now it's kind of turned into a load and performance test or a SOP test. The same with the mocks. Or you can spin up multiple mocks. So if there are third parties or other internal like authentication APIs or inventory check APIs, you can make them perform a lot faster and understand how the API that you're responsible for what your API can do at full chat if assuming the backends were performing optimally. And so I kind of call this the scientific method. You learned this in grade school. You water the Lima beans with coffee and coke and water, see which one grows the best. But then let's say you engineer a new super being, right? Well, then you want to do the same experiment again and validate that it's better. But oftentimes when you go back, you make changes and you go back to test that in staging. The whole staging environment's changed. And so you don't really know if you did an improvement or not. And so having the ability to replay the same traffic with the same back-end responses again is actually I think there's something to be said for that. It's harder to do than people think.

[00:21:48] Joe Colantonio I started my career as a performance engineer. Once again, we do all kinds of performance tests to try to mimic what happened in production. And this is before Containers and everything, and it was never got it right. So this is awesome. It sounds like you can actually mimic how many transactions per hour you expect and you do experiments. Okay. It's a holiday season. Black Friday we expect a 40% bump. Can you handle it? So it's like chaos engineering. It's almost like resilience testing. You can do all kinds of testing using one test you could do, like a functional test. It's spinning up to do a load test. You do it for the chaos test. What happens if this, service goes away. My understanding, all these different scenarios for that one.

[00:22:27] Nate Lee No, thanks for bringing that up. The chaos is another capability is once we have the traffic patterns, we can start to inject 404. We can inject non-responsiveness, we can inject high latency like, hey, every once in a while, we want you to take 30s to respond and see what happens. Do you hold the thread open or the connection open? Do you close that gracefully? Does your application just crash and burn? That's kind of like I call it extra credit. So you could get through your normal testing cycles and then go for extra credit with the chaos testing with the backends. Because a lot of times these third-party providers don't really tell you kind of like especially the sandbox is A, they're pay to play, right sometimes, like if you're integrating with a visa or an Amex, you get two sandboxes for free and they only go up to 10 tps. By the way, if you need more, you have to pay for it. So you really only have you're limited by these third-party connections, but if you can simulate them at will, you can run them locally on your machine like on Minikube or Docker Compose. That's a huge kind of developer efficiency boost if you can simulate, because all these traffic generators and mock rebuild are containerized. So they can run anywhere as many times as you like. And then the other cool thing that you had on, Joe is like, there are companies that are doing these experiments as well as trying to figure out if, like graviton processors are faster than Intel processors, these cloud providers coming out with new architectures, and we've actually done that with our customers, is like, hey, apples to apples comparison. We replay the same exact traffic against the Intel cloud instance versus a graviton, and we figure it out in one instance like one was 40% faster than the other. And they were like, oh, we can re-architect and move over to this new one.

[00:24:11] Joe Colantonio Oh my Gosh. A lot of times people go down a rabbit hole as we're going to rewrite everything because I have the latest and greatest technology and once again, architecture. I used to have to test different configurations to see which one would perform better. Before you build out the whole thing, it sounds like you can actually use this like almost as an exploratory session in before you even write like a line of code. Really just do like a little staging set of what you think and see how it behaves.

[00:24:38] Nate Lee Yeah, I mean, you can run that if you're running optimizations like code optimizations, you can just run the traffic locally on your laptop so we can replay the traffic locally. If you're re-architecting and you just want to turn the workload into a graviton workload and test that first, you can do that. There are also other things that I think maybe is harder to test with conventional solutions like horizontal pod auto scaling or there's the circuit breaker pattern for microservices architecture. Like is that going to work? Are we going to failover properly those types of things that I could wait for a high load event like the Super Bowl, or I could preempt that and actually run a high traffic event myself and make sure that we're scaling properly. People think that self-healing and auto-scaling is kind of the silver bullet, but people don't realize these horizontal pod auto-scaling features take a couple of minutes to spin up. And what happens during those two minutes? Are you dropping orders?

[00:25:38] Joe Colantonio That's once again, every time I talk to anyone with Kubernetes where performance or resiliency always comes to configurations, could be different based on your needs for your application. So you can use this for configuration testing as well then.

[00:25:50] Nate Lee Absolutely. There are a variety of I mean really where when we said about building speed scale, it was kind of the continuous testing dream. That was our dream. I read somewhere like continuous testing is the ability to understand the state of your quality at will. Continuous testing, the ability to understand the state your quality at will. At any given time, do you know how good your code quality is. And a lot of people don't. They have to I mean agile helps. Two week sprints help. But because of the environment and data constraints, it's hard to run tests repeatedly reliably. And so the idea is if you've got the spans of traffic, whenever you're doing a code commit or you got a build candidate, you can run 10 minutes of traffic compressed into 10s against your latest build and understand the SRE Golden Signals, which Google establishes as throughput, latency, headroom, and error rate. And so if you get a cursory idea of that and you can run it every time and by the way, oh, things have changed. Or you know what? The way people are hitting our application has changed. It's a new holiday season. While you can just go out and rerecord, rerecord traffic and add that on to your application, you don't have to re-script your whole regression library or have another army of engineers. Your existing engineers can focus on the back-end feature and spend less time building the scripts and work on building features.

[00:27:20] Joe Colantonio Another issue this reminds about as a developer is complaining or waiting around for the test to finish before their code checks in. And so this sounds like you're not saying get rid of all end-to-end testing like you said, this is API testing, but at least allows your developers to have a sanity check before it's committed to the next round of the pipeline. It sounds like. So it's a great time saver.

[00:27:38] Nate Lee Yeah, absolutely. It's the at a time saver. SREs win too. I think because SREs are like, well, am I going to burn on my error budget? Am I going to be sticking to my SLIs and SLOs? The tester said they tested it. So I just have to go on that. Well, if you run, 10 minutes of traffic against your app, you can actually say, hey, you're Max TPS here used to be 80 tps, but for some reason the code change that you just did drop it to 60 tps or hey, your latency went from 20 milliseconds up to 70 milliseconds with this last code change something changed. Even if it's just making better-educated decisions about releases, that can be quite educational, I think, and prevent these outages.

[00:28:23] Joe Colantonio Absolutely. Most of the people listening to this are probably mostly have a tester background, so will be familiar with tools like LoadRunner, JMeter, and Postman-type deal. I think if you give all the reasons how this is different, but I just don't know if we made the connection, how much different they are. Like would you compare? How would you compare this to say, a load runner or a Postman? Does it replace it? Does it work together with it? Is it completely different how would you explain it to someone who's a tester basically?

[00:28:49] Nate Lee So I think for like API development tools like Postman or StopLight or Hopscotch, things like that, I think it's very complementary. In fact, we can ingest Postman collections and we can actually take traffic recordings. We can export to Postman collections as well. So a lot of times when you're building these collections, it can be quite a headache. Call one of the headers like how do I get this message. It actually sends back a response. It's a little bit of a guessing game trial and error. If you can record traffic and then export to Postman collections, it can be a huge accelerator for these engineers. It's complementary to those compared to load testing tools like JMeter or Locust or K6. We have the majority of our customers are using tools like that. And basically, there are so many microservices now they're in the hundreds that they just can't build for every single microservice.

[00:29:41] Joe Colantonio Right?

[00:29:42] Nate Lee I don't know if you have you ever heard the pets versus cattle analogy.

[00:29:46] Joe Colantonio Yes.

[00:29:47] Nate Lee Yeah. The cloud shift, like, hey, the servers now are cattle, right? Not pets you don't like curate them one by one and care for them like you just have huge hoard of servers now. I think tests are going through the same shift. You don't curate each one of these tests and I update it and hey there's a version change I updated with a new request-response versus now. Just go out, grab more traffic, see if it's got the stuff that you need, and then run the traffic. With the Load Runner or Jmeter or K6 scripts, like basically we can run a lot more load tests by capturing traffic and then stressing your application. We take that, multiply it. We actually can grab our traffic. And we've had some requests like customers that say, hey, we're going to try out speed scale, but we've got a big K6 practice already. We actually built a K6 exporter so we can capture traffic and then just dump out a whole slew of K6 scripts. Even if it's for just that value add so many K6 scripts in fact, that they couldn't all be uploaded to K6 cloud at once. I just don't think they were expecting that volume. We can be an accelerator in that regard. And you can just use us for our mocks. But yeah, it's kind of a different mindset if you will. It's like I said, and don't test replay traffic.

[00:31:09] Joe Colantonio Is there any better way to get the source through production what your users are experiencing? It seems like the most logical thing, but counterintuitive to the way we probably been developing software for at least as long as I've been around.

[00:31:21] Nate Lee Well, I think the mindset changing. I've talked to several business executives and technology leaders who have said, really testing is just all a dress rehearsal for production. Productions a real deal. And we're just preparing for production. If I can shift production left or I can simulate production conditions, that's kind of the whole point, really. And I don't think people realize a solution like this exists. There actually are open-source alternatives. There's call replay and VCR. I think Curl and Hey are kind of like the one-by-one transaction versions of you can curl a transaction you saw in prod, then that's great. And so that's kind of like that. But we've combined like a traffic replay solution with a mocking solution like wire mock or surface virtualization. And so having both of those two kinds of value props in one solution. Is really what the acceleration comes from is the velocity to move quickly.

[00:32:25] Joe Colantonio Another question to keep in mind. And like I said I haven't tried this. Do you need to record traffic using speed scale? How does it work? How does that process work?

[00:32:33] Nate Lee It's definitely the fastest way. And specifically recording in Kubernetes is the fastest. Like that's kind that. Kubernetes a lot of hooks to ingest traffic, it's very conducive to it. But you don't have to record. In fact, we think the ways of sending us data are going to be multiply that it's going to be commoditized, we keep a close eye on eBPF open source. We actually can ingest files. We also have a browser recorder. So like you can install us on a browser. We can capture basically a Jar file that looks at all the Http transaction. So we don't do browser Dom event like mouse clicks recording like conventional UI testing tools. But we do have a way to capture UI interactions. And so that's what we capture. And really we feel like, our IP is around how we interpret that data, parametrize that data, and then make it replayable.

[00:33:28] Joe Colantonio And that's the secret sauce. I think that's the most important piece. So love it. Awesome. Okay, Nate. before we go. Is there one piece of actual advice you can give to someone to help them with their production Kubernetes automated testing efforts, and what's the best way to find contact you or learn more about speed scale?

[00:33:44] Nate Lee I think oftentimes when people try to improve their performance and resiliency they feel like they have to master the discipline. Like I have to be a performance testing expert before I set out. I need to learn and consume all the information. Get certified. Really, I would focus on the most critical API that you have. If it's the checkout API, or maybe it's your inventory lookup API, the one that's kind of like the slowest and mock up an environment for it. You don't have to use any mocking solution, just figure out how you can simulate or have the repeatable like a terraformed or Yaml for that environment and spin it up repeatedly. And I think if you do that, then you can repeatedly run tests. And it doesn't have to be. I'm not advocating like any sort of commercial off the shelf or open source solution, but just focus on the repeatability because stable and predictable environments can get you a lot further in testing than writing a whole slew of test scripts. And yeah, you can find us at speedscale.com, and we've got tons of blogs that are devoted to kind of, how does it differ from this or how does it different from that? It's really kind of a paradigm shift using traffic instead of testing. It's like I said, we're kind of that anti-test, but we provide all the same benefits. So people are like, well how do I update the tests and how do I maintain the tests and stuff? Like you just grab new traffic. I mean we can patch snapshots with request-response pairs and post main collections if you've got new functionality, but it's just a complete mind warp. Happy to talk about obviously I'm quite passionate about this subject.

[00:35:22] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a482. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.

[00:35:58] Hey, thanks again for listening. If you're not already part of our awesome community of 27,000 of the smartest testers, DevOps, and automation professionals in the world, we'd love to have you join the FAM at Testguild.com and if you're in the DevOps automation software testing space or you're a test tool provider and want to offer real-world value that can improve the skills or solve a problem for the Guild community. I love to hear from you head on over to testguild.info And let's make it happen.

Scroll back to top

A Practical AI Guide for Business Leaders with Brad Groux

Posted on 01/09/2026

About this DevOps Toolchain Episode: In this episode of the TestGuild DevOps Toolchain ...

Top 8 Automation Testing Trends for 2026 with Joe Colantonio

Posted on 01/06/2026

About This Episode: AI testing is everywhere — but clarity isn’t. In this ...

A smiling man with glasses and a beard sits at a microphone. Text on image promotes the Automation Testing Podcast update with Joe Colantonio from TestGuild, highlighting insights and trends leading up to 2026 events.

Automation Testing Podcast 2026: New Schedule, Events, Discounts with Joe Colantonio

Posted on 12/28/2025

About This Episode: This is a special end-of-year episode of the Automation Testing ...