AI’s Impact on Testing Careers with Jason Arbon

20 April 2025 at 11:44 AM

By Test Guild

AI’s Impact on Testing Careers with Jason Arbon

About This Episode:

In this episode, special guest Jason Arbon will be present for a truly eye-opening conversation about the rapidly evolving world of AI-driven software testing.

Jason introduces us to the concept of MCP, a powerful AI workflow engine that’s changing how testers interact with tools and offers a behind-the-scenes look at the origins of Testers.AI is a platform that automates comprehensive software testing using AI personas tailored for everything from usability to security.

In this candid discussion, Jason explores why the testing community initially hesitated to embrace AI solutions, how the landscape is shifting, and the surprising “happy dance” moment when AI outpaced traditional test teams in speed and coverage.

The conversation doesn’t shy away from the difficult questions:

What does this mean for human testers?
How reliable are AI-generated results?
And what skills will testers need in the new era where AI handles the bulk of basic testing?

Whether you’re intrigued, skeptical, or just curious about how AI is
reshaping quality assurance, this episode is a must-listen.

0:00 / 0:00

Join the Guild for (FREE)!

Email New Tab

About Jason Arbon

Jason Arbon

Jason Arbon is the CEO at Checkie.AI. His mission is to test all the world’s apps. Google’s AI investment arm led the funding for his previous company (test.ai). Jason previously worked on several large-scale products: web search at Google and Bing, the web browsers Chrome and Internet Explorer, operating systems such as WindowsCE and ChromeOS, and crowd-sourced testing infrastructure and data at uTest.com. Jason has also co-authored two books: How Google Tests Software and App Quality: Secrets for Agile App Teams.

Connect with Jason Arbon

- Company: www.testers.ai
- LinkedIn: www.jasonarbon
- Twitter: www.jarbon

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Transcript

Download New Tab

[00:00:35] Joe Colantonio Hey, in this episode, our special guest, Jason Arbon, will be presenting a truly eye-opening conversation about AI-driven software testing. And in this candid discussion, he doesn't shy away from some really difficult questions like, how does or what does AI mean? And in this candid discussion, he really doesn't shine away for some really difficult questions, like, what does AI mean for human testers? How reliable are AI-generated test and results and what skills the testers need in an era where AI handles the bulk of basic testing? Whether you're intrigued, skeptical, or just curious about how AI can be used for quality assurance activities, this episode is a must listen. But before we get into it, let's hear from this week's awesome sponsor, BrowserStack, and their AI powered low code automation solution. Check it out.

[00:01:21] Hey, let' face it, many testing solutions require high coding proficiency, excluding non-technical team members from the testing process. Steep learning curves, script maintenance, and the need for skilled resources and infrastructure set up or other significant challenges. As a result, QA teams are forced to limit automation to critical applications, leaving significant gaps in automation coverage. What do you do? Well, introducing BrowserStack Low Code Automation. It's a powerful end-to-end low code testing platform built upon best in class Browser Stack capabilities that lets anyone create automated tests in minutes without coding expertise. With their intuitive test recorder, you can create your first automated tests with ease, simply interact with your application and capture complex actions like hovering, scrolling, file uploads, etc. And generate test steps with meaningful human readable descriptions. Timeouts are intelligently configured for each test step, ensuring stability without manual intervention. Visual validation ensures specific UI elements appear correctly, improving accuracy and test stability. But here's an interesting piece, their low code automation tool is powered by AI. It uses natural language to create test steps and simplify automation of complex user journeys. When UI changes occur, self-healing technology automatically adopts your tests to prevent failures. You can also easily scale your low code testing with automation best practices for testing across different devices and browsers. You can also bring modularity, reusability, and extensibility into your low code tests with advanced features like modules, variables, data-driven testing, and many more. Join over 50,000 customers, including some Fortune 500 businesses who trust BrowserStack low code automation and take your animation testing to the next level using the special link down below.

[00:03:24] Joe Colantonio Hey Jason, welcome back to the

[00:03:28] Jason Arbon Thanks, Joe, it's been a little while. Actually, it feels like a few years in testing world, but only a couple months.

[00:03:33] Joe Colantonio For sure. So talking about that, a lot goes on in a few months. Since we spoke, MCP, I'm already off topic, MCP has been a huge thing. What's the deal with MCP? What is it? Why is it important?

[00:03:46] Jason Arbon It's basically a way to let people type to and interact with tools from a prompt, basically. The idea is usually you'd have to, like in the old days of calm, actually, so like dates me, but you used to have like very strict APIs and you had to code to them and understand them and read the documentation and all this stuff. Even command lines, you have to memorize all the switches and stuff like that. The idea is that MCP lets you take a tool and share it in a way with an AI, like an LLM. And then the person would just say what they want and it figures out which tool to use and how to use it for you. And it can even cascade call. It can make a call to one tool and then take the output of that and put it to another one. So it's really a high level workflow engine, natural language kind of workflow engine on top of other tools. It enables that kind of magic.

[00:04:31] Joe Colantonio Is that the same as the agentic AI?

[00:04:34] Jason Arbon No, kind of. It depends on whatever people want to call agentic. The true definition like an agentic thing is the purest form is that you have a bunch of AI agents that collaborate together and often without much human interference or inspiration and to accomplish a goal together. MCP applications are usually more often these days, more like they want to accomplish a single thing like download this image and convert it to ASCII Art and then post it to my webpage. But it's not really like a long-running multi-agent concurrently working on different aspects and then rolling them up. So it's baby agent I guess I would say.

[00:05:10] Joe Colantonio Gotcha! Jason, every time I talk to you, it seems like you have another company up and running, very prolific.

[00:05:16] Jason Arbon It's center failure, Joe.

[00:05:18] Joe Colantonio I wouldn't say failure. Because you're iterating, it seems like. So the next one looks pretty mind blowing. You gave it a little teaser of it at Automation Guild called Testers AI. Why did you create Testers AI? I guess that's something good to start with.

[00:05:32] Jason Arbon The funny thing is, this is actually probably interesting the testing role is that I actually created testers.ai much earlier than checkie.ai, by the way, like I created a while ago, but testers weren't ready for it. Just the early reception I got from talking to testers, the human testers is, it sounded threatening because they're kind of scared of AI to be frank, and then you go, well, this thing is my job.ai that's a little concerning. I avoided that and created the checkie site for a while to as an umbrella for everyone. To feel that it's more friendly. And also like, there's a lot of the testing debate. You had some probably earlier this morning, the difference between testing and checking.

[00:06:08] Joe Colantonio Yes.

[00:06:08] Jason Arbon And so it was also a joke on checking. Cause like, oh, the AI just checks, it doesn't do everything. Don't worry about it. Don't be concerned. But what's happened is I think both the community and the world has moved fast enough that now people want agentic things. They want testers to be AI as much as possible and to use AI as many as possible. So the mood, the environment's changed. I went back to testers.ai. That's actually what that loop was about.

[00:06:32] Joe Colantonio All right, very nice. So what does it do? Why people afraid of it? Or if people see it now, it looks kind of still, if you go there and you have like, looks like a little different personas of testers or testing activities that you can utilize.

[00:06:47] Jason Arbon Yeah, it's actually a very similar thread that I've been on the last year and a half, two years. The idea is that you create these different personas, like exactly like these testers. There's a tester focused on usability, there's a testing agent focused on performance, there's testing agent focused on security. And then what you do, it's like the ideal test team actually, right? So you have one of every specialty kind of in the world of testing. Usually, you can't afford that or find those people. And by the way, if you look at their pictures, they're the most beautiful test team on the planet. Did you notice? They are AI-generated, but AI doesn't generate people like me. They're all beautiful people. I left it that way. So it's the most beautiful testing team on the planet. But the idea is that, yeah, you just point them to a website and they look for bugs for you on across all those different dimensions of quality. And the crazy thing, Joe, this is the eureka that's happened since we checked last was mind blowing actually to me. And I'm an optimist and people call me an AI cheerleader or something sometimes, but I really got this. I got this working and I ran on the porch and I did a happy dance. And it's recorded apparently on my porch video, my kid found it, but it's crazy. What happened is, the AI is now also what's recently pushed up on the testers.ai is that these agents can find bugs, but they can also create test cases. So functional test cases, so they can look at a webpage, auto-generate a full test suite of test cases like hundreds of tests, and then execute them. Like they can do login, they can happy paths, sad paths, edge cases for search, log in, form fill, just interactivity, menus, all that stuff. They can generate a full suite, execute them, and then generate a report that visually shows you what they did and the bugs they found, all in like less than an hour. It's kind of crazy. What's crazy about it is that this is better coverage than most test teams have. I've worked on a lot of test teams in the past. Maybe they've all, for some reason, not been very good, but this is better coverage that we would deliver in months. It can happen in like, less than a hour. We're rearing a new world, Joe. Like, and also if you think about it like an overly get excited. It's also kind of funny because all these other tools and teams are like trying to fix one little piece of their workflow, right? Like fix selectors is this amazing thing. With AI, you just don't need selectors. Like if you have an AI first approach, not just in terms of technology, but in terms your testing workflow, you let the AI do all the bulk work first. And the humans come in after and review it and say, and sometimes bad robot, false positive, negative, but the vast majority of the data, the bugs and the coverage is legit. It's net a hundred times faster than doing it yourself. We're in a new world, Joe. This is something I didn't expect to happen this year, frankly, and it's running and demonstrable today.

[00:09:33] Joe Colantonio What's the quality, though? Because once again, I had another interview, pretty eye-opening there as well.

[00:09:39] Jason Arbon Tell us who it was, Joe.

[00:09:41] Joe Colantonio Well, you'll tell by when it's released, but they showed a user a graph of someone using an application and they showed they had multiple profiles of different users using the application as a real user. And then they showed an automated script and what that looked like. And it's totally different. And they were saying how different the intent is from behavior. How good are these, the results from these agents if there?

[00:10:05] Jason Arbon So a couple of things I'll note. I challenge you to ask the people to talk about this stuff. What tests they have run lately and I guess in a real world app and what bugs they've found that anyone actually fixed. We'll leave that aside for now. The real story with agents is that first of all, say that's true, completely true that AI agents or whatever, even just regular automation can't catch all these nuanced issues and stuff like that. One thing is that, most test teams don't have the time to do that in the first place because they're doing all of the rote basic work. Like they basically can't keep on top of the basic testing. A lot of people that teach or try to give presentations on the nuance of some sophisticated testing procedure, very rarely is that stuff executed in production because no one has the time. I highly recommend people let AI or some other form of testing do the basics so they can focus on those types of things. But secondly, if you actually look at it, the agents that I have running. You know me I try not to self-promote too much, because it gets dated too, but is you can generate not only personas in there. There's user feedback personas in this stuff that I'm working on. It will look at the webpage and dynamically generate personas that would like to interact with that page. Say it's a medical device company, it will create hospital administrators. It will create like in the medical world. It creates personas that match that application and then ask them for not only just functional coverage, but qualitative feedback on that website. And even more fun is that you can also create your own personas. I was working with a travel agency, travel search company. One of their personas was a great example. And also these people were talking very nebulous, high level kind of vagaries. This is a real company trying to do real testing on a real travel search site. They have some personas defined. One of their personas is a very rich first-class traveler like an international traveler that always travels with their dog. You can imagine this lady with the dog in the purse or something, right? But they have very special things they're searching for and they're looking for, and they need their itinerary. And so the question is, how well does it work for that person? You can create these personas and feed them into the agents. And this is where test of creativity is maximized. They spend all day thinking about these nursing personas. But then you feed it into the agents, they ingest that, and they give feedback based on that persona. So A, it's kind of true what they're saying but the reality is no one really does that. And also the reality is you can actually do it better, faster and cheaper with ironically with AI and automation and ask them again, what tests they've actually run and any bugs they've file that were fixed.

[00:12:47] Joe Colantonio This sounds cool, but if I am a tester, I don't know if I want it to be true. Like you said, because I still think there's resistance to, or if this does this, then what does a tester do?

[00:12:58] Jason Arbon Right. So I almost should have asked you to ask that question. I've also been interacting with people on this, like over the last few months. And there's two parts to it. The second one is the finding I've had. I'll share that in a second. First, I told you, I think it's worse than you believe.

[00:13:15] Joe Colantonio Worse.

[00:13:15] Jason Arbon Testers, I believe now after interacting with a lot of humanity last few months is that testers don't actually care about quality. Very rare. Very rare is when the tester care about quite what they care about is the mechanics of testing. So they love to know they're adding value and getting paid for putting their headphones on and executing the same suite manually over and over again, maybe a little exploratory testing on Friday afternoon, but they like that activity. They also, if you're a developer, like, if you're an automation developer, they love to create page object models.

[00:13:50] Joe Colantonio Yes.

[00:13:50] Jason Arbon That's what they're there for. Most of you interviewed to be a developer. Didn't pass the interview loop. So this is a practice job to get to their developer role so they can create more buttons, I guess. Oh, it's not interesting to me, but they like the craft. They like doing the work in the execution of it. They not only don't care about quality, they also don't about efficiency because the inefficiency in the process is their fun. And that's what they're there for. I think the vast majority of testers are like don't actually care about quality and they actually don't care about efficiency and they absolutely don't want to use these tools to be very clear. And I think they don't wanna use these types of tools because it threatens their job and they like things to be predictable and it ruins their fun. Let me tell you one quick anecdote that's beautiful. So it's not just a testing specific thing. There's the IDE. So these are, I mean, I know you know this but for our audience, there's tools where developers can code, right? And then, they can type in prompts, add a button in the top right corner for sign in or whatever, and it goes off and generates the code for them. It sounds like a tool, like maybe even a tease, maybe it could work. But you know, one of those companies just hit a, got an offer for buyout for $3 billion I think yesterday. So it's like legit. There's interest in people using it. The problem, so they asked the CEO of this thing in a Q&A, what's the biggest concern or problem your users have? And he was gonna say something about some feature or wish or something like that. But no, he stopped himself and he said, the developers don't wanna be testers. And I was like, let me think about it. I'm still thinking about it to today. Because what it does is it turns that developer into a tester because the AI is doing most of the code. Now they're accountable for it. They have to test it now. They're the tester. And they might fix things up and rejigger it a little bit, adding human value and review, and they have to review the code and that they're account for it and they're testing it now, like most of their job is prompting, waiting, go get a coffee or while the agentic stuff runs. And they come back and have to fix things up a little bit. Sometimes it just works, but they have to test it. And so, and they don't want to be testers. The analogy here is in the testing world is the same thing as coming to the testing world where AI can and will do the bulk of the testing and it's demonstrable today. If you want to live in denial, that's fine, but it's demonstrateable. But guess what testers like to do? They like to the do the testing. Just like the developers like to code, they like to the testing, they don' like to be the test manager. Everyone goes like, who tests the tests, right? Well, no one really does. We know that, Joe. Rarely does anyone really review tests, but that's the mode that's going to be putting the bulk of the work for testers moving forward will be reviewing the output and tests of the AI agents and testing bots, and most testers won't like that and don't want to do that. So it's going be a very interesting period here. And I'll finish up with the IDE thing is what just happened. Actually, I just launched these this week, but with MCP, back to what you said at the top of the hour, we've used MCP and added our testing agents now into the IDE experience. When the developer, so this is what's coming next next, is that developer has to test the code. Well, and the AI generates it, right? Now with MCP they can just say in their IDE, in their tool, say, hey, check this for me, or can you test this website out for me? And guess what happens, Joe?

[00:17:09] Joe Colantonio Spins up, nice.

[00:17:11] Jason Arbon It goes and starts testing it. And then guess what happens. The bugs from that get auto uploaded back into the IDE and guess what the AI coding agent does, Joe, it looks at the bug report list and guess who does it starts to fix them and then it produces a new build and it sends it back to the AI testing agents and at some point, they're not perfect yet they're, not genius yet but at some point most of the basic stuff has been found fixed and resolved and verified. And then the humans come in, so I think I'm as- The humans now are starting to become the exception handlers in a very polite way. When the AI can't figure it out, only after the AI I can't figured it out the humans come in and handle the exceptions and fix things up that the AI couldn't figure out or do. But that net process is like measurably 10 to a 100 times faster than the human, similar human process, especially when you measure latency. That's what's coming next, next. The question is, these testers, only a few testers I think will make that leap across this threshold here.

[00:18:12] Joe Colantonio What are they going to do? What are the skills they need then or habits?

[00:18:15] Jason Arbon I say, there's a good news. There's a preface to that. I'll answer the, I promise to answer the question. The real interesting thing here though, is that AI is generating code. I use this basic napkin math. AI can generate code say 10 times faster. If you look at it like the top, these people that don't, apparently testers know more than the CEOs of Google, Microsoft, OpenAI. It's amazing, Joe. They're freaking geniuses. They're just trapped doing manual test execution every time, but they know better than all these guys. Sorry, you can tell. I spent too much time in my cave and reading LinkedIn. But the reality is that these agents are gonna produce 10 times the amount of code. And if you think humans are only needed for 10% of the work, the amount of code goes up 10x, 10% is what we have today. 10% percent of the future is what we have to date. I think we actually need as many or more testers than we have today, but I think there's gonna be a shortage of testers, ironically, because guess what? Most of those testers won't, they'll opt out of the system. Because they won't want to do that work. The developers don't wanna be a tester. They don' wanna be reviewer of tests and they don't to embrace the technology. They like what they're doing today and the world's just gonna move past them. I think ironically might have a shortage of testers, not even just like a breakeven, but a shortage. But those are the testers that are smart enough, experienced enough, and care about quality enough that they will look and review and appreciate the work that the AI did. Just like some developers go. That's not as smart as me. I'm going to still manually craft all my C code or something. Amazon's stopped hiring engineers, Salesforce claimed to stop hiring engineers too, and switching over to AI. And that's going to, that trend is going to panels going to keep going the other way where they're going to. And what Facebook, I think, just said, they're getting rid of most mid-level engineers, I think, or something this year or next. There's going to be an outflow from the testing world. At the very moment we need very experienced is incurring and passionate testers, I think is what's going to happen. So many test results reviewed. I'll give you one quick example. If you just go to the testers.ai.

[00:20:21] Joe Colantonio We'll have a link for it down below as well so people can check it out also. Okay, good.

[00:20:25] Jason Arbon And I've got a brilliant business model. I'm giving away AI testing agents for free. It's genius. But if you go there, and even better now, there's test results for like almost 500 homepages of companies around the world, like the top web pages on the planet. The AI has already tested them. And you can browse around and look at the bugs that's found and stuff like that. I'm not only am I giving the tool away for you, I'm actually testing people's pages for free. I need a business partner. But it's demonstrable that like no tester could have done that. And it's happening now. This is the reality of the world.

[00:20:56] Joe Colantonio I've done trivial things with like GPT. I guess sometimes good output and sometimes I'm like, what the heck, like something stupid like an image. Here is an image, keep it exactly how it is. Just change this one thing. And every time I did it and I have a chat, it's always there. It still changes it. And that's just trivial. How can we trust the AI agents to be doing what they say they're doing?

[00:21:22] Jason Arbon Perfect example. Again, there's two, usually, you're supposed to do three, but I'll just do two because I don't have a lot of RAM. One is that picture that came out, I'll just make this philosophical argument. Picture that came out, A, you could call it user error because you didn't specify it well enough, the change you wanted. B, this is my AI apology. B, it's your opinion. If I execute the same prompt, maybe the picture it gave would have been what I wanted. It doesn't know what you want. There's variety and ambiguity in that. But also the reality is it's just, it's not that smart yet. But if you look even last week, if you look at what is it to blink or whatever, and there's, and the new ChatGPT stuff is getting pretty darn good. Guess what we're talking about, Joe, we're talking about Oh no. I asked it to, what'd you ask it to do? I asked you to take a picture, swap in Fabio's hair from mine. And I see your prompts instrumented and then guess what? The hair just wasn't flowing. One percenter problem. And by the way, if you're really worried about it, wait six weeks, maybe three months, and it will be even better.

[00:22:29] Joe Colantonio But that's a trivial problem what about like at an enterprise like that's doing radiology software I mean like completely.

[00:22:38] Jason Arbon Okay, good point. If you look around, one thing is you look at studies actual academic things, the AI is on average, this is interesting thing. AI on average is better than most of these radiologists themselves, like literally is and guess how radiologists are using AI, but the AI go through thousands of these things efficiently, and they look at the anomalies and look at ones that are the extrema, right? Just like I'm saying, developers are using AI and how testers should be using it. First, you should really be waiting for the AI to be perfect. By the way, it never will be. And also, beauty is in the eye of the beholder. Some testers think AI is wrong because they're not a very good tester, frankly. Or sometimes they think it's really wonderful because they are not a great experienced tester and they're fine with the basic responses. In terms of, I think what you're getting at is, how do you rely on this stuff that is variable and not always predictable? Again, even if you exclude the human in the loop. The way we do it at testers.ai, and I've had to do this, by the way, not just because it's cool to do, but because I had to control quality, is almost everything that's done is peer reviewed by one agent, by another agent. There's actually like a test manager agent that reviews the work of the other laboring AI agents. Also use multiple LLMs, by they way, to cross check each other. And like for every flow, even though the agent thinks it did well, it double checks it again and says, Does this, here is the intent, here's the actual results, here's consequence, like, is that seem correct, right, to you, so there's like four levels of eval and in checking, because I'm a tester, right? But the funny thing is this, think of the reverse. Every tester thinks like, oh, versus some a perfect ideal world that doesn't exist. Look at your Selenium scripts and Java. When you do a flow, right and you're clicking through an app, for example, and you get to the fourth page and that where you're supposed to be, how do you know that you're there? Guess what everyone does, they grep or they do a regex looking for a string on that page, which will be shopping carts or something, right? So they make sure they got there. There's so little validation in existing test automation, it's embarrassing now. The AI looks at console logs, network activity, screenshot of the app, has the context of the whole flow, and it has kind of the combined intelligence of the planet to a degree embedded in it. And when it looks at that last radiology chart picture right on the screen for the doctor to review. That application that the doctor's gonna look at, it looks at the entire screen. And it can, even if you ask it, if you asked the Selenium developer automation engineer, is this the right page? And guess what? It's a radiology chart for somebody that has a problem, right? Guess what the AI can do? The AI can act, and your script finds a problematic chart. The AI not only verifies holistically that you got there correctly, that the flow made sense, but that screen makes sense for a human to look at. Because it can find UI bugs, and if it's confusing or even difficult to read or has usability issues or anything like that, it finds those. Selenium scripts don't find any of that stuff. But most importantly, it can even give you an assessment on this exact thing you gave me, prompt you gave me, of that radiology image. And you can ask it, do you think there's a problem in here? Does this look like a problematic scan as I was expecting? You can actually get that. You talk about the test team, I know somebody might have worked on a medical product in the past. That your automation never did any of that. There's the imperfections in the radiologists themselves but the interesting thing is that on average, it's faster, delivers far better coverage, far less expensive, and can give you far more validation than we even usually have today in most situations. Even that radiology, because the AI can actually look at that radiological output. Try the latest GPT on a radiology image that you used to, you can't, but if you ran it local, you could. But it's better than any Java Selenium code I've ever seen in my life.

[00:26:29] Joe Colantonio And it's almost doing it like a real user because it's using visuals. Is that what it is? It's all image based now, or is it?

[00:26:34] Jason Arbon Well, depending on who you're talking to. So our stuff is, it's a combination of actually a lot of things. It looks at the DOM, also visual, also network activity, an API is called and console logs. That's what it looks at today. So it puts all those together, not just to find bugs, but also to figure out how to execute it and then verify that flow was executed correctly.

[00:26:54] Joe Colantonio I guess another thing that's been freaking me out, I've been reading this website, they wrote a book called Modern Day Oracles or BS machines? How to thrive in a ChatGPT world. I guess my question to you before we get into that, I guess it's related, is an LLM thinking, are these agents thinking?

[00:27:11] Jason Arbon Nobody knows, but well, the funny thing is, Joe, we cannot define what human thinking is. We don't even know what consciousness is. Like we can't. Somebody this week has an MRI version of maybe a way through kind of the entropy of the brain state to decide if a person is conscious or not conscious, like asleep or awake. But that's as close as we get. When humans are saying, oh, this isn't thinking.

[00:27:32] Joe Colantonio Hold on. So I could be asleep right now? I don't know.

[00:27:34] Jason Arbon I just don't think you're a conscious agent. You're just part of my simulation. But actually, you were an in-app purchase that I proactively opted into and I had to do a lot of farming to get you and I appreciate it. The funny thing is, humans think we're so smart, but like we have a ton of biases and one of them is just that we think we are smart and we judge everything else like against the way that we thing in our decisions. Like I was saying, beauty is in the eye of the beholder. Like sometimes some person based on their experience or whether they're, what their mood is. You've seen these studies where judges changed their sentencing decisions based on before or after lunch. It's dramatic. It's a significant bit. But frankly, Anthropic Habitat had an article, they released a paper like maybe last week. They're trying to trace through and debug these neural nets, these deep neural nets to figure out what thinking is happening or not. But the weird thing is this, at the end of the day, we don't even know how our own brains work, let alone I think it's a hubris to criticize how in that context, how another thing might be thinking or determine whether it's thinking or not, because we can't even decide how we think. And really at the end of the day, the proof is in the pudding. And the real metric today from a testing quality perspective are these evals, right, that measure everything from math to science to world knowledge to factuality. Even testers always worry about hallucination. Hallucination has dropped like a rock. It used to be a big problem. It's far fewer in between a problem. The reality is we don't know how they think. We don't know how we think. And at the end of the day, it doesn't actually matter as long as what matters is what they can do from an engineering applied engineering perspective. From applied engineering perspective, they can generate better. No human wants to do this. They should challenge themselves to a test with AI. Sit down and write. I did this at a conference, by the way, recently. I had people in a workshop, sit down, they gave everybody 10 minutes, a room of 50 people, find bugs on the conference webpage. No one found really anything. We're like a couple things here or there. I ran the AI in the same time. Actually took like less time than that. And I gave them at least 50 testers that care enough to go to a conference. And we found like 40 issues. And by the way, some real nasty security ones. But the best bug was this at the end of the day. The AI did a bunch of pre-thinking and it found out there were some problems, some sketchy stuff. I won't mention them because they're security. Sketchy stuff in the auth and login area. It found some bugs and issues. But a human during the break looked at that result and it went off and dug deeper. And found out that it was some catastrophic security problems. And those had been fixed, but it was the combination of the whatever it's thinking or not with the human thinking or not, that's where the real power comes from. The basics can get done very expensively and slowly with humans or quickly with machines, but the best is when I think the AI is finding the smoke and then the testers find the fire. That's I think where we're evolving. But most testers just want to look for smoke and file the bug and move on.

[00:30:33] Joe Colantonio I think you brought up a great point there, though, is you had people actually try it within a conference setting to kind of challenge them. I challenge people that may not agree with you, maybe go to testers.ai and download it and try it for themselves. So what can they try? Can you explain a little bit more? I know you've mentioned sometimes about personas, so they would download it. How hard is it to get up and running? And then are they able to run API tests, accessibility tests, performance tests, how does that all work?

[00:30:58] Jason Arbon Yeah, so it's actually, I try to make it a super easy as possible. You can just download it. You download a binary. You have to pick. It's very complicated. You have pick which platform you're on. Windows, Ubuntu, Mac. You have know if you're Mac Intel or ARM. So we lose about 40% of testers. Actually, I think we do. And then you download your machine and you mark it as safe to run. Cause you trust me. And I'm not monitoring your image queries, Joe. You just download it and then you run it. The name of the app is called Testers, like on Windows it's called testers.exe. And you have to go to console. If you double click it, this is my biggest issue right now. People like, they don't see anything run because it doesn't have a GUI. That's a command line. So you have open up a terminal and type testers dot exe space test. T-E-S-T, it's four characters. And you hit enter and what it will do, and then after that you add your URL, whatever your URL is you want tested. And if you're behind a firewall, it can test internal URLs, as long as it can get to the webpage. What it does then is it wakes up, automatically loads your webpage, loads all the agents. You can see them dance around and do their inspection and testing and their chat back and forth. They're talking to each other about what issues they found. You see the test manager come in and triage them. It all happens automatically while you're just watching. And then it will generate functional test cases. And the demo mode, when you first download, just with no flags, the demo it will report those bugs. And then it will execute three of the functional tests that it defined. So it'll define like 20 or 30 simple ones and then execute them for you though. Right there, you'll see it clicking through and records a video of the whole event and provides a test report. It's pretty, like that's what I do. Download it, run one command line, and it will do all that. And I would say you can do all these customizations, but really we're in a new world. If that's possible, and only that was possible, that's pretty crazy world. So if you want to have generate. Two tips, if people want to generate custom personas, it's just a flag. And you can tell it what type of, you can say for a friend, Kevin, I think you know Kevin, for his website, he works at the church in Utah. And you just type dash custom personas. And then you put in quotes, the prompt and you say, return missionaries looking for the singles word to date. And it creates a bunch of user personas that match that criteria. And then next run, it will run them against the website and you get feedback from them on what the website should be doing and their expectations, whether expectations are met through the features and content.

[00:33:30] Joe Colantonio I don't understand how it gets the context of that. Like, how does it know that's such an obscure kind of persona? How does a know how to match that persona?

[00:33:39] Jason Arbon It's crazy. I did the same with the world travelers that are, by the way, this is the funny part. I asked it for rich world travelers, that travel with their dog, right? Just generate those and drain a few of them. The funny one was almost all of them were older, but you'd expect that a little bit. But how does it do it? It's a lot of, there's Arbon magic in there, but the reality is GPT, if you ask GPT right now, right, any chat LLM and you go, you can actually ask it right now, you can say, generate personas that match this description, and it's pretty darn good. And it's a hundred times less expensive, or actually a thousand times probably less expensive than human doing it. Probably a hundred time faster, but it doesn't. There's a lot of painful evenings I've had trying to make sure how to get that into execution of bug reporting stuff. What people don't understand about LLMs, testers especially don't get, is that LLMS really do have the context of the world in them. But testers just, for whatever reason, cannot get it out of there because they think everything has to be scripted, written precisely, triple checked by a human. The reality is that the AI even knows more about testing than almost any tester. I challenged them, like take a quiz. I had to take the, what are those standardized tests? What's the big standardized test?

[00:34:54] Joe Colantonio ISTQ or something or IS?

[00:34:55] Jason Arbon Yeah. ISTQB or yeah, whatever. And it passed with flying colors. Ironically, the only one that it didn't really do well on. Is the AI test manager one. And I looked at the questions, a little sus as to whether they're relevant or not. And then they put a banner up. I don't know if I told you this before. They put a banter up after I did that and I published slides that is passing these tests, that the certification stuff that you can't use, you cannot automatically test and you can publish any results from any testing that you do on the test page. Just funny. But yeah, but people just testers don't how to unlock it. I think, the reality is they don't want it to work. And they also don't sit down for four or five hours and learn how to use it and learn. Like the biggest trick is they ask one question and it gives them a crappy answer. They ask a simple question. Usually, the person you talked to earlier today would ask a question with misspelled, poor grammar, but they're stupid questions. And then they get a stupid answer, but guess what? You can also ask the AI to give you an answer for quantum mechanics, to describe quantum mechanics as if you're talking to a five-year-old. It adapts to the question. A lot of times the AI is deliberately giving you a non-sophisticated answer as a tester because you've asked it a simple question and testers always ask simple questions first. They do the BBTs, they do the smoke tests first, but guess what? Smoke tests on AI give you simple dumb answers and then they move on. You really need to push it, ask it multiple questions, and then give it a lot of context and then judge the thing. But if people can't do that, they're not really ready to use AI, frankly.

[00:36:27] Joe Colantonio Another use case came into my head. I was reading this book, Oracle BS Machines, and they said if someone is prompting it incorrectly, say they're generating garbage and they don't know, they're not an expert. They're copying it, publishing on a blog and over time, how does LLMs learn? They scrape these sites. So over time does it degenerate over time if it's going to start using? Is that a real fear because someone said that could possibly happen.

[00:36:52] Jason Arbon The reality is, so this is, there's always, people were anti-bicycles at one point, they were anti locomotives, they were like anti plows at some point. That's actually like two, three-year-old kind of thinking. Most of the LLMs today they use reinforcement learning and they use hundreds of thousands or more of human input to do the reinforcement learning with to make it say the right things and to teach it what is a fact and what is just rumor. Is the earth flat or not, right? If enough people say that the earth is flat, the LLMs won't say the earth was flat unless you ask it to tell you that. I think the people that talk about those things and write those things, frankly, spend hours on this. They frankly don't understand the technology and usually they're computer science professors that were not part of the LLM craze and did not make any money on this stuff. If you look up almost all of them, all the naysayers, they fall in that category. And also almost all the testers that complain about AI and LLM and testing, did you look at him? They were formally employed somewhere recently and upset that they can't. They're upset with their job hunt, usually.

[00:37:59] Joe Colantonio These are professors, I'm going to look them up. That's very interesting.

[00:38:03] Jason Arbon Yeah, I'm just guessing. Yeah, but usually they're, and they've studied some different field that didn't take off. So they're kind of jelly. This is the reality. Especially like Marcus, and if you look at them too, they've made, look at their tweets. They've always made these protests like, oh, AI will never do this. AI could never do that. And then guess what? Like 6 months, 12 months later, they're demonstrably proven wrong. So at some point you got to go, who do you believe. And what you should believe is math. And there's these things called trend lines and how they're doing on the evals. How they're doing. And by the way, just money speaks because a billion people now are using Ai and they're finding some value in it. usually it's people that have an ax to grind or they teach a different thing. Like the person you may have talked to earlier, people teach, I'm just guessing, people are used to teaching people how to do things manually and click and then go, oh, here's a puzzle. Can you solve this puzzle? We'll see how creative you are. And that's the measure of find this very niche little weird test case or something, right, or this anomaly? But 99.9% of testing is not that. It's making sure the site still works as expected and works for the average normal user. That's the bulk of testing. And if you can accomplish that, then you're in the top 1% of the testing teams. Very rare to get to that point.

[00:39:13] Joe Colantonio Okay, Jason, before we go, is it one piece of actionable advice you can give to someone to help them with their AI testing efforts? And what's the best way to find or contact you?

[00:39:21] Jason Arbon I feel so happy you said that. I listen to it all the time. And I go like, I have some plots. I'll be, not even commercial because it's free. Just go to testers.ai and download the thing and play with it. It's free, you can get free bugs and free, like in less than an hour, you can see what AI can do today. And it'll get better. It's not perfect, but try it out. The key advice is for testers is test things. Like instead of imagine how they won't work in some weird obscure case, or they won' cover something very unique to you. Just try this stuff, test it out. And the key thing is compare it with an Oracle and guess who the Oracle should be? You, you put yourself to the same test that you gave AI and then compare yourself with it. If you just poke holes in things, that's what the lowest degenerate tester does is they just find problems in things. They don't appreciate the complexity. They don' appreciate the value. They just are super excited to find a bug. Be a tester and be a critical thinker and actually measure these things. That's what I recommend. That's my advice. Be a tester, ironically.

[00:40:20] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a542. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.

[00:40:54] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:41:38] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

Scroll back to top