About This Episode:
In this episode of the TestGuild podcast, Joe Colantonio sits down with Ben Fellows,
founder of LoopQA and QA thought leader, to explore how AI is reshaping test automation.
Ben shares lessons from his popular AI test automation workshops, diving deep into topics like:
How AI turns hours of page object coding into minutes
Why “augmented coding” beats “vibe coding” for serious QA work
Practical ways teams can leverage Cursor, Playwright, and AI to boost productivity
What QA leaders need to know about shifting roles, scaling code reviews, and IT security concerns
Key trends coming in 2026 that could redefine how we write tests
Whether you’re curious about AI’s real impact on QA, looking for ways to speed up your automation, or wondering what’s next for Playwright and MCP, this conversation will give you actionable insights and inspiration.
About Ben Fellows
Ben Fellows is the CEO of LoopQA, a company specializing in embedding Senior onshore QA team members to enhance software quality. A strong advocate for automation and emerging tools like Cursor, Ben believes in leveraging technology to drive significant improvements in QA efficiency and effectiveness.
Beyond his leadership at LoopQA, Ben actively contributes to the QA community through thought leadership, content creation, and hosting The Daily CTO podcast. His mission is to develop the next generation of QA Directors, empowering them to take greater ownership of processes, metrics, and overall quality within organizations.
Before his career in tech, Ben was a “West Wing kid” and began his professional journey working on Capitol Hill—a unique foundation that shaped his strategic and leadership mindset.
Connect with Ben Fellows
-
- Company: www.workwithloop.com
- Blog: www.workwithloop.com/blog
- LinkedIn: benfellows
Rate and Review TestGuild
Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.
[00:00:35] Joe Colantonio Are you spending hours writing page objects or reviewing endless PRs only to feel like you're falling behind? Imagine cutting that time down to minutes without sacrificing quality. That's the power of AI and QA. And today we're diving into it. Hey, I'm Joe Colantonio and today you're in for a treat. If you've been hearing all the buzz about AI and QA, but still wondering what it actually means for your day to day testing, you'll want to stick around. My guest Ben Fellows, founder of LoopQA, speaker and workshop leader has been helping QA teams and directors of testing to get hands on with AI, Playwright, and augmented coding. And this conversation and demo then breaks down what's hype, what's real, and actually shows how AI can slash hours of tedious coding into just minutes. We're talking practical demos, page objects in seconds, new roles for QA, and even what's coming next in 2026. Trust me, this is one of those episodes that I think is going to change how you think about testing forever. So grab your favorite cup of coffee or tea and let's do it.
[00:01:39] Hey, quick question. Are you overwhelmed by test tool research? You're not alone. Choosing an automation testing tool can be a rabbit hole for hundreds of opinions, each promising to be the best, but none tailored to your team's specific goals, stack, or budget. And that's where the free Test Guild tool matcher comes in. Answer a few quick questions. Think test type, budget, tech stack, and instantly get a personalized short list from over 300 trusted tools. This will save you weeks of demos and sleuthing. Get expert curated recommendations backed by insights from my podcast interviews, blog posts, and webinars. Make confident decisions faster. Think of this as like a G2 product test tool encyclopedia of test tooling awesomeness. You're ready to cut through the clutter, all you need to do is go to TestGuild.com click on the tools menu and select the tool matcher to find your perfect test automation tool in seconds. Powered by the Test Guild community, where automation pros get smarter, faster. Check it out.
[00:02:43] Joe Colantonio Hey Ben, welcome to The Guild.
[00:02:47] Ben Fellows Hey Joe, I appreciate you having me.
[00:02:49] Joe Colantonio I'm really excited to have you. I've been following you for a while, and I've seen a lot of posts on LinkedIn about, I guess the workshop you're given. A lot of industry leaders like Jim Hazen, I think Butch, what's his last name, Mayhew?
[00:03:02] Ben Fellows Mayhew, yup.
[00:03:04] Joe Colantonio Has been recommended while this was a great workshop. I know you've been talking about Playwright for a while, but now you're coming in with AI as well. So thanks so much for joining us today.
[00:03:12] Ben Fellows Yeah, happy to share more.
[00:03:15] Joe Colantonio Awesome. Alright Ben, I guess before we get into though, a little background for the folks that may be new to like, how do you get into automation and maybe a little bit more like how do you found your company, what that's all about as well?
[00:03:25] Ben Fellows Yeah, for sure. So I think like nearly everyone in QA, particularly in the States, I got into QA accidentally, just kind of through lived experience. Randomly, I was in politics in college, worked on Capitol Hill for a couple of years, had the opportunity to join a startup, sort of got to join a tech team and very much that kind of learned to build a plane and flight at the same time experience. We built a box office management solution in Ruby on Rails. And during that time, just really had death by a thousand cuts. Like it was we would release. No matter how much testing we did, we would always break something and really got to see firsthand the emotional toll that QA when done poorly or not development when done, poorly or couldn't cause on a team. That set me on a journey when I decided I want to start my own company to figure out like, what is this methodology? What are these practices? Why is this so hard for technology companies? Founded a QA company 6 and a half years ago now. Knowing what I know now just wildly naive, maybe in some ways. And then in the last couple of years have really been trying to understand the impact of what I call augmented coding or what a lot of the people called AI and stuff like that, and it's impact on QA have a ton of thoughts and would love to get into it on this and show a little bit of some of what we're doing in our workshops and kind of what we're talking about.
[00:04:45] Joe Colantonio Love it, love it. So I guess before we get into as well, why Playwright? It seems like you were you jumped on board Playwright pretty early as well.
[00:04:54] Ben Fellows Basically, when we first started off, we were doing a lot of manual testing. And from this UX/UI perspective, I kind of focus on three traits when it comes to QA, it's emotional intelligence, critical thinking, analytical ability, and ambition. Why Playwright is simply because we were using a private tool originally when we started a company called Essential Test, a great team, but they had some limitations, shout out Brian behind that team. Only works on PC, a couple other things with some of our clients say, Hey, we really, we want more of an open source solution. We had jumped over to robot framework. We're using the Playwright library within Robot framework, and then we were like, well, why not just start using Playwright itself? The thing that struck me immediately about Playwright when I started using it comparatively like Selenium and Cypress and some of the other ones was a flexibility of locators. Like there was never something I couldn't get access to in the DOM. The ability to run in parallel by default, the ability to have multiple context windows, multiple browsers up and going at the same time and be interacting during a test, so really kind of accidentally got into it, but then fell in love with it very quickly.
[00:06:01] Joe Colantonio Absolutely love it, love it. All right. Before we ease into the workshops as well, I'm trying to tease it out. I know you also are doing, I think, like one-on-one director sessions as well. And I'm just curious to know from your point, before we go into like a demo mode, is there a common thing you see QA leaders struggling with that's coming with AI or misconceptions?
[00:06:23] Ben Fellows Yeah, there's a lot actually because there's a lot of pressures. I think that the biggest thing with AI to understand from a QA perspective is that there's really two ways that I think we see AI affecting QA. There's like agentic browser-based agents that are opening browsers and driving the browsers and using image and they're doing testing and I think a lot of private companies are selling. At best, helpful solutions, at worst, kind of magic solutions of just this idea of like, oh, we can open the browser and do all your tests. And I think that there's some interesting stuff happening there. But at the same point, I think it's not really production ready or enterprise ready, at least as far as I've seen with some of the MCP stuff that I've played with. It's slow and expensive to run, but there's a lot of promise. And then there is basically what I want to talk more about, which is AI is kind of a coding tool and understanding how do we use traditional test automation, but use AI to help. When it comes to these director conversations, I think they're getting a lot of pressure from boards to say AI everything. And I think that that's probably the wrong approach. I think thinking about AI's targeted implementations of bottlenecks is really the right approach. Team roles and team size, I'm seeing a lot pressure on figuring it out because I just had a conversation with someone yesterday who were like, I cannot keep up with the PRs anymore. Like my team is producing so much more code and it's high quality code, but I can't keep up. And I was saying, yeah, well, one of the things that we're seeing is actually a readjustment in how many code reviewers you need comparatively to people who are producing the code. Do you need more of these sort of different roles in that roles? And it's been interesting because some companies really open to this idea. We fundamentally need to rethink roles and responsibilities and the distributions of those roles and then other companies. Their management is saying, use AI, but they're much stricter. And so it's like this kind of challenging balance. And so yeah, a lot of different thoughts happen to go in any direction you find most interesting though.
[00:08:21] Joe Colantonio I guess the first one, why is there more code is because they're using AI to generate the code now, and that's why there's so much code?
[00:08:28] Ben Fellows Yeah, exactly. Basically, and what I can get a couple demos of, for example, is to me, the best use of AI today is to think about it as a productivity tool. There's so many different narratives about AI, there's so many different narrative about kind of what it can do. And the thing that I think it does very, very, very well today is do something like, okay, as an automation engineer, I have to write a class or a page object model from a website, and that might take me three, four hours to do. Now, and I can show you this, it would take about 30 to 40 seconds. Six months ago, it would have gotten most of those locators wrong, but today, with some of the tooling and the models that they're where they are, I'm seeing roughly 80 to 90% of the locators are accurate, and then I'm just debugging the ones that are not accurate. So now, that 3-4 hours of work is compressed down into, let's call it 15, 20 minutes. If I'm done with that ticket and I ship it, I'm now onto that next ticket. And so across our projects, we're seeing anywhere from a 3X to 5X increase in high quality code per engineer per project. And so I know the old adage of more code isn't necessarily a good thing, but what I'm trying to explain to people is, no, this is the same code that would have been produced by that engineer. It's just happening that much faster.
[00:09:44] Joe Colantonio I love that. That's a good point. Along those lines, Ben, do you see people more curious or a skeptical when you bring AI into QA, especially on LinkedIn? I would think how some people are really like the anti AI, a full-blown AI and kind of get into like AI, I don't know, chaos sometimes.
[00:10:08] Ben Fellows What I would say I've experienced over LinkedIn is kind of the 5 stages of grief, there's a five stages of change or whatever. And the last couple of years posting about AI, I feel like I've sort of seen this unfold over directly in front of me. And I forget the order of the stages, but it's like denial, anger, then kind of acceptance, but doesn't apply to me. There's these five stages. If I posted about AI a year and a half, two years ago, it was a lot of just, you're a charlatan, you are kind of crazy, you're dumb, blah, blah. If I post about AI today, it feels like the industry has pretty much come to grips with this idea of no, it's a useful tool and it's here. I think that the challenge with AI is it fundamentally changes our jobs from people who write code to people who review code. For me, I love that because I'm not a huge, like I don't get a ton of joy in the writing of the code. And I really enjoy thinking about the architecture and reviewing the code. There's a lot of humans out there though, who really love the art of writing code. And I think that that's where there is legitimate sort of emotion around this idea of like, it's taking something from people that like 20 years of their career into perfecting and have a ton of pride in.
[00:11:29] Joe Colantonio 100% agree. So that's why I think seeing is believing. I think some of these demos that people have been raving about on LinkedIn is going to help the audience as well. You talked about page objects, and I know some people just love creating page objects. And I'm just curious to know then how AI can help accelerate that or maybe make it easier, better for them.
[00:11:47] Ben Fellows Yeah, for sure. So one of the things I cover when it comes to page objects. So here's like a kind of a demo app that I play with. It's just something that I like, I built myself. That's how I kind of try to manage a bunch of my relationships and stuff like that. Your traditional way you're going to approach this situation is you're gonna open it, you're to go through the DOM, right? You're going either have a framework for this or you're go through yourself and you're basically going to write all your locators. Depending on the circumstance, more and more I'm pushing for QA teams to have direct access to their repos because you don't even really need to open the DOM anymore. Let's start by assuming that the QA team has access to their repo. In this repo, we have a test folder with a Playwright suite in it, and then you have your code base. Let's assume that I'm a QA person being told I want to write my page object model for the dashboard. We're going to go over to the prompt. We're going to say add a palm file for the dashboard. And we're going to give it some context of what the actual dashboard component is that we want to do it for and we can go and we'll pull it in here. Oops wrong file. Here. And what it's going to do is it's gonna add a page object file from a dashboard. And what this is, is this is taking what is something that's probably two to three to four hours worth of work, and it's gonna do it in probably two minutes, right? And we're not even opening the browser, which is amazing. We're actually just literally having, we're pointing Cursor at the component. It's going go read the component, and it is going to write a page object model file that is actually fairly good. Now, we could add a lot more context. We could do stuff like Cursor rules, but you can see this is a 5-word prompt. Like a lot of people think, Oh, we need perfect prompts. And actually like, to me, what I tell people is as long as you have an engineer who understands what the output should be, and they're evaluating the output against it, the prompt doesn't really matter nearly as much. We could make this probably better, but just for the sake of this demo. And what I always start off with is basically giving this, this doesn't say Playwright. This doesn't say anything about best practices. This doesn’t say anything about how you want to do it. And for the most part, it's going to do a very, very good job. We could offer it more context. We could put in the Playwright, locators URL. We could do some of these other things. And these are all things I try to cover in the workshop about how these things go. But what the output is going to be pretty good. And so you can see it wrote 529 lines. We're going to open this and you can it has a dashboard page class which is exactly what I would expect. It's got all your locator read-only is defined and here you have your constructor. This is exactly how I would write test automation. Now, some of these things I don't love off the bat looking at this. I don't really love the use of dot first here, so that would be something that I would immediately call out. I'd go through and we can go and start to review this and edit this. But the fact is, if I tried to sit down and just write out this code, I mean, you're talking hours of work. And it's done and it started to add methods. It's out of the method for get total contacts count, get needs attention count, all of these different things that I would need to use if I was writing test automation, right? If you think about interacting with this page. And we can take it a step further though. And while it works, and I'll turn it back to you to ask a question, we can actually take it to step further and we can not just say, okay, not only have we gotten this productivity boost from this, we're much faster. Let's go ahead and let's tell it, add test IDs to the component. And update the palm. And so this is the thing that's really happened in the last three, four months is the AI is actually good enough now to update a front-end component without breaking it. If I had given this prompt six months ago to this, it would have broken the entire front-end component. Now, I can give this prompt a 99 out of a hundred times. It's just gonna update the front-end component with test IDs, update the Palm to use it and the front-end component will not be updated in the least. Now, not only have we been much more productive, we're actually going to have a significantly less flaky set of test automation as well, because we have test IDs that are independent of locators changing and anything else. So I'll pause there because it was a lot of information, kind of what questions, thoughts, concerns come to mind so far.
[00:16:00] Joe Colantonio The first one with the locators, do developers usually have an issue with that? Is this a culture change or I haven't been doing development 5, 6 years, maybe this is the norm, but is this the norm or your testers work directly in the repo as developers and they're able to make change the actual production code?
[00:16:18] Ben Fellows Completely company specific. Some companies very comfortable with this change, other companies very not comfortable with his change. And one of the things that I talk about is kind of being QA, we always focus on the pragmatic reality compared to the theory. We could all understand the theory, but there were oftentimes put in situations where the pragmatic realities a bit more challenging. Let's assume that you didn't have access to the code. What you could do is you could just copy and paste the whole DOM directly into the agent and the same concepts apply, whether or not you do it. And so what I try to do is with these workshops, I try to say in a perfect world, look at how amazingly fast and stable this can be. But yeah, most companies, maybe their automation suites no longer in their code base, maybe it's not a mono repo. And so there's a lot of things around this that can throw flags, and so maybe it's just part of this that I'm showing could apply. Maybe it's, just, Hey, I'm going to copy the whole DOM and make locators from it, but I'm not going to get the benefit of test studies, right?
[00:17:19] Joe Colantonio Absolutely. And I think what's important is, as you mentioned, the person needs to know what they're doing. A lot of people, I think, vibe testing. I don't know if you would call this vibe testing, but you're not just taking it at faith what it's doing. You're actually looking at it what does and you have to, as a tester, understand the code still. It's not like saying no one code is obsolete and you still need to understand not only the code, but the domain as a tester. Is that correct?
[00:17:43] Ben Fellows Yeah. And I always kind of call it code augmentation because I feel like vibe coding came out of there was I forget who was termed it and it was originally not sort of a derogatory term, but that all of these like developers over the last 6 months, 12 months, they would always be like, well, I do real coding and all these other vibe coders. It's not real. And so it's like, to me, it's what I'm trying to say is, okay, vibe coding is like, whatever this is augmented coding where it's, like, we're using these tools to do the same quality of code that we'd be otherwise doing. And then we're just doing it much faster. And so what we can see is, and once it starts to finish it's test ids and we can go look, it's the same class, the same style, and everything is just added a test id. And then once it finishes it, we can go to the Palm and do it. I don't know that I directly answered your question. I think I got a little bit too hung up on the vibe coding piece, but that is just the one thing that. It's like the thing that I'm trying to tell people is the actual output of the code is the exact same. It's just a question of how quickly you get there.
[00:18:40] Joe Colantonio Love it. Okay. Cool. How does MCP change? Is this using MCP? A lot of people are talking about Playwright MCP is it knows Playwright in context like what is MCP to are now using MCP behind the scenes there.
[00:18:56] Ben Fellows Yeah, for sure. So this is nothing to do with an MCP server. This just simple using Cursor to interact with a code base and do prompting essentially. And you can see, by the way, as we finished, right, so you have all your test IDs now. So now we have a page object file that didn't exist 10 minutes ago and also didn't have any test ids in the front end. In less than 20 minutes, we have page object filed as test IDs for every major thing, has methods for every measure element and has test IDs. When it comes to the use of an MCP server, this is where it starts to become interesting. And in my perspective, you start to be right on the edge of like, is it production ready or not? What you're doing here with Cursor is just like augmented coding. You have a human who's still gonna go interact with the internet. What an MCP protocol is doing, as far as I understand, when it comes in general, is MCP protocols basically are a set of protocols that allow AI to understand the tools that it can do to interact with other things. Specifically with Playwright, it allows AI to drive a browser. But you could also use MCP to interact with JIRA or all these other things. There's a lot of companies, particularly private companies that are trying to use MCPs in the world of QA to say, why would you yourself go navigate to the dashboard to write test cases when you can just open an MCP with Playwright? And you could say, go to the dashboard on here and write test cases by what you see. Underneath the surface, all it's doing is it's basically just opening the browser, it's navigating to it, and it's in theory writing the test cases. And there's a lot of ways you could do it. You could use like Claude's computer use, you could use a couple other ones, but Playwright has a whole library designed around interacting with the browser. What they've done is they've given AI the ability to drive the browser and this is where you start to get into that concept of like how quote unquote, magical, can it be, right? Cause in theory with an MCP, you could say, hey, open my app and write an entire test suite from what you find. And in theory, I've turned off these emojis so many times, but they stay on. In theory, an MCPs system can open the browser, can navigate through everything, and it can basically write your entire test suite. Does it work that well? My experience, no. Is it really interesting, no, for one-off use cases? And like, do I see a world as it gets better and better where it can take on bigger and bigger tasks? Absolutely. And so it's one of those areas where I think that one of the challenges with AI is trying to figure out what should I be keeping an eye on for the future and what's useful in the future? And where is it really Excel right now? And so my perspective is this code augmentation, it crushes right now. It could be taking traditional test automation suites and be making you exponentially more productive at it. This MCP side of things, I'm keeping an eye out. I love Debbie O'Brien's demos that she's doing. I'm like, okay, at some point in the next 6 months to 12 months to 18 months, we might be flipping away from test automation with the DOM to test automation through MCPs, but I'm still under the camp of I'm waiting to see it work at scale. It's a little too expensive and slow for me. And so I'm trying to figure that out.
[00:22:13] Joe Colantonio I have been doing silly things with it. I have a local application I created to get stats on me from Libsyn, because you need to manually log in and have an assistant that does it. And now I just have a button, they put in the name and it uses the Playwright MCP to go and just grab the stats and throws it into the little app I have. It's pretty cool.
[00:22:31] Ben Fellows Well, and it goes back to what's interesting is Playwright, Playwright has two libraries, you have Playwright test and you have Playwright core and like what's fascinating about, and once again, I only know so much about Playwright. The team itself knows more, but Playwright as a tool is like suddenly you have all these agents interacting with the browsers and Playwright across all these agent seems to be really the default way that a lot of them are choosing to interact with the browser and so what's interesting is, there's almost now two communities of Playwright users. There's the traditional test automation users who are like, hey, we're here and then we're QA. And then there's all the Playwright uses that are coming in because they're doing like business cases or like what you're saying, hey, here's a non-automation based use case. And I sort of see maybe from the outside and I know nothing about the conversations. I wonder how much tension there will be on the Playwright team at Microsoft to start to be like, wait a second, which one has more business use cases and trying to figure that out. And I'll be curious to watch them navigate that.
[00:23:28] Joe Colantonio Do you worry about Microsoft owning Playwright? Random question.
[00:23:34] Ben Fellows Yes, in some ways. I think, there's always risk with so many of the tools that we use that they're actually owned by private companies. I think that there's pros and cons to it. On one hand, I know the team well and I'm stunned by how well the job they do and how much produced and it goes back to, I know the Selenium folks, I don't really know the Cypress folks as much. There's a reason why Playwright became so popular so fast. On the other hand, I think there is inherent risk to a private company, but no, the short answer is I don't worry too much about it. I just always try to also have backup plans in the back of my mind, but yeah, no, I don't worry too about it to be completely honest with you.
[00:24:24] Joe Colantonio And so just to clarify, this isn't Playwright only, right? Could this be approach also agnostic where it doesn't really necessarily matter. They could use Selenium or whatever Puppeteer if they had to.
[00:24:38] Ben Fellows Yes, this is language. This is language agnostic. This is framework agnostics, et cetera. So like Cursor, which is also a private tool. Same goes back to that idea of like, I'm basing a lot of this stuff on like Cursor which is a private tool. The most important thing is understanding the models that we're using. I tend to use CloudForce on it, which was a premium model, but like GPT-5 is obviously out. The biggest takeaway that I would say also is like. If you put this prompt in with an older, cheaper model, it will not perform nearly as well. If someone says, Joe, I watched this interview and this guy lied because I did the same thing and it didn't perform, first question I always ask is which model are you using? Because if you're using a cheap model, you're not gonna get nearly the same level of performance. No, any language, any framework, this is it. Code in general, it has nothing to do with Playwright. And one of the real benefits, particularly when you have well-documented stuff is If it ever does for some reason struggle with what, what you're trying to do, one of the great things about Cursor, and this is not limited to Cursors, if I pull up the Cypress docs and I put it right there, like it'll go to read the page and it'll figure out what to do. Or you can go and you can index the docs and it just go from there. So yeah, completely agnostic.
[00:25:55] Joe Colantonio Love it. Good point about the model. This drives me nuts. You see people once again on LinkedIn that will pick holes at using AI, but they're using it like a model that it wasn't designed to do what you're doing. I don't know if that makes sense, but they almost make it like a straw man when you're like, well, that's not even the model I'm using. It doesn't even it's not helping your point, basically.
[00:26:15] Ben Fellows Well, and it's such a learned skill and I didn't really appreciate this until like a month or two ago. And then as I've started to more and more appreciate it when it comes to these webinars, a lot of engineers that I'm working with, they want to like memorize prompts. They want to memorize this stuff. And the challenge with this is like doing augmented coding is really a skill that you learn over time and you start to understand the patterns of how AI behaves and you start to sort of understand what you can and cannot do. For example, it's going to research a lot of the files, but it struggles still with huge file sizes. Like it just does. And there's a lot of these things that you just pick up over time where you're like, okay, so this is how I have to start to think about like architecture and the decisions. And what I'm trying to do with these workshops is focus on kind of inspiring people to go in a direction, then having people come out of them with like a memorized set of prompts and stuff like that, because just using this tool, I mean, it, I would say it took me a couple hundred hours before I fully was like, Okay, I got this. I'm going, I'm moving forward.
[00:27:20] Joe Colantonio Absolutely. I guess another point is, I think we may have, maybe I'm wrong, maybe you don't agree with this. A lot of testers have said, oh, we're developers now, so we need to write production ready code. A lot of people complain that the code isn't beautiful or writes a lot of code, but this isn't for production. As long as it works, I would think it's fine. Do you have any thoughts on that?
[00:27:42] Ben Fellows I do, I think that it writes better code than people expect it can, particularly if you know what kind of standards you want and you just start to give it feedback. The other thing that I will say that I find really interesting is that, there's less and less barriers between QA and developers when it comes to like code and code practices. For example, if I said something like, find me the file that handles goals on the front-end. As a QA, when I'm getting into a project, I now have a tool where I can just spend like an hour or two just understanding the code base that I'm interacting with. And this does a really fantastic job. And so when it starts to think about like, okay, what are the decisions the dev made here? What are the code quality things that the dev did? I can even ask it questions like, hey, based on how you see this implementation, what things do you think I should actually test? I mean, obviously I'm going to still use my brain and critically think. And the danger here is it's good enough to remain people start to overrule on it. But within 30 seconds, it found the component file where goals was. It then gives me a nice little summary. It gives me the key features. It then also tells me some other files where the goal functionality is used and which is all accurate looking at this. And then I can say something like, what is the unit integration test coverage of it? And it can probably identify that there is none. Now, funny, I've hit a spending limit. Side note, this stuff's expensive. Actually, and this raises a really interesting point. I am spending hundreds of dollars per month per engineer on this. And so like our tools budget is going way up. That being said, one of the things about this, if you go back to those productivity pieces, I don't have to hire as many humans anymore because I am seeing that three to five acts. And there is this really interesting larger question of like, okay, shaping of expenses and stuff like that. Like, as you saw, I mean, my spending budget per engineer is upwards of like $250. I just hit another limit. So it's like, I'm also running though, the most expensive models besides Opus. Opus is the most expense and this is kind of second most expensive. But the point that I was trying to make is like, I think that QA you should spend time understanding the code because you now have this thing that can actually inform you of the decisions that were made and it should no longer be the black box that it once was.
[00:30:10] Joe Colantonio Love it. I guess I'll also give suggestions, they just check this in what areas do you think, like you said, don't have code coverage or should I look at more for whatever your goal is? But obviously, it's going to give suggestions. But you as the human needs to say, well, that's a good one. That's not a good. That's an okay one type deal.
[00:30:27] Ben Fellows Yeah, 100%. I mean, and it goes back to, I think that like everything, unfortunately, it might be engineers that struggle. It might introduce more bad code, but I think for engineers who really are your great engineers already, they'll just have that much more impact on your code base and your projects.
[00:30:48] Joe Colantonio Now, do you recommend teams use like a template in Cursor so all the code is similar like you are like a defines a persona or a level you're checking for security issues so that anyone on your team that prompts is going to have the same style, the same code, the same objectives. I don't know if that makes sense, but I think this Cursors templates, right?
[00:31:09] Ben Fellows Yeah, Cursor templates, there's Cursor rules, which you can do where you can basically say anytime this happens, always do this. For example, more and more projects are publishing Cursor rules. So this project uses a convex backend. And so they actually convex publish Cursor rules for you of like, here's all your instructions for how you should do best practices with convex. And that way, whenever you're working, you can always just default back to it. To your point, you could extend that. I always find Cursor rules to be helpful, but for whatever reason, it doesn't always follow the Cursor rules, maybe as directly as I would expect. I do think it goes back to like the hesitancy here, the danger here is it is such an endorphin rush now, because you can produce feature, feature, feature, and you don't want to stop and add automated tests. You don't wanna stop and review the code. And so like, you can almost become a bit of like a dopamine addict in a weird way here, where it's just like, it's such a high, building this stuff and so it's like still just trying to be like, okay, even though you have these rules, you still have to stop, you still have that automated test. You still, to your point, have to make sure that you're following your own standards.
[00:32:17] Joe Colantonio Ben, I think it's a no-brainer, but do you get people like not even knowing where to start, like, I know Cursor, you know Cursor. Is this something people typically know already? Or do you have to say, all right, to get started, you use these tools, get started with this prompting. Like, how do you recommend people get started? I guess.
[00:32:34] Ben Fellows A surprising number of people have still never even used Cursor, but I think the biggest thing that I'm finding is obviously enterprise is going to be a little bit slow to adapt for the concern of security. And I think AI in general, people are always a bit concerned about when it comes to security. I think that what I find interesting is like, if you are concerned about like using Copilot, for example, which is owned by GitHub, Copilot has very strong enterprise security policies. If you don't believe that they're gonna follow those own terms of uses, then you shouldn't be putting your code in copilot. So I'm a little bit frustrated at times because it feels like there's selective fear with some of these tools where it's like, okay, so if GitHub's willing to lie to you about how they're using your AI models, then you should never put any code in GitHub because then they're lying about everything, right? Similar to Microsoft, similar to all these tools. Cursor is a startup, so I think that there's always a little bit more on the enterprise, not that it's a billion dollar startup. And I know they are pushing into enterprise. They have a lot of enterprise security, but there's a lot of hesitancy around the models and getting these through IT sort of security clearances and stuff like that. That's the thing I most hear is like, Oh, I've played with it on a personal level, but I can't use it professionally because I'm limited. I think that 6 months ago, I would have told you there's still some resistance at this point, I actually think I almost get no resistance anymore. I think the only resistance I hear sometimes is people saying like Oh, well, it's useful for other people, but not me. But like, you start to show them this and their eyes go, Oh, wait a second. I get it now. The primary blocker at this point is like IT security and people getting comfortable with the idea of AI and kind of, Oh is it secretly training against my code base and stuff like that.
[00:34:22] Joe Colantonio Absolutely. So is that a lightbulb moment you've seen on like Jim and Butch? I think they all mentioned the page object. Is that like the wow factor that gets people or is there something else that you show as well that gets them juiced up?
[00:34:35] Ben Fellows I cover page objects. I cover data factories. So like here you have a schema that's a couple of thousand lines long. And writing data factories is a huge headache. And like this thing can write data factories super, super fast. I could pull in the schema and I could say, write me a data factory for contacts and it would create a factory with a bunch of methods for a seeding a contact with API endpoints for one of the things that I try to show. I have spent so much time with poorly written swagger docs that don't give me the shape of the object that I need. And so don't gave me what requirements are already need to be in place to use an API endpoint, not only can AI write much better swagger documents, I can just point AI at the end point back. Tell me the shape or the object of this test needs. Tell me what dependencies are. And so from an information gathering perspective, that's where I feel like with Butch, I was like, Hey, and that's, where he kind of got it. The thing that I find most interesting is people think when they get into my workshops, that they're going to see some like magic MCP thing, which is like fine. And I don't really get into that too much. What I think that they are surprised on is, wait a second, I'm writing code and this can just write the exact same code that I would be writing. A thousand times faster. And I think, that people have a hard time pre my workshops, maybe dissecting the different use cases and maybe this is just kind of clearing up, oh, the practical use cases, if that is clear at all.
[00:36:05] Joe Colantonio Yep, absolutely. So actually we're only like 4 months away from 2026. I'm asking people this now because I do like trends every year. Is there anything you see coming up in 2026 people need to be more aware of?
[00:36:19] Ben Fellows Yeah, so I think that, what I'm most interested in is at some point, whether or not we stop rating DOM based automation. So like, for example, you can start to see this a little bit with Playwright already. Like Playwright has its MCP, which is like now interacting with the web without you having to write code to interact with the DOM. And then you have basically you still a bunch of traditional Playwright automation. What I'm going to be curious about is do automation suite in the next six months to a year to two years, do they start to just become natural language in many ways. Now they obviously have Cucumber and Gherkin and all these other frameworks, but what I'm talking about is basically that there's no longer a locator layer and instead you're just relying on AI to do what the human brain does, which is like, oh, I need to interact with this form. Downsides of that is like inherently it would be a lot slower. Like there's a lot of things about that that would be interesting, but that's probably the number one, one that I'm interested in. The other that I'm more interested in is, there's a lot of people who are arguing somehow that there's gonna be this slowdown period of how smart it gets. And I think there's a lots of software developers who are trying to use that as a reason to not adapt it. They're like, oh no, there's this evidence that it's not getting as smart each release or something like that. To me, as long as the context windows get bigger and bigger and better, it's gonna continue to be more useful. This prompt could just look across the whole code base back when GPT dropped two and a half, three years ago, you could do 500 characters. Gemini can now have a million characters. What I'm going to be very curious about is like, how does its ability to absorb code before it makes a decision impact? Cause the more code it can absorb, the better decision it's going to make. So those are the two trends I'd say I'm most interested in.
[00:38:06] Joe Colantonio Absolutely. So when you say not dom, do you mean it's going to be image based?
[00:38:10] Ben Fellows Yes, loosely. So my theory on that front is, so like, I don't know if you played with computer use, for example, at all, which is like through code. Basically, it's just taking a series of snapshots very, very quickly, each action, evaluating it based on the human, then going. Now, the downside of that is it's very expensive and very slow. The upside of it is you can start writing tests in like seconds because there is no more DOM interaction. There's no more class and stuff like that. And so you're just like, Hey, log into this app, do this thing. What I'm curious about is what the relationship between these two things are going to be because it's like, if you can start to just run tests and it's stable, fast, easy, and you're just giving it human-based instructions and it is just interacting with the browser like a human would, there's also a lot of really interesting sort of acceptance criteria things there of like, hey, look at this page and like, tell me what might be wrong with this page each step because for example, I just gave it a screenshot of a formatting error. I didn't give it any clue. I just said like, Hey, what do you see that could be a bug on this page. And it immediately identified the formatting error so it's like, there are things about the way that that process is information that makes like assertions, for example, so much easier because it's going to run hundreds of thousands of assertions every time it looks at an image, like a human brain would. Yeah, that's kind of the thought process, but right now, slow, expensive. So who knows where it's gonna go.
[00:39:36] Joe Colantonio And that's more like a real user. If testers say, well, that's the way testers always been saying they didn't like automation because it's not a real use. Well, if it's using image-based things, then it's just like a real user because you're not interacting with the DOM that a real user wouldn't interact with, right?
[00:39:53] Ben Fellows Yeah, 100%. And it goes back to like humans. I think we vastly underestimate how much capacity we, as our brains can process when we're looking at something like it's stunning when you actually think about like, Oh, I'm looking at this webpage and I'm analyzing so much. I remember when image based first dropped for AI, I was sitting in an airport and I took an image, I told the AI, this is my golden screenshot of what that should look like. And then I took another image of the website. I had changed one tag from critical to medium. And then I said, I just finished test automation. What's the difference? Do you see any failures? And within seconds, it told me this tag on your golden was critical. This tag in your output was medium. So therefore something failed. And what's wild about that is that's exactly how the human brain would approach it. If you were, or I were writing test automation for that, we would have to write a thousand assertions for every possible little detail on that page. In that moment, like sitting in that airport was like, okay, whether it's tomorrow or a year or three years from now, how we do assertions within automation is going to fundamentally change along with how we actually write test automation.
[00:41:05] Joe Colantonio Absolutely. And that's why I loved AppliTools when they first came out. It was groundbreaking in 2015. And now I think that's becoming the norm for where the industry is going, for sure.
[00:41:15] Ben Fellows Yeah, exactly. And now you don't even really have to use like golden screenshots anymore because you can just ask AI like as a human who's expecting this and it's starting to get good enough opinionated where it's like, oh yeah, like, this is what I would expect or like, hey, do you see anything funky about this page? And it's going to tell you funky being a highly technical term.
[00:41:34] Joe Colantonio Does this work with legacy code or is it only work good with greenfield applications?
[00:41:41] Ben Fellows No, it works with legacy code for sure. The bigger the code base, the more you should put some context into it. And so for example, let's assume it had a huge code base. The first thing that I would do is I would actually do this backwards. Rather than just starting off with something, I would ask it just to do research in the codebase and find things because then once I'm telling it to do stuff, I'm gonna make sure that I'm pulling in the four or five files that I think I wanted to interact with as kind of the starting point. And so I'm giving it that context. The bigger the code base, you just have to sort of think about the order of operations a little bit more so on the like, make sure that you can give it the context for where you want to be in that code base and you're doing a little more research, that's really the only big difference.
[00:42:26] Joe Colantonio Okay, Ben, we can talk for hours, but before we go, is there one piece of actionable advice you can give to someone to help them with their automation testing efforts and what's the best way to find you or contact you or learn more about your business or maybe getting some of these one-on-one demos themselves?
[00:42:40] Ben Fellows Yeah, for sure. First thing on how to find me, check me out on LinkedIn is where I'm most active, Ben Fellows on LinkedIn. I also do some stuff on YouTube, which you can check out a little bit more about my business, we basically do a lot of content, we do a lot of webinars, but ultimately, if you want people that embed on your team, that you understand all this stuff, that's really what we are shining and so we staff engineers on teams, oftentimes for six months, but up time, sometimes up to three, four years. And basically what we focus on is providing a productivity boost alongside upskilling teams. And so what we're trying to do is kind of beachhead a lot of these initiatives within companies and then really bringing the existing QA teams on board and using it. When it comes to kind of where to start and my most important things, I think that for better or worse, the best experience with AI is paid. And so I do think that if you're going to do this, you should set aside a couple hundred bucks as part of your experimentation. I know that's very hard for a lot people. And the best thing you could do is go to your bosses and say, Hey, I want to download Cursor. I want up and do a premium model. I want do a POC. And then I would just start with tedious tasks. What are the things that are most tedious that are pattern-based that you can try to replace? And this, so that's where I would start. And that's why I always start with a page object models, database schemas, these things that a remarkably tedious, fairly pattern- based I pay for a premium model. I have Cursor and it's made my life a lot easier. So that's, where it starts.
[00:44:07] Awesome. Definitely give Ben a follow. You can find links to all this awesomeness down below.
[00:44:11] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a558. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.
[00:44:47] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.
[00:45:30] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.