AI in Testing: Innovation or Just Noise? with Fitz Nowlan

30 April 2025 at 7:07 PM

By Test Guild

AI in Testing: Innovation or Just Noise? with Fitz Nowlan

About this DevOps Toolchain Episode:

Today, special guest Fitz Nowlan, VP of AI and Architecture at SmartBear, dives deep into the ever-evolving world of AI in software testing.

Get resource: What is OpenTelemetry Guide: https://testguild.me/TelemetryGuide

This episode is packed with insights from Fitz, who shares his front-row perspective on what’s genuinely innovative versus what’s just hype in today’s AI-powered DevOps landscape.

Fitz also tackles hot topics like MCP servers, agentic AI, and the game-changing impact of AI-driven self-healing in test automation. He explores how leading tools like Playwright and Selenium integrate MCP servers, discusses the challenge of identifying meaningful AI features, and highlights practical strategies for evaluating new tech without getting lost in industry buzzwords.

Fitz also offers a behind-the-scenes look at SmartBear’s approach to customer-driven innovation, the evolution of visual and intent-based testing, and how agentic AI might soon revolutionize the role of QA teams.

Whether you’re curious about the pitfalls of flaky tests, the future of test data generation, or simply how to keep up with today’s breakneck pace of change, this episode promises actionable insights you won’t want to miss.

0:00 / 0:00

Join the Guild for (FREE)!

Email New Tab

TestGuild DevOps Toolchain Exclusive Sponsor

This episode is sponsored by SmartBear — makers of Insight Hub.

If you’re curious about OpenTelemetry but find it a little overwhelming, Smartbear has you covered. They just released a free eBook: ‘What is OpenTelemetry? A Straightforward Guide.’

It breaks down exactly what OpenTelemetry is, why it matters for application monitoring, and how you can start using it without the confusion.

Grab your free copy now: https://testguild.me/TelemetryGuide

About Fitz Nowlan

Fitz Nowlan

Fitz is the VP, AI and Architecture at SmartBear and leads AI integrations across various software testing products and services. Prior to SmartBear, Fitz was a software engineer and founder, and earned his PhD in CS from Yale University.

Connect with Fitz Nowlan

Company: www.smartbear
LinkedIn: www.fitz-nowlan

Rate and Review TestGuild DevOps Toolchain Podcast

Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Transcript

Download New Tab

[00:00:00] Get ready to discover some of the most actionable DevOps techniques and tooling, including performance and reliability for some of the world's smartest engineers. Hey, I'm Joe Colantonio, host of the DevOps Toolchain Podcast and my goal is to help you create DevOps toolchain awesomeness.

[00:00:19] Joe Colantonio Hey, is all the buzz about AI and DevOps testing innovation or just noise? Well, depending on what you think, you're in for a treat because we have a special guest joining us to talk all about how AI is solving today's testing challenges, myths and fact. Joining us today, we have Fitz. Fitz is the VP of AI and architecture at SmartBear and leads all AI innovation across various software testing products and services. Prior to SmartBear, he was a software engineer and founder and earned his PhD in CS from Yale University. So he really knows his stuff, really is in the weeds of AI, what it can do, and what it can't do. You don't want to miss this episode. Check it out.

[00:00:58] Hey, before we get into it, this episode is sponsored by SmartBear, makers of Insight Hub. If you're curious about OpenTelemetry, but find it a little bit overwhelming, SmartBear has you covered. They just released a free e-book. What is OpenTelemetry? A straightforward guide that I think some must read. It breaks down exactly what is OpenTelemetry and why it matters for application monitoring and how you can start using it right now without any confusion. Grab your free copy now at testguild.me/telemetryguide or click the link in the comment down below.

[00:01:31] Joe Colantonio Hey Fitz, welcome back to The Guild.

[00:01:35] Fitz Nowlan Hey Joe, thanks so much for having me. It's good to be back. It's been a couple of times now.

[00:01:38] Joe Colantonio Yeah, always excited to have you. You always seem to have your pulse on what's going on. It really seems like, you work for a company that's evolved a lot in AI as well. You really have a front row seat almost to AI, the evolution of what's gone on, what's working for your customers, what not. I want to look at maybe some recent AI launches and software testing and beyond and see what stands out as truly impactful versus what just feels like marketing fluff. The first thing that hits me right now, and I'll get your opinion on what is hot and what's not is MCP servers has been all over the place. What's the deal with that? Is that something you see as fluff or hype or is it the real deal?

[00:02:16] Fitz Nowlan I think it's the real deal, but I think the impact of them is still very early days. The idea behind that MCP server is that you want to allow your customers at runtime to extend the functionality of your AI-driven application. And so they can plug in their own data sources, like their code base or internal files, internal databases, any kind of a data source that they would have that they want to make available to your application as a vendor. That they can then use it within your application. I think the concept is straightforward, it makes sense. And I think it will gain a ton of traction but I think, it's still kind of early days where people need to identify what is that killer application where I need my special data to be plugged into it. I think historically all these cloud apps have been just the vendors data residing in the application. And now finally we're realizing, oh, with some of these AI applications we want the customer's data to injected at runtime.

[00:03:14] Joe Colantonio Is it almost like a wrapper that creates an API for your particular application, and you then feed a third party LLM on your data to those contacts, is that how it works?

[00:03:24] Fitz Nowlan Kind of. Imagine you have a vendor that you purchase and that vendor has introduced AI or LLM-powered features into their application. But that LLM, one of the things that it needs is the customer's data. And so MCP allows a customer, or really the vendor, to add the customer data as a data source to the LLM. And so basically making data available at runtime to the LLM that wasn't available to the vendor when they were building their application. Imagine your Salesforce and you're working with some really large customer of yours and that large customer has a database full of historical interactions with a bunch of their opportunities and their leads. Salesforce may not have that entire database or that whole data warehouse. It may be massive. However, the customer can plug into Salesforce through an MCP server to make the customer's data warehouse available to Salesforce's AI at runtime.

[00:04:20] Joe Colantonio Gotcha. And like I said, it gives more context than because the AI is only trained on what it's trained on public data. But this would be more private data on your specific situation. So the results in theory would be hopefully better, I assume.

[00:04:34] Fitz Nowlan That's right. Yeah. And it could be two aspects or two ways you could work it. You could make it so that the MCP server allows the vendor to train on your customer data and therefore get an even finer tuned LLM. But you could also use it more in a rag style approach, retrieval, augmented generation, where there's no fine tuning or training occurring, but they're incorporating the data into their prompts at runtime. It's still going to be more personalized, but it wouldn't necessarily be a full on trained model. There are kind of two approaches there.

[00:05:02] Joe Colantonio Gotcha! I know Playwright and Selenium both released MCP servers. What's the benefit of that then, being able to interact with browser that way, or an API for a browser?

[00:05:12] Fitz Nowlan Yeah, I haven't used either of their MCP servers, like their extensions through MCP, but I could imagine that the benefits there are exposing either a few things, either exposing a new data source that's internal to the customer who's building the Playwright or the Selenium tests, or exposing things like source code or other parts of the application that aren't accessible at the browser level. You need to plug in an extra data source to make that available.

[00:05:38] Joe Colantonio Gotcha! At SmartBear, I guess, like I said, everyone's releasing these MCPs. Are you doing anything like with any of your solutions that actually going to take advantage of this, where you're going to do your SmartBear MCP server?

[00:05:48] Fitz Nowlan We don't have any plans to do like a SmartBear branded MCP server. However, we do have two applications where we're already kind of talking about how do we want to expose MCP capabilities to our customers. We're trying to, in other words, we're paying attention to the latest and greatest, but we're not just rushing out and just getting something out just to say, Hey, we do MCP too. We're really trying to focus on the customer use case and understand where is it that they have a data source that's rich and valuable in the context of one of our applications and then lean into that. I actually have an MCP meeting tomorrow to talk about it, but we don't have any concrete plans right now to say, this or that product has an MCP server capability.

[00:06:27] Joe Colantonio How do you evaluate whether an AI feature or tool is meaningful or has meaningful purpose or it's not just noise, you're just doing it just to, like you said, keep up with what everyone else is doing. And do you have like a framework you use to go through with your team? Okay, does this actually make sense? Are we just doing it because it's cool and it would be nice to have.

[00:06:46] Fitz Nowlan Yeah, it's a great question. I mean, the most basic baseline is always build something that customers want, build something that people will pay for. And so that's where a lot of our, no, I shouldn't say a lot. I mean, that's a big part of where our AI features are rooted. When we talk to customers, we want to do something like this. Is this possible? And so we have our Halo AI Labs is sort of this beta program that we have available to all of our customers across our products, where they get first access to our new AI features. And that we have AI powered features that are in that and sometimes when we give them to customers there, if we find great usage, then we're going to look to promote those to production for everyone. And other times, if we build a feature where customers aren't so enthused by it, or it's just not so practical or useful, then, we can just quickly unpublish it basically. The Halo AI Labs is a great way for us to get that early feedback from customers. The other part of it, how do we know what's noise and what's not? On the AI team that I run at SmartBear, we focus on trying to be cutting edge and trying to be in touch with both the SmartBear products and the SmartBear customers, but also kind of where the AI is going. So some of it is intuition, to put it plainly. We try to lean in where we feel like there's value here where there's a real differentiator. It's kind of a mix of those two things, customer feedback and intuition.

[00:08:00] Joe Colantonio I've spoken with a few companies and they're kind of now pulling back on AI powered type terminology. I've been seeing a lot of AI powered test case generation tools. In your experience, do you think these tools fundamentally are really changing workflows or they still more AI assistance? Do you see people starting to pull away from AI and just using the word like intelligence? They don't necessarily care how you get the end result, as long as the end results is correct.

[00:08:25] Fitz Nowlan Yeah, that's a good point. I don't have a strong opinion. I think this is kind of related to your question and it's just kind of coming to my head. The thought that I have here is I wonder, with the rise of agentic AI, I think that organizations will start to see that as a true step change, a true increase in value being brought to their customers. And so it's possible that the LLM wrappers that most of the AI features in the world today are. We'll kind of fall back into sort of they'll be simpler or more baseline intelligence as opposed to full-on AI powered XYZ. In other words, I wonder if AI powered is kind of being reserved for agentic AI features in the future. Whereas, you know, those LLM wrappers which rapidly lose their value over time as the models get better and better. I wonder if they're kind of falling down into more like intelligent test case authoring as opposed to full on AI powered authoring. But that just kind of an off the cuff thought there.

[00:09:21] Joe Colantonio Makes sense. Where do you actually see innovation then happening in AI for testing? Like real, I know you create a really great solution that was acquired by SmartBear because I think it really took really good advantage of AI and did it in a way that made sense and it wasn't just hyped. Where you see real innovation then happening and AI for testing specifically?

[00:09:40] Fitz Nowlan I do think that it's the agentic AI. We're working at SmartBear on a full on, not just a QA tester, but a full on like QA department abstraction. The idea that with agents now, you can have an army or a fleet of agents that perform all the different functions that you might find in your whole QA department, not a single manual QA tester. I think with the ability of these reasoning models to really kind of churn in a loop towards a single objective or a single goal, and then being able to pick, call it five to ten different goals that a QA department might perform. You can really start to focus in in an expensive way currently with all the tokens, but focus in on solving different individual problems that are all related toward in QA and testing. And so that's a research project that we have around agentic QA. I think, there's real value there now in terms of like what's in the market today. I think we're just kind of scratching the surface of what a lot of different providers are doing. One that comes to mind is Cognitions Dev and AI. That's obviously around software development rather than software testing, but it is an agentic flow and it is kind of a higher level of abstraction from some of these other LLM wrappers we've seen that are doing things like, again, to use the software development analogy, code completion, or code reviews. Those are kind of single insertions of AI into the software development process today, as opposed to a full junior developer abstraction is a level above that in terms of abstraction. That's an AI that's doing a whole role of the junior developer rather than just one little task within the software-development cycle. I think we'll see similar things on the QA side as well, and we hope to be at the forefront of that. One other technology that's new that is worth mentioning is this computer use a model from OpenAI, an Anthropics computer use model. These are really different because these are models that are trained to interact with UIs. And I know this very well because we were interacting with UIs two years ago using GPT-4, the original GPT 4 and 3.5. And we had to do a very complex process for interacting with UIs. Now today with these models that are trained for producing XY pixel coordinates. They can really drive interfaces. They can really drive whole UIs with AI, and that is not something that the previous models could do. And I know we tried to get them to spit out correct locations and coordinates for interfaces before. This is really powerful and this is very revolutionary. And we are investing a lot in our use of these computer use models.

[00:12:13] Joe Colantonio What I'm confused about this is SmartBear has had like test complete forever. How is this different than test complete? Is it just now catching up to what you've all been doing for like 20 years or all these tools? Or is it different? How is it different than like a WinRunner back in the day or a silk or something like that?

[00:12:29] Fitz Nowlan I hate to pre-announce, but test complete will be using these approaches to drive what we call self healing. Test complete for years was using locator-based approaches and some visual approaches to identify elements as part of a test script. But when that interface would change, we didn't have any intelligent or dynamically intelligent at runtime ability to find the same element that was there logically but looked different or was in a different location. And so now with AI and some of these computers use models in this particular feature, we're not using the computer use there just yet, but we are now able to bring LLMs into the fold to make that decision for where some logical element is now residing or what it now looks like. And so the idea is that we can look at an interface like a human would and say, oh, That doesn't have the exact underlying locator that it used to have, it deep in the code, but logically it's the same thing. That's the element we should interact with. And TestComplete will be rolling this out, I don't know, in Q2 maybe, and we'll call it self-healing. It's AI-driven self- healing. And that's difference that in the past it was static, it was code-based, it was a locator. Today it's logical and it's intent-based.

[00:13:44] Joe Colantonio All right. So once again, I don't want to be bogged down on this because I'm just trying to wrap my head around it. Once again, you still have ability to do like an image compare this image versus that image. It's different scale, but this is more sophisticated, I assume, because now you're using actual AI rather than just a generic algorithm to figure out. Yeah, the image is a little different, but it's still, I can still understand it because I have context and I'm able to reason within my model that this is still the same image or something or the same element.

[00:14:11] Fitz Nowlan Yeah, you're spot on and your knowledge of TestComplete is awesome because I appreciate it. You are spot on. Basically, the previous ones would do visual image compare, and they were trained using a computer vision model. And so they would identify objects that look slightly differently, but are basically the same. This is now interpreting it using GenAI or an LLM to say, this object is logically the same as this other object, even if visually they are different, will say. It's the next level kind of intelligence to that self-healing approach.

[00:14:42] Joe Colantonio Nice. All right. So I just want to go back to some of those abstractions to just put a little more legs on them. When you were talking about abstractions, are you talking about like having like a model and agentic AI that was like a security sally or any accessibility where you say, here's my application. I want to test for security. I have a persona, a security sally, say run it, run any type of test you normally would as a security expert.

[00:15:05] Fitz Nowlan Yeah, I think we haven't decided if that is exactly the future for us. But I do think that type of approach will emerge. I think there's a benefit to specializing your agents in AI, just like there is in having your roles at your organization, your human roles specialized. It allows you to, you if nothing else, like a really basic example where this can help is it allows you two regression test the performance of your agents better if they're supposed to do kind of one thing and you have a class of challenges or problems or tasks that are all related in that same vein, then it's easier to validate the performance of your agent if this agent is Security Sally, for example. So yeah, I think that we will see that type of specialization in agents. And then of course, it gives customers the flexibility to turn certain things on or off, to use certain agents at certain times for their specific needs. I think you will see specialization in these QA agents in the same way that we see it in your actual QA workforce.

[00:16:02] Joe Colantonio Any time anyone hears of AI, especially if you start thinking about abstracting the QA team or your testing team, they start thinking, losing my job. Where do you see, meet people, actual flesh when it comes to roles or jobs in the future?

[00:16:17] Fitz Nowlan Yeah, so much of it, I think, is on the oversight of a security sally, for example, it's in finding the requirements for what security means for your particular application. I think over time, right, the progress always changes workforces at organizations to technological innovation, or for years, changes the makeup of your workforce and changes the needs and the requirements and the tasks and the job. That's not going to change going forward, there will always be a dynamic, you have some level of fluidity to the roles at organizations. There definitely will be more disruption in that vein. It's not going to happen overnight. And the disruption won't be, it's not gonna be 180 degrees different. It is going to be slight changes. Instead of performing that manual test that you had to do, I now you need you to oversee the execution of some manual tests that you authored potentially, or that you oversaw, or that that you guided by a fleet of agents. Another example, maybe rather than writing that Playwright script, we need you to write code for a build our product instead. In other words, it's a small shift of perspective in these roles, but I don't think they go away entirely and certainly not overnight.

[00:17:23] Joe Colantonio Gotcha! It's hard to gage, though, because we've been talking for two years and all of a sudden we have MCP, we have the new computer use and they have a new model and ChatGPT with the image and it's gotten like a thousand times better. You'll be posting on LinkedIn. How do you even know as a company, like how do keep up with all this?

[00:17:40] Fitz Nowlan Yeah, it's super challenging. I think one of the best parts about coming on with you is you give me some of these tidbits that you picked up from your sensors and filters. It's really hard. And even internally, we talk about it how, one person or one team even can't keep up on everything. Some things are hype and noise, and some things are substantive and real. I think, like I mean, you can't give up with all of it. You kind of have to pick your spots, apply a filter. And if you feel Intuitively like there's something there, then dig into that. Don't feel pressure to read every single article because it can't be done. And I think back to the point behind your question, we have been talking for two years and the landscape is still rapidly changing. But I would also say I don't know if there have been full-on industries that have died overnight as a result of these things. I think that kind of gives me confidence that things won't change overnight. And a lot of times, too, the other thing is we live in this tech bubble, in this AI bubble. If you go out and you talk to people at Thanksgiving or at a holiday and you're talking about things and like their lives aren't impacted, like there's a big wide world out there that's not necessarily going to change overnight. And so I think, there's some perspective that can be maintained that way.

[00:18:48] Joe Colantonio That's true. My wife thinks I'm nuts. I take pictures of calories on food and say, which one should I eat? Which one's better? I use it for all weird things. She's like, she would never do that. That's so true adoption.

[00:18:57] Fitz Nowlan She's just like, how about just eat the vegetables and on the carbs.

[00:19:04] Joe Colantonio Sure. We talk a lot about success, but I'm curious to know if you could share any failures or maybe not. Have you seen any AI applications in testing that would really be impressive but end up maybe causing more complexity than value?

[00:19:16] Fitz Nowlan It's a good question. In testing, maybe not one that I've personally experienced, but one that I've heard about is, for years, right, an automated web testing, you use Playwright or use Selenium or use Cypress. There's this idea now that use AI to generate Selenium and Cypress and Playwright tests, and then run those tests. But I think that fundamentally misses one of the main challenges of those tests, which is that they are brittle and they break because the application changes and those tests are just like the locators in the test complete case. They're specific to that implementation, to that or the underlying HTML or the underlying structure of the application. And so while you can use AI to update those tests, you're still required to do a little bit of human in the loop to tweak the test and make them work. I feel like that's kind of a leaky abstraction in the sense that you're using AI to generate tests, but these are tests that you are only generating to test the quality of your app. Instead of generating those tests with AI, you should test the app with AI and basically rise to a level above using agentic AI, using a QA agent. That's what we're investing in. We're hoping to kind of skip that process of improving your code-based tests with AI, and instead, let's get rid of the code-based tests entirely.

[00:20:40] Joe Colantonio No, that's a beautiful point. A lot of people say, oh, we're developers, they spent so much time coding these tests thinking they're like the hardcore developers and they're missing the point of actually testing the application as a real user, which would be more visual, which it seems like we're going back to a more visual based testing, would be better, I would think, not worse.

[00:20:59] Fitz Nowlan Yeah, well, I think that's exactly I think it's a great answer. Great point. The point there is right that the user has no idea whether the underlying code of the application is a button or a label, for example. Why would your test need to rely on that fact? Because your application ultimately is just trying to allow the user to click on something. And if that's communicated visually, then you want your test to assert that intent, not to assert the way it was coded originally.

[00:21:27] Joe Colantonio 100%. Now, one of the biggest issues I've seen with tests being flaky is test data. And for some reason, every time I talk to a vendor, it's like we have something but not really. Do you see that as an issue that AI in the future will be able to tackle better? Because it seems like for some reason after all these years, people still have problems with test data generation. You have solutions out there, but they're kind of expensive. They're hard to integrate into functional tests in DevOps.

[00:21:53] Fitz Nowlan Yeah, I like this area of investment a lot. And we, SmartBear within TestComplete actually has the intelligent IQ add-on, I think is the name of it, where they use an LLM to generate test data. So you basically give it your instructions, your plain text instructions. I want first names and last names. I want zip codes. I want a sentence or a description, a message related to this or that requirement, and then the LLM will generate that data for you. And we've been live in production with that for about a year. And we have some decent usage from customers, repeat usage using that particular feature and purchasing that add-on. More broadly, we're investing in that for API tests creation. That's a case where you need test data for your API tests, but the data itself is fairly well defined. These APIs have shapes, they have types on a lot of these fields that you're going to use in your API test. It's a good area for LLM data generation where you want something realistic and authentic. But you don't need the bar for correct is easily measured. You don't to need to generate the values in the entire expanse of all possible values. You just need a number between one and ten and that will work. And so translating those we'll say API definitions from open API specification docs into actual data can now be done by LLMs and previously it would have been very hard to get a general purpose binding from your OpenAPI spec into data using just RegExes, for example. That's a long way to way of saying, yeah, I think LLMs will definitely help with data generation. We've done a little bit of that at SmartBear and we're doing more of that now in development now.

[00:23:32] Joe Colantonio Nice, we talked a little about TestComplete, it's a very mature product. I'm curious to know if people are trying to implement AI into their more legacy type applications. Is it hard to integrate AI? Is that something that like, oh, it only for newer Greenfield applications? Are you able to build in a way to be more AI ready for whatever innovations might come down the road in a year or two?

[00:23:52] Fitz Nowlan I'll say it probably is harder to integrate into legacy applications. Probably two reasons, one is just the UI or the data model, the structure of the application may not lend itself great to sort of modern LLM concepts. The second challenge would be probably the implementation of it. If the implementation doesn't have a clean, what we would call in code, like a narrow waste for where you can hook in at the code level. It's going to be challenging to comprehensively or to like coherently call out to AI in that application. I would say it probably is more challenging, but I would also say that just because it's challenging doesn't mean that you shouldn't do it. And of course, in the TestComplete case, it's a large, long running product. It's done very well for us. And I think it still has a lot of room to grow. Just because something's hard doesn't mean you shouldn'd do it, just means that you've got to be really intentional about it. But we are investing in AI in that product because we think there's a lot of benefits that it can bring.

[00:24:49] Joe Colantonio I don't know why I'm thinking this, is there any AI power features at SmartBear that you think are currently underrated or misunderstood by your users?

[00:24:56] Fitz Nowlan Oh, that's a really good question. I think the one that probably comes to mind is in QMetry, which was a recent SmartBear acquisition. They have a host of AI features there that I think are underrated. So they can generate full manual test plans. So not just individual test cases, but a whole group of test cases that are related from a requirement or a user story. Basically you describe your application or maybe you even give it your functional spec that a PM wrote, and they can come up with the full test plan for that. And load it into your test management software, which is Qmetry in this case, and then update those as the application requirements change. Based on a prompt, based on natural language, they can update your test cases. And then also they can deduplicate test cases that are the same that you maybe have authored or that different people have authoured over time that are really sort of functionally testing the same thing. Those are, I think, are some underappreciated AI features that we have in the SmartBear portfolio today.

[00:25:53] Joe Colantonio Nice. So we talked about MCP, which I've been hearing a lot about, agentic AI before MCP was the hot thing. Is there anything else you see on the horizon that's about to blow up?

[00:26:03] Fitz Nowlan The one thing I really I wanted to kind of highlight or talk about coming in was I think a lot of the initial AI features were, and still are to some extent, LLM wrappers, your wrapper around the LLM where you're providing a small jump over a gap in a process or a workflow using AI and that's again the case of like auto-genning your Playwright tests using an LLM, small wrapper around the LLM. Say here's the current code here's application generate me a test and it's going to be great at that. And I think those wrapper lose their value over time. What I'm starting to see now, which I think is this next big trend that's worth paying attention to is this notion of agentic reasoning where you don't guide the LLM into producing an output for that specific problem or that specific gap in your workflow, but you instead just give the AI a ton of tokens and you say in a loop, keep working and keep working towards that objective. And here's what you said previously, to the AI. Here's a little bit of a guide, a little bit of an nudge. Here's what he said previously. Keep going. And basically this idea of if you had a big enough budget, the latest and greatest models will eventually produce something looking like a solution. And maybe it won't be quite as good or quite as coherent as the human would do today. But I think we're getting the kind of the key insight here is I think we're getting to the point where you don't need to impose the structure or the guardrails on the LLM. You can actually just let the LLM talk to itself in the same way that you would give a human a task and tell them to go work on something. I think we're getting to that point now where the 03 reasoning models, for example is pretty close to being able to solve some of these base level tasks with a big enough budget. That has an impact for application builders like us because I think a lot of our AI integrations today are very focused on solving a specific task with guardrails and wrappers around the exact type of output that the LLM can produce. And I think if we get to the point where we can more generally ask the AI to solve a problem, that unlocks a whole bunch of new application use cases.

[00:28:12] Joe Colantonio What's holding it back? Is it the power, the like token power, the money? So once that drops, it's going to be more and more common.

[00:28:19] Fitz Nowlan I think it's the token cost. It just needs tons and tons of tokens. I think, that's one piece of it. Also, I don't think the models are quite quote unquote, smart enough yet. I think they, in other words, I don't you could just have GPT 4.0 in a loop and like let it run forever with infinite tokens, I still won't get to the answer for anything other than a basic problem. But I think as the models get a little bit better, we just saw Llama 4 came out in the last couple of weeks. And so I think the models will keep getting better. I think token costs will come down. And I think those two things together will make it so that you could solve baseline tasks. Not everything, but we'll see the point where you don't need to direct the LLM to choose one of these four options. Basically, you can just give it an open-ended question and it will produce what looks like an answer that a human might come up with.

[00:29:06] Joe Colantonio I don't know what's in my head. It's like a thousand monkeys banging on a keyboard would eventually write Shakespeare with enough time.

[00:29:13] Fitz Nowlan That's right. I think it is that but I think if you can compress time down that you'll get that output, you'll get that Shakespearean output in a reasonable amount of time. And to kind of flip it around, when someone asks a human a question that they have 1000 neurons, millions and billions of neurons firing away banging on that keyboard to produce that output. It's just that it happened so fast that we don't know all the options they've discarded.

[00:29:40] Joe Colantonio True. Switching gears here. I don't know why. Do you seem like think we have it backwards a lot where we start from requirements using AI? Wouldn't it make more sense if you had already had an application to use AI in production to listen to what your users are doing using AI that automatically then would be able to visually see what they're doing, create those tests for you, which then would become your requirements that you would then stop when you shift left. Does that make any sense?

[00:30:05] Fitz Nowlan Absolutely. I'm smiling because we just had a conversation like, I don't know, it was a couple of months ago internally about how it's a continuum. The shift left was on a line left to right, but it's actually a circle and the circle includes software development, software release, monitoring observation, and then it cycles back around to testing and then feeding into your software development. If you imagine that you start with an idea for an application you want, those are your requirements. That has to come first. There has to be some need, customer need that needs to be met or a personal need that you need met to build an application. You have requirements and you define what that application should be, should look like. Maybe it's in plain text. Maybe you just draw a mock. It doesn't matter how you represent it, but you have an idea for what the application should be, what should exist. From there, you can then now use LLMs to generate that code. You can generate the code and then build the artifact. You can use the artifact using AI. You can then build tests from that original requirement document. You could publish it live and you could see how users use the application. And from that usage, you could then inform improvements, et cetera. It's all a continuum. And I think the point is that the AI acts as the glue across those points. There used to be the requirements definition point. There was the code implementation point. There was the test implementation point, and then there was the live monitoring. Those were all sort of concrete points. They're no longer on a line, they're on a circle, and the glue between all of those points in that circle on the edge of that circle is LLM.

[00:31:44] Joe Colantonio Alright. Crazy thought just popped in my head when you were talking there. So you're starting a Greenfield application. How far away are we from recording, say, a radiologist doing their job and they're talking through what they're doing. You don't have an application yet. And then using that video and that text to train an LLM that then create the application for you. So it's like really shift left, shift right, like before you even write code, you are like you're kind of like not spying on them, but you're actually watch them do the video and have the AI do the reasoning to actually create what they really need rather than you assuming you know what they need by creating all these requirements by someone and then going back and forth.

[00:32:20] Fitz Nowlan I hope we're far away because I'm toast when that happens. But I mean, concretely speaking, we have that for toy applications. You see now with all the vibe coding stuff, the developers basically, they say, I'm not going to write a single line of code. I'm just going to describe what I want. I'm going to keep describing, keep using. And they can build things like the one I saw was a recipe application that works. But you also see then the vibe coding disasters where there's no security and someone made, started making a million dollars a month or whatever it was. And then they get hacked because there was no security built in. I think that for toy applications, we're probably pretty close for real legitimate B2B applications. I think we're still a ways away from what you're describing of like taking a video of a radiologist or a really complex process and observing what they're doing and then building the agent that does that. I think, we are probably years away from that or more. And so I think, there's a spectrum there, a gradient of the applications that can be built, but certainly for like the toy applications with bolt.new or There's a bunch of them, those app builders, just based on your prompts. I think you can get a working application to run on your local machine in very short time.

[00:33:27] Joe Colantonio Okay Fitz, before we go, is there one piece of actionable advice you can give to someone to help them with the DevOps automation testing efforts? And once again, what's the best way to find or contact you?

[00:33:35] Fitz Nowlan Yeah. The piece of advice I think is to stay current. And I think I kind of said something with this last time, be aware of what's an automatable process and where you are really making a value added decision and try to automate the stuff you can and try to preserve your brain power for the things that really matter. And ways to reach us would just be [email protected] and you can find me online and on LinkedIn as well.

[00:33:57] Joe Colantonio And we'll have links to everything we talked about in the comments down below.

[00:34:00] For links of everything of value we covered in this DevOps Toolchain Show. Head on over to Testguild.com/p185. So that's it for this episode of the DevOps Toolchain Show. I'm Joe, my mission is to help you succeed in creating end--to-end full stack DevOps toolchain awesomeness. As always, test everything and keep the good. Cheers!

[00:34:23] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:35:06] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

Scroll back to top