AI Lying, QA Dying, Risk Based Testing and more TGNS164

4 August 2025, 04:53 PM

By Test Guild

About This Episode:

Are you over-testing the wrong features and missing critical bugs? This one tool might fix that.

Did an AI agent really lie to cover up deleting production data? Yep—and we’ve got the full story.

Is your test suite helping your business make millions—or slowing it down? Here’s how to tell.

Find out in this episode of the Test Guild New Shows for the week of Aug 3rd. So, grab your favorite cup of coffee or tea, and let's do this.

Support the show learn more about AI and our sponsor: https://testguild.me/ZAPTESTNEWS

Exclusive Sponsor

Discover ZAPTEST.AI, the AI-powered platform revolutionizing testing and automation. With Plan Studio, streamline test case management by directly importing from your common ALM component into Plan Studio and leveraging AI to optimize cases into reusable, automation-ready modules. Generate actionable insights instantly with built-in snapshots and reports. Powered by Copilot, ZAPTEST.AI automates script generation, manages object repositories, and eliminates repetitive tasks, enabling teams to focus on strategic goals. Experience risk-free innovation with a 6-month No-Risk Proof of Concept, ensuring measurable ROI before commitment. Simplify, optimize, and automate your testing process with ZAPTEST.AI.

Start your test automation journey today—schedule your demo now! https://testguild.me/ZAPTESTNEWS

Links to News Mentioned in this Episode

time	news	link
0:23	ZAPTEST AI	https://testguild.me/ZAPTESTNEWS
1:02	RISK Calculator	https://testguild.me/riskcalc
2:35	Playwright Performance	https://testguild.me/wbfq3p
3:38	AI Lies	https://testguild.me/rn5i5i
4:55	QA Dying?	https://testguild.me/f1xvvf
6:26	Stack Overflow Survey	https://testguild.me/9afut6
7:27	Webinar of the Week	https://testguild.me/xifvgn
8:18	Test Autonomy	https://testguild.me/6ybmms
9:17	Follow money Promptfoo	https://testguild.me/eei8zd

News

Transcript

Download New Tab

[00:00:00] Are you over testing the wrong features and missing critical bugs? This one tool might help you fix that. Did an AI agent really lie to cover up deleted production data? Yep. And we got the full story. Is your test suite helping your business make millions or slowing it down? Here's how to tell. Find out on this episode of the Test Guild News Show for the week of August 3rd. Grab your favorite cup of coffee or tea and let's do this.

[00:00:23] Hey, before we get into the news, I want to thank this week's sponsor Zaptest AI, an AI driven platform that can help you supercharge your automation efforts. It's really cool because their intelligent co-pilot generates optimized code snippets while their planned studio can help you effortlessly streamline your test case management. And what's even better is you can experience the power of AI in action with their risk-free six-month proof of concept featuring a dedicated ZAP expert at no upfront cost. Unlock unparallel efficiency and ROI in your testing process. Don't wait. Schedule your demo now and see how it can help you improve your test automation efforts using the link down below.

[00:01:02] Joe Colantonio Let's start with something that can help sharpen your test planning right now to help you focus on what matters using risk based testing. I just wrote a new blog post that outlines a practical approach to implementing risk based testing. So what I did is I took information shared by Bob Cruz at Automation Guild and also a session that Jeanne Harris did at Automation Guild as well. So I brought them all together to create this blog post. So, as you know, in modern software delivery environments, especially those using agile DevOps or continuous integration, testing everything is really feasible. Instead, risk based testing allows testers to allocate limited resources where they are mostly needed by aligning test efforts with the areas of highest risk. This article introduces a framework called GREETS to guide testers through 6-key factors to consider when assessing risk. And the blog post also breaks down a formula that Bob Cruz uses to help break down risk as well. And along those lines, I actually created a new tool at the Test Guild for you to use called the Test Guild Risk Scoring Calculator. So in this free tool, you can enter module specific details with various risk dimensions such as complexity, change frequency, and potential business impact and generate a structured score to guide test planning. And once you have it all filled out, you'll have a cool risk analysis matrix that you can use to help generate a structured score to guide test planning. It's a brand new tool. I'm looking for input as well. And if you're looking to help with the risk, definitely check out the calculator down below and let me know your thoughts.

[00:02:35] Joe Colantonio All right, based on my conversations, I know a lot of you are using Playwright for your end-to-end testing, but are you leaving performance off the table? If so, here's how to catch slowdowns before your users do. All right. In this article by João, he explains how testers can convert Playwright end- to-end tests into performance audits by integrating Lighthouse audits. Any emphasize that even a hundred milliseconds improvement in page load time can raise diversion rates up to 10%, making performance a business critical concern rather than a purely technical one. And then he breaks down how integrating Lighthouse directly into a Playwright task provides clear actual benefits, early detection of performance regression, enforcement of performance budgets during CI/CD, combining quality reporting, including UX and accessibility. And objective failures based on thresholds rather than subjective speed judgments. And he also breaks down how he recommends using the integration in several contexts like smoke testing and production to validate baseline performance, regional testing for global deployment apps, and pull request validations to prevent regressions before merging.

[00:03:38] We talk a lot about AI and testing, but when AI testing goes wrong, it can go very wrong. This next story is a wake up call for anyone putting agents near production code. Not sure how many of you have seen this yet. I've been away for about two weeks on vacation. I actually found it on LinkedIn via Arthur, who is a security expert. But Replit's CEO apologizes after its AI agent wiped a company's code base in a test run and lied about it. And this happened while an investor was evaluating the AI over a 12 day period for the ability to handle coding task. And during this test, the agent was instructed not to alter live code, but it proceeded to wipe the company's production database, erasing data and following the deletion, the AI fabricated results that falsely reported successful test outcomes. It even generated fake user profiles and reports, presenting them as valid replacements for the deleted data. And in the article, it also said the AI hid and lied, obscuring the impact of its actions. And then it goes into how Replit stated and implemented new safeguards, including stricter separation between development production environments and force encoding freeze mechanisms and enhance backup protocols. The company emphasized that while AI can assist in coding, human oversight remains critical, especially in production systems. With all the news around AI, are you feeling the pressure about AI replacing QA or developers? Well, this perspective flips the narrative and shows where testers still can lead.

[00:05:05] So in this article, Thomas Howard argues that contrary to fears, quality assurance is not disappearing, instead, it's transforming into AI augmented, continuous discipline, deeply integrated across the software development life cycle. Thomas stresses that problems in test automation often stem not from test or skills, but poor data sets and also incomplete model training. Skewed or insufficient data can lead AI generated tests to miss edge cases or critical defects. We emphasize the importance of ongoing data governance, model retraining and oversight by cross-functional teams, maintain dataset integrity. He also highlights that AI systems excel at repetitive tasks and pattern recognition, but they can't replace human judgment and exploratory testing, complex UI flows, accessibility evaluations, or performance tuning. He recommends blending automation with deliberate human oversight, particularly in areas requiring empathy and domain knowledge.

[00:06:01] Looking ahead, he also describes QA in 2025 and beyond as a hyper-connected AI augmented practice. He also notes that some may be looking at traditional QA milestones like unit integration system gates and making it more into an iterative loop of design validation and learning. And in this evaluation model, AI handles the heavy lifting of routine verification, freeing humans to help guide the strategy, ethics and innovation.

[00:06:26] Also want to know where AI is actually making developers better and where it's creating headaches for testers. The numbers don't lie. Stack Overflow just released their fifth annual developer survey based on input from over 49,000 developers across 177 countries. First, while 84% of respondents indicate they use or attend to use AI-powered tools in their daily workflows, their confidence in AI-generated code is dropping. Nearly half of developers, that is 45%, cite almost right, but not quite, outputs from AI tools as their top frustration. Two-thirds said they spend more time debugging these near correct suggestions. And while trust falters, human collaboration rises, where 75% of developers say they still consult a colleague if they doubt AI's answers, especially in complex or high risk contexts. 64% of developers do not view AI as a job threat, a modest decline from 68% the previous year. So if any of those numbers grab you could take a deeper dive using the special link down below.

[00:07:27] Next up is our webinar of the week. It's called why 2026 testing needs one platform, not many tackling multimodal, agentic, multi-tool chaos before it's too late. And here's what's in a few, are you spending hours triaging red builds that turn out to be flaky tests, struggling to keep up with microservices, mobile APIs, and now AI features all shipped at once, or maybe a testing strategy still assumes a one size fits all works across different tech stacks? In this session, we invited to meet an expert and product engineer at Qyrus to break down a new testing approach built for an AI era velocity. So think multimodal and agentic orchestration. It's all about getting from firefighting to flow and doing it with a strategy that works across your entire tech stack. So definitely register using that link down below and hope to see you there.

[00:08:18] Speaking about autonomous testing, here's another article that goes over auto tariffs outlines a progression in software testing from manual descriptor automated towards what he describes as autonomous testing. He frames the shift as a response to the rising complexity and fragility of modern testing suites where traditional automation still demand significant human intervention for test design, maintenance and triaging. He defines autonomous testing as a system that can independently generate, execute, adopt and analyze tests with minimal oversight, and he identifies three pillars underpinning this model, dynamic test generation based on observing user flows, self-healing test maintenance using application behavior and intelligent results that analyzes that minimizes human triage. And it's not just theory, he also introduced a new open source project, the agentic QA framework, which he published on GitHub. And this framework aims to operationalize autonomous testing principles using AI agents that can observe, decide and act on quality signals across the software lifecycle.

[00:09:18] And last up is a follow the money segment. So Promptfoo has just raised 18.4 million dollars in series A funding. And this round is aimed at helping expand the company's efforts to build what is called the definitive AI security stack. The company is focused on the security challenges introduced by new AI architectures. And I think this just highlights how testers need to be more involved in security testing as well.

[00:09:41] Alright, for links of everything we value we covered in this news episode, head on over to all the links in the comments down below. So that's it for this episode of the Test Guild News Show. I'm Joe, my mission is to help you succeed in creating end to end full stack pipeline automation awesomeness. As always, test everything and keep the good. Cheers.

Scroll back to top

Promotional image for a TestGuild Automation Testing event on Gatling Studio performance testing, featuring Stephane Landelle, Shaun Brown, and a smiling host at a microphone. No expertise required to join!.

Gatling Studio: Start Performance Testing in Minutes (No Expertise Required) with Stephane Landelle & Shaun Brown

Posted on 12/07/2025

About This Episode: Performance testing has traditionally been one of the hardest parts ...

Playwright + AI, Faster Migrations, Smarter Tests and More TGNS176

Posted on 12/01/2025

About This Episode: What tool is trying to give testers more control over ...

AI-Driven Manual Regression: Test Only What Truly Matters With Wilhelm Haaker and Daniel Garay