Is Performance Testing Dead,LLM Testing, AI Reliability, TestMu and More TGNS166

18 August 2025, 10:12 PM

By Test Guild

Join the Guild for FREE

Learn more about our Strategic Partner

About This Episode:

Looking for fresh insights and global testing connections? TestMu 2025 is just around the corner

Are you testing LLMs the right way — or setting yourself up for hidden failures?”

And, is performance testing really dead in the age of AI? The answer might surprise you.

Find out in this episode of the Test Guild New Shows for the week of Aug 17th.

So, grab your favorite cup of coffee or tea, and let's do this.

Links to News Mentioned in this Episode

Time	News Title	News URL
0:19	TestMu	https://testguild.me/wt63da
0:55	LLM Testing	https://testguild.me/8sno86
2:11	LLM Coding Study	https://testguild.me/tpmqc2
4:03	The H.U.M.N. Method	https://testguild.me/zr7xpq
5:27	Webinar of the week	https://testguild.me/dzz4hg
6:23	Testing Tools Survey	https://testguild.me/6q8uvi
6:55	Chaos Engineering	https://testguild.me/i96cue
8:00	Open AI Telemetry	https://testguild.me/6y0itd
8:52	ZAPTEST AI	https://testguild.me/ZAPTESTNEWS

News

Transcript

Download New Tab

[00:00:00] Joe Colantonio What global online conference is just around the corner? Are you testing LLMs the right way or setting yourself up for hidden failures? And is performance testing really dead in the age of AI? The answer might surprise you. Find out on this episode of the Test Guild News Show for the week of August 17th. Grab your favorite cup of coffee or tea and let's do this.

[00:00:20] Joe Colantonio All right, if you're looking for fresh insights and community networking, I got the event just for you. I'll be kicking off TestU tomorrow, August 19th. But make sure to register if you miss day one because the event is going on to the 21st. And I believe even if you can't make it, everyone that registers will get a recording. They also have an excellent lineup, including speakers like Angie Jones, Andy Knight, Michael Bolton, Debbie O'Brien, Simon Stewart and a bunch more. They also will have over 60 sessions covering topics like AI, accessibility, CI/CD, and how to future proof your career and more. Don't miss it. Register now using that link down below and hope to see you there.

[00:00:56] Joe Colantonio Are you wondering how to test LLMs? If so, here's a great article on LLM testing by Monika, and it goes over from correctness to confidence, understanding LLM testing, and it goes over challenges and how to validate large language models. And unlike traditional software testing where you know the outcomes, with LLMs, the same input doesn't always mean the same output. This makes old Pass/Fail QA approaches insufficient in this case. Instead, Monika recommends testers should focus on confidence, not just correctness. She outlines five key dimensions like roundness, safety, robustness, consistency, and costs. Also, classic testing methods also need to adopt as she goes over functional testing means check in ranges of valid outputs, regression testing means comparing behavior before and after updates and safety testing involves prompts designed to reveal bias or toxicity. In her conclusion, LLM testing is about building trust in evolving unpredictable systems, and testers must measure acceptable ranges, not perfect answers, while exploring tools like Promptfoo, DeepEval, and Langchain to monitor and govern real-world use. Definitely another must-read article that you can find more about using the link down below.

[00:02:12] Joe Colantonio Talking of LLMs, there's an interesting real- world implications of how AI models behave in testing that you need to know about as well. A new study from Sonar has revealed significant security and code quality challenges across leading coding models with important implications for software testers working with AI-generated code. And so the research analyzed over 4,000 Java program assignments across 5 major models. And while all models demonstrate the strong capabilities in general generating strategically correct code and handling common algorithms, the security findings present serious concerns for testing teams. And according to Sonar's CEO, Tariq, understanding each model's unique strength and failure patterns is crucial for safe implementations. The study found that all evaluated models produced critical security vulnerabilities, including hard-coded credentials and path traversal injections. And for Claude Sonnet 4, nearly 60% of vulnerabilities reached locker-level security, while LLama 3.2 version hit over 70%, with GPT-4 reaching 62.5%. And perhaps most concerning for testers. The research reveals a troubling pattern. Better functional performance often correlates with higher security risk. The study also identified what researchers call distinct coding personalities, and also all models showed a consistent bias towards messy code, with over 90% of identified issues being code smells, which indicate poor structure and future technical debt. And with Gartner predicting that 90% enterprise software engineers will use AI-generated code assistance by 2028, Sonar emphasize the need for a trusted verify approach using tools like SonarCube and other platforms to analyze AI generated code for security flaws and definitely something testers need to be involved with as well.

[00:04:03] Joe Colantonio Speaking of having testers in the loop and AI and LLMs and this whole thing, it came across another approach that might help you with this as well. And this was published by Peter Souza, who's introduced both the PDF and a video talking about a QA framework he's created called HUMN, created to solve what he calls the AI production paradox. And he goes on, while AI can generate code and plan quickly, humans still slow things down with approvals, which could create a bottleneck. He created HUMN, which stands for human-centric, user-focused, multidisciplinary, and nuanced. This method shifts QA from being a gate at the end to more of a continuous partner from the very start of a project. And he breaks down that the human-centric means trusting instincts through exploratory gut checks before code is written. User focus anchors testing on real user experience, not just specs. Also talks about how to integrate QA across design, product, engineering, legal and compliance. And also, focuses on system level workflows and readiness reports beyond ticket checks. This framework also includes seven practical templates for bugs, automation, pull requests, epics and more, and ready to plug into tools like Jira or Monday.com. And Peter also contrast the old QA as a gatekeeper model with the new QA as a partner model, arguing teams to adopt continuous risk assessments from day one.

[00:05:27] Joe Colantonio Next up is the webinar of the week, and this is all about how to prevent AI driven failures before they cost you performance testing in the age of AI. I'll be joined by two experts, Stephen Feloney and Don Jackson, focusing on how AI enhances rather than replaces performance testing. And the presentation will examine four specific areas where performance testing was incorrectly predicted to become obsolete. First one is microservices architecture. The second one is cloud testing environments. The third is test automation frameworks and the fourth, is AI implementation. And also can emphasize that while AI creates operational efficiencies, performance testers remain necessary to guide these implementations effectively. And the session is going to cover historical context, current state and the future direction for performance testing in AI enhanced environments. Make sure to register now using that link down below. Even if you can't make the live session, I'll be sending out a link to the replay in a few days after the event.

[00:06:23] Joe Colantonio Do you want to help shape the future of software testing tools? Well, I just found a leading software testing platform that they're conducting a market research to understand how development teams discover and evaluate testing tools. The survey aims to help testing platforms better understand the decision making process behind testing tool adoption within engineering teams. And participants who actually complete this brief survey will be entered into a selection process for gift cards from major retailers, including Amazon and Starbucks. I love Amazon cards. I always do surveys like this because you never know.

[00:06:55] Joe Colantonio I just came across a new announcement about Chaos Engineering Pioneer Gremlin Launched an AI driven reliability intelligence tool for SRE DevOps has launched reliability intelligence, which is an AI-driven solution to help teams detect and fix reliability issues and complex systems with coding assistance speeding up production deployments by 70%. As we've seen in previous news items today, Gremlin argues that outages and failures are becoming a bigger risk. The platform combines automated fault injection, continuous resilience analysts and a MCP server for LLM integration. And some key features include experimental analysis to compare test results against baselines to spot anomalies and explain failures. Recommended remediation, turning millions of test patterns into specific, accessible fixes, and LLM integration, letting testers query data and build custom dashboards with natural language. And Kolton, the CEO of Gremlin, says this tool provides actual recommendations based on deep understanding of your system and dependencies.

[00:08:00] Joe Colantonio I also found this next announcement on my LinkedIn feed was posted by Scott Moore to an article on how TraceLoop has released Open LLMetry. An open source observability framework built on top of open telemetry. And this was designed specifically for monitoring large language model applications. Open LLMetry extends open telemetry, meaning it could plug directly into existing observability tools like Datadog or Honeycomb or New Relic or Splunk and it supports instrumentation for nearly every major LLM provider. Also the setup is lightweight with just two lines of code to start capturing telemetry. It also ships with a Traceloop SDK while still outputting standard open telemetry data, making it easy to integrate without losing compatibility. And this is big. Instead of black box LLM behavior, you can now get actual telemetry connected to your existing monitoring pipelines.

[00:08:50] Joe Colantonio New stuffs. I want to thank this week's sponsor ZapTest AI, an AI driven platform that can help you supercharge your automation efforts. It's really cool because their intelligent copilot generates optimized code snippets while their plan studio can help you effortlessly streamline your test case management. And what's even better is you can experience the power of AI in action with their risk free 6 month proof of concept, featuring a dedicated ZAP expert at no upfront costs. Unlock unparallel efficiency and ROI in a testing process. Don't wait, schedule your demo now and see how it can help you improve your test automation efforts using the link down below.

[00:09:30] Joe Colantonio All right. For links of everything that we covered in this news episode, head on over to the links down below. That's it for this episode of the Test Guild News Show, I'm Joe. My mission is to help you succeed in creating end to end full stack pipeline automation awesomeness. As always, test everything and keep the good. Cheers.

Scroll back to top

AI CY Prompt, Playwright Reliability, AWS Down and More TGNS172

Posted on 10/27/2025

About This Episode: Is Cypress about to change how you write automation forever? ...

A Halloween-themed promotional graphic for TestGuild Automation Testing's "Optimus Prime Halloween Special" with Paul Grossman, featuring festive decorations and two men, highlights the fun side of test automation during Halloween.

Test Automation Optimus Prime Halloween Special

Posted on 10/19/2025

About This Episode: In this Halloween special, Joe Colantonio and Paul Grossman discuss ...

Testing Skyscrapers, AI Drift, Playwright Agents That Promise to Do It All TGNS171

Posted on 10/14/2025

About This Episode: Is the Testing Pyramid holding your team back? AI agents ...