Look, I've been doing this testing thing for over 25 years now. I first wrote about the AI “three waves” back in 2017, and honestly?
I thought I was just documenting a trend.
Turns out, I was watching a revolution.
(Quick note: I originally wrote this in 2017 and updated it in 2023 to keep up with all the innovation in this space since it was first posted. Now in late 2025, we're living in what I predicted – the third wave is real, and it's changing everything.)
Here's what's wild: back in 2017, when I first started talking about AI in testing on my TestGuild podcast, people thought I was overselling it. “Joe, is this just hype?” they'd ask. Now in 2025, 81% of development teams use AI in their testing workflows.
The question isn't “should we use AI?” anymore. It's “which AI tool won't waste our time?”
And that's what this guide is really about.
Tool | Best For | Key AI Feature | My Connection |
---|---|---|---|
BlinqIO | Cucumber + GenAI | AI meets prompt engineering | Had founders on podcast A485 |
testers.ai | Autonomous everything | AI agents write & run tests | Ex-Google Chrome testing team |
Mabl | Agentic workflows | Autonomous test agents | Multiple podcast appearances |
Katalon | All-in-one platform | Self-healing + AI generation | Gartner Magic Quadrant pick |
Applitools | Visual testing | Visual AI pioneer | Adam Carmi on A450 |
ACCELQ | Codeless automation | Generative AI creation | Guild sponsor + webinars |
BrowserStack | Test observability | AI root cause analysis | Long-time Guild partner |
Testim | Reducing flaky tests | ML-powered locators | Oren Rubin interview |
LambdaTest KaneAI | LLM-powered | Natural language tests | Cloud testing platform |
TestResults.io | No selectors | Selector-free testing | Tobias on podcast |
Tricentis | Enterprise | Fully codeless AI | Guild conference sponsor |
The Three Waves: How We Got Here
Before I dive into tools, you need to understand this framework.
It'll help you cut through the AI washing and marketing BS that's everywhere now.
First Wave: The Vendor Lock-In Era (1990s-2000s)
I cut my teeth on WinRunner. Man, I loved that tool. Then Mercury killed it for QTP, and my heart broke a little.
That was the first wave – proprietary tools that locked you in:
- WinRunner, Silk Test, QTP – The OGs
- Proprietary everything – Each vendor had their own scripting language (TSL for WinRunner, anyone?)
- Record and playback – Sounded great, produced brittle garbage
- Expensive as hell – Enterprise pricing before “enterprise” pricing was cool
The problem? When the vendor pivoted (or died), you were screwed. Your entire test infrastructure could become obsolete overnight.
Enter the second wave.
Second Wave: Open Source Changes Everything (2004-2020)
Then Selenium happened. And Selenium changed everything.
I've interviewed dozens of people on my podcast about this shift – from Jason Huggins (Selenium's creator) to folks building Cypress and Playwright.
The second wave was all about:
Open source – Free, community-driven, no vendor lock-in
Web-first – Built for the modern web app explosion
Developer-focused – Real programming, not wizard-driven nonsense
Explosion of tools – Cypress, Playwright, Appium, and hundreds more
But here's what nobody talks about: this wave just moved the pain. Instead of paying vendors, you paid engineers. Instead of brittle record-playback, you got brittle selectors. Different problem, same headache.
By 2017, we were seeing early ML attempts – basic self-healing, visual AI.
But the real AI explosion? That didn't hit until ChatGPT launched in late 2022.
Suddenly everyone was racing to add LLMs to their testing tools.
Third Wave: AI That Actually Works (2020-Present)
Here's where we are now. And I'm not gonna lie – after interviewing hundreds of testing experts on TestGuild, I'm cautiously optimistic about this wave.
What makes a tool “third wave”?
- Self-healing – Tests adapt when your app changes
- Natural language – Write tests in plain English
- Autonomous agents – AI that can reason and make decisions
- Visual intelligence – “Sees” your app like a human does
- Predictive smarts – Knows which tests to run and when
The big shift? Third wave tools don't just run your tests faster. They actively reduce the maintenance burden that's been killing teams since the Selenium days.
Look, I'm a skeptic by nature. But after testing these tools myself and talking to teams using them in production, this is real. Not perfect.
Not magical. But real.
The 11 Tools Actually Worth Your Time
Alright, let's get into it.
I've personally tested, used, or extensively interviewed founders/users of every tool here.
No fluff, just what I've learned.
1. BlinqIO: Where Cucumber Meets Generative AI
Podcast Connection: I had founders Guy Arieli and Tal Barmeir on episode A485 to talk about “AI Meets Cucumber: A New Testing Approach Using Prompt Engineering”. They're also a Platinum Sponsor at Automation Guild 2025.
Here's what got me excited: Guy and Tal are serial entrepreneurs with 25 years in testing. Their previous company, Experitest (now Digital.ai), was at the forefront of mobile test automation. Instead of retiring with a pile of money, they built BlinqIO.
The Innovation: BlinqIO calls Cucumber a “test speak” language – a way to communicate precisely with AI. Their AI virtual testers translate test scenarios into automation code, and here's the kicker: they work 24/7. As Tal told me on the podcast, “You can have an army of virtual testers underneath you that work during the night.”
What Makes It Third Wave:
- AI Test Engineer – Automatically generates BDD (Gherkin) scenarios from feature requirements
- AI Recorder – Captures test steps and generates Playwright code + business descriptions
- Self-healing – Detects UI changes and automatically recovers/fixes tests
- No vendor lock-in – Complete access to your project code in a private Git repository
- Multilingual – Supports testing in 50+ languages
From Our Podcast Conversation: Guy emphasized how generative AI creates a “synthetic human brain” that dramatically boosts tester productivity.
Unlike tools that replace testers, BlinqIO augments them – testers direct the AI army.
**Watch my hands-on demo:**
Real Results:
- RedHat Test Automation Engineer reported 10x boost in test creation efficiency
- Vodafone Team Leader praised seamless integration into team processes
Best For: Teams already using or familiar with Cucumber/BDD, organizations wanting AI without vendor lock-in, global companies needing multilingual testing
Pricing: Freemium model available
Check it out: blinq.io
2. testers.ai: The Ex-Google Team Bringing Chrome-Level Testing to Everyone
What I Love: Built by engineers who tested Chrome at Google. These folks know what actual enterprise testing looks like.
When I first saw testers.ai, I thought “oh great, another ‘autonomous AI' promise.” Then I dug deeper. The team behind this – they're the ones who built the testing infrastructure that keeps Chrome running for billions of users. That's a different pedigree than most startups.
The Hook: AI agents that write AND run tests. No scripts to maintain. No brittle selectors. No manual clicking through the same flows for the hundredth time.
Two Types of Checks:
Autonomous Static Checks – AI scans your app for the basics you're probably missing:
- Performance issues
- Privacy & consent problems
- Security vulnerabilities
- Third-party supply chain risks
- API design issues
- Error handling gaps
Autonomous Dynamic Checks – This is where it gets interesting. AI analyzes your app and generates interactive tests covering:
- Happy paths (the obvious stuff)
- Edge cases (the stuff that breaks in production)
- Invalid inputs (the stuff users WILL try)
- Statistically likely bugs (based on patterns across millions of apps)
Plus – and this is clever – it gives you “Copilot fix prompts” you can paste directly into GitHub Copilot or Cursor to fix issues.
Real Talk: The claim is tests that used to take 8-12 hours to write now run in minutes. I haven't validated that personally, but knowing their background, I believe the tech is solid. Bonus is Jason Arbon who has been aon multiple TestGuild podcast and Automation Guild sessions.
Best For: Teams who want Google-level testing without hiring a Google-sized QA team
Pricing: Not published, targeting teams who previously couldn't afford this level of coverage
Check it out: testers.ai
3. Mabl: Agentic AI (Finally Living Up to the Hype)
Podcast Connection: I've been following mabl since they started, and recently had them on TestGuild to talk about their new agentic workflows.
When mabl talks about “agentic workflows,” they mean AI that acts like a skilled human tester. Not just running scripts – actually thinking about what to test.
What's New in 2025:
- Test Creation Agent – Give it requirements in plain English, it builds your test suite
- mabl MCP Server – IDE integration that lets you query tests with natural language
- Auto TFA – Autonomous root cause analysis for every failure
- Visual Assist – Adapts tests when UI changes
My Take: I've seen a lot of tools claim “autonomous” testing. Mabl is one of the few actually delivering on it. Their approach to test creation from user stories is legitimately impressive.
Real Results I've Heard:
- One team told me they'd save $240K over 2 years vs. Selenium
- Another said they went from 2 weeks of work to 2 hours
Best For: Teams ready to embrace truly autonomous testing, unified testing across web/mobile/API
Pricing: Starts around $450/month
Learn more: mabl.com
4. Katalon: The Gartner-Approved Choice
Podcast Connection: We've had Katalon folks on multiple times discussing their AI features.
Katalon got named a Visionary in the 2025 Gartner Magic Quadrant. That's enterprise-speak for “these folks are legit.”
What I like about Katalon is they're not chasing hype. They've built a solid all-in-one platform that works for teams at different skill levels.
Key Features:
- No-code test creation (for beginners)
- Full scripting capabilities (for experts)
- Self-healing scripts (reduces maintenance)
- AI-powered test generation
- Covers web, mobile, API, and desktop
My Take: If you need ONE tool that does everything reasonably well, Katalon's your answer. It's not the flashiest, but it's reliable.
Best For: Teams with mixed technical skills, organizations wanting an all-in-one solution
Pricing: Free tier available (actually usable), premium starts at $208/month
Check it out: katalon.com
5. Applitools: Visual AI That Made Me a Believer
Podcast Connection: I interviewed founder Adam Carmi back in the early days (listen to episode 43), and he's been back on the show multiple times.
I'll be honest – when Adam first told me about visual validation testing in 2015, I thought it was BS. “An algorithm that finds bugs without explicitly defining elements? Come on.”
Then I tried it. And my skeptical mind was blown.
Why Applitools Is Different: No pixel-by-pixel comparisons. No fragile baseline images. Their Visual AI actually understands what matters visually and what doesn't.
What's New in 2025:
- AI-based self-healing execution cloud
- Automated maintenance grouping – ML clusters similar changes across pages/browsers/devices
- Smart diff prioritization – AI knows what's a bug vs. an intentional change
Real Story: One company saved a million dollars a year by replacing thousands of assertion lines with visual checkpoints. A MILLION. DOLLARS.
My Take: If you're doing any UI testing and not using Applitools, you're working too hard.
Best For: Visual regression testing, cross-browser validation, teams obsessed with UI/UX quality
Pricing: Starts at $199/month
Try it: applitools.com
6. ACCELQ: Generative AI Gets Real
What I've Seen: ACCELQ's approach to generative AI is different than most. They're using LLMs to actually understand test intent, not just generate scripts.
Key Features:
- Plain English test creation – No rigid syntax, just describe what you want
- Autonomous healing – Handles complex element type changes automatically
- Logic insights – AI analyzes your test design and suggests optimizations
- Reusable test assets – Reduces duplication across your test suite
My Take: The “logic insights” feature is underrated. It's like having a senior test engineer review your work and suggest improvements.
Best For: Teams wanting to scale test coverage fast, organizations moving from manual to automated
Pricing: Custom enterprise pricing
Learn more: accelq.com
7. BrowserStack Test Observability: AI Debugging That Doesn't Suck
What It Does: Turns test failure chaos into clear root causes using AI.
Look, everyone has test reporting. BrowserStack's Test Observability actually uses AI to tell you WHY tests failed and whether it's a product bug, automation issue, or environment problem.
Key Features:
- AI-powered root cause analysis – No more digging through logs for hours
- AI-based tagging – Automatically categorizes failures
- Smart prioritization – Tells you what to fix first
- Works anywhere – BrowserStack, local, other platforms
My Take: If you have a large test suite and spend hours debugging failures, this pays for itself immediately.
Best For: Teams with 100+ tests, distributed teams needing unified observability
Pricing: Starts at $29/month (add-on to BrowserStack)
Try it: browserstack.com/test-observability
8. TestResults.io: No More Selector Hell
Podcast Connection: I had founder Tobias Müller on the show (episode on Next Gen Functional Visual Testing), and what he showed me was legitimately innovative.
The big idea: What if you never had to deal with XPath, CSS selectors, or element IDs ever again?
How It Works: You describe what users do in plain language. TestResults.io figures out the rest. No selectors. Just user journeys.
Key Benefits:
- 3x faster testing (according to their data)
- Eliminates flakiness through AI stability
- Massive maintenance reduction
- Works across any platform users can interact with
My Take: If selector maintenance is killing your team (and it probably is), check this out.
Best For: Cross-platform testing, teams tired of selector maintenance
Pricing: Custom
Try it: testresults.io
9. Testim: ML for Locator Intelligence
Podcast Connection: I spoke with co-founder Oren Rubin about their mission to make test automation accessible beyond just developers.
Testim uses machine learning specifically to solve the “flaky test” problem that drives everyone crazy.
How It Works: Multiple fallback strategies for finding elements. If one locator breaks, ML automatically tries others. Tests self-correct when UI changes.
Key Features:
- ML-powered locators – Multiple ways to find elements
- Smart execution – AI optimizes test order
- Intelligent grouping – Related failures grouped for efficient debugging
- Auto-healing – Tests fix themselves
My Take: They're laser-focused on one problem (flaky tests) and solving it well. I respect that approach.
Best For: Developer teams, CI/CD environments, teams fighting test flakiness
Pricing: Starts at $450/month
Learn more: testim.io
10. LambdaTest KaneAI: Modern LLMs Meet Testing
What's Different: Built on modern large language models – think GPT-level natural language understanding.
KaneAI lets you create, debug, and evolve tests using natural language. And because it's LambdaTest, you get their entire cloud infrastructure for cross-browser testing.
Key Features:
- Natural language test creation
- LLM-powered debugging
- Autonomous test evolution
- Integrates with LambdaTest's cross-browser platform
My Take: This is where testing is heading – conversational interfaces powered by modern AI.
Best For: Teams wanting cutting-edge LLM tech, cloud-based cross-browser testing
Pricing: LambdaTest starts at $15/month
Check it out: lambdatest.com/kane-ai
11. Tricentis: Enterprise AI at Scale
What It Is: The big enterprise play. Fully AI-driven, fully codeless, built for massive scale.
If you're a large enterprise with SAP, mainframes, and a complex application portfolio, Tricentis is built for your world.
Key Features:
- AI-powered test design and generation
- Automated maintenance at enterprise scale
- Intelligent test execution optimization
- Packaged application testing (SAP, Salesforce, etc.)
My Take: Not for startups. But if you're a Fortune 500 with complex enterprise apps, this is the tool built for you.
Best For: Large enterprises, SAP environments, complex application portfolios
Pricing: Custom enterprise pricing
Learn more: tricentis.com
Overwhelmed? Use My Free Tool Matcher
Look, I get it – 11 tools is a lot to process. That's exactly why I built the TestGuild Tool Matcher.
Answer a few quick questions about your tech stack, budget, and testing goals, and it'll shortlist the best options from over 300 tools (including all the ones in this article).
Takes about 60 seconds. Completely free. No email required, no sales BS. Just a straight answer about what tools actually fit your needs.
How to Actually Choose (By Pain Point)
By Team Size
Small Teams (1-10): Go with testers.ai, BlinqIO, or LambdaTest KaneAI. Low learning curve, affordable, fast value.
Mid-Size Teams (10-50): Mabl, Katalon, or Testim. Good balance of power and usability.
Enterprise (50+): Tricentis, ACCELQ, or Katalon Enterprise. Built for scale.
By Primary Pain Point
“Our tests are flaky as hell” → Testim or BrowserStack
“Maintenance is killing us” → TestResults.io or testers.ai
“We need visual testing” → Applitools (no question)
“Want plain English tests” → testers.ai, BlinqIO, or ACCELQ
“Need autonomous agents” → Mabl (most advanced)
“Love Cucumber/BDD” → BlinqIO (built for it)
“Need everything in one” → Katalon or Tricentis
By Technical Skill
Non-Technical Team:
- testers.ai
- BlinqIO
- ACCELQ
Mixed Skills:
- Mabl
- Katalon
- Testim
Highly Technical: Any of these work. Focus on integration capabilities.
Real Talk: Is AI in Testing Just Hype?
I get asked this on every podcast episode. Here's my honest answer:
In 2017? Yes, mostly hype.
In 2023? Getting real, but oversold.
In 2025? It's mainstream. The question isn't “is it hype?” but “which tools actually deliver?”
After interviewing Jason Huggins (Selenium creator), Ben Fellows (LoopQA), Guy Arieli and Tal Barmeir (BlinqIO), Jim Trentadue, and dozens of other testing leaders on TestGuild, here's what I've learned:
AI won't replace QA engineers. But it WILL change what they do:
- Less time writing/maintaining scripts
- More time on exploratory testing
- More time on test strategy
- More time analyzing quality trends
- More time on complex scenarios AI can't handle
The teams winning aren't the ones avoiding AI. They're the ones figuring out how to work WITH it.
Wait… Is There Already a Fourth Wave?
Podcast Connection: I just had Don Jackson on episode A554, and what he showed me made me question everything I thought I knew about where automation testing is heading.
Don recently joined Perfecto (now part of Perforce), and their new agentic AI approach is fundamentally different from every tool in this article. Here's why:
Third Wave vs Fourth Wave: The Critical Difference
Third Wave Tools (everything above in this guide):
- AI helps CREATE scripts
- AI MAINTAINS scripts
- AI HEALS scripts when they break
- But there's still a SCRIPT being executed
Perfecto's Fourth Wave Approach: No script. Ever.
Instead, you write goal-oriented prompts in natural language. Don's example from our podcast:
“Book a flight from San Francisco to New York in business class, prefer an aisle seat, second preference is window seat. If there are no flights available that have one of those seats, I don't want to sit in the middle. Come back with an error message.”
That's it. That's your entire “test.”
How It Actually Works
At runtime, the AI:
- Takes a screenshot of your application
- Interrogates the image to understand context
- Makes decisions about what to do next to achieve the goal
- Handles UI changes automatically (because there's no brittle script to break)
- Works across web, iOS native, Android native, mobile responsive – all from ONE test
Don's tagline says it all: “No scripts, no frameworks, no maintenance”
Real Example That Blew My Mind
I asked Don about reliability concerns. He told me about testing a weather app:
He wrote: “If the app isn't installed, go install it.”
What the AI did autonomously:
- Recognized it was on an Android device (not iOS)
- Swiped from bottom to check app catalog
- Didn't find it, so did a search
- Still not found, clicked home
- Opened Play Store (not App Store – it knew!)
- Searched for the app
- Clicked install
- Waited and checked progress bar repeatedly until done
- Clicked “Open” when button appeared
No explicit loop scripting. No device-specific logic. No progress bar waits coded. Just one simple prompt.
As Don said on the podcast: “Think about how hard that would be to script today.”
What Makes This “Fourth Wave”?
The difference is agency – real, autonomous decision-making:
Third Wave Example:
AI generates:
click('#login-button')
type('#username', 'test@test.com')
type('#password', 'password123')
click('#submit')
Script is created and executed.
Fourth Wave Example
Prompt: "Log into the application"
AI figures out HOW at runtime based on what it sees.
Things That Were Previously “Untestable”
Don mentioned several beta customers finding use cases nobody expected:
Financial Services Company: Stock trading app with dynamic graphs. The AI can now validate:
- If a price point is higher than the previous point, it should show green (not red)
- The chart visualization matches the numbers in the table below
- All this with DYNAMIC data (no static test data required)
E-commerce Company: Product images with descriptions
- They run marketing campaigns where descriptions change (“Sale on Laptops!” added to everything)
- Couldn't test these campaigns before (static data problem)
- Now they can validate: “Does the text match the picture? If it says ‘HP laptop with 17-inch screen and 10-key', does the image show the HP logo and 10-key keyboard?”
Accessibility Testing: One prompt: “Make sure this page matches WCAG 2.0 standards”
The AI grabs those standards, checks compliance, reports back. Done.
My Honest Take (The Skeptic's View)
Look, I've been in automation for 25+ years. I've seen a LOT of “revolutionary” promises that turned into vaporware.
When Don first described this 18 months ago, I thought it was interesting theory. When he demoed it, I was intrigued. Now that it's actually released and I've seen real customer results?
The Good:
- Solves the selector maintenance nightmare
- Works across platforms without rewriting
- Enables non-technical testers to automate
- Handles complex scenarios that were too hard to script
The Trade-offs:
- Slower than traditional scripts (it's taking screenshots and processing them)
- Requires good prompting skills (vague prompts = vague results)
- You need to build trust through auditing early on
- Not a replacement for API testing or unit testing
The Real Question: Is this production-ready today?
For some use cases – absolutely. For dynamic UIs like Salesforce Lightning (Don's example), for exploratory testing, for applications that change frequently.
For high-speed regression suites where you need maximum performance? Maybe not yet.
The Controversial Take: Scripters vs Testers
Don said something on the podcast that's going to upset some people:
“Some of the best testers I've known in my career are the worst scripters. And conversely, some of the best scripters were the worst testers because they didn't have that destructive mindset. Wouldn't it be amazing if I could have my best testers be able to do automation?”
I've seen this my entire career. The business experts who understand the domain can't automate. The automation experts don't understand the business context.
Fourth wave tools might finally bridge that gap.
Exploratory Testing, Automated
This is what really got me excited. Don described a beta customer who asked the AI:
“Find all the different paths to get to the shopping cart.”
The AI found 12 paths.
The customer only knew about 9.
Think about that. Automated exploratory testing that discovers things your manual testers missed.
Should You Adopt This Now?
Immediate Use Cases:
- Salesforce Lightning testing (notoriously difficult to automate)
- Dynamic applications that change frequently
- Multilingual testing (works in 98% of languages)
- Accessibility compliance checking
- Exploratory test automation
Wait a Bit If:
- You have stable apps with established automation
- You need maximum execution speed
- Your team isn't comfortable with AI/prompting
- You're just getting started with automation (learn traditional first)
My Prediction
In our podcast conversation, Don mentioned he'd been talking about this concept for 18 months and calling it “goal-oriented testing.” The fact that multiple companies (including Perfecto) are now building this approach tells me something:
This is where testing is going.
Not in 10 years. In the next 2-3 years.
The tools in the main part of this article (third wave) are amazing and will continue to evolve. But I think we're watching the fourth wave emerge right now.
Check it out: Perfecto – Look for their Agentic AI features (released July 15, 2025)
Watch my hands-on review: TestGuild YouTube Channel – I did a deep dive showing this in action
Hear the full conversation: TestGuild Podcast Episode A554 with Don Jackson
My Actual Recommendation
Stop overthinking it. Pick 2-3 tools from this list based on your primary pain point. Get trial access. Build the same 5 tests in each. See which one clicks with your team.
For the adventurous: Try Perfecto's new agentic AI on one particularly painful automation scenario. See if runtime decision-making works better than scripting.
Don't wait for perfect. Start experimenting this quarter.
The teams I see succeeding with third-wave tools aren't necessarily the ones with the biggest budgets or most engineers. They're the ones who started early and learned by doing.
And the teams that will lead in the fourth wave? They're experimenting with these agentic approaches RIGHT NOW.
Stay Connected
Want more? Here's how to keep learning:
TestGuild Podcast: Every week I interview testing leaders about what's actually working. We've covered AI testing extensively with folks from BlinqIO, Applitools, Testim, Reflect, and many more. Subscribe here
Automation Guild Conference: My annual online conference brings together the biggest names in test automation. We'll have sessions specifically on AI testing tools. Learn more
Weekly Newsletter: I send out weekly updates on the latest tools, trends, and techniques. No BS, just actionable insights. Join 40,000+ subscribers