About This Episode:
In today's episode, I continue my conversation with legendary software testing expert Michael Bolton as we explore the world of artificial intelligence and its impact on testing.
Checkout this weeks sponsor: https://testguild.me/ailowcode
I know I tend to have really pro AI testing experts, and I wanted to give another point of view to balance things out. I can't think of anyone more perfect to do that than Michael
So, is AI truly the revolution it's marketed to be?
Listen to hear how Michael cuts through the hype to reveal what these tools really are: probabilistic text generators that often seduce us with their confident-sounding outputs.
You'll discover when AI can be genuinely helpful in testing and when it's likely to lead us astray. Michael shares fascinating examples of AI inconsistencies and explains why the overconfidence these systems project can be dangerously misleading.
Whether you're concerned about AI replacing testers or looking to incorporate these tools into your workflow responsibly, this episode provides the critical thinking framework you need.
Stay tuned for part two of my conversation with Michael Bolton, where we separate AI fact from fiction in the world of software testing.
Exclusive Sponsor
Sponsored by BrowserStack
This episode is brought to you by our friends at BrowserStack. Many QA teams struggle with limited automation coverage due to steep learning curves, infrastructure hassles, and the need for advanced coding skills. That’s where BrowserStack’s Low-Code Automation comes in.
It’s an intuitive, AI-powered, end-to-end testing platform that lets anyone—regardless of coding experience—create automated tests in minutes. With features like natural language test creation, self-healing tests, visual validations, and cross-browser/device support, it helps teams automate more, faster.
Trusted by over 50,000 teams, including Fortune 500 companies, BrowserStack Low-Code Automation helps you scale testing without slowing down.
Learn more at: https://testguild.me/ailowcode
About Michael Bolton
Michael Bolton is a consulting software tester and testing teacher who helps people to solve testing problems that they didn't realize they could solve. In 2006, he became co-author (with James Bach) of Rapid Software Testing (RST), a methodology and mindset for testing software expertly and credibly in uncertain conditions and under extreme time pressure. Since then, he has flown over a million miles to teach RST in 35 countries on six continents.
Michael has over 30 years of experience testing, developing, managing, and writing about software. For over 20 years, he has led DevelopSense, a Toronto-based testing and development consultancy. Prior to that, he was with Quarterdeck Corporation for eight years, during which he managed the company's flagship products and directed project and testing teams both in-house and around the world.
Connect with Michael Bolton
-
- Email: michael@developsense.com
- Company: www.developsense
- LinkedIn: www.michael-bolton-08847
- Twitter: www.michaelbolton
Rate and Review TestGuild
Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.
[00:00:35] Joe Colantonio Hey, in today's episode, I continue my conversation with legendary software testing expert Michael Bolton, as we explore the world of artificial intelligence and its impact on testing. Now, I know I tend to have a lot of real pro AI testing experts on the show. A lot of people say, hey, Joe, enough is enough with AI to want to give another point of view to balance things out. And I can't think of anyone better or more perfect to do that than Michael Bolten. So is AI truly the revolution it's marketed to be? Well, Michael cuts through all the hype and reveals what he thinks these tools really are. What are they? Well, you need to listen to the show to find out. You'll also discover when AI can be genuinely helpful in testing and when it's likely to lead us astray. Michael also shares some fascinating examples of AI inconsistencies and explains why the over confidence the system project can be dangerously misleading. So whether you're concerned about AI replacing testers, or looking to incorporate these tools into your workflows responsibly, this episode provides the critical thinking frameworks you need. So stay tuned for part two of my conversation with Michael Bolton.
[00:01:41] Hey, before we get into it, quick word from our awesome sponsor for this episode. Hey, let's face it, test automation with traditional code-based automation tools require a high coding proficiency, excluding non-technical team members from the testing process. Steep learning curves. High test maintenance and the need for skilled resources and infrastructure setup or other significant challenges. As a result, QA teams are forced to limit automation to critical applications, leaving significant gaps in automation coverage. What do you do? Well, introducing BrowserStack Low-Code Automation. It's a powerful end-to-end low-code testing platform built upon best-in-class browser stack capabilities that lets anyone create automated tests in minutes without coding expertise. With their intuitive test recorder, you can create your first automated test with ease. Simply interact with your application and capture complex actions like hovering, scrolling, file uploads, etc. And generate test steps with meaningful, human readable descriptions. Timeouts are intelligently configured for each test step, ensuring stability without manual intervention. Visual validation shows specific UI elements appear correctly, improving accuracy, and test stability. But here's an interesting piece. Their low code automation tool is powered by AI. It uses natural language to create test steps and simplify automation of complex user journeys. When UI changes occur, self-healing technology automatically adopts your tests to prevent failures. You can easily scale low code testing with automation, best practices while bringing modularity, reusability, and extensibility into your low code tests with advanced features like modules, variables, data-driven testing, and many more. Join over 50,000 customers, including some Fortune 500 businesses who trust BrowserStack low code automation, and take your automation testing to the next level using the special link down below.
[00:03:47] I'm just curious to know with AI coming aboard, how much does it mock all this up? We talked what is testing. We talked about automation. But now I've been seeing AI testing, AI automation. Like it's like everything now has AI and everyone says, well, this is all fine and good and dandy, but AI is going to solve all these issues now with all these misunderstandings.
[00:04:10] Michael Bolton Well, what's our history? What's our history with claims of X is going to solve everything or X is going to do this or that? I mean, look at self-driving cars. My favorite example, because it's so evocative. Seems that in 2014, there were lots of people making claims that by 2024, there will be no more human driven cars on the road. I mean people were making that claim. And here's the problem with it. There is a kind of, Carole Cadwalladr had a wonderful word for it, broligarchy that sees a problem, solves 20% of it, the easy part of it in 5% of the time. Problem solving doesn't quite work by projecting the time it took to solve the easy problem into the time to solve hard problems. The hard problem with self-driving cars is that driving is not the business of keeping the car in between two lines on the road. That's the easy part to solve. The problem is not keeping the car from hitting a car immediately in front of it when the car immediately in from of it stops. That's a problem and that's got to be solved and enormous steps were taken for quite a while in solving that problem. But what people who are fascinated by technology often forget is that technology is always an extension of ourselves, an extension to us. And what is driving? Driving is not a mechanical activity. Driving is a cognitive and social activity wherein lots of stuff depends on communication between drivers. Lots of stuff, depends on things that drivers do that involve things other than operating the steering wheel and the brake pedal, maintaining situational awareness, communicating via both behavior and a gesture and so on. With other users on the road, working out as we're incredibly capable of doing, humans are just amazing this way, and machine is terrible at it, working out people's intentions. What they focused on was, once again, as Harry Collins would point out, they're focused on the behaviors of driving and not the actions of it. The behavior plus intentions. When it comes to AI, there's a bunch of different problems that come up. First is, AI is a marketing term, not an engineering term, because any kind of reasonably sophisticated software can be marketed as AI, if we think of intelligence in terms of something that takes in some kind of a sensible, that sensible data, that data that we can take in with our senses. And then processes it somehow, applies decision rules to it, and then produces an action. The example I like to use is Shazam. Shazam, the music software, you hold it up to the speaker in the pub, and he goes, what's that song? And it tells you what the song is, and you think, wow, that's amazing. How clever it is, how smart it is. Well, you can market that is artificial intelligence if you want it to. Nobody's stopping you. In 2018, I was working on a particular kind of artificial intelligence pattern recognition in brainwaves for people who were unable to speak or unable to gesture. What this thing did was it threw a band that you wore around your head and sensors attached to the band. It would read your brainwaves and try to figure out which of six or eight blinky lights that you were looking at on a tablet or a phone screen. And then a bunch of techniques would be applied to that data collection. And through a programming that was selected, rather than programming that was intentionally designed, how machine learning works essentially through a zillion different possible candidate programs or candidate algorithms at something. And then you choose the one that fits the data the best. That kind of machine learning is something people have classified as artificial intelligence, presumably because the process of algorithm development and selection of the appropriate algorithm that's been abstracted out by algorithms. And so we get a candidate, a machine learning model that does something and that has the feeling of intelligence about it. And then we come up with these things that generate long strings of text. Stochastically, holistically. And from that, we look at these things, which are models of, literally models of how human speech would go. And then we look at that and we say, wow, that looks a lot like human speech. Our brains say, it must be intelligent. But our brains are wrong, because we know how these things actually generate text. They generate text probabilistically based on training data. And sometimes that gets refined a little bit by various kinds of tuning and feedback assessment or assay of this stuff that users supply. Is it what you were looking for thumbs up or not, thumbs down, and stuff gets refined that way. But it's still just spitting out text. One of the things that we have worried about is that risk, the risk that what we see which is very legible once again, and which is really seductive, is not right. Let's think about the problems associated with AI and what makes GPTs particularly problematic. The first thing is that they're algorithmically obscure. They're not written by intentionally and socially aware people. A machine learning model is generated and selected by various kinds of algorithms, and we don't have the source code available. So that if it's producing unhappy output, we don't know exactly why. It's really hard to trace back through layers and layers of back propagation and data upon which it's acting. That interaction is really hard to trace and by and large, we're unable to do that in any kind of reliable way. So what this means is that the product isn't easily, certainly not easily checkable. There's an obscure relationship between the input and the output, but we don't know about the details of the process by which that's happening. That's one kind of problem. Another kind of a problem is that when we retrain the model or try to fix it in some way, we don t know the effects that retraining is going to have. By changing the weights in a large language model. We don't know how the change of one weight is going to affect the changes in all the other weights because of the phenomenally complex interactions between these things where we need huge farms of servers to process all the linear math that goes into those sets of relationships that spit out a string. So we can't fix them. We don t know how they work on a low level, and therefore we can't fix them, and therefore they're vulnerable to regression kinds of bugs, bugs where it gets better in one way, eh, but it gets worse in another way. On a social level, one of the problems we're facing is that there are claims of these things thinking like humans. There's very great excitement about how good the output looks. And that causes people to turn what might otherwise be reasonable desires into wishes and fantasies. Anything that works great in one context or appears to work in one contest is vulnerable. We have to experiment with it really thoroughly. Another thing that is challenging about these things is that we are inserting them into places where humans once fit. We're trying to put them into the human social order. But these things are not themselves social agents. They're not brought up like people. They don't have responsibility like people, they don't feel responsibility. They are not social agents, but they look like they are. And we anthropomorphize. Not only do they look they are, by the way, they'll say all kinds of stuff to seduces into believing that they care for us. Apologies for any confusion, right? Or sure, I can help you with that. Well, that's very nice, but it pulls psychologically on us in ways that cause us to read stuff in that's not there.
[00:14:11] Joe Colantonio Or builds trust where it shouldn't be.
[00:14:13] Michael Bolton Well, yeah, yeah. The important thing to remember about that is it's not the thing that's building the trust, right? GPT is not building the trust.
[00:14:24] Joe Colantonio Right, right.
[00:14:24] Michael Bolton We are. Yeah, exactly.
[00:14:26] Michael Bolton We are exactly. So we've got to be responsible for that. Now, there's a wonderful blog post by a guy named Baldur Bjarnason called The Large Language Mentalist. And the subtitle is something like how large language models reproduce the psychic person's con, the psychic's con. Because psychics use the same kind of approaches. They'll give you an answer that looks really good or that sounds really good, but that isn't based on any kind of knowledge or understanding of you or what you want, but you, what do you call it? The psyche, the client of the psychic, you fill in the details and when you fill it in the details, all of a sudden the psychic looks masterful and brilliant and insightful when what's actually happened is that they've said something that, for instance, fits the Barnum effect. You are a reasonably diligent and thoughtful person, but every now and again, your discipline slips and you do something careless and so on and so forth. Find a person on earth that doesn't apply to you. These things are very sophisticated on that basis. They're really clever at making us feel good. And it's a mistake or at least a very risky to put them in places where a human wants fit. And we all know about this from our interaction with chatbots. When we're trying to get a problem solved or we're try to get customer service, these things are enormously frustrating because they're not great with specific details and they're great with dealing with our frustration with them. And we've got lots of experiences and lots of accounts of that. But there's another aspect about it which is really bad, really bad these days. It's starting to dampen a little bit as more and more people are getting experience with these things and realizing that the things that we've been warning about for the last two and a half years have come to pass. There's lots of fear missing out. There's lot of investment, lots of money riding on this stuff, and there's lots weird boardroom stuff that says, AI is the next big thing. We gotta get on it. We must apply it. Well, that's not engineering.
[00:16:57] Joe Colantonio If your companies have the mandate, thou shall use A.I. In your day-to-day work to save time for companies.
[00:17:04] Michael Bolton I'd tell them to take a hike. As a professional, I'll use the tools that I believe help me discharge my service to you in appropriate ways. Thou shall use AI is dumb advice in my point of view. It's like saying, thou shall use a marker instead of a ballpoint pen. Now, it's just why? Now, of course, if you are developing machine learning models, I guess, you're gonna be using them in your work. But the social aggressiveness and the corporate defensiveness about these things, well, we spent a lot of money on this. We've get that money back. It really buys into this on cost of fallacy. Here's cases in which I actually fairly regularly use ChatGPT. I use it as a thesaurus sometimes. Sometimes I ask the bot, cause this is something it should be really great at, right? It's a large language model. It's got thousands and thousands of instances of people using this stuff. So I'll ask it for a synonym for something. I'm trying to write something, I can't get just the right word. I ask it a synonym of something and the results just are kind of underwhelming, unfortunately. And part of that is because it's based on stuff that people have said before already in the past. And it's probabilistic in that way. The competition between a large language model and thesaurus, it's best 50-50. Usually, looking something up in a theasaurus is a wiser approach. If I'm doing something in testing work that requires me to write a very short body of very simple code that is easily analyzable and where risk is low. I remember one thing recently where I wanted to make a change to a whole bunch of PowerPoint slides, and I've got code somewhere that iterates through a body of these things, but I was too lazy to look it up and to revise it. I asked she chat GPT for it. And on the fourth or fifth time, it got it right. But it's actually a wash to determine whether I would have done better just going back to the code that I'd already written to do this stuff and reapplying that, taking out the special casing that was in it or something.
[00:19:50] Joe Colantonio I just want to make sure though, a lot of times people think you're against automation and not you use it in the right place when it makes sense. When can you use AI? I use AI all the time. I use it to take pictures of the calories of ingredients of things. So which one's healthier, or even if I go to a restaurant, I don't know what the dishes are. I take a picture of it and say, Hey, what does this mean? And it gives me good results. So as a tester, are there any cases that you could see that it can help you?
[00:20:17] Michael Bolton Okay, well, that's one nice example. One of the things that I actually, an AI-based application that I really liked, I think it's called Picture This. When you take a shot, a photo of a flower, a particular kind of flower in a botanical garden, which my wife is especially into. I remember using this thing for the first time in Butchart Gardens near Vancouver. Near Vancouver, near Victoria, BC, not Vancouver. And it was pretty good. So it seemed at recognizing certain kinds of flowers and identifying them. Here's the thing though, nothing in particular was on the line there. There's no risk involved. I wanna show you, first of all, I'll give you a set of things whereby I say, yeah, go ahead, use large language models or use machine learning models, if you like, in these kinds of circumstances. One, I could do this as a build, would you prefer that I do it as a build? Or instead of just blasting a list at you? Okay. Well, let's do it. It doesn't matter that much. Let's do it this way. Start and then change those all to on click and there we go. When can AI in one form or another be okay? Well, first of all, if you use the output for inquiry rather than control, we said the same thing about metrics for years and years. If you're using it to help you frame better questions, to give you something where, eh, let's think about this in a slightly different way, then it's pretty safe, it's okay for that. When? Output is used for discovery and analysis rather than for pushing aside responsibility. If it's used for the purpose of delegating a decision to machinery, not playing, but when it's a tool to help us discover things or analyze things, that might be okay. The unreliability turns into a problem with that, as we shall shortly see. In general, the simpler the model and the more constrained the set of features, that is the, if you like, not the hyperparameters, but the actual parameters, the things that it is processing, when that's relatively speaking constrained or low, you're probably on safer ground there. And there's a wonderful pair of authors, Narain Kapur, who have written a book on this, and this is one of their essays on that subject. When Risk is Low. When we're taking pictures of ingredients or taking pictures of flowers and using this as a classifier and we're saying, well, that appears to be an amaryllis, nothing really at stake there. I'd want to be a lot more careful about it when I'm taking pictures of mushrooms and I'm considering eating them. That's the sort of thing where I want somebody with expertise to let me know whether this thing is poisonous or not. Anytime when there's no risk of loss or harm or wasted time or diminished value or bad feelings, opportunity costs, societal consequences, then we're probably not too bad. When the volume of output is relatively low and it's easy to scrutinize the output, again, no really serious problem there. When risk is elevated, when the possibility of trouble is higher, but, we are very careful with the output. And we have it examined and scrutinized and vetted by people with the necessary expertise to make an expert judgment on it, then again, not so bad, could be okay. When variability is tolerable or even welcome when we're helping it, we're getting it to help us to inspire something or create something, although I must say my experience is the level of creativity from these things is low, and it's by definition subject because something that looks inspiring to me is basically the output of something that's been trained on a lot of artists' proprietary work. And my wife, who's a graphic artist, has a pretty strong objection to that. Various kinds of societies and businesses have tried their very best all the way along to make sure that artists don't get any money for anything, so this really weaponizes that. When the actual creativity isn't the point, but just getting jiggled is, right? When you do the creative bit, that that's okay. When variation is all right, because of our capacity to repair, to fill in the blanks, to feel in the gaps, to recognize the difference between what the machine does and what a person would do in a similar kind of circumstance. Here is what I believe is actually the most powerful and interesting use of large language models, when we use them as a mirror or a lens, when we used them as reflection or a focusing mechanism on what people are like. Go ahead.
[00:25:46] Joe Colantonio Like, are you going to give an example of what that means?
[00:25:50] Michael Bolton The most famous one that comes to mind at the moment is the case in which Amazon in the 2017, 2018 period, somewhere in there, might've been a little earlier, decided that it was going to apply machine learning algorithms to selecting who should get hired at Amazon. And you probably remember the outcome of that. The outcome of that was the ideal Amazon employee would be a programmer dude between 25 and 35 years old, always a guy, and typically a white guy. Well, what this is not good advice, what it is, it's a mirror on how Amazon actually hired people in that era.
[00:26:37] Joe Colantonio When we look at the output from any kind of machine learning algorithm, geez, it reminds me of that line from the movie, The Russia House, where one of the characters watches another character go into, he's talking to the Russians, and he's got a list of things that the spies on our side want to know. And one of spies on out side is not really happy with this idea. He says, I don't like lists. Says, they tell you too much about the people who made them. Well, in this case, I think one of the more powerful uses of AI is to reflect on what we're like and what our world is like. That is super powerful. Now, here's the instance of when I think it's problematic. Let's have a look here. I was asked to review a book. And I just asked to review this book, and in the book there's an example of how to use an LLM to generate test data, or to generate an object. That's the assignment. You're a JSON to SQL transformer. Convert the JSON object delimited by triple hashes into an SQL statement that will create a table to insert the transformed records into, and then create insert statements to add each record to a database.
[00:28:08] Joe Colantonio Think this is a perfect example of using AI. I'm curious to know what your results are.
[00:28:12] Michael Bolton Well, this is the result from the book. Now, for people who are only listening, it's gonna be hard to see this, but basically it comes up with three records for people, and there's a data associated with the person, and so that's one thing in the JSON. It's an array. There's an area of three people, and two of them have additional needs. I took that exact prompt. And what I saw was three different, there's one from the book, and I did two trials of this using the exact same prompt. And what we see are some pretty interestingly significant differences between each one. For instance, in the book the first name and the last name are characterized as varchar of 255 characters each. In my first trial, it's 100 characters. In my second trial, it's 50 characters for each of first name and last name. The total price of field in this JSON, in the book and in my second trail, these are decimals with two decimal points. In my trial though, it came back as an integer. In the third instance of a problem here, the additional needs fields in the book and in my first trial were varchar of 255 characters. And in my second trial, varchar of 100. Now, this is sort of interesting because when we get down to the actual insertion of the records, this what happens. In the book, it provides the data to this total price field as a decimal with two digits. In the second, in my first trial, that's what happened in the book, in the first trial it provides an int value into an int field. In my second trial, it puts an int value into a decimal field. Is this a problem or not? My SQL's a little rusty, I don't know how it would accept that, or if it would filter it and reject it. But my more important point here is the level of inconsistency we're getting in the test data. Our experiences with machine learning models in general suggest that if your output is of a certain length, it starts to go bad. It starts to get worse and worse. More variable, more consistent, and by the way, that doesn't improve in any kind of consistent way between one model and the next, even when those models are supposedly incremental versions of the same thing. 01 and 03 in the chat GPT family and the open AI family of models. Those things have varying degrees of success and accuracy. But one consistent thing we've noticed is that when your output gets longer, the error rate gets higher.
[00:31:37] Joe Colantonio Isn't this a great example though of AI assistance where you do the thinking. You don't trust the output. You look at the output looks good and you fix it like you did and say, okay, it got me started. Like it's a starter. And then I applied my thinking knowing that it's not a thinking output. I don't know if that makes sense.
[00:31:54] Michael Bolton Well, that would be great if we did it thoroughly and reliably and consistently and from an informed perspective, it would be really easy for me as somebody rusty in SQL.
[00:32:11] Joe Colantonio Let's get rolled.
[00:32:12] Michael Bolton Yeah, to get right, right for us to fall asleep. And these things, if they're nothing else, they tend to be very reassuring.
[00:32:20] Joe Colantonio Yeah.
[00:32:20] Michael Bolton Right. They tend not to be very, to prevent their own skepticism. And there's a report that I saw it, I guess it was a LinkedIn post today, but it's based on a report from a week or two ago, where one AI company, I was looking at, I think the 03 model and noting that a lot of what it said was simply untrue in terms of, well, what did you do to get this result? Well, it says I created a Python program to create this result. Did you run it? Oh yeah, absolutely, I ran it. Okay, well let's see the code then. It pretty, it spits out some code and it's a, I think it's code for factorizing a 512 digit prime number, or something, something of that nature. Anyway, it's got a routine that starts, I think with the first 10,000 prime numbers and tries an easy way of deciding that this number is non-prime is if it goes in, if one of the first ten thousand primes goes into it, then that number is not prime. Well, it turns out that it gives it this long list of the program that it ran and the output that it got. And it turns out the person who is actually testing this thing points out that this huge number is actually divisible by three. Now, it's funny because there's a really easy way to check that. Do you know it? Well, no, there's much easier way to do it. Look at the last three digits of a number and you sum them and if the result is zero you sum them and divide by three and you keep doing that successively so for example if the numbers are let's see is 999 divisible by three, sum it, well the answer is 27 and so oh we sum that some of the digits in that and the answer 9 and divide 9 by 3. And yes, 999 is indeed divisible by 3. Is it the last two digits or the last three digits? The last three digit, I believe. It's embarrassing to say it this way and this kind of, ah, I don't remember. But I believe it's the last 3 digits. If they're divisible three, then the number is divisible 3. And in the same way that it's easy to identify if the number's divisible two, if the last number's even, you can do that. Anyway, by any points out as he's testing this that the number is indeed divisible by 3. Oh, I apologize for any confusion. When we give it a hard problem, we're setting ourselves up for it to mess up a hard problem in a way that is going to fool us through the confidence that these things are designed to exude. We're going to get fooled by its confidence. That's one of the really hard things about testing a large language model, is that the output tends to be voluminous and requires an immense degree of scrutiny by means other than a large-language model. Otherwise, people say, well, if something is 90% correct, and then we run it against something that's also 90% percent correct, then we're good. But the product of 90% correct and 90% applied to the 10%. It's 90% applied in the 90%. Two processes that you put together that are 90% correct give you an 81% possibility of correct output or 81% kind of reassurance of a correct output. It goes the opposite way from the way people would think. It's devilishly expensive. And difficult to test these things. And the easy way to get around the problem is not to use them when it's not a good idea to use. And I went through a list of ways, places in which it's okay to use it. Well, if any one of those things is not okay, then it might not be a good to use some.
[00:36:39] Joe Colantonio All right, so Michael, obviously, the buzz is AI, testers are being burdened with having to now test AI. Are there any traditional approaches to actually testing AI that you can recommend?
[00:36:51] Michael Bolton Well, one traditional approach that actually works is to recognize that our job is not to confirm that everything's okay, but to go through a process of exploration, experimentation, experiencing with a focus on finding problems that matter because that's going to work, because our job is not to confirm that everything's okay. But to highlight trouble. One of the problems with a certain kind of traditional approach to testing, the idea of checking the output is that the output from AI is non-deterministic. There's no way of doing a unit testing in the traditional sense for a machine learning model when the output of that model is going to be generative and text-based. You can't do a unit check on something, unless you ask it to produce stuff in a very specific way in a JSON, then I guess you could help with the unit check from the JSON, but to generate a JSON maybe you want something a lot more normal than using a GPT to do it. A little bit of code in the scripting or the programming language of your choice. But unit testing isn't gonna work for that. The kind of testing that's in, what we would refer to as the discipline frame of testing, the low level programmer style stuff. There's no low level program at work here, no conscious one. That stuff isn't gonna work. And it's not enough to show that the machinery can work because people are gonna fix up, they're gonna repair in their minds the output from it. Anything that's based on determinism or precision or something that's not generated in a much more free form kind of way like. I mean, LLMs generate freeform text, they generate JSONs too and those JSONs are checkable, but not the lower level process of the data creation, that sort of stuff. You've got to look at the output, you've got to examine it and if you've got a program you've gotta test that coming out of a GPT. I mean if you got a program coming out a GPT, you've got to test it carefully. And review it and look for problems in it, otherwise you'll miss those problems. What we perceive of as a magic trick, when we see a magic trick, and we have this perception that something magical happened, that's not actual evidence of magic. We're filling in this stuff that provides a magical explanation for it. Another general thing is that just because we don't understand its behavior doesn't mean it's intelligent. A lot of the time we say, somebody's intelligent, we don't t know what intelligence is. We don't really understand what intelligence is. But if something appears to be intelligent, then we infer that it's intelligence. But that's actually kind of risky because of the magic trick phenomenon. Something that appears to be is not necessarily what is. However, I guess I could say that automated checking isn't going to work very well for LLMs, but the processes of using tools to help us analyze results, to help us investigate results, search, sort, filter, parse data, those things are all still available to us as helpful. Nonetheless, the volume of stuff that's coming from an LLM and its inherent variability is going to make that work hard which ctually is kind of fine with me. I like that kind of hard work, but there's some people though who want that stuff to go away and I've got sad news for them, unfortunately.
[00:41:02] Joe Colantonio Absolutely. Michael, it almost seems like it's going to make, there's a need more for testers. A lot of people say, am I going to be replaced by AI or all this? It sounds like not only are you not going to get replaced with all this code being generated, all this gap that's going make testers even more important. Along those lines, if someone wants to learn how to be a tester, as we defined it in this episode, how can they learn more? I know you have a bunch of stuff on your website. You do a lot of training on rapid software testing. Where can people find out more?
[00:41:32] Michael Bolton Well, my website is a developsense.com. I would also point people to James's website. He blogs a little less than I do. Satisfice.com we offer classes, both online and in-person. For public classes, I do the live ones, say the in- person ones, and James does the online versions of them. And he runs those fairly frequently. So watch his site and the rapidsoftwaretesting.com website for those things. For in-house classes, if you're at an organization that's got a crew of testers that you want trained in the rapid software testing approach, James will do it online. I will do it online or in person, depending on what you want. I love in-person work. To me, it's so much more engaging and interactive in ways that I think are really important to preserve being in the room with people has a certain kind of magic to it. Finally, this will come out, I imagine, after this podcast emerges. But James and I are working on a book currently slated for release in September of 2025 called Taking Testing Seriously, the Rapid Software Testing Approach. This book has been a long time in coming and one of the things about writing a book is that you get to find out gaps in either your own understanding of something or your capacity to explain it. So James and I have spent untold hours arguing about what's going into the book and what's not going in and how to say it just so and just right. So it's like a good meal in that way. You can't spit it out off the end of a fast food assembly line, it's got to be prepared and to some degree practiced because we're trying to make a book that's going to stand the test of time. And that's tricky in a world that's moving all the time. So we're looking for things that are more or less eternal about the business of looking at products critically, stuff that's gonna last for a while.
[00:43:55] Joe Colantonio I'll have to have both of you on a show to do a promotion for that book for sure. That'd be awesome.
[00:44:00] Michael Bolton That would be wonderful and we absolutely welcome that and thank you for it.
[00:44:05] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a544. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.
[00:44:39] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.
[00:45:22] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.