Chatbot Test Automation with Christoph Boerner

Published on:
Christoph BoernerTestGuild AutomationFeature

About This Episode:

Testing artificial intelligence is very different from testing conventional software like mobile apps or websites. In this episode Christoph Börner, creator of Botium, will share how testing a chatbot is fundamentally different than testing a website or smartphone app. Discover common causes for flaky chatbot testing, and how to handle tricky testing scenarios using Botium. Listen up!

Exclusive Sponsor

The Test Guild Automation Podcast is sponsored by the fantastic folks at Sauce Labs. Try it for free today!

About Christoph Börner

Christoph Börner

Multiple entrepreneur, developer, tester, keynote speaker and drummer. Studied information technology at the Technical University of Vienna and worked in various fields of software engineering. Active member and organizer in the Austrian testing community. Deep interest in artificial intelligence, machine learning and bots from day one. Long before these topics began to revolve the whole industry and have been considered to be the next big thing. Deep friendship with Florian Treml, even after playing in the same rock band for several years :). Starting together the Botium journey in 2018 was just the next logical step. Today we are counting the biggest chatbot qa community and can be considered as the first choice for testing conversational AI.

Connect with Christoph Börner

Full Transcript

Christoph Börner

Joe [00:01:40] Hi Christoph! Welcome to the show.

Christoph [00:01:43] Yeah. Hi, Joe. Hi, guys out there. Thank you very much. It's a pleasure to be part of your Guild today.

Joe [00:01:49] I'm really excited to have you on the show. I've been hearing about Botium for a while, so it's great to have actually the inventor of Botium joining us. Before we get into it though, Cristoph is there anything I missed in your bio that you want the Guild to know more about?

Christoph [00:01:58] Well, it was maybe the best bio I've heard about myself. Pretty cool. Thank you very much.

Joe [00:02:05] Cool. I guess before we get into it, how common do you think chatbot testing is nowadays? Is it a fad or is it something you see as a growing area that you think is missing testing and resources to help testers test it?

Christoph [00:02:16] So first of all, it's something new so testers, test automation engineers have to adopt. In fact, all the big enterprises out there have in their future strategies conversational AI, meaning bots and virtual assistants. So all this stuff is coming up more and more. But in fact, I still get these questions in my inbox asking me what is the value of testing bots, and then, well, if you are a passionate tester out there this really hurts. But I can compare it to as you said in my bio, I was running before Test Consulting Company and then we were doing a lot of test automation for maybe 15 years ago for websites and then the last five to 10 years, mainly mobile test automation. And also it's their first time. So we had to convince people that you need to test and we see the same thing right now with bots. So it's a topic that is really coming up a lot of attention, but still people asking me what is the value of testing chatbots?

Joe [00:03:12] So what is the value of testing chatbots?

Christoph [00:03:16] I was not expecting this. Well, the value of testing is derived from quality and quality means at the end of the day, the satisfaction your user can feel every time he talks to your chatbot or virtual assistant. This is why I would say why we call our whole discipline your quality assurance. It's all about quality and from company sites, it's, of course, also about confidence. Also, you need a high confidence level to ship your products into production environments. And this is, I think, of utmost importance, especially in the Agile approach. And of course, when we're talking about values and Agile projects, then automation is the key enabler there. Test automation usually leads to an early identification of the facts and then of content behavior simply so this is not different when we're talking about bots. And at the end of the day, all the stuff we know decreasing time to market is a result. Costs of quality are decreasing end to end again to amplify user satisfaction all the stuff at the same time. This is how I would explain the value of testing bots.

Joe [00:04:23] So I wonder, with your experience over the past year with covid and everything I would think there was probably an uptick in people utilizing chatbox and chatbots because more people are going on websites and they're not going out to maybe a bank or merch physically, but virtually maybe they're going on and trying to get their questions asked. Did you see that as a trend, that more companies were like, “Well, we didn't even think about how many people are going to start using a chatbot, and now we kind of need to definitely put some resources into test it”?

Christoph [00:04:49] Yes, definitely. So as bad as this whole pandemic thing is, for the topic of conversational AI, it was definitely a boost, to be honest, because starting with this pandemic, so many companies were knocking on our doors and telling us, “Well, we get 20 thousand calls in our support centers every day and it's impossible to answer them. Can you build a bot for us?” And then those days usually or the first reply was, “Well, we are delivering test automation tools for bots. We are not the bot developers.” But one year ago in March, we also helped a lot of companies develop some kind of corona bots. So I remember a lot of…especially when it comes to financial institutions and so on, people were very unsure of what is happening with my money and my bank account and so on. Will the ATMs work in the upcoming days? So there are so many questions coming in. And then on the other side, a lot of these bots were also used internally. So employees were asking these questions. Do we have to stay home? Will I be fired? How is this whole thing now going? And yes, it was definitely a boost for conversational AI.

Joe [00:06:01] How exposed could a chatbot leave a company? I'm just thinking of, you know, sequel attacks or security attacks the same as when you testing a website is it the same concerns you need to think about when you're testing a chatbot?

Christoph [00:06:14] Yes, definitely. So that's a very important point. And also one we are following here at Botium, what we are delivering is let's call it a holistic test approach. And in this approach, all these topics we know from testing websites or mobile apps coming again, it's about different test types, approaches. It's about functional, nonfunctional. It's about mobile regression acceptance, and it's, of course, also about security and performance or a very big one, is also GDPR, for example, in Europe, and there we have a code of eighty-nine dot whatever percent when we run our GDPR test that it fails with the first question. The first question inside is what are you doing with my data? And eighty-nine percent of all bots we are asking cannot answer this question. In fact, security is a very important topic and we have seen cases where we test with Botium. Well, let's say that the case you want to avoid is the companies that your bot cannot resolve an intent that reacts or replies with something like, “Sorry, I don't understand.” This is a typical case. And as a user, you don't want to see this. But the worst case is when the bot breaks and exposes some security-relevant data and so on for example. This usually happens when bots break, sometimes you see what is the NLP engine behind, what is the bot building framework used, and so on. And this is already the first option for an attacker. I mean, more than that, we are doing our own security testing. It is running for different stages. And the first one is testing on an API layer, for example. This can be compared to API testing. When you're talking to a bot-building platform or chatbot, usually you start to run your tests against the API because there we can really execute these enormous test sets and this is what we see in testing chatbots, that quantity really makes a difference. This is when I was testing web and mobile, I would have said it's more about doing the right tests than doing a lot of enormous tests. When it comes to conversational AI, this is different because people can ask bots everything. So in theory, tests are infinitely large. So all this discussion about test coverage, how many, or how much can be reached is even worse compared to what we have before. But coming back, we usually start on the API layer because then we can execute something like 100 000 conversations with this typical regression tested for chatbot, we're like this (clap hands). It takes a few seconds and in big teams with 30, 40, 50 developers, conversation designers, and so on, I see in screen stuff happening from a few hundred to even a few thousand builds, meaning a few thousand times who run the smoke test. And if your test automation is that fast, you can also run the whole regression 1000 times in sprint, nothing stops you from. And when we start on this API level and especially there we are using for security testing a Zed Attack proxy from so they're OWASP ZAP is pretty well known from the OWASP guys out there. So everything we fire to a chatbot and everything that comes back as a response we send through this proxy and people are pretty, they go crazy when we do something like that conversation where we say hello to a bot and check if he detects the welcome intent because this is a two-minute thing to automate. And the result is it shows the conversation is green. But then we go on the results to the security tap and it shows 30 different security vulnerabilities or something to fire. So this could happen. The cool thing here, and this is kind of our philosophy we're trying to make, I don't want to say demystify test automation, but we are trying to make it fast and easy and security testing with watching boxes, switching on the security testing switch and then every request, all these conversations, all these 100 000 questions going to the bot and all responses coming back are going through our Zed Attack proxies that are tested through hundreds of for hundreds of possible vulnerabilities. And therefore you don't need a lot of security knowledge in your team. So this can be done by everyone. But to adhere to this is not the whole story. So especially when it comes to, I don't know, financial institution, a bank or something, I wouldn't say if you do this and you don't have any errors there anymore and then you can ship to production, you're totally safe. This is not the whole truth. The whole truth is this. Yeah, taking the low-hanging fruit, let's call it like this, “Is this stuff that you really can fix on your own easily?”. And when you have done this and you go for an external consultant or for a real pen test or whatever, you can already tell them, “Guys, we have continuous security testing in place. We are doing one thousand builds in the sprint. We are running 1000 times in the sprint the security test set because it's that fast. So let's always run it. And these are the 300 security vulnerabilities we already identified and we fixed.” And then usually an external audit, an end-to-end external pen test (unintelligible) an external security test can start on a different level.

Joe [00:11:30] So there's a lot of info there. I love this approach. It sounds like we keep mentioning the conversational AI type of chatbot. It could be what you're trying to test. How do you know what to test? There seem like so many possibilities. Is it because you're using API, there's a schema you can return, and then you just test that schema? How do you approach testing an application that could have numerous types of paths or responses based on numerous criteria?

Christoph [00:11:58] So actually, this is what and how should we test? This is one of the most important questions for us. On a very low level, I would say on the left side, we are testing the dialogs, meaning the conversation flows itself, that's been designed by conversation designers out there. And on the right side, we are testing the performance, meaning more or less, that they help the state of your AI engine. So testing the conversation flows is I would say pretty easy because usually, you have some list of intents that would define somewhere in the planning phase what should our bot be able to answer. I don't know. For for a telephone company, it would be something like the how can I quit my contract? Is the new iPhone in the store available? Stuff like this. And usually so what we see with our customers, usually they have around about a hundred or the bigger bots a few hundred intents. And usually, you have just a few hundred user examples behind every intent. And if you multiply those two values together, you end up with these thousands or even a hundred thousand conversations with something like this. And this is a good starting point for a regression test. But this is only the left part. This is your dialog. This is really checking if I ask A do I get back B (unintelligible) intended to be. The other thing is, as mentioned before, your whole NLP performance, your NLP score, talking about natural language progressing. It is a new word. There, it's about more or less we're getting into data science as testers. So in general with conversational AI, there is a new vocabulary. There are these intense utterances, AI models, NLP, NLU-F1 score, KFold analyzes, and so on. So a lot of new vocabulary and it's a lot of data science. So what we are doing there is we are showing what is the confidence distribution of your intents and therefore you see well for stuff like, “How can I quit my contract. the NLP behind?” Or if you want to say that the AI or the chatbot does resolve to it with 98% or whatever. So you don't need to touch this intent. Everything is cool. But on the other side, we are highlighting for the let's go for the same example, “Is the new iPhone in the store available?” Maybe it resolves just 38% or something and this is definitely done with an intent where you have to take action where you might have to add more user examples, meaning more training data. We train the bot and we test the whole thing. And this is just the beginning. We have distributions on an occurrence level, on an entity level. We are creating a whole confusion bot (unintelligible) and they are more or less a combination of all possible intents and so on. And you have, as mentioned before, still with all this new vocabulary and so we're trying to make it fast and easy to use. We're always testing. We are using red and green. And if you have these red boxes popping up, it's usually a good hint to click onto it. And you will see, “Oh, here we're missing training day and so on.” In this case, then we even though the next step, if we have a case like, “Well, here, we cannot resolve to the intent with more than 30, 40, whatever percent, then you just have a few user examples behind and then we are even doing test data augmentation. You just hit the button and then we are using a paraphrase to generate test data and so on. So in theory with Botium you can do codeless test automation.

Joe [00:15:32] Very nice. So we mentioned Botium, I want to get into Botium, but before I just had a random thought. You talked about vocabulary and people think of like you mentioned bots separately than chatbots a bunch of times. So can you use Botium to test other types of bots?  Like for example, I know audio's becoming more and more popular. Can use Botium to test audio-type bots or is it just text-based chatbots?

Christoph [00:15:53] Definitely, yes. So we are going really fully end to end and we are testing all aspects of performance and end to end means voice-based testing, of course. And even there really fully end to end because voice-based could be, I don't know, the Alexa you were talking to on the one hand, but on the other hand, Voice Space could be also an IB assistant (??) you're talking to so effective voice response. These are these typical hotlines that are telling you if you want to talk to support, press one. So they are disappearing, they're going to be replaced. Thank God. But they are there. They need to be tested. At Botium you can do this and still without any coding required.

Joe [00:16:31] So I guess I should have asked this earlier, but what is Botium then?

Christoph [00:16:35] Well, actually Botium everything started here with an open-source test automation framework for bots that we called Botium Core. This was, as you mentioned, that I really liked this in your introduction or my bio when you said something like the next logical step after Selenium and Appium, this is exactly what was our intention. In 2017 Florian my co-founder and I are doing a consulting project and the customer wanted to build the bot and he wanted to have the whole thing having a bot development lifecycle set up integrated into CI/CD pipeline. We had these requirements like a few hundred intents ending up with thousands of conversations to test and we knew we need to test this. I don't think it was around about 10 000 conversations. We knew we need to test them permanently every day a few times, so there's no way of doing this manually. So we started to look for what is out there and then we found nothing. So end of 2017, we are based in Vienna, Austria, which is in the heart of Europe, and we were taking Botium startup contest, immediately won this one and went to the same one for Europe and also won the prize. And this was the beginning of Botium. I founded the company and two months later we launched Botium Core (??) and it's still there. It's an open-source test automation framework and for all the guys out there who are used in Selenium and Appium, it feels very familiar. Instead of a WebDriver you were creating your bot for (unintelligible), you're setting your capabilities, you're writing tests. So this was our first intention, to make it, of course, technically there is something different in the background, Selenium, and Appium, and there is other stuff going on. But for people who are used to automating with those technologies, Botium feels very familiar. Yeah, as you mentioned, we then pretty soon became at least in the open-source market, we are for sure the industry standard, but having an enormous community growing, we got a lot of requests like people were, “This is cool, testing the conversation flows. But exactly what about this other piece of stuff that we need to monitor the health of the AI.  What about performance testing?” I've done a demo this week for a customer who has in parallel 50000 users every minute in his chatbot, also an enormous number. So all this stuff, like I mentioned in the beginning, it's a holistic test approach. You need to do load stress testing. You need to do security, you need to test everything, you need some monitoring, need a pipeline integration. Therefore we decided, well, Botium Core is cool, let's keep it open source for the community. But for all this stuff around, we need something bigger and this is what then our second build-up called Botium Box. And this is more or less very nice UI, at least we think it's nice. This depends on the people using it, but there you really work with a few clicks, you connect to your bot, you write your first tests and you have them integrated into your platform. It's usually when I'm doing demos, it's like at the beginning I ask, “Is there someone who has set up a test automation framework or Project Selenium or Appium?” Usually, people say yes. And I'm like, “So just think now somewhere back in your head what this means if you will do the same thing with Selenium?” and then I show them, well, we export from the bot building platform the customers are using to develop the bots, the credentials that are usually are JSON files from (unintelligible) or whatever. We support more than 40 platforms. So this is a drag and drop into our q ick start menu. Then you are connected to the bot forever. It takes one second versus setting up the WebDriver setting capabilities and stuff and then you hit next. And maybe the easiest way to compose a test is or live chat, you just say, “Well, I talk now to the bot.” So you open your microphone and you talk to the bot like I'm talking to you. On the other side, the bot does the speech to text, recognizes the intents, replies. In the end, you hit save this whole thing. It's in your test case and then that's it. The test project is finished. So after five minutes, we have a repeatable test project that we can execute 100 times a day if we want to and full pipeline integration is there. It's just the web bot (??) to call full reporting. And I'm always like maybe I'm most excited myself about it because I was doing this for so many years, but especially in big companies setting up something like testing end to end the mobile app in Android, and iOs in parallel may be on real physical devices with some device provider and so on, integrating this into a pipeline trigger with every build and so on. Well, to be honest, this took me at least weeks in all these companies and some of them even months, and the same thing we are doing here in five minutes and it's codeless. At least ourselves it's exciting.

Joe [00:21:38] Yeah, it sounds like a really cool tech. That was going to be one of my questions. How is this different than someone just using Selenium and maybe REST Assured to test API? This is built specifically for bot testing. It sounds almost like not a unit test, but it's almost like you would test the bot away from a website per se, and then you still would use Selenium to do your browser-based automation. But this seems more like a solution targeted specifically for the bot functionality to get up and running quickly so you don't have all these other distractions.

Christoph [00:22:06] Exactly. That's it. And as mentioned, we're really trying to cover all aspects of testing bots. So taking this easy example that I was throwing now where we have this small conversation with the bot recorded, and then you can execute how many times as you want. The next step is to say, “Well, these tests are running now against the API of your bot, let's run them end to end.” And this means just selecting all of the dropdowns. I want to run them now against the browser farm, against all possible operating systems and browser combinations and I was about to run them on real physical devices like smartphones or the likes. But the test set stays the same. So you have recorded the simple conversation that runs against API, but it will also run in the web browser and we take care in the background of all this stuff. So we take away the technical complexity. No one has to be there with Selenium indicators or something to find an input field where you can send text to the bot or whatever. Botium takes it away so you can really concentrate on the main purpose, defining your test, what should be tested, what should be the content, and so on. And the thing as I mentioned before, you go to the next step and say, “Oh let's test against an IVR solution.” No change, it's the same test just telling Botium, “Well, now run against an IVR solution.”.

Joe [00:23:29] So I know people are always going to ask this, do you ever have a test that Selenium interacts with Botium, or is it always separate? Is there ever an end-to-end test where you would have to leverage Selenium, Appium, and Botium together in one type of flow?

Christoph [00:23:42] Yeah, so definitely there is interaction. But I mean, in a case like Botium Box is using Selenium and Appium in the background. So when we are talking to a bot through a website or mobile app, we're of course using Selenium and Appium. So those two things out there are the industry standards. I'm working with them for 15 years, they are perfect. And why should we reinvent this thing? Yes, we are using them. But as a user, you don't have to write Selenium code. I tend to call it. This is the magic that happens in the background.

Joe [00:24:14] Right. So I guess what are the components that make up Botium then? What's the Botium stack like? What are people getting when they install Botium? Because they're like main areas that…there are certain things that people need to know about.

Christoph [00:24:24] So while there's a tech stack and there is what you get on the low level you get a web application, you have a web UI, you log in and it's really easy. Quick start wizards and when you want to start with the NLP testing, not with the conversation flow testing, then we also have very low entry levels. Like instead of thinking about how can we test now our NLP, how can we check this, you can just use our conversation. You download it, directly connects to the AI model, downloads it, and automatically generates test sets, test cases based on the training data. And this is, of course, then when you're running these test sets, you are purely testing your training data, which is not a fulfilling approach, of course, but is a very good starting point. We see that when customers are developing since years past and they realize now, “Oh, we need testing” and go for Botium Box. Then again, we have here automatically generated test sets with hundreds of thousands of conversations. And so it could take a lot of time to produce them manually. And then usually you just add on top your own specific tests. So this is one thing you get. On the other side from a tech level so for the operations or DevOps teams who would run Botium Box, in theories, it's a containerized application, so it becomes more or less in the docker container. And if you are hosting it on-premise, the only thing you need is a quality standard server, something with four courses (??) or something, I don't know 16 GB of ram and the docker, it's the docker command and ten minutes later everything is installed. You point your browser to the UL and Botium Box is there. That's it. Supporting, of course when we're talking about the containerization also Kubernetes so we're open-shift to stuff like that. But from a tech stack that's it. In the background inside these docker containers, Botium Box was mainly built with JavaScript React Redux so that there's of course a lot of things working together inside, a lot of containers talking to each other. But once again, we're taking away all this complexity from users that we're even packing inside this container stuff like browsers and so on headless if you don't have a contract with a browser for almost something like that to have a starting point. We have a few headless browsers inside for a smoke test or something. It's really enough just to check if the bot is there and you can talk to him. For augmentation, of course, it's another topic. Or we are even packing in device emulators for Android and so on. So we need to…and people just need to say, “Well, run tests against this Google pixel emulator” and that's it.

Joe [00:27:02] So I think you mentioned it started off as an open-source project. And of course, to get all these other functionality takes time and effort. You have paid features for enterprise, for security performance testing. So is this still open source? I'm on your website. Is the mini free the same as open-source? Is that like a different offering?

Christoph [00:27:19] Yes, it is. So what is and what will always be open source is Botium Core. This is mentioned at the beginning, the comparable to Selenium or Appium. And I really like someone in the community in some blog posts called us the Selenium for chatbots. I really like this tagline, because this really is. You can do a lot of stuff with it, but you need all the scripting, you need the knowledge, and you can mainly test conversation flows. All the stuff around is not possible. This is open source and it will stay open source. What is there now? The Botium Box mini, as you saw, is the replacement of our community edition, our formal one that was there of the Botium Box. And there we saw that people were struggling a bit with the hosting on their site. So it's for their (unintelligible) you're talking about docker and containers and so on. This is all complete and easy, cheesy, but there are still people who want to try out something and who just want to register somewhere. And two minutes later you can try out stuff. And I mean, I was in the situation a lot in big companies. Well, you could download the community edition, but to install docker to run these containers, you don't have sufficient rights and you're waiting for a week. So we wanted to avoid this situation. So, therefore, we said, let's take more or less our free community edition that was there. Let's make it deprecated and just host it for all users for free. This is at the end now the Botium Box mini. We are hosting it. We are even paying for the hosting costs up there in the cloud and users can use it. It's eligible for company usage. Of course, it's from the feature set perspective. There are differences to our paid plans, of course, a different support level, different feature set level. But we see, to be honest, a lot of big companies using it at the beginning of the project because they are like, “Well, we need some kind of testing solution, but let's start somewhere. Let's try this out and you can try it out without any risk.”

Joe [00:29:17] Okay Christoph, before we go, is there one piece of actionable advice you can give to someone to help them with their bot automation testing? And what's the best way to find contact you or learn more about Botium?

Christoph [00:29:27] So for the first question, my advice is, of course, to start testing using Botium. I mean, being the founder and CEO and also without joking, it's well, there's absolutely no risk because getting started with the mini is completely free. And as I mentioned before, this testing and quality assurance really makes the difference. This makes the difference between high and low-end user satisfaction. And we all know when it comes to stuff like chatbots, then we are on the Internet and it's mainly used by these millennials generation set and so on. So people who are used to stuff that works, if it doesn't work exactly, this generation is kicking your ass. (Unintelligible) Well, let me give you an example. We were testing the chatbot of an airline that changed their ticket booking system to a chatbot and it was, to be honest, before corona. At the moment, I think not so many tickets are bought. But the thing is, their chatbot couldn't understand the user request. I want to book a ticket or I want to book a flight. This was its main purpose and they didn't realize this for two days or something. After two days, you can say, well, it's such a big airline that sells so many tickets so you can put a number behind the financial loss. This is not a big thing, but the loss of reputation was enormous. So all this posting sends on how can these guys maintain an Airbus or Boeing or whatever if they cannot produce the chatbot and so on. So reputation loss is a very big thing. So, therefore, my very first advice, is guys start testing your bots and therefore I don't have to say use Botium because anyway you will end up with us. It's just the best option.

[00:31:12] And the best way to find or contact you?

[00:31:15] Just Google us, hit our website. You can book there a demo session. This is actually still directly connected to my calendar. So anyone who wants to block my calendar and have a demo session can book one. So a lot of ways, social media, we are everywhere. So if you search all the search engines for us, you will find our company.

 

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Christoph BoernerTestGuild AutomationFeature