Scaling Mobile Testing Pipelines with Anton Malinski

21 September 2025, 02:09 PM

By Test Guild

Two men with microphones in a podcast setting discuss scaling mobile testing pipelines, featuring TestGuild Automation Testing branding and highlighting "Anton Malinski.

About This Episode:

Scaling CI/CD for mobile apps is hard. Faster test runs often lead to more tests, more infrastructure, and more complexity. So how do you keep your pipelines healthy and reliable while still shipping at speed?

In this episode, we sit down with Anton Malinski of Marathon Labs to explore the real-world lessons he’s learned building and optimizing mobile CI/CD pipelines. You’ll discover:

How to scale mobile test automation without introducing friction
What “healthy CI growth” looks like in practice
Why real devices still matter, even with a massive emulator fleet
How backend mocking and dedicated mobile API gateways transform shift-left testing
Practical advice for teams evolving from weekly releases to on-every-commit confidence

Whether you’re a QA leader, automation engineer, or DevOps practitioner, this conversation gives you the insights and metrics you need to take your mobile testing pipelines to the next level.

Exclusive Sponsor

Discover TestGuild – a vibrant community of over 40k of the world's most innovative and dedicated Automation testers. This dynamic collective is at the forefront of the industry, curating and sharing the most effective tools, cutting-edge software, profound knowledge, and unparalleled services specifically for test automation.

We believe in collaboration and value the power of collective knowledge. If you're as passionate about automation testing as we are and have a solution, tool, or service that can enhance the skills of our members or address a critical problem, we want to hear from you.

Take the first step towards transforming your and our community's future. Check out our done-for-you services awareness and lead generation demand packages, and let's explore the awesome possibilities together now https://testguild.com/mediakit

About Anton Malinski

Anton Malinski, sporting a full beard and headphones, smiles while discussing Testing Pipelines into a microphone, dressed in a light-colored shirt and suspenders against a dark background.

I've worn many hats throughout my career in tech and enjoy roles focused on improving developer experience and raising the quality bar of products.

If I'm not working on something in CI/CD, you can probably find me tinkering with open-source projects or preparing a tech talk.

Connect with Anton Malinski

- Company: www.marathonlabs.io
- Blog: www.medium.com/@malinskiy
- LinkedIn: anton-malinski
- Twitter: @anton_malinskiy
- Github: www.github.com/Malinskiy

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Transcript

Download New Tab

[00:00:35] Hey, is mobile CI/CD pipeline slowing you down? Well, if so, you're in for a special treat because today's guest, Anton Malinski, co-founder of Marathon Labs, shares harder in lessons from scaling testing infrastructure and why healthy CI growth isn't just about speed. He covers a lot in this episode. You don't want to miss it? Check it out. Hey Anton, welcome to the Guild.

[00:00:57] Anton Malinski Thank you, Joe. Happy to be here.

[00:01:00] Joe Colantonio Great, great to have you. I've been speaking to a lot of people in the industry hearing a lot about what you all are doing there. I guess before we get into it, Anton, in your bio, the bio you sent me, you wore a lot of hats over your career. I'm just wondering what led you to where you are now?

[00:01:17] Anton Malinski The initial vector, I guess, is I started as a mathematician, specifically in the cryptography field, very scientific, very hardcore, and very deep, and I never expected to go into software engineering. But I started working on security solutions in general and started working around the software. I was like, I'm going to poke around a little bit. I'm gonna take a stab at it. Like working on it, I started to feel like maybe it's actually interesting or tinkering with this puzzle of like how to build something. And it led me to kind of go into the mobile area where I started building prototypes for mobile phones that were getting very popular. So everything had to be on the mobile and spend a lot of time there. And you quickly realize as soon as you get to scale, like certain scale that you can't just verify everything like that you've already built before. You inevitably end up in this space where you need to verify stuff. And I started building this thing for a growing team where like we were having growing number of like use cases, features, bug fixes, all of this needs to be balanced. And I was building the CI/CD story of like, we need more hardware and then we need to write better tests and like trying to balance this all at the same time. And this is how I kind of ended into this, I think previously it was DevOps, now it's platform engineering, whatever you want to call this field, but it's basically everything around the code that ends up on the production that users are actually using. So everything around.

[00:02:51] Joe Colantonio Love it. Love it. Why mobile testing? Why does it seem like CI for mobile apps is there's a lot of friction compared to maybe a web app?

[00:03:01] Anton Malinski First, I guess where the pain starts is usually development cycle for web is very different from mobile. On web, you can deploy it in several minutes. It's going to be deployed on your CDN, whatever you're using, Akamai, Cloudflare, whatever. But for mobile every release really matters. It's like hours and days of trying to see if your release is going to get canceled by Google, by Apple, something is wrong, and so you really need to make sure everything is really working perfectly. And ideally, you also don't want the team to stop working. You might have 10 people in the mobile, you might have 50, and all of that, if all of them stop working for release, it's a totally good thing for the business. You want to still continue working on the next features and bug fixes. So making sure that whatever you're like doing in your code base is up to kind of, I guess, a certain level of quality is, it is a growing pain. You either don't know about it yet or you already know about and you're on top of this game.

[00:04:01] Joe Colantonio Once again, in the pre-show, you mentioned something about Agoda, I think, as the company.

[00:04:08] Anton Malinski Yes. This is the company where I got most of the experience, I guess, with the CI and testing and specifically UI tests for mobile and building mobile farms. I joined this company end of 2015 and I stopped working there around middle of 2019. So like in three and a half years experience journey, I started working as a regular software engineer and transitioned into this infrastructure person and the manager for the core operations team. And this is where I've built like mobile farms with like hundreds of devices for Android kind of roughly I think around 80, 90 Mac minis that are running all of our work hold and lots and lots of different just the regular Linux servers running everything from like emulated space, for example, for emulated Android devices and all the microservices that are needed there.

[00:04:59] Joe Colantonio Yeah, and I think you mentioned something about you had some sort of loop going on, faster test runs, more tests written, more infrared needed, optimization. Can you walk us through maybe what that loop is?

[00:05:10] Anton Malinski Well, I guess, like, in Agoda, the whole story, like you have to understand, when I joined the verification story looked like you stop working as an engineer, everyone stops working. There's an Excel spreadsheet with a list of scenarios where everyone share like I was sharing and you have a kind of pass or fail. And for several days, the work stops and releases cut out like for this, like several days no feature work, no bug fixes. Everyone just goes through these scenarios, basically like monkeys and you want to optimize this. This is where we started. And I remember we had like maybe a couple of hundred Appium tests that were run kind of once a month, but if you run them once a month, they're all like always like half red. So like, you don't have any confidence or anything. And where we kind of left off, like there's a whole journey in between, like we, we ended up with like every commit being tested in under 15 minutes, everything, including regression for all the different configurations for Android and ..... areas. The loop of kind of development started as, we have this pain, how do we solve this is an iterative process. This didn't happen in a month. That didn't happened in one year. It took us like three years realistically to go from A to B. But first we tried to address the, like the first kind of pains were trying to get dynamic CI runners. Just need build machines to even start running tests. And if they're hard coded, like it doesn't really work out of the box. You need like if you want 10 more, you have to get 10 more. And once you get this, you start running your test, usually using devices on those build agents. You spin up an emulator on the build runner and that kind of works until probably like 150, 200 UI tests where it's growing to this time when like it just takes long enough and you don't have really good control over it. So you wanna run it faster. And we've found that adding more hardware to this, it just didn't help. Like you get more build agents, it doesn't really help. There is a nice trick around it that people usually use with sharding. For those who don't know, like the sharding is basically a concept where you parallelize the load into like this bucket. So let's say you have 200 UI tests and you want to run them on like in parallel with four different runners. So you get like 50, 50, 50, 50. Get like four different jobs, execute them in parallel, and that's kind of you get more hardware. You don't really get more hardware you get more build agents, like you really need more devices that you want to execute tests on. And since you are using kind of build runners as your hardware, that's a kind of how, I guess, like where the solution goes from. And this point, I think we're roughly at around like one hour mark. One hour and a half for running our tests and we added more hardware and we hit this wall of like 55 minutes. We give it like twice more hardware, it's still not faster than 55 minutes. And we've figured out that this is not a problem at all in the horizontal scaling, something else is broken there, and we were kind of concurrently figuring out how to write better tests and everything but the main bottleneck there was around kind of the orchestration of running this and specifically this is like test runner space where it's a fancy way of saying scheduling algorithm like scheduler that tries to schedule task on your cpu for the operating system same concept just like you're scheduling tests for execution on lots of different devices or emulators simulators real devices and we try to like work at that time, I think we experiment with a composer from ... with fork, with spoon, we kind of modified them for several months and it just didn't work for us. Like it was still kind of hitting exactly the same limitation and the nature of it is everything in UI testing space and in general mobile testing space is super flaky. Like cables break, batteries die, there's just something some new update that's coming from manufacturer and it has a pop-up screen I remember like in our farm. And it just like stops everything. You can't click on it through like any kind of software method. You have to physically go there and disable it because the manufacturer obviously wants you to update the news firmware, but you're actually running a 24/7 service for everyone. So competing priorities. And then we kind of worked on this, like try to sketch out a prototype for something that could solve our problems. And the main two things that we were trying to address there were around there better kind of load balancing and parallelization, and then also kind of addressing real failures. Like kind of prepare, like not hoping for everything to succeed magically. Like we just prepared that like everything will fail, devices will fail, test execution will fail, your tests are gonna be broken, we just put it in the design, we have to work around this. And we basically came up with like a statistical modeling and put this into the software and I remember our first test run that we started, it was already like running everything in 15 minutes from like 55, that we just couldn't break with adding more hardware, it's just impossible. We've found this spot where like what the bottleneck was for optimization of this particular problem. And kind of we released this, then I remember it was 2018, I released this as an open source project called Marathon, and we continued kind of adoption of it internally. And we got into the state where developers were happy with the solution. Developers were like, oh, I can actually trust my tests because like every commit that I'm running right now is executing all of those tests. So like all of them are green, so I can trust them. And that means I can write more tests. And then in the next year, our tests skyrocketed from 200 to like 1200 for both platforms. So this is like both. We used native frameworks, so like XCUI test for iOS and the Espresso with a little bit of UI automator for Android. And the number of tests skyrocketed. You obviously find more issues and you solve them, but this phenomenon that I never expected, that's like trying to optimize the execution like only puts more load on like, you're going to have more test cases after that. And that's kind of the barrier, I think in general, like the 200, 300 test cases where I think a lot of companies are starting.

[00:11:43] Joe Colantonio You mentioned different frameworks. Does the framework you use matter? Like if you use Appium as opposed to native, does that impact how you're going to scale? Or is it irrelevant? You still hit these bottlenecks.

[00:11:54] Anton Malinski I'll start the answer to this question. From the technical side, Appium is nothing more than the same Espresso and UI Atomator and XCUI Test. So Appium, is just an HTTP server that is built around exactly those same APIs. So whenever you're running Appium you just have more layers of interfaces, but to the exactly the same APIs as you would use in native testing. As any system that is like, simple system is usually more reliable. Like the more complex the system is, the more failure points you're going to have. So Appium from that perspective is usually much harder to achieve stable execution. It's not unheard of. I've seen companies that have like successfully implemented like thousands of Appium tests, but it is definitely a much harder effort compared to native tests. And even between native test platforms for Android and iOS, the resource you like that's like, usually in your test case, what, you're going to find an element by some ID or by text, you're gonna like do some assertion, like click on something, interact with it. Generally iOS and XCUI test is 50% slower than Android. Agoda, I remember we standardized the format for all the test cases. We kind of first more like had like a specification, we use YAML for it, but it doesn't really matter. So it can be anything. We first specified whatever the test case should be. And then, like, engineers took it for the actual execution and implemented it in the actual native framework, using page objects and everything. But it turned out, like from the hardware resource planning perspective, that exactly the same cases take, like 50% more on iOS, which was a surprise to me. I expected both native frameworks to be exactly as performant. And then if you add Appium on top, of all of this, it's just gonna be slower and less reliable. So that means you're gonna spend more hardware on it and effort as well.

[00:13:53] Joe Colantonio You do have a background in mathematics. Are there any real world metrics you can or you rely on to let people know your CI growth is healthy or it's going to hit maybe a bottleneck?

[00:14:06] Anton Malinski I guess in terms of metrics, you can kind of see, so from a hardware perspective and general infrastructure, I would call the first important metric saturation from the four golden metrics. And in this specific case, like, saturation can be queue time and the duration of your verification or the build like how long does it take for you to execute your tests and how much are you waiting in queue until your tests even start executing? It doesn't mean that your queue time has to always be zero. It's just impossible by nature because any company, like it doesn't have a normal distribution of everyone working around the clock. Usually, you clock in, you start writing some code, you push something, you test, then it's lunchtime. And then maybe at the end of the day, you're gonna push your last commit and you actually want to test the final thing. Usually you have those spikes in the load and that's where the queue times that starts to be non-zero it's going to be like 15 minutes 20 minutes but unless you have lots of money or you don't want to set a goal of queue time being zero, it should be something realistic, like 10 minutes. It's not the end of the world, but it shouldn't be throughout the whole day. Like you should have times where you really have resources for running something. And for testing duration, at least in Agoda and later on, like with just Marathon as a project in general, like through customer interviews, what I found is 50 minutes to 30 minutes is generally like psychologically okay for engineers to spend on waiting until their code change is verified. Anything like you can go for a coffee, come back, and you still be in the context of whatever you were doing. If the like this green or red check in the CI is taking longer than 30 minutes, you probably are going to pick up some other tasks like realistically you need to work on something. You're going to switch context. And whenever that you actually get the result, you already forgot what you're working on. So like being in this full state for an engineer is really important. So this time to feedback, this is where it starts to kind of accumulate. And especially if your checks take, I've seen teams where it takes like two, three hours. Like you start to pick not one additional task, you start to work on three of them and juggle all of them. And it's a pretty hard way to work in my opinion.

[00:16:26] Joe Colantonio Oh, yeah, even with five coding, I can imagine now it's even worse. 30 minutes, people are already writing three new features. A lot of times when we talk about development, we talk about shift left, and I assume it might be easier with web than it is mobile, but is it realistic to shift left when you're doing mobile type of development?

[00:16:51] Anton Malinski Shift left, like, I guess, like the whole phenomenon here, like is shift left can be described from production to staging environments, from staging environments to CI. And I guess we're focusing more on this, like CI to local development loop. When building mobile infrastructure or a mobile platform, ideally, you were like, whenever you say you run your test cases in CI and something failed. You don't really want people to have to run it on very different devices. You want to build this thing as a proper service where something failed on a device with a configuration, let's say it was a five-inch screen, this specific DPI, this setting, like this specific version of the operating system, and you have exactly the same API on your development machine, you can use exactly the same CLI tool, for example, or a web UI and request exactly the same device. If it's a real device, you can get exactly the same device where it failed. And this way you can kind of shift left the signal from CI to your local execution. I don't really like the pattern of trying to push for a signal. Like if you can run something from your local and there is no real difference between CI and your local executions at the end of the day. So like, if I want to check something, if it's working end to end, I'm just like my dual user experience, I have a button. It just runs everything as fast as possible without any flakiness, without any kind of something unstable, and just gives me a signal. Does it still work or not? And in my opinion, it's definitely possible, but for 99.9% of cases. So there are those things where you really need a specific hardware. What I mean by this is we obviously have to talk about emulated and simulated devices versus real ones. For simulated emulated ones, you can spin them as much as possible. You can scale them. You can buy more servers for spinning up Android emulators, buy Apple Mac Minis for running your simulators. That's very cheap. But if you want like rare devices, let's say you want some Chinese device without Google Play services, you're not going to buy hundreds of them. And you're not going to buy them for like a scaled execution so that everyone can run all of those tests in like 15 minutes. And what I mean by this, so in our case, for example, in Agoda, we're running around 24-25 hours of tests in 15 minutes and just simple math like how many devices do you need at minimum to even achieve this? It's around 50, right? So like it's around 50 devices, the minimum you need, and that's if you have multiple people doing this, like concurrently, 150 devices of the same model, it's unpractical and not from the perspective of it's a once-off investment. It's the problem of your users are changing devices. Whatever was popular one year ago, you bought those, like, let's say you bought 150 devices. You check it now, like your top 15 devices that your users are using, they're different. You need to buy new ones. And what are you going to do with the old ones? Like, are you gonna recycle them? Like, what's gonna end in a year? You're gonna redo it again? It's unpractical to use this at scale, in my opinion. It's like when the proper process here, I think, is when you see an issue, and usually you see issues on like real devices. First you try to really replicate it on a simulator and emulator. There are lots of APIs available now. There is like anything around network shaping, packet delays, throttling, you have all the access to the camera including you know faking video streams, you have you can mock GPS. There's so many things you can do with sensors even like for an accelerometer and try to reproduce it on an emulator and simulator. If it's at all not possible, and there are cases like this, of course, shift it to the real devices. And that's where kind of balancing between those like real devices and virtual devices comes in. How many things can you realistically reproduce on emulated devices? If most of them you can do, it really depends on your application. Some applications are just simple crud things around API. They don't have really use any specific sensors. They don't need like low level access to the hardware. Most of them can be realistically done on an emulator, but if you're doing something very, very interesting, I remember I was working on one app. It was called BookReader, called BookMate, and we were partnering with this company which was building this very interesting phone with an E Ink screen on the back of it. So kind of front face was LCD, on the back you had E Ink device screen, and you can't emulate this. Like, whatever you do, it's just not possible. You're going to have to test on an actual device, and there's going to be some proprietary API. So there's only one way to develop this. You can try to abstract everything behind interfaces, but there are those cases where this really is just a requirement to use real devices. But again, you want to try to strive for using emulated devices because those ones are easier to maintain, they're cheaper to support, they are updatable like whenever the new version of the operating system comes in, you can just update the package and deploy the new version of the simulator and emulator. But with real devices, whenever someone even says like Samsung S25 or something, which specific version of this device does it mean? Because they are varied between different markets even, they might have a very different silicon chip inside. They might have very different like modem chip, they may have a different firmware version. So which specific device are we talking about? And even in that case of updating firmware that I mentioned before, like, do you want to update your firmware? Because some of your users will update the firmware and some of your users are going to say no to this firmware. Whenever you have a crash from a specific device, like do you want to store all the different combinations of firmware that you just want to have at hand to reproduce the issues? So it's a very hard challenge.

[00:22:54] Joe Colantonio As I say, it sounds almost impossible to plan, even if you do as best you can, once you're in the wild, all bets are off. But mathematically, there has to be some way you can kind of have confidence in knowing you at least have like-

[00:23:09] Anton Malinski Well, you try to do your best with what you can. So you're always limited by the resources. In this case, it's even just the basic budget, like how much hardware can you buy? How much hardware you can afford? So realistically, the first constraint is this one. And the second one is like, how many problems do you want to fix for the users? You're never gonna achieve perfect software. It's impossible. There is no such thing. If we had an algorithm to do this, like everyone would already be using. You want to try to do as best as possible for the majority of your users. And this is where it's super important to collect what your users are actually using for your application. Maybe it's a very niche application installed for some market and only like a certain number of devices are popular there. And this where kind of trying to find the vendor who is going to give you those devices, like rent them out like browser stack or anything like this, or if you want to buy them, comes into play because you really want specific devices, otherwise you're testing your app on basically a minority of the users and you want make sure that at least the majority of the user have a good experience.

[00:24:20] Joe Colantonio Absolutely. I wrote my notes. I forgot to follow up to you. You didn't mention Marathon open source. Maybe a little bit more about what is Marathon open source because I kind of skipped over it without following up.

[00:24:30] Anton Malinski That's all right. Marathon started as this orchestration and test runner in Agoda. We were trying to kind of parallelize everything as efficiently as possible and try to handle flakiness as well. Marathlon Open Source Project is basically this test runner. It doesn't have any good UI or anything. It's just this component is in this very large puzzle of trying to solve testing at scale. And basically trying to, because the ideal user experience, you have a button and you just have a result in a very short amount of time. So Marathon is just this test runner for Android and for iOS that works with basically like real devices and simulated ones. And you just give it your application, give it hardware. And there's lots of different configuration that you can do there in terms of the execution, but basically it tries to balance everything as much as possible so if devices you know dive God forbid or maybe they disconnect it just works around this the test run never stops and it works in a very different way to conventional test runners so to say because most of the conventional test runners they pre-plan everything let's say the test runner sees 5 devices it's gonna split all the tests into 5 buckets and just continue executing until it's done. Marathon does it very, very differently. It doesn't expect the devices to even be there. It's just like whenever the device is free, it figures out what is the best statistically correct thing to do right now in terms of probabilities. And encoding the things around like tests failing. Two biggest things around test run that are usually unaccounted for a test duration. So usually there is an assumption that all tests take exactly the same amount of time. When you shard, for example, into four different buckets, you assume that 50-50-50-50 is exactly the duration. No, because some of your tests are going to take one second for checking your one widget. Maybe you created one widget, make a screenshot, screenshot comparison, next test. But some of your tests are going to be scenario tests, they're going to feel like very long, it's going to include OAuth, it's gonna go like you're going make some orders, you're gonna go into your like basket, go order check. It's gonna take minutes sometimes. You need to account for those things and the test runner has to understand that this test, for example, takes longer duration. I'm gonna schedule only one for this device, but this test for example are very short and I'm going to schedule them on a completely different device and all together batch them. This is one thing that it comes for. And the second one is just how flaky are the tests in general. If it sees, for example, a test that has been stable for the last, it's always the configure mode, let's say one week, it's not going to schedule additional retries. But if it sees that the test has been always flaky, like it still passes and most teams are very comfortable with the fact that if one of the retries passes, the team still considers that the task is passing. We're in this mode where we're happy to pay additional cost for still having the test passing. And yeah, it just calculates the probabilities and figures out, oh, I need like three retries so that I can make this test, like at least one of them being green. But there's also another mode, like if you wanna fix that flakiness, it still sells all the metrics to like a metrics database. And you can see like which tests are flaky. For example, in Agoda, we implemented the process where every week, like the team lead or dev manager receive the report for like your team has like this statistic for the flakiness of your tests and you probably want to fix those because they are really affecting everyone. The problem with the flaky tests, it's not that like only specific team is affected. If you're running tests at scale, one task can basically make a bad day for every single developer because everyone is running everything. This is where having that statistic is very useful. So like you can schedule preventative retries rather than running the test. It fails, I'm gonna run another attempt. It just figures out I need three of them so that like one of them can actually pass and schedules them concurrently. And then if you have enough devices, you can already get the same result by still paying kind of roughly the same cost, but much, much faster. And you can also implement runs for checking flakiness. So like when you fix flakiness, for example, how do you figure, how do really know that you fixed it? I run it locally and it's green. Did they actually fix it? Like you really need those numbers for the probabilities. Like for example the probability of passing was 0.7 and I want to get it to 0.9. It means that I need to run this test a thousand time on different devices, hopefully concurrently. And then check the actual statistic for my fix. Like, did it improve the statistic or not? And this is what Marathon Open Source Project also allows you to do. It's kind of like this engine of scheduling test execution as efficiently as possible for at scale. Where like real problems happen, there's always flakiness. It's always gonna be there. Like it was there in the beginning of time, I believe it's gonna be at the end. You can only manage it and try to do the best you can and as much resources as possible that you can put in.

[00:29:50] Joe Colantonio I first heard about you all from, I think you know Igor from ingenious.io. He raves about how fast this is. What's the other aspect of Marathon? I know you have the open source, but I believe you probably have another business that also does maybe more of paid option.

[00:30:06] Anton Malinski Yeah, so kind of context there is Marathon open source started eight years, around eight years ago. And like for the past around three years, I guess, we've been building this cloud offering. So the whole idea is like Marathon is just the engine. Like the thing that I wanted to build originally was this user experience of I just send my application as many tests as possible. And I want to receive the signal as fast as possible by signal being what's actually failing. And I wanted the signal to be reliable. Unfortunately, it's not possible to do this just with open source offering. You really need this knowledge around infrastructure, and how to tie these pieces together and have a process around infrastructure. There's always this new updates coming from WWDC and they're always breaking stuff around like all of this stack, unfortunately. And what we're building right now is basically this user experience where you just provide us the binaries and we give you the results in under 15 minutes. It does involve much more than the open source offering, but I know it's pretty popular, the open-source part is very popular in the community. Lots of teams are using this. I think the largest number of tests in one run I heard was around 9,500, which I was very surprised to hear. Like I didn't know the teams were actually building like this, such large test tubes, but then it depends on like what you're building and what the business wants in terms of reliability.

[00:31:31] Joe Colantonio I hate to bring it up because a lot of people are sick of it, but AI, I hear about AI a lot with web. Is there any features or benefits of using some sort of AI LLM to help with mobile testing at all?

[00:31:45] Anton Malinski It's very controversial, I guess. My personal opinion, I don't see yet any good place to put AI in, in the mobile or even development. My reasoning here is like, if you're just starting out, it's much better to read the resources that are prepared, like for everyone for the last several decades. Like they are targeted for people, by people, they are handcrafted. AI is useful to get something very quickly out the door. But you really need to have a certain kind of experience, maybe like at least 10 years doing exactly the same thing. And then this thing can prototype something for you and it's gonna be much better than starting from scratch for sure, but you're still gonna spend time on it. I guess for experienced people, yeah, I can see some value in it, but the first place that comes to mind is probably writing tests. Like, instead of like getting all of, for example, the setup for the test, setting up expectations for the mock server, setting expectations on like preferences in your app and et cetera. But at the end of the day, the approach before AI was like, I'm going to write my tests in a way where like I can do this in one single line of code, put a JUnit rule here. I'm gonna have like some specific functions to set up my database. Whatever you're putting into the AI as a query, like what you want to achieve, was probably already done with some framework for testing, realistically. I haven't seen yet anything that looks really interesting to me at least.

[00:33:17] Joe Colantonio Okay Anton, before we go, is there one piece of actionable advice you can give to someone today to help scale their mobile CI/CD pipelines? What would it be? And the second part is what's the best way to find or learn more about Marathon Labs?

[00:33:30] Anton Malinski I guess advice is here, first really set up the expectations, like verification and quality space is really problematic because I believe like the only way to really achieve this and you've probably heard this from someone on like one of the podcast is this, there's this Kaizen approach to quality and all it boils down to is like everyone really needs to chip in on this to succeed. It's a long-term game. It's never like, it doesn't take weeks, it's doesn't take months, it takes years to get to the next level in quality for almost every team. And we really need to buy in an alignment from QA, from engineers, from product, from senior management that they like, the team is going to be slower because they're going to do work on the quality. So first things first, set up expectations. This is going to take a long time. And if everyone buys in, then proceed with it. If someone doesn't really want to do this, I don't believe it's worth it. Like just try to continue where you are. You're going to do basically go uphill with a large stone. The second, I guess, thing that I want to share is don't try to reinvent the rule here. I've been this guy and like I created Marathon because of the necessity. Like I didn't create it because I just wanted to create a new project. So try to reuse whatever is there on the market. And I don't mean Marathon here. Like whatever you see as a solution, like any product, you know, try to work with the vendor, try work with people who actually have like financial alignment to the product for the product to succeed, because usually internal platform teams, they do not have any effect on the business, their impact is transitive. They're only helping developers. The business is usually never interested in improving the platform. For example, like AWS does. Like AWS sells their platform and hence they're really motivated for the platform to be of very high quality. If you're gonna do something internally, at best it's gonna be some hacky prototype unless you spend the years on it, but where are you gonna get the budget? So try to reuse whatever is there already on the market. For example, if you're doing mobile farms, there is the successor of the OpenSTF open source project, device farmers. Start with that, try to use native tooling, but God forbid, don't go into this rabbit hole. And also use solutions that are appropriate for your team size and the project. So if you're just starting out, if you have two developers and you have a very simple web, like don't use any farms, just go and like spin up an emulator simulator on your build runner. It's going to serve you for the next like a year and a half. You don't need complex projects in the beginning. Like maybe you want to think about the process of like how you're rolling the right tests at that point, but you don't need complex solutions. In the beginning, you can think about them maybe when your team is like maybe 10 people, 15. Use an appropriate solution for your scale. And regarding Marathon Labs, I think you can go to our main website marathonlabs.io. Learn about our business, learn about our product, and there is also a link there to the open source version, which you can just try it out, give us feedback as always. I'm staying always for the community, so go to my Twitter. Maybe we'll get a post link somewhere. I don't even remember the link to be fair. So reach out to me.

[00:36:56] Joe Colantonio We'll have links to all this awesomeness in the comments down below.

[00:37:00] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a560. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.

[00:37:35] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:38:18] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

Scroll back to top

Automation Guild Kickoff, AWS Nova, Karate v2, and AI Testing Reality and more TGNS177

Posted on 12/08/2025

About This Episode: Which long-running testing event just announced registration for its 10th ...

Promotional image for a TestGuild Automation Testing event on Gatling Studio performance testing, featuring Stephane Landelle, Shaun Brown, and a smiling host at a microphone. No expertise required to join!.

Gatling Studio: Start Performance Testing in Minutes (No Expertise Required) with Stephane Landelle & Shaun Brown

Posted on 12/07/2025

About This Episode: Performance testing has traditionally been one of the hardest parts ...

Playwright + AI, Faster Migrations, Smarter Tests and More TGNS176

Posted on 12/01/2025

About This Episode: What tool is trying to give testers more control over ...