Testing ML Pipeline Best Practices to Scale with LakshmiThejaswi Narasannagari

16 March 2025, 11:22 AM

By Test Guild

LakshmiThejaswi Narasannagari TestGuild Automation Feature

About This Episode:

In this episode of the TestGuild Automation Podcast, host Joe Colantonio chats with machine learning engineer Lakshmithejaswi Narasannagari, who boasts over fourteen years of experience working with tech giants like Intuit, Incom, and Poshmark.

Check out App Automate: https://testguild.me/appautomate2

Lakshmi delves into her fascinating career journey from Oracle developer to transitioning into machine learning operations and test automation. The conversation covers best practices for ML pipelines, testing at scale, and how to integrate testing effectively in the AI ML space.

Lakshmi shares insights on understanding machine learning models, navigating various roles within AI and machine learning, and setting up guardrails to ensure model performance and accuracy.

Please tune in to gain valuable knowledge from Lakshmi's expertise and learn how to approach testing in our AI-driven world.

About This Episode’s Sponsor: BrowserStack App Automate

In today’s fast-paced digital world, businesses must ensure a seamless user experience across mobile applications. However, comprehensive mobile testing often faces challenges like limited device availability, slow testing setups, and scalability issues. That’s where BrowserStack App Automate comes in.

BrowserStack App Automate is a powerful, cloud-based automation testing solution designed for both native and hybrid mobile apps. It enables teams to:

✅ Integrate test suites in minutes using the BrowserStack SDK, without any code changes
✅ Run thousands of tests in parallel on a real-device cloud of 20,000+ devices
✅ Seamlessly integrate with leading automation frameworks, CI/CD tools, and project management solutions
✅ Test internal environments such as staging or behind firewalls
✅ Validate advanced use cases, including biometric authentication, payments, and network testing
✅ Gain deep insights with an AI-powered automation dashboard featuring failure analysis and debugging tools

Trusted by 50,000+ customers, including Fortune 500 companies, App Automate takes mobile testing to the next level.

🔥 See it in action and support the show by checking out BrowserStack App Automate (https://testguild.me/appautomate2) today!

About LakshmiThejaswi Narasannagari

LakshmiThejaswi Narasannagari

Lakshmithejaswi is a Machine Learning Engineer who previously worked at Intuit, Incomm, and Poshmark. She has over 14.5 years of experience. Her focus is on end-to-end Machine Learning Operations and test Automation. She loves to talk about how to approach testing in the AI and ML world.

Connect with LakshmiThejaswi Narasannagari

- LinkedIn: www.thejaswi-reddy

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

Transcript

Download New Tab

[00:00:02] In a land of testers, far and wide they journeyed. Seeking answers, seeking skills, seeking a better way. Through the hills they wandered, through treacherous terrain. But then they heard a tale, a podcast they had to obey. Oh, the Test Guild Automation Testing podcast. Guiding testers with automation awesomeness. From ancient realms to modern days, they lead the way. Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold Oh, the Test Guild Automation Testing podcast. Guiding testers with automation awesomeness. From ancient realms to modern days, they lead the way. Oh, the Test Guild Automation Testing podcast. Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

[00:00:34] Joe Colantonio Hey, do you want to learn more about ML pipeline, best practices, and probably testing at scale? Then you don't want to miss this episode because we have an expert joining us. Lakshmi is a machine learning engineer who previously worked at big companies like Intuit, InComm, and Poshmark and she has over 14 years of experience. She's really an expert in this area. She focuses on end-to-end machine learning operations and test automation, and she loves to talk about how to approach testing in the AI/ML world that we all find ourselves into nowadays, so you don't want to miss this episode, check it out.

[00:01:06] Joe Colantonio Hey, before we get into it, as you know, we live in an era of applications and smartphones. Businesses are leaving no stone unturned to ensure a smooth user experience, but it is easier said than done. Businesses struggle to test comprehensively due to poor device coverage, device availability, low power testing setups that are slow and unreliable and limited scale of the automation setup. Introducing BrowseStack App Automate. This week's sponsor. It's a scalable cloud-based automation testing solution for your native or hybrid mobile apps. It's easy to use and offers a plethora of features to ensure faster and confident releases. With App Automate, you'll be able to integrate your test suites in minutes using the browser stack SDK without any code changes. Leverage real device cloud to run thousands of tests in parallel across over 20,000 plus real devices. You can also integrate with leading automation frameworks, CI/CD, project management tools and many, many more. You also can test your apps on internal development and staging environments or behind firewalls. You can even test really advanced user cases like biometric authentication, payments, networks, etc. on their network of both iOS and Android real devices. App Automate is also powered by their all new automation dashboard that delivers comprehensive insights into performance and issues, along with AI-driven failure analysis and advanced debugging tools. What are you waiting for? Join over 50,000 customers, including most Fortune 500 companies that trust App Automate to take their mobile app testing to the next level. See it for yourself, though. Check it out. Support the show using the link down below.

[00:02:51] Hey, Lakshmi, welcome to the Guild.

[00:02:55] LakshmiThejaswi Narasannagari Thank you so much for the warm welcome and the warm introduction.

[00:02:58] Joe Colantonio Absolutely. I'm always curious to know. I know AI and machine learning is the hot topic nowadays, but I think you've been involved in it for a bit now. How did you get into AI and machine learning?

[00:03:09] LakshmiThejaswi Narasannagari I initially started my career as an Oracle developer, and since then I took a rollercoaster. I've slowly transitioned into machine learning engineering over the period of 14 years, I would say. I think in the middle of my career, that's where my manager trusted in me and like, okay, Thejaswi, I think you can do this. And at that point in time, I was doing this automation frameworks for email marketing projects. That was the phase where machine learning was actively in development, as well as the boom of predictive analytics models was in raise in 2019 or 2018, I would say. And so I got the opportunity and I quickly jumped on it and slowly transitioned into DevOps world, I would say, but into machine learning operations slowly. Worked on a bunch of models building automations around the models and testing models, that's how I transitioned into an actual machine learning engineer. It was a happy journey, I would say.

[00:04:12] Joe Colantonio How did you do it? I mean, how did you learn then to get yourself up to speed? Was it something you were working on and just you were learning on the job then?

[00:04:19] LakshmiThejaswi Narasannagari Yeah, so I'm actively learning and I think in the beginning of those, I did not understand what a model is because the world is so different in the traditional software engineering versus the machine learning world, right? Walking on the automation pipelines for testing your software applications is very straightforward because your application is, you would know the expectation around how your application works. But in machine learning world, your inputs are different. You don't have a hands -on, like, you cannot see them and you cannot put an automation pipeline on them, like on the UI or like how you generally write automation frameworks, whether on the web application UI or mobile application on different devices, right? I was learning and your podcast was one of the, like I'm an active listener and you were also actively talking about automating using Playwright and things like that. I was slowly learning new technologies and I was going through this machinelearningmastery.com, which actually talks about how you build a model, how to understand a model, how the inputs and outputs works. There are a lot of learning places that I am on and I actively listen to podcasts and I make sure I hit on, at least read two or three research papers every now and then and understand how the machine learning model works. I think that was really important for me to put a guardrails around how the model has to behave as it's supposed to. I was trying to connect the world of automation world and also the machine learning world and the whole model observability evolved in my career. So model observability is more about when I say it touches the model monitoring concept where you were looking at the model. Hey, is it behaving as it's supposed to? Just like in the traditional software engineering world, is your application working as it's expected to, right? Another would be understanding whether the model is having any bias towards the regions or like demographics and things like that. And also thinking about what are the checks behind different, like how you can enforce the data integrity into your pipeline. So these are the-I'm slowly trying to, how do I put my automation knowledge into machine learning pipelines? That's how I slowly, one by one, I try to implement in my day-to-day work.

[00:06:53] Joe Colantonio Nice. For someone that's just listening, and they may be kind of new to this, what is a model before we dive into how we get into the pipeline and all that?

[00:07:02] LakshmiThejaswi Narasannagari Good question. I would say that it's an entity. I would say like, you can think this entity in different ways, right? Either you're asking a question to it, or you're asking it to define certain responses for you in such a way that it's more looking like a human given, like human generated response. That's where it is trying to the, like automated responses for you, but intuitively, I would say. More towards approaching towards humanized responses that you would get, but not exactly. The more towards you move towards the generative AI model, then the accuracy would be like, it's very closer to human responses and then the traditional predictive analytics is more about categorizing. For example, you say, what is the, let's say you write a text and you say, what is the tone of this model? I know in the email marketing world, we try to send a lot of messages towards a particular targeted user. Let's say you're writing a humorous message or you're conveying a happy message or you're trying to convey a serious message, like a professional message. And the model actually here, what it does is it tries to take in all that input and it categorizes of a text if it's a text-based model, right? If it's a summarization model, what it would do is it would concise whatever you have given into a smaller text version so you don't have to explicitly, what do you say?

[00:08:37] Joe Colantonio Once again, a lot of times when I talk to testers, they're like, I'm not a mathematical wizard. I need to understand AI inside and out, machine learning inside and out. But I don't think that's true. Maybe, am I wrong? Like how much of AI does someone need to understand to really get involved with? Do they really need to get into the guts of it?

[00:08:56] LakshmiThejaswi Narasannagari It totally depends on the role. I would say the roles are evolving these days. And also, when you really get into an organization, let's say you are on the machine learning development side or AI development side, then you would need to know all the statistical, the knowledge about statistics and the machine learning algorithms and things like that. I was fortunate to be on the operations side of it, So I'm not an active model like you know development person but I'm more on the operation side of it where I actually helped with like you know deploying and creating machine learning pipelines and deploying them. Here you wouldn't need a lot of ML like you know development knowledge but to pass through the interviews you should at least have the knowledge of like how the models are developed and how you play around with the inputs and outputs to and at least understand a knowledge around how. There are different types of models, and I think it's really important to understand the landscape of the models. For example, there could be reinforcement models and there could be time series models. There could be just predictive, simple, lightweight models. Traditionally, when you're transitioning from a software engineering role to machine learning role or ML role or AI engineering role, you will need to understand the overall landscape of it so you can excel in your career. You could be focused on one, just machine learning operations, but you still need to have the knowledge of what are these models? How am I playing with my data? And in what way am I interacting with my data? And if it's a time-series model, how should be my pipeline? That actually is important for you to build the entire pipeline, I would say.

[00:10:44] Joe Colantonio We mentioned pipelines a lot. I think people are familiar with pipelines but people are not familiar though with the machine learning pipeline. Can you maybe walk us through what a machine learning pipeline looks like and maybe break down the different layers that makes up a pipeline for machine learning?

[00:10:58] LakshmiThejaswi Narasannagari Yes. Typically, a machine learning pipeline would have, at the high level, I would say, categorized into three layers. One is the feature pipeline, the second is the training pipeline, the third is the inference pipeline. And whatever you do to take the model from building to the production, it typically falls goes to this feature, training, and inference. It goes from feature to inference. In the feature pipeline, what happens is here you are. playing around with the data, data preprocessing happens, experimentation happens of your model, and you're actively developing your model, and you're playing with the feature stores and making sure the data is in the right format for the model to integrate. And in the training pipeline, what happens is your feature is extracted from the feature pipeline, and now the model is trying to learn, depending on what your model size is, your training pipeline compute times, and your training and pipeline infrastructure depends, right? And once the model is trained on your training pipeline, now it's transitions into storing the artifacts and the inference pipeline kicks in and it takes the artifacts and it actually serves the model on the production. You integrate that into the front-end application and now the model is live, I would say. Like the journey of the model begins on the feature pipeline and I would say the life is actually very lively and it can take all the inputs and outputs and it's performing right.

[00:12:35] Joe Colantonio I know what other kinds of testing like performance testing and even testing features and requirements for a normal piece of software. When it goes in production, you're like, oh, I wasn't expecting that. When we talk about models, are there any ways that you can experiment with there or use evaluation practices that help the model perform well in production? Like, how do you know that what you're testing or building the pipeline for once it does get into production, it meets your expectations?

[00:13:04] LakshmiThejaswi Narasannagari There are a bunch of things that you could do to make sure your model is performing as expected in the production. First is trying to understand the requirements of how your model should behave in the production, right? And once you have those requirements, you go back and do the R &D of your model that you think about. What's my data is? Where should I fetch it from? and what's the time length of the data. The first and foremost thing is, once you have the requirements, you have to make sure that you do not overfit the model. For example, you know that the model is going to have evolution of like requests coming in and there is a change in the concepts for your model. Then you have to ensure that your training data is such that it has like almost all the demographics, like all the information that you could feed in. The model can learn. Imagine you're putting a child to a test in the future, right? You make sure that your child learns everything and then you put the child in the real world because the child has already learned everything and now they can put it to use their knowledge. I would consider that way. There's a lot of offline R&D analytics is going on and that's the experimentation loop. I would say you're making sure you have all the data, you take care of all the data quality checks in there. The type of evaluation methods I would also put in where, if it is going through the CI/CD pipeline, I would look at the data constraint checks. That's a great library that I have used. It's called great expectations, where the great expectations does this. If your table is in the right format, if it has any null value, the basic things, and also flexibility of adding thresholds to your data. If the model has a lot of sparse data, I would say. Lot of machine learning models do have sparse data where the inputs and outputs are just like zeros and ones. And what happens is the overfitting happens for the models if you have a lot of zeros and ones. I think it's really important to understand the type of data that you're playing with it and you put the guardrails against the data first in your pipeline.

[00:15:23] Joe Colantonio How are you testing this? Do you create like regression test and test and sprint? When the model is in your pipeline, before it goes to production, do you run automated tests that check different things to experiment to make sure it meets the criteria you set?

[00:15:38] LakshmiThejaswi Narasannagari Yeah, we should do a bunch of things, I would say. Unit testing is hard in machine learning systems, but the way I have performed it as like, putting the data constraint checks in the pipelines itself on pre-broad, I was not very successful doing it on the local because you really do not have the luxury of playing around with real data on your local machines because of the GDPR constraints and things like that. We are not supposed to store the data on our local systems for more than 30 days, right? So pre-broad is the best friend for me here playing around with the data checks. Some type of unit testing would help too to understand if it's a SQL model, if it is running on the SQL queries, then it's possible to put unit and integration tests in there, but if it is a fairly text-based text summarization models, things like that, then I would put post-evaluation checks. That means the model evaluation is happening at the output layer. Once the model goes to the inference pipeline, then you're storing all that input and you have the feedback loop happening. The constraint checks happens on the feedback loop as well. And also another thing that you could do is you could look for the concept drifts on the model by integrating some model monitoring tools and observability tools. You can look for the data drifts happening and you can look for this is a whole new evolution happening on the interpretability side. What you could do is you understand the inputs and outputs of the model, like offline. You do not have to do it online, but you take the inputs and outputs of the model and you perform the Shap analysis on it and Lime analysis on it, and that actually tells you how your model is performing, right? And that's a really good tool to use about the model. I would say, in a nutshell, the evaluation should happen at all the three layers. Training, once the experimentation happens, you look at the performance of the model by looking at the metrics like precision, recall, AUROC, Auroc scores, and you look at the latency as well like the performance of the model. it's really important to understand how the model is performing. For example, if it's a text -based model, you expect the response to be in fraction of seconds. If it's a batch model and if it's a heavy under computation, you still need a good latency. It all depends on how you have actually set up your ML pipeline. Latency is one big factor that I would put a check on looking at the traffic.

[00:18:21] Joe Colantonio Now, I wasn't even thinking about performance, but that is probably huge because people expect answers quickly nowadays. That's interesting for sure. I wrote down, I don't know why I wrote this down. Cause I think you mentioned it a few times, Guardrails. What are Guardrails, to people are listening? How do you put in Guardrails? What are Guardrails for when we're talking about machine learning?

[00:18:40] LakshmiThejaswi Narasannagari The reason I say Guardrails is because in traditional software engineering world, you know the expectation of your software application. And the machine learning world, it's really difficult to have that expectation to constrain to a box, right? You never know, like for this output, it should, you can only know the threshold of it. You can only know, okay, 70% of my model behaves this way and 30% I do not know. And that could be a change in the incoming data and which will make the model accuracy score go low and your expectations are no longer working for you. That's why I say guardrails. When I say guardrails, it's making sure that I keep a tab on my model through the entire pipeline and making sure, okay, I am confident that my model have feeded with the right data. I am not falling into any GDPR issues and I have the right quality of data. I do not have any, especially with generative AI models, it's really important to make sure you do not have any bias feeding into the models, right? Your incoming data has to be screened in such a way that you have the best quality of the data that you're playing around with. And in the feature pipeline, that's a ground for you to understand if you want to, like these days we are thinking about automated retraining pipelines. Now, you want to retrain your model when you're actually not looking at, like you want to retrain your model so it can perform really well. You want to increase the precision of your model. So when we are thinking about the retraining, we should be very careful about how the retraining is happening. So put the guardrails around there. And in the training pipeline, you want to make sure that I know the typical 80-20 rule everybody knows about where you are training your data model with 80% of the data and 20% you're testing. If the 80-20 is working fine everybody looks at the accuracy but I also look at the AUROC scores because the accuracy itself is not the right metric for us to listen. Sometimes your accuracy can be 90% but still the model doesn't give the right responses. The AUROC scores give you the right measures. I would also check for Auroc scores to understand whether the model is having the right context and giving me the right response, right? And in the evaluation pipeline, in the model observability pipeline, that's where I have the right handle of whether model is going through any drift or not. I would have the check on the drift happening in the last one week or last two weeks, the performance of the model has changed.

[00:21:39] Joe Colantonio All right. Dumb question. What's Drift?

[00:21:41] LakshmiThejaswi Narasannagari Okay, the Drift is, let's say, I am, think me as a model, now you expect me to answer questions on the, let's say I'm a, like math tutorial model. You ask me all math questions and I'm performing very nicely. And like I give the right answers for you. Suddenly you ask me very different question, something related to science and I might have some context of science in my training pipeline, so now I could hallucinate and also give you science-related responses, but I am not trained on science. But I'm still giving you responses on the science models, science requests. I could hallucinate and my responses are changing. So now when I see Drift, there is a change in the requests, the requests of, questions in the math format. Also, now you're seeing also the science related. Tomorrow you will see social related questions and tomorrow you will see something totally different like art based questions. There is a change in the concept of your requests. That's one type of a Drift. And in the same way, you will see the drift in the responses as well. So now it's really important to keep a tab on this and understand where the drift is happening.

[00:23:07] Joe Colantonio Yeah. Thoughts were drifting, I love it.

[00:23:10] LakshmiThejaswi Narasannagari Yeah, so you understand that the drift is happening at the request and you quickly train your model with more training data and make sure that the model is having like more knowledge for you to give the right responses.

[00:23:24] Joe Colantonio Gotcha! Would a Guardrail be like, hey, I'm a math model. You have a key and response like, I'm not trained on this. I'm not even going to answer it. Or is that a bad way?

[00:23:33] LakshmiThejaswi Narasannagari It's interesting that you asked this question. The guardrails in this case for me for a math model would be if it's an LLM based model or if it's a predictive analytics model, but the scenario seems to be more towards the LLM model. I would put a check on like three to four ways. For example, one way could be you were an if condition, if the response, if the questions and requests are incoming, which is not related to computational related questions, then say that I'm not, I don't have enough context to respond to your questions. I would need more context. So you would make sure that the model is responding in such a way, right? Or you would keep a tab on your outputs where you're saying, it depends on totally how you have set the model. If there's a scores, response course that you are keeping a tab on. If the model is giving a response to science, the score of the model would come down totally if it's a math-based model. If it is seeing requests coming in from arts-related or science-related, if it doesn't have this response, the accuracy score would come down. You put a guardrail on the accuracy score there. And if it is below 80% or 90%, then it would alert and you would send an alert on how the alerts are set, especially like totally depends on your pipeline. The alert would kick in, or if it's a monitoring-based tool, you have an anomaly detected on your output layer, then you quickly look at the details of why the score has went down, what are the requests. You do a reverse engineering and reverse lookup to see why the change has, why the score has gone down and how you fix it.

[00:25:24] Joe Colantonio Alright, before we go, is there one piece of actionable advice you can give to someone to help them with their ML, AI, DevOps, or automation testing efforts and what's the best way to or contact you.

[00:25:34] LakshmiThejaswi Narasannagari I'm very active on LinkedIn. I also do top-met meetings, if anybody is interested in, like want to transition their careers into machine learning in AI. My top-met meetings are free. I do free 30-minute sessions. Yeah, I have, like, good resources of, like how you want to transition your career. I would be happy to send that link as well.

[00:25:58] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a538. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.

[00:26:33] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:27:16] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

Scroll back to top

A Halloween-themed promotional graphic for TestGuild Automation Testing's "Optimus Prime Halloween Special" with Paul Grossman, featuring festive decorations and two men, highlights the fun side of test automation during Halloween.

Test Automation Optimus Prime Halloween Special

Posted on 10/19/2025

About This Episode: In this Halloween special, Joe Colantonio and Paul Grossman discuss ...

Testing Skyscrapers, AI Drift, Playwright Agents That Promise to Do It All TGNS171

Posted on 10/14/2025

About This Episode: Is the Testing Pyramid holding your team back? AI agents ...

Two men are featured in a promotional image for TestGuild Automation Testing, highlighting a session on Playwright AI Vibe Testing with Vasusen Patil and exploring the benefits of self-healing tests.

Playwright AI Vibe Testing: True Self-healing Tests with Vasusen Patil

Posted on 10/12/2025

About This Episode: Flaky Playwright tests got you down? Discover Vibe Testing, a ...