AI-Powered Predictive Autoscaling for Kubernetes with Jennifer Rahmani

By Test Guild
  • Share:
Join the Guild for FREE
AI-Powered Predictive Autoscaling for Kubernetes with Jennifer Rahmani

About This Episode:

In this episode of the TestGuild DevOps Toolchain Podcast, host Joe Colantonio sits down with Jennifer Rahmani, Co-founder and COO of Thoras.ai, a company redefining how infrastructure scales with AI-driven predictive technology. Drawing from her years as a DevOps engineer in the defense tech sector, Jennifer shares how she and her twin sister turned real-world frustrations into a reliability-first platform that eliminates the guesswork from scaling.

 

We discuss how Thoras.ai integrates with Kubernetes to predict workload demand minutes—or even hours—in advance, allowing teams to maintain high availability without overspending. Jennifer explains why they use the right AI for the right use case, how their predictive autoscaling works in multi-cloud and hybrid environments, and how it helps SREs avoid downtime during unpredictable events like Black Friday or major product launches.

Whether you’re dealing with noisy data, high cloud bills, or sleepless nights worrying about reliability, this episode delivers practical insights for making smarter scaling decisions.

 

Listen in to learn:

Why reactive scaling is broken and how predictive autoscaling fixes it

The advantages of using machine learning (not just LLMs) for scaling decisions

Real-world SRE pain points Thoras.ai solves

How to balance cost savings with reliability in modern infrastructure

Exclusive Sponsor

SmartBear Insight Hub: Get real-time data on real-user experiences – really.

Latency is the silent killer of apps. It’s frustrating for the user, and under the radar for you. Plus, it’s easily overlooked by standard error monitoring alone.

Insight Hub gives you the frontend to backend visibility you need to detect and report your app’s performance in real time. Rapidly identify lags, get the context to fix them, and deliver great customer experiences.

Try out Insight Hub free for 14 days now: https://testguild.me/insighthub. No credit card required.

About Jennifer Rahmani

Jennifer Rahmani, with long, wavy brown hair and a black top, looks directly at the camera against a plain light background, embodying confidence and focus—much like Kubernetes enables AI-Powered Predictive Autoscaling in modern tech environments.

Jennifer Rahmani is the Co-founder and COO of Thoras.ai, a company redefining how infrastructure scales with AI-driven predictive technology. Thoras is on a mission to eliminate the reactive firefighting that plagues SRE and DevOps teams by enabling systems to anticipate and adapt before issues arise.

Prior to founding Thoras, Jennifer spent a decade as a DevOps engineer in the Defense Tech space, where she specialized in architecting resilient cloud infrastructure and large-scale monitoring systems for mission-critical environments.

Connect with Jennifer Rahmani

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

[00:00:00] Get ready to discover some of the most actionable DevOps techniques and tooling, including performance and reliability for some of the world's smartest engineers. Hey, I'm Joe Colantonio, host of the DevOps Toolchain Podcast and my goal is to help you create DevOps toolchain awesomeness.

[00:00:19] Have you ever been jolted awake at 2 a.m. by a scaling emergency? If so, you're not alone. What do you do to avoid this? Well, today you're in luck because joining us in this episode is Jennifer Rahmani, a co-founder and COO of Thoras.ai, a company redefining how infrastructure scales with AI-driven predictive technology. And drawing from her years as a DevOps engineer in the defense tech sector, Jennifer shares how she and her twin sister turned real world frustrations into a reliability first platform that eliminates the guesswork from scaling. We discuss how Thoris.ai integrates with Kubernetes to predict workload demand minutes or even hours in advance, allowing teams to maintain high availability without overspending. Jennifer also explains why they use the right AI for the right use case and how their predictive auto scaling works in multi-cloud and hybrid environments and how it helps SREs avoid downtime during unpredictable events like Black Friday or major product launches. Whether you're dealing with noisy data, high cloud bills or sleepless nights, worrying about reliability. This episode is going to help you learn more how to make smarter scaling decisions. You don't want to miss it. Check it out.

[00:01:34] Hey, before we get into this episode, I want to quickly talk about the silent killer of most DevOps efforts. That is poor user experience. If your app is slow, it's worse than your typical bug. It's frustrating. And in my experience, and many others I talked to on this podcast, frustrated users don't last long, but since slow performance is a sudden, it's hard for standard error monitoring tools to catch. That's why I really dig SmartBear is Insight Hub. It's an all in one observability solution that offers front end performance monitoring and distributed tracing. Your developers can easily detect, fix, and prevent performance bottlenecks before it affects your users. Sounds cool, right? Don't rely anymore on frustrated user feedback, but, I always say try it for yourself. Go to smartbear.com or use our special link down below and try it for free. No credit card required.

[00:02:31] Joe Colantonio Hey Jennifer, welcome to The Guild.

[00:02:35] Jennifer Rahmani Joe, thank you for having me.

[00:02:37] Joe Colantonio Excited about this interview, I saw an article on your company and your founding. I thought, wow! this would be a great episode. Before we get into the meat of the episode, I'd like to learn about the guests, maybe how you got into tech, how you get into DevOps and SRE?

[00:02:51] Jennifer Rahmani Yeah, can definitely go into it. So part of Thoris, I spent about 9 years as a DevOps engineer. And a lot of my focus was on deploying monitoring solutions for the defense tech world. And during this first hand, I experienced a lot of my own frustrations with the tools that I had. I just found that it required a lot of my guesswork intuition for a lot of my job. I would have to set up scaling policies for infrastructure, figure out what are the thresholds for alerts. And I just thought we were very reactive with the way we were doing all of this and using like a lot of guesswork. With working with so much data, it became easier to just throw money at the problem. We would monitor it and ingest anything and everything. We would over-provision and have extra compute at hand. I also started to see this industry change where cost started becoming very important, especially with meeting with the rise of GPUs. We had to start being more mindful of costs, but also make sure that performance and reliability isn't impacted. I got to myself, there has to be a better, more proactive, less firefighting way to make better data-driven decisions about how to scale and measure infrastructure without a trade-off between being more performant and reliable and being more cost-efficient. I teamed up with my co-founder and we started Thoris, and the whole idea was let's develop this reliability first machine learning, driven platform to empower engineers to run their infrastructure more efficiently, eliminate waste, better safeguard, and be able to plan for growth as their environments evolve and prevent downtime.

[00:04:30] Joe Colantonio Love it, love it. I guess there's a little more to your co-founder. Not only is it your sister, I think, but I believe it's your twin sister. I have three older sisters. I can't imagine going in business with them if they were twin. I think it'd be even difficult like why that as your co founder just happened by chance or is this something you've always planned?

[00:04:47] Jennifer Rahmani Not anything we've ever planned. It's actually pretty funny, Joe. We ended up kind of working on similar fields, kind of accidentally. She was a site reliability engineer. She did work also in the defense tech world, but she also transitioned more towards commercial. It's funny dinner table conversation would become about work. And a lot of the problems that we were seeing, we venting to your family is a very common I think during family dinner. And one thing we started to kind of see is, seeing a lot of the same frustrations, we're both very much engineers, never really expected to become founders, never thought we would. But we realized that her and I have very complementary skills. And we have a lot similar experience, but it kind of in different domains. Mine was more towards monitoring, hers was more towards the infrastructure side. And she also has a little bit of an ML background as well. And, we decided, why not do this? Nothing like this exists, let's build the tool that we wish we had.

[00:05:45] Joe Colantonio Love it, love it. I also would be scared. I mean, I got laid off and I just started my own thing. I didn't have a good gig and I just started my own business. What drove you to that point though? Was it like an AHA moment? I know you mentioned a bunch of different issues you had at work, but were there no toolings that existed that addressed all these issues that you had?

[00:06:05] Jennifer Rahmani Yeah, no, it's a really good point. I think it was naturally with like the pain points and the way that the industry was going, one thing, again, I think with the frustration of just firefighting, always having to be on call, that becomes very tiresome. A lot of weight gets put on the reliability engineer. It's very high stakes. And once outages happen, we're digging through our monitoring tools, we're trying to figure out the cause. We're realizing that a lot of the issues we're finding, maybe there was a more proactive way to kind of get ahead of it. Naturally, as SREs, we love to find things to automate. A couple of other things contributed. I think one was the rise of Kubernetes also that ended up being our first wedge on focus. Kubernetes is a fantastic technology, but Joe, I'm sure you know it. It's very complicated in its own way, and you have to really enable it to kind be more proactive. With the way like Kubernetes works, is you have your real-time scalers, very reactive. And often, you're using, you kind of eyeballing your metrics and your data from your observability tools and you're reacting. And as engineers, Nilo and I, my co-founder, we were updating our scaling policies very manually. When we would get hit with a traffic surge, we would go in and quickly update the policy. And the problem with that is the applications, a lot of times they have long startup times. There would be latency already. The other problem too, is especially if you're starting to use GPUs or more compute intensive servers, you have to be able to go ahead and quickly get that compute. And oftentimes, try to get the GPU, wait for it to initialize, wait for your applications to load up. It's just too reactive. And we realized that, I think with the rise of AI, even though we're using more traditional machine learning for our tech. There is now more of an appetite for engineers to use AI in general. We very much leave use the right AI for the right use case. And that was one thing that kind of I think helped with this. Okay, now is the right time AI adoption is at its rise. The problem is getting worse, data is becoming more noisy, environments are becoming more compute intensive. I think now is the time to do it. Who else is going to build it if we don't, right?

[00:08:27] Joe Colantonio Absolutely. You made a good point. You know, the rise of AI, but like you mentioned, you're using machine learning. And I think does that like kind of not taint, but like give people a wrong impression, all the judges jumping on the bandwagon when it sounds like this is a great problem that machine learning was made for almost, it sounds like.

[00:08:45] Jennifer Rahmani Yeah, no, you're absolutely right. One thing we kind of set out to do with Thoris was have this reliability first platform that uses the right AI for the right use case, right? And the way we kind of envision is we're starting with machine learning because it's a safe, easy adoption for infrastructure engineers to use AI because this is very new for many of them. And we realized that, there's different types of AI, there are LLMs, you have traditional machine learning. And I think for the use case. It just makes sense to figure out what type of AI to use. For example, there's a lot of buzz right now with using large language models for part of the SRE workflow. I think it's very good and useful for postmortems, going through logs, natural language. But for the type of scaling decisions that we're focusing on with Thoris, it's more numerical statistical data and it's actually better to use more lightweight performant machine learning models rather than an LLM. Why use an airplane when you can drive a car, right? For shorter distance, especially if it's cheaper and very efficient.

[00:09:50] Joe Colantonio Weird question. Why the name Thoris? I know Thor is the God of thunder. I don't know if Thoris are derivative of that?

[00:09:57] Jennifer Rahmani Yeah, it actually is. So you got it right. It is thunder, but the IS at the end makes the plural, and it's also the feminine version.

[00:10:07] Joe Colantonio Oh cool.

[00:10:08] Jennifer Rahmani Yeah, the two female co-founders, Thoris and the whole idea that, we're building this platform that helps give the right actionable insights, is very game changing in the way that we manage infrastructure today.

[00:10:20] Joe Colantonio Love it, love it. Now you've also gotten some investment as well. What's the picture given that resonates with people that are looking to invest in companies? Do they know like Kubernetes is a pain and therefore, this is a no brainer? What was the hook that got people like, oh, I want to be a part of this.

[00:10:39] Jennifer Rahmani Yeah, I think one big part of it, Joe, is the fact that Nilo and I have lived and breathed this on our own. We've worked firsthand for 9 years in the space. We've dealt with the frustrations. We know the pain points. I think that's very appealing for investors, but also at the same time, we try not to introduce too much of our bias. We were very, very big when we raised our first round. We were talking to customers about their pain points, we were figuring out who is the ICP we want to go for. And we started to realize that with our customers, they are a group that cares a lot about cost savings. They don't want to overspend anymore. They also care about reliability. They want to always make sure that their highly available performance is never impacted. They want that to get better at anything. They're sick of that trade-off. They don't always want to be on call for issues that could have been intervention in the first place. And they're also really curious about AI, but we also find Joe that our investors, our customers, they're a very AI savvy group. And the fact that they know that with the AI hype that's going on, Yes, AI is very promising, but let's make sure that the folks are pioneering these solutions are actually pioneering the right ML for the right use cases. I think it was the-we have the expertise. We're very educated about the types of customers and we do a lot of our research and also we're very big on using the right type of AI.

[00:12:05] Joe Colantonio Absolutely. As you mentioned, you both have deep expertise in SRE. When you speak with customers, they could probably tell it's not BS, but you must have had a notion of what you wanted to create. Was it? Did it turn out to be different than what you got into live when talking to customers like, hey, this is actually they're using it for things that we didn't think about or this is an issue that this really helps with that we weren't anticipating.

[00:12:28] Jennifer Rahmani Yeah, no, that's a really good question. When we first got into it, actually, what was really interesting is, we knew we wanted to build for Kubernetes. What we decided to tackle, is to kind of give you an idea of what we do with Thoris is we built like a predictive autoscaler that integrates into your infrastructure and what it'll essentially do is it'll use multi-signal intelligence. So it combines like infrastructure metrics with real-world signals that tie into your infrastructure and better signal, when you need to scale. And so what we do is, we forecast what is going to be workload demand in 5 minutes, an hour, 6 hours, whatever. And we use that to basically scale your infrastructure in advance. And we'll also take a look at the real time to make sure covered in the meantime, if it's an unprecedented spike that our models haven't seen before. And we knew that the industry didn't have it today the way that the scaling works today is very real time. It's very reactive. What that causes engineering teams to do, what we had to do was operate at a low level utilization. Most of our customers before they start using less, they're at 40 to 50% utilization. That means you're leaving 50 to 60% of your servers unused just that hand, just in case. And so we discovered with this technology if we can anticipate and bring them to 85% utilization or more. And we can also help make sure that we scale that before the usage spikes. We created this for that Black Friday use case. You would get the unexpected product launch or unexpected spike in traffic, and you don't really quite know what that's gonna look like. And so usually you over-provision, but you can still get it wrong. The whole idea with Thoris, right, is we can ingest whatever meaningful metrics and help predict and get ahead and make sure you have the right capacity. So that you have just enough capacity to make sure highly available, but you're not massively over-provisioning. That we got right. I think what we didn't realize is how widely applicable the technology is. We got into it for customers that have very variable traffic patterns, but we started also getting customers who are just massively adopting Kubernetes. They want to be able to manage that with less overhead, less complexity. But also sometimes they just want to stop eyeballing the dashboards and figuring out how to scale. They want to have better developer velocity and push out code and not have to worry as much about that infrastructure piece, obscenely scaling.

[00:14:59] Joe Colantonio Very cool. So I started my career as a performance testing engineer. I have to put on a load and staging to try to represent what I think was going to happen in production. It was never the same and it was always difficult. It sounds like this could almost fill that gap. Does it listen to your traffic then and tell you, hey, with the forecasting, you should probably anticipate a load of XYZ in the month of blah, blah, blah. Or like, how close does it get to leading indicators route without you having to jump in. I don't know if that makes sense, but.

[00:15:30] Jennifer Rahmani Yeah, no, it's a very good question. How it works, right, is it integrates directly into your Kubernetes metrics. It installs in minutes via like a single Helm chart command. And we run AirGap today, so like no data comes out from your environment, which is really fantastic, especially for our customers in more regulated spaces. But we basically will ingest with whatever metrics you have, as long as you have data in your environment. That is helpful for determining when to scale, whether it's in any of the major observability tools somewhere in your cloud environment, we can basically grab that data, ingest it, train our models to forecast what is the pattern gonna be in an hour, 6 hours, whatever that's gonna be. And so we'll give you the recommendation for right sizing your pods, but also for scaling. You can run a recommendation mode, but typically most of our customers will turn us on autonomous mode which means we will go ahead and take our metrics, our predictions. We'll take a look at the real-time as well, and we'll just start scaling your infrastructure as a result. What that does is actually very powerful. One, it helps as you're pushing your code through production. A lot of times you have very unknown situations. If you think of like the CrowdStrike outage, where someone pushed out a change and they weren't sure how it was going to affect production, it led to a big outage. A situation like that could actually be prevented because we can start stimulating a lot of those changes in the lower environment and kind of figure out, what is gonna be the impact and make sure that you're fully covered against the unexpected and the unknown as you're pushing to production and running in production.

[00:17:13] Joe Colantonio Nice. Here's a dumb use case, very small. I've run webinars and we're doing a new system. And sometimes when you run a webinar, you get like 200 concurrent users trying to log in at the same time. Using this technology, would you be able to beforehand say, hey, I'm having a webinar on this date. I anticipate this. Rather than have it guess, let it know, Hey this is something you should be looking at.

[00:17:35] Joe Colantonio Yeah, that's a very good question. Exactly, and that's the premise of it, right? You have a webinar in this example, and you kind of have an idea of how many users you're gonna have, but not really, right, you could be completely off. Typically how this works without a solution like Thoris, right is you just guess and you hope for the best and you think I'm hopefully gonna have this many users and let's just provision it that way. And as you see more users logging on, you quickly make changes to be able to scale. But the problem is, is if you get it wrong and you have to really scale your webinar, users are probably having issues getting on there and they're experiencing latency. But with Thoris, since we're integrating with more of the metrics that are meaningful for determining how much traffic you're gonna get to hit that webinar, we actually will anticipate it in advance. The whole idea is all your users could get on there no matter what the scale ends up being. We're reacting very quickly to manage that scaling. But you're also not overspending just in case and keeping that extra compute.

[00:18:40] Joe Colantonio And obviously it helps. It shuts down resources that it knows it's not in use. So you don't have to always look, I guess, or have like set times that you guess like during night time. It's not going to be a peak when you don't necessarily know this automatically does that for you.

[00:18:55] Jennifer Rahmani Yep, that's absolutely correct. When you deploy us, we start learning the workload pattern. We'll take care of that scaling decision. And so we'll scale you up, increase your pod sizes when you're expected to get more traffic in advance. But we'll also scale you down and shut everything down during those periods of low traffic. I think of Thoris as, always kind of having that eye out so that you can sleep soundly just in case you get more traffic at an unexpected hour. We're also reacting very quickly and learning what those patterns are as they evolve over time.

[00:19:34] Joe Colantonio You did mention Black Friday. It always amazes me how many times huge companies get this wrong and it's not too far away. Definitely using a solution like this will definitely help them out. Do you find that that's an easy sell? That example when you go to a company, hey Black Friday is coming or do people not get it and it is still a hard education you need to do?

[00:19:54] Jennifer Rahmani So the Black Friday use case, I like saying it, Joe, because it's a very applicable, as a very understandable use case because everyone pretty much knows what Black Friday is. So I think it really does help demonstrate. Now, I use it as a starting point because not everyone is affected by Black Friday, but it does help paint the analogy of how that Black Friday used cases. And this is one thing I learned right from us working with Thoris and continue to mature the platform and for more customers. That's Black Friday use case ends up lending itself to any unexpected event. And if you think about it, like any unexpected, what is an unexpected event. It's typically as your environments change, as your code, more code gets pushed out, your applications get heavier, your traffic gets heavier, you do product launches. All of that are kind of similar to the Black Friday use case in the sense that on a smaller or different scale, you don't know what your traffic patterns are gonna be. You don't what your scaling should look like. So it's the whole idea of whenever you have your environments changing in some type of unknown pattern, having Thoris be able to, it illustrates that Thoris is able to better insulate and take care of that scaling efficiently, but more reliably so you're covered no matter what.

[00:21:10] Joe Colantonio This may not be a case. Can it be used for security? Say someone's trying to take down your website. Will it notify you, hey, you're getting some really odd behavior here based on a forecast. You might want your security team to jump in or something like that.

[00:21:26] Jennifer Rahmani And that is one piece too, we believe in helping to take care of the pieces of automating infrastructure and scaling that engineers don't have to manage, but also giving engineers the right alerts about what they need to know about so they can go ahead and take action and be informed. So that is exactly how it works. We'll take care of the scaling. We'll also kind of let you know of those anomalies and weird types of issues in your environment that, hey, you should take a look at, and this is where it's going on, and what else it might be affecting. But yeah, that's a great question. That's exactly a very realistic use case, actually, that we help with.

[00:22:06] Joe Colantonio Very cool. I also note sometimes companies are really cost conscious. Does this give you like, hey, you saved X amount of dollars using Thoris this month or anything like that?

[00:22:16] Jennifer Rahmani Yeah. So when customers deploy us, we made sure that we baked the cost savings and ROI upfront. We're constantly telling customers, how much they're saving using Thoris, how much could save using Thoris. We're also giving them more of an idea of as they continue to grow, you to help them be able to plan for the capacity better. We are very, very big on, even though we take a reliability first approach, we believe that. Reliability is the number one KPI. We know that cost is very important, so that is something that we make sure is very clear in our product.

[00:22:53] Joe Colantonio Who's the perfect use case for this product? Is it, say someone has a Greenfield application or like, do you have a certain like, yeah, you're probably better off if you're already in production because then we can work off real data and historic data or does it matter?

[00:23:08] Jennifer Rahmani We have customers across a whole broad of industries. We have e-commerce, we have B2B, we have cybersecurity, we add tech, health tech. We find that as long as you have data in the cloud, you're using Kubernetes, we can help scale your applications better. We do see a whole range of use cases. There's a lot of migrations today. We actually help really well with that also because if you're migrating to Kubernetes or migrating into different cloud platforms, it really does help to have a platform that helps take a look at. You're evolving traffic patterns and workload needs and help inform you with the right insights to be able to take those decisions. Very open-ended, right? We can support any use case, anyone that's running these cases in the cloud, but it is interesting that we are seeing a lot of migrations and we're also able to help with that as well.

[00:23:55] Joe Colantonio Very cool. Besides migration, I know a lot of times companies are doing like a multi cloud type of environment. Does this work with that type of setup?

[00:24:04] Jennifer Rahmani It does, yeah. In fact, many of our customers are running like these larger environments. They're multi-clouds, they're hybrid. And I think that is part of the reason they need a solution like this because there's just so much complexity. There's so much overhead. There's different engineering teams involved and they really need a shared tool to be able to help efficiently manage all of this.

[00:24:24] Joe Colantonio Engineers tend to be somewhat skeptical, in my experience. How many feel confident in using that autoscale feature without a human kind of in the loop?

[00:24:35] Jennifer Rahmani It is interesting, right? We do take a human in the loop approach, right. So we've made sure we champion like a very safe adoption for low risk adoption for Thoris. When you deploy us, you can run us some recommendation mode where we just give you recommendations. That's something that we built in the product, but we've actually found that all of our customers and anyone that tries this out switches to run us autonomous because they immediately see, everything upfront about the scaling decisions and how much potential and savings. That was one thing I was wondering when we were going into this, right? We know the appetite is there, how quickly is this going to happen? And I was very pleasantly surprised to see that not only are we being put in production environments where we are more useful because there's more variable traffic patterns, but we're getting put across all these different environments and engineers are actually trying Autonomous. And I think part of it is there is trust with the platform that we're not just the cost savings tool. Unlike many of the other tools out there that just scale based on cost savings, what's the cheapest decision? We're taking more of that reliability first engineering approach by making sure that we are making smarter scaling decisions that are more proactive, that make sure you're covered against unexpected events, but also allow you to have that higher utilization. And I think that reliability first method is what enables so much trust in the platform.

[00:26:01] Joe Colantonio And speaking like a lot of SREs really want to be hands on, I think. A lot of solutions are kind of like black box. It's like, ooh, we're doing this magic here. Can't look at it. And sometimes people just want to look at or be able to tweak it. Is that even an option or something that you think about?

[00:26:18] Jennifer Rahmani I love that you use the term black box because we always say this, especially when we're talking to customers, anytime, especially with AI tools. Like there's always questions about what type of AI you're using. Just different questions about is there like a black box to how does all this works? I'm a big believer that if anyone ever tells you that it's a blackbox and we can't tell you anything, it's completely proprietary. It just takes care of everything behind the scenes. Don't worry about it. That's a red flag, right? I think things should be more visible. And that's one thing we've gotten really good with our customers with balancing in our platform. There's a balance between taking care of the right tasks, like scaling and automating those, that low-hanging fruit that is very critical. But at the same time, giving engineers the right insights about what to look at and what to worry about. And I think striking that balance is very, very important, especially with the human in the loop, safe AI adoption. That is needed for these tools today.

[00:27:16] Joe Colantonio Nice. I guess another question I would have is a lot of companies I speak to have like kind of cultural issues or things in place in order to be successful with the X technology. Does a company need to have some basics in place in order to be successful with their product? Do they need open telemetry? Does that help? Like anything like that?

[00:27:37] Jennifer Rahmani The way we've built this is we've made it pretty easy to grab any data from it's anywhere. So we support all the major observability tools. We have integrations with all of them. We can grab data from there. But also even if you have data that's important, that's not in one of the tools, we have a way of ingesting it. I think right now with how much the platform has matured, as long as you have some data somewhere in your environment that is helpful for making these scaling decisions. We can work off of that. It is pretty open-ended in terms of what we can support. Today, we support Kubernetes. Of course, we would love to go beyond that. We do find that as long as you have the data, we can take care of it and help.

[00:28:26] Joe Colantonio Also, I know a lot of times, SREs sometimes complain about a lot of false positives. How do you avoid false positives or maybe even overscaling because maybe a prediction error?

[00:28:38] Jennifer Rahmani Yeah, that's a great question. This is another reason, by the way, that we're using traditional machine learning, which is really good for this type of thing, right? Statistical data that is well-suited for machine learning which is very well- suited for it. But it's funny, we sometimes get questions, right, like some folks for LLMs, right. There's a concern of hallucinations. Luckily, that is not a concern here because we're more traditional machine learning. But of course, there's always concerns about other types of issues like the false positives. With the way that we've baked the product, you can see how well our predictions perform. And I think that visibility helps really greatly. And again, these models are constantly trading, learning your patterns, your workload needs. And I that's what's also really helping them to kind of get better, especially as they see more use cases.

[00:29:27] Joe Colantonio Anything on your roadmap that you're excited about that you could share?

[00:29:30] Jennifer Rahmani Yeah, absolutely. So at its core today as a wedge product, we're helping engineering teams scale smarter. But our vision actually goes beyond auto-scaling. We wanna help engineers not only handle unpredictable traffic with ease, better plan for pushing out changes to production, making sure they're always prepared for these big moments, right? Like product launches, viral growth, Black Friday. All these different types of surges, but without scrambling, being reactive and overspending. So ultimately the goal is to give developers and engineers a confidence and the data driven decisions to move faster without breaking things, especially in today's day and age, where you have different tools that help you to write code quicker, test better. We wanna make sure that we're that layer that helps Ops engineers give them the right insights. And automation to help run their infrastructure more efficiently so that as these applications are pushed out quicker, they're able to run more safely in production. And so, you're better balancing performance, costs, reliability. And I think to do that, that goes beyond auto-scaling. It's really essentially building different features and products that help utilize all the data that we have today and give engineers those right insights so that they're always involved, human in the loop to be able to run more efficiently and be more highly available.

[00:30:59] Joe Colantonio All right. So you were a hands on SRE for many years. Is there anything you thought, you wish you had this product when you were like hands on, when you got a call and you're like, oh, if I only had this insight before I would have solved this issue beforehand.

[00:31:14] Jennifer Rahmani Absolutely. I think having been on Hall, there's so many incidents I can think of in the past where it was just such a wild goose chase to figure out what was the originating issue? What were the symptoms? And I think, I believe had I had a tool like Thoris, I would have been able to take a look at the right insights. And I would've been alerted into like, what was anomalous type of patterns that were introduced. By any changes made in the environment. Oftentimes, for example, I think once there was an incident where a developer pushed out an image that started consuming more memory and that affected scaling, I'm gonna have had some other downstream impacts. And it took a while to kind of figure that out, but I had a tool like Thoris, not only would it have scaled to make sure that it would have taken care of that so there was no performance impacts. But also would have alerted me that hey, this workload is affected. There was a change that was introduced that did this. It just would have been a lot quicker and mean time to response is a very important metric. It's very critical that we cut that down in the SRE space and that would have been much lower I think with a tool like Thoris.

[00:32:33] Joe Colantonio All right, Jennifer, before we go, is it one piece of actual advice you can give to someone to help them with their DevOps SRE efforts? And what's the best way to find contact you, or learn more about Thoris.

[00:32:43] Jennifer Rahmani Yeah, absolutely. One thing I always tell folks is with all the AI out there and all the demanding work that SREs have, figure out the right workflows and ways that you can use AI and embrace it. I think there's a lot of opportunities with the work that we do. And find the right AI for the right use case. As for finding me, feel free to check out Thoris.ai. That's our website. You'll find our docs. You can also request a demo or even try out our product if you would like and see how it works firsthand. You can connect with myself also through LinkedIn. My LinkedIn is, well, it's linkedin.com/in/Jennifer-Rahmani. Feel free to add me, would love to connect.

[00:33:29] And we'll have links for all this awesomeness down below.

[00:33:32] And for links of everything in value we've covered in this DevOps ToolChain show, head on over to testguild.com/p199. So that's it for this episode of the DevOps ToolChain Show. I'm Joe, my mission is to help you succeed in creating end-to-end full-stack DevOps ToolChain awesomeness. As always, test everything and keep the good. Cheers.

[00:33:54] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:34:37] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
Two men are featured in a promotional image for "TestGuild Automation Testing" discussing Playwright and AI in QA, with the text "with Ben Fellows.

Playwright, Cursor & AI in QA with Ben Fellows

Posted on 08/31/2025

About This Episode: In this episode of the TestGuild podcast, Joe Colantonio sits ...

Why AI + DevSecOps Is the Future of Software Security

Why AI + DevSecOps Is the Future of Software Security With Patrick J. Quilter Jr

Posted on 08/27/2025

About this DevOps Toolchain Episode: Support the show – try out Insight Hub ...

A man with glasses and a beard speaks animatedly into a microphone. Text reads "TestGuild News Show: Weekly DevOps Automation, Performance Testing, and AI Reliability. Breaking News.

Playwright MCP, Cypress FHIR API, AI Test Management and More TGNS167

Posted on 08/25/2025

About This Episode: Is AI the future of Test management? Have you seen ...