Proactive Observability in Testing with Anam Hira

By Test Guild
  • Share:
Join the Guild for FREE
Anam Hira TestGuild Automation Feature

About This Episode:

In today's episode, we're diving into proactive observability and testing with our special guest, Anam Hira, cofounder of Reveal.ai. Anam, who also has experience working at Uber AI, shares an intriguing journey where he developed “Dragon Crawl,” an innovative project aimed at tackling challenges Uber faced with its end-to-end testing across multiple cities.

We explore how Dragon Crawl utilized LLMs to enhance testing reliability, making tests less flaky across varied UIs.

Anam's journey didn't stop there. He co-founded Reveal, a platform that takes testing and observability to a new level by connecting end-to-end tests with telemetry data. This modern approach, termed proactive observability, allows for detecting bugs before they hit production, saving companies significant time and cost.

Join us as we explore the principles of proactive observability, how Reveal leverages telemetry for seamless integration, and its impact on testing efficiency. Whether you're a startup or an enterprise, if you're keen to ship faster without sacrificing quality, this is an episode you won't want to miss!

Exclusive Sponsor

Sponsored by BrowserStack

1.3 billion people live with disabilities, representing $6.9 trillion in disposable income. If your apps and websites aren’t accessible, you’re missing out—and risking legal trouble.

BrowserStack makes accessibility testing easy. Their platform automates ADA & EAA compliance with the Spectra™️ Rule Engine—catching 40% more critical issues than older tools.

Scan full user journeys, test with real screen readers, and get guided “Assisted Tests” even if your team’s new to accessibility.

No setup. No maintenance. Just inclusive experiences—at scale.

👉 Start building accessible apps with BrowserStack: https://testguild.me/accessiblity

About Anam Hira

Anam Hira

Cofounder Revyl – Proactive Observability; ex Uber AI Proactive Observability, using LLMs from end to end testing

Proactive Observability, using LLMs from end to end testing

Connect with Anam Hira

Rate and Review TestGuild

Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.

tgaAnamProactiveObservabilityinTesting539.mp3

[00:00:35] What is proactive observability in testing? Don't know? Well, you're in for a treat because joining us today, we have Anam Hira, a co-founder of Revyl, a proactive observability platform. He also is an ex-Uber AI employee. So he knows a lot of stuff. I actually caught my attention as he released the paper, Uber released the paper on an awesome project that he worked on that I think is going to give you a lot of insight on what is proactive observability in testing and how it impacts to end-to-end testing. You don't want to miss it. Check it out.

[00:01:02] Joe Colantonio 16% of the world's population, that's 1.3 billion people live with disabilities and have 6.9 trillion in disposable income. If your websites aren't accessible, you're missing out on this massive market. Beyond loss of revenue, non-compliance with ADA or EAA put your business at legal risk. But when you build with accessibility in mind, you enhance your UX, drive sales, and expand your customer user base. The challenge? Manual audits are slow and free tools often miss critical issues and lack scalability. And developers struggle with prioritization. What do you do? No worries, Browser Stack solves this. Their all-in-one platform automates ADA and EAA compliance powered by their Spectra Rule Engine, which detects 40% more critical issues than legacy tools. Workflow analyzer scan multiple webpages in one go and entire user journeys without having to rerun scans for every page state. You can also validate accessibility using the screen readers such as NVDA, voiceover, and talk back on real devices with zero setup. The best part, even if your team lacks expertise in accessibility testing, features like assisted tests let you order websites and web apps by answering simple step-by-step prompts. Plug and play, automated accessibility checks into your workflow, no setup, no maintenance overhead. Expand your market, save time. and build truly inclusive experiences. Learn more and try it for yourself using the special link down below. Support the show. Check it out.

[00:02:33] Joe Colantonio Hey Anam, welcome to The Guild.

[00:02:37] Anam Hira Thank you for having me, Joe.

[00:02:39] Joe Colantonio Awesome to have you, I guess before we get into it, like I said, you sent me a link to something called DragonCrawl. So I think it had something to do with challenges you had at Ubers end-to-end testing that you're working on and the process led you to developing this thing called DragonCrawl, so what is DragonCrawl?

[00:02:54] Anam Hira Yeah, we can go over a quick overview. DragonCrawl was my intern project back in 2022. Problem Uber was happening was that they run in 200 different cities. Each city has slightly different UI. So testing it becomes impossible with like normal Playwright scripts or like Appium scripts, because every time there's this like change, you also have to change like the code. DragonCrawl was using LLMs to kind of semantically understand the app and the tests. You'd have like a test like core trip flow which is ordering a ride and picking up a ride and it was testing that using LLMs so the UI changes wouldn't break the flow.

[00:03:31] Joe Colantonio You did this as an intern? Did I hear that correctly?

[00:03:34] Anam Hira Yeah, it was originally my intern project and then the team grew a bit and then I joined back full time as a machine learning engineer.

[00:03:42] Joe Colantonio All right, so how did you come up with this idea then? Were you just messing around or did someone suggest it to you?

[00:03:48] Anam Hira Yeah, it was actually, so like the original project was just to use like normal testing, but I was somewhat interested in LLMs at the time. I was a new field, so I kind of suggested it to my manager saying, hey, this LLM thing is kind of interesting, can we do this? And then my manager, who was really nice, he's like willing to take the risk of using LLMs. And then we ended up doing that, ended up being pretty successful, found 11 P0 bugs, basically just means people can't order a ride. Counts like 25 million saved, so it ended up being a pretty big project within Uber.

[00:04:19] Joe Colantonio Thanks. How did you get there, though? Like, how did you achieve such an awesome result?

[00:04:23] Anam Hira Yeah, I think it was a lot of help from like my manager and my mentor. One of my mentors, Juan Marcano was like so helpful in it. And I think another thing was just like really bounding the scope. The original project was just for ordering a ride. That was the highest impact thing at Uber and then just building out like really quick proof of concepts to get it into production really quickly. I think that's what helped us make it really fast.

[00:04:46] Joe Colantonio What was the initial issue? It was the tests were flaky and using LLMs, you were then able to make them more reliable. Is that how it worked?

[00:04:53] Anam Hira Yeah. So the problem that they were having was, yeah, they were really flaky because like there'd be pop-ups or like they would have like A/B tests where like super slight like variations in the UI, but they would break the test because like the selector ID would be different or like there would be a pop-up in New York for some like regulation and the tests were like super flaky, like 50% flakiness and like they just were getting no signal and then like engineers, like they see no signal on a test. They just want to push the production. Like these tests are always red. Like I'm not trusting them. And so from there, like DragonCrawl, it had 99% reliability. So you got a lot more signal from the test. Like when a test failed, you know it failed. Like it was a true failure.

[00:05:32] Joe Colantonio And you mentioned those tests were Playwright tests. Were they selenium tests?

[00:05:36] Anam Hira It was a combination. Most teams, like it was pretty fragmented at Uber. Most teams were using Appium. There were some people using like Playwright would be for web. There was also like manual testing as well, just because like they were so flaky that you had to manually test.

[00:05:50] Joe Colantonio Nice. So you talked about some LLMs and how they enhance the testing reliability. How did you find out? Like how did you tap in then to like observability part of it?

[00:05:59] Anam Hira Yeah. So the observability part came after we had the idea actually talking to, I think, you know Jason Arbon, he was telling us how like all these companies are using so much observability. They're spending so much money on observability. We had the idea of connecting tests to traces. So like you run an end-to-end test and then you connect that to the traces it generates. So if there is a bug, you can see exactly what microservice the bug happened in, and that's what we're doing at Revyl, we're connecting the test to the traces. like have the visibility into where the bug happened. Cause at Uber, even if the DragonCrawl found a bug, you're supposed to still spending so much time finding exactly where the error is. Now you can kind of see where, like what microservice it hit the failure app.

[00:06:44] Joe Colantonio All right, so I guess let's dive into that a little bit, what that looks like. So you created DragonCrawl at Uber, and then you founded Revyl. How was Revyl built upon the principles of DragonCrawl? So it sounds like you then, like how did you build it? How do you get the observability? How do you get the tie-in now with your platform?

[00:07:02] Anam Hira Yeah, so Revyl uses LLMs to run end-to-end tests, like written in natural language, similar to how like a similar principle to how it's done in DragonCrawl, but we connect it to the traces. Imagine you're Amazon, you're running a checkout flow, you want to test that end-to-end, like the end-to-end experience of how a human would go through it, and then it's connecting to the traces that are emitted if a company is using Open telemetry or something along the lines of that. And you can see exactly where the error would happen from that. The triaging in time becomes a lot faster.

[00:07:40] Joe Colantonio For the folks that don't know what telemetry is, what does that mean? Does it mean like it actually shows you where the issue is in the function, in the SQL statement, like what does that mean?

[00:07:48] Anam Hira Yeah, telemetry is kind of like you're like Datadog, you're Splunk. Generally code is like instrumented. So like, let's say you have a microservice, you'll instrument that code. And let's say you connect the front-end to the, like there's a front-end microservice and there's a back-end microservice, your telemetry will allow you to see like the, what is called span that's going from the front-end to the back-end, basically just see every server that's hit.

[00:08:12] Joe Colantonio Nice. Do you run these tests in production or is it in staging? Do you see developers using it early on to find issues quicker when they run their tests on check-in?

[00:08:22] Anam Hira Yeah, there's a large variance. I think smaller startups gain more value from it in production just because it's easier to do. It's you don't need like most startups don't have staging pipeline for our enterprise customers. They definitely get more value from running it in staging or shift it further left because the problem is if you get a bug in production, like once you're testing in production, you have that bug in production. If there is a bug and that just costs a lot, you lose a lot of trust. At Uber, like a P0 bug costs $2 million for every hour that is in production. The costs are really high if a bug does get into production. There's a large variance startups running it more to the right in production and then enterprises running it more to the left in their staging platforms or even local host.

[00:09:07] Joe Colantonio All right, so if I have an existing test suite and I want to get this telemetry data using your platform, what do I have to do?

[00:09:14] Anam Hira Yeah. The steps is to like, if you're a large enterprise, you probably will have like open telemetry instrumented. If you're a startup, it's instrumenting with open telemetry. Things like Sentry use open telemetry. So if you're using something like Sentry, you actually already have open telemetry set up. And you can see the traces and like, we can connect to the traces. Yeah. Those are like the steps, just connecting some kind of open telemetry, something like Sentry, something like Datadog to your code base. And then we can capture those traces.

[00:09:45] Joe Colantonio Alright, so speaking of Jason, I actually had him on a webinar yesterday and he was talking about Agentic AI, it's like the new word I've been hearing the past few months or so, and I think on your site you mentioned something about Agentic flows, maybe I'm wrong. If that's true, what is Agentic flows and how does it help make your UI, more resistant to changes, your tests more resistant to UI changes?

[00:10:06] Anam Hira Yeah, I think like what some people will do for testing with LLMs is they'll use the LLM to generate the test case and then you'll have like a Playwright test generated delineation with Agentic is agentic wouldn't generate it on the like beforehand, it would kind of test like a human would. You observe the state like let's say you're on a home page, you get the state, you get the DOM, parse it into like an LLM form and then you're sending that state alongside the task you want to do. It's kind of like how we would test as humans, where it's like we see this state and we have some tasks we want to do, let's say site, like log in with x email and then it's getting that state doing the task and then running it through a loop rather than generating the test before and like running through it.

[00:10:58] Joe Colantonio How does this work? Is this just the observability piece? So you bring your own test to the platform and then the platform handles the gluing of the test to the telemetry data?

[00:11:09] Anam Hira That's part of it. Our thought on this is like, there's a lot of observability platforms. There's a lot of testing platforms. Previously, like all of the companies we speak with, they want to put an investment in testing, but there's it's just so flaky. Like, if you're getting 50% reliability on your tests, you're just not going to run them, you're not going to have any trust in them. That's why a lot of these companies are using observability platforms, because they give you data and give you signal. Problem with observability platforms is that they're in production. Bugs cost a lot of money for users, users who's trust in the product. We're kind of unifying it and running these resilient tests, resilient agentic tests using LLMs and then connecting those to get the visibility of tracing and visibility of observability to know where the bug happened. It's a bit of both. That's why we call it like proactive rather than reactive observability, kind of finding the bugs before they happen in production.

[00:12:04] Joe Colantonio Can you give an example of that? Of finding a bug in production?

[00:12:08] Anam Hira Yeah. So let's say you're Uber, you release like an A/B test for Australia where there's now a pop-up for ordering a ride and that breaks like you can't order a ride anymore. An example that reveal would catch would be it would run that test in Australia for ordering a flow, like kind of emulating how a human would go and order a ride and it would see like this pop-up caused me not to order a ride because it has the open telemetry traces set up, it would be able to see exactly the A/B test that caused this error. You as a developer, you catch the error before it goes to production and you save like the $2 million you would have lost in revenue if it had gone to production and just running these over and over again, just to have confidence to be able to ship to production.

[00:12:59] Joe Colantonio Do they get alerted or does the build just denies the check-in?

[00:13:03] Anam Hira Yeah, the typical workflow would be that you would push this like A/B test or push this like merge request, pull request for a feature, it would run the test. And then you would get alerted that yes, this build failed because this test failed and you should look back into it where the error is.

[00:13:20] Joe Colantonio How long has this platform been around? Cause it's one of the first times I heard of it. I don't know how new is it. Do you have any existing customers and what are they saying about it?

[00:13:26] Anam Hira Yeah, we've been working for around five months. We've been working with some startups as our early customers, as well as some enterprises that we're starting to work with now.

[00:13:35] Joe Colantonio Nice. Are you in Y Combinator as well?

[00:13:38] Anam Hira Yeah, we're in the fall badge for YC. It just ends, we're just like wrapping up the fall badge.

[00:13:43] Joe Colantonio How does that work? Did you get a lot of any feedback from that? Does that actually help you? I'm just curious myself, Y Combinator? What's the benefit of it?

[00:13:52] Anam Hira Yeah, I think it's super helpful. Like especially the group partners, we have Andrew Miklis, the co-founder of PagerDuty and Dalton Caldwell. Being able to bounce ideas back with them is like super helpful. Especially like Andrew, he worked on a very similar, I guess similar area of product where he's selling to like large enterprises, getting feedback on like how to do sales, how to develop, all that stuff is like, where it's really invaluable to be honest.

[00:14:19] Joe Colantonio Very nice. There's a big, big push for AI, Agentic AI. Do you see this as being a commodity soon? Or like, where do you see the future of this? Do you think this is like a must have, or do you think this is only certain companies would benefit from it?

[00:14:33] Anam Hira I think right now only certain companies benefit from it that have fast Dev cycles and have like high impact of failures, but as I guess the problem, a lot of the companies we talked to are facing is they're using like cursor or like a Codegen tool for all their generation and they don't really have anything validating that it's working. They're trying to ship a lot faster now and bugs are just getting into production. It's, I think LLMs as a tool will be commoditized in the sense that there's only a couple of major players, but any like vertical integration, like vertical application, I don't see it being commoditized. And I do see it being a must have as engineers want to shit faster using code generation tools and the speed of development becomes a lot faster.

[00:15:18] Joe Colantonio What's your vision for Revyl then? Do you have like a master plan, like you're going to take over this segment or this use case?

[00:15:25] Anam Hira Yeah, I think the master plan is just really unifying testing and observability. A lot of these companies try to run tests, they fail because there's not high reliability, there's flakiness, like the app is undeterministic because of some like ML, like personalization that they use and then they also use observability tools, observability tools are good, but the bug is already in production, bugs in production costs a lot, you have to triage it, spending a lot of time. Why we call it proactive observability is it's really just observability platform that catches bugs before they hit production and just like, I guess the master vision is to combine the two companies only need to have one. We test across Android, iOS, and web. Write one test in Android and it'll work across iOS and web as well. I think companies are tired of having a bunch of fragmented services. One of the companies we talked to have three observability platforms. They're tired of paying so much for them. And yeah, I think we can really unify it and give customers a better experience.

[00:16:30] Joe Colantonio Nice. People have a lot of manual tests, a lot of manual processes. Does this help at all if they want to transition to more of like an automated deployment type transition?

[00:16:38] Anam Hira Yeah, 100%. A lot of the startups we work with, they were manually testing, like completely manual testing before they onboarded to Revyl, and now they just run it on like a per diff basis. Every time they commit, they run the tests and it's automated their manual testing.

[00:16:56] Joe Colantonio So I'm just always curious, do you develop this with something in mind? You worked at Uber, but now it's in the wild. Do you see people using it in a way you didn't think about that made you think, huh, maybe this is a use case that we may need to expand on or add more features to.

[00:17:10] Anam Hira Yeah, I think some of the unexpected use cases were stuff like compliance. A lot of these financial companies have like strict compliance regarding their end-to-end experience. Example would be like a company like Intuit would need some compliance pop-up to show up. And that's like mandatory or else they get sued. End-to-end testing that on every diff is a use case that was unexpected that they couldn't do before because like they use the LLMs for kind of like the semantic reasoning behind it of the compliance, kind of like how a human would see a compliance pop up. Another thing is like startups really want to test in production rather than staging. They just want to know that the end product works. They want to ship really fast and they just don't care about staging. Like they just want to see that the end product works.

[00:17:56] Joe Colantonio How hard does this sell it for this? It seemed like it'd be a no-brainer, but like, do you find any type of resistance? Like, is it a hard push to get people to understand what this platform is doing and how it can help them?

[00:18:07] Joe Colantonio I do think it's a harder sell in smaller companies for larger enterprises. This is like, we find it to be like a hair on fire problem. Like bugs are getting into production. Testing is really hard. They're paying much manual people to do it for startups. It's like less of a problem just because like, if there's a bug, it's not the end of the world, they can just roll back and refer in regards to like knowing what observability is testing is. I guess the connecting to tests, connecting your tests to tracing is a bit harder of a sell. just because it's a new thing that not many other companies are doing. But yeah, for enterprises, it's not too hard of a sell.

[00:18:44] Joe Colantonio What does it work with? Because I know enterprise have a bunch of different technology. We mentioned Playwright, Appium, Selenium. Does it matter if it's mobile, Native? How does it work? What does it work with, I guess?

[00:18:55] Anam Hira Yeah, it works across the board. We have created our own custom Playwright drivers and own custom Android and iOS drivers for the device automation. We work with like, it's very easy to onboard it because it's essentially just taking the state of the app or with the web and then doing an action on that it's agnostic to any platform.

[00:19:16] Joe Colantonio Okay, before we go, is there one piece of actionable advice you can give to someone to help them with their AI tracing kind of testing efforts, observability, and what's the best way to find or contact you to learn more about Revyl?

[00:19:28] Anam Hira Yeah, the best way to contact is at Revyl.ai or on LinkedIn and Jira. And in terms of actionable advice for AI tracing and observability, I would say less is more. We find that people try to test everything, testing just the most important parts of your apps, but making it bulletproof is way more valuable than testing everything and testing like the tail end, just because that's where most of the value is going to be.

[00:19:54] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a455. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.

[00:20:52] Joe Colantonio And if you're a tool provider or have a service looking to empower our Guild with solutions that elevate skills and tackle real-world challenges, we're excited to collaborate. Visit testguild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.

[00:21:12] Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
A Halloween-themed promotional graphic for TestGuild Automation Testing's "Optimus Prime Halloween Special" with Paul Grossman, featuring festive decorations and two men, highlights the fun side of test automation during Halloween.

Test Automation Optimus Prime Halloween Special

Posted on 10/19/2025

About This Episode: In this Halloween special, Joe Colantonio and Paul Grossman discuss ...

Test-Guild-News-Show-Automation-DevOps

Testing Skyscrapers, AI Drift, Playwright Agents That Promise to Do It All TGNS171

Posted on 10/14/2025

About This Episode: Is the Testing Pyramid holding your team back? AI agents ...

Two men are featured in a promotional image for TestGuild Automation Testing, highlighting a session on Playwright AI Vibe Testing with Vasusen Patil and exploring the benefits of self-healing tests.

Playwright AI Vibe Testing: True Self-healing Tests with Vasusen Patil

Posted on 10/12/2025

About This Episode: Flaky Playwright tests got you down? Discover Vibe Testing, a ...