About this DevOps Toolchain Episode:
Today's episode explores observability within the software delivery lifecycle, particularly test-related events. Joining us is Rodrigo Martin, a principal software engineer and test architect at New Relic with an impressive two decades of experience in software testing.
Rodrigo sheds light on the often overlooked aspects of observability in testing environments, discussing how tracking test-related events can significantly enhance debugging capabilities, performance monitoring, and overall test suite reliability.
He shares actionable insights on getting started with event tracking, the key metrics you need to focus on, and the common challenges you might face. Whether you're dealing with flaky tests, trying to improve pipeline performance, or simply wanting to understand your testing processes' inner workings better, Rodrigo's expert advice has got you covered.
Don't miss this episode if you want to take your DevOps observability efforts to the next level. Grab your headphones and join us on this journey to make your CI/CD pipelines more efficient and your developer experience richer. Listen up!
About Rodrigo Martin
Rodrigo is a Principal Software Engineer and Test Architect at New Relic, leveraging his two decades of experience in software testing to enhance testing tools and processes. Outside of work, he balances his time between fitness, producing electronic music, and immersing himself in sci-fi literature.
Connect with Rodrigo Martin
- Company: www.newrelic
- LinkedIn: www.martinrodrigo
- Git: www.rodrigojmartin
Rate and Review TestGuild DevOps Toolchain Podcast
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.
[00:00:00] Get ready to discover some of the most actionable DevOps techniques and tooling, including performance and reliability for some of the world's smartest engineers. Hey, I'm Joe Colantonio, host of the DevOps Toolchain Podcast and my goal is to help you create DevOps toolchain awesomeness.
[00:00:18] Hey, do you want to know how to actually track test related events? Or do you even know what that is? Well, if not, you're in for a special treat because we have Rodrigo Martin joining us, who is a principal software engineer and test architect at New Relic, where he leverages his two decades of experience in software testing to enhance testing tools and processes. I'm really excited about this episode, don't have a lot of information on this. So I think you'll really going to learn a lot. You don't want to miss it. Check it out.
[00:00:44] Hey, Rodrigo, welcome to The Guild.
[00:00:46] Rodrigo Martin Olivera Hi, Joe. Thanks for having me.
[00:00:50] Joe Colantonio Great to have you. I guess before we get into it, it's a pretty short bio. Is there anything in your bio that I missed that you want to give to know more about?
[00:00:57] Rodrigo Martin Olivera No, I think that's pretty much it. I think it covers pretty well my background.
[00:01:03] Joe Colantonio Awesome. Awesome. I guess before we dive in, I came across an article you wrote about how to actually track test related events, I guess, at a high level. How do you even explain what does that mean how to track test related events?
[00:01:16] Rodrigo Martin Olivera Yes. Basically this lies in the world of observability. As you know, observability is being in the news for the last couple of years, long couple of years. And usually when we talk about observability, we think on tracking stuff that happens in production. Having observability in our production systems that are usually really complex, all of microservices, macrofronts, etc. But sometimes we neglect the other aspect of the software delivery lifecycle. And that includes testing. Basically the idea of this article is to let people know how can they track events related to their testing processes and tools. And that part of the pipeline so they can have better observability and they can debug issues faster.
[00:02:04] Joe Colantonio Nice. Obviously, you need to have some sort of observability in place, I guess, for this to work. What it needs to be in place in order to actually get started, though?
[00:02:11] Rodrigo Martin Olivera Yes. Basically, you need to have a place where you can store these events. In the case of my article, I work at New Relic, so I use New Relic. But you could use another tool where you can track your events where you can store your events related to testing or production, etc. You need to actually instrument your code to actually send those events and define exactly what you would like to track and which types of tests are you most interested in? Which type of metrics will you like to track coverage etc. so it's kind of defining what you want to track. Having a place where you can send these events and eventually query that platform and get your answers.
[00:02:55] Joe Colantonio Cool. I guess, what are some of the key challenges someone would is trying to solve? Like why use test related events? Like what is the benefit to them?
[00:03:05] Rodrigo Martin Olivera Yes. One of the main benefits is that you will have a way to ask questions about the context of your system whenever those tests were executed. Usually, when you have hundreds of test cases, thousands of cases or even more, it's really hard to pinpoint the exact issue or where does those tests fail within the pipelines. Having this instrument that allow you to ask better questions and be able to pinpoint exactly where you need to develop those failures. But not only that, you can also track performance of your tests. That's really important because we want our pipelines to be as fast as possible. You can also track coverage or other metrics that are more related to governance. Let's say if you're working with multiple teams or a multiple of teams and services and you would like to have a centralized way to see things. Those are probably the most important use cases.
[00:04:03] Joe Colantonio All right. Say I have a test suite of Selenium tests, a Playwright test. It doesn't matter, Cypress whatever it is. Do I have to ensure that those tests or can I run them in staging or development? And how do I know that if this test fails, how do I correlated with the test event?
[00:04:18] Rodrigo Martin Olivera Yeah. So basically you will need to have a way via plugin or via your test runner or your pipeline to send the events that you are most interested. Let's say you work with Playwright. That's what we work with, right? Every time you run your tests, there's usually an opportunity to use an after hook. So after each test is finished, you can collect your events, right? And send this to your observability platform. That's one way of doing it. Maybe in the pipeline you are interested in sending events on your test coverage reports. Then you will instrument at that point in the pipeline. There are several points where you can actually instrument and you mention environments, environment will be also one of the dimensions or metrics that you will track, right? Because maybe you have tests that run on the staging on development, even in production, that would be another dimension. You can then filter out. What happened when I ran these tests in a staging versus maybe what happened when I read the same one in development. Those are two different dimensions that you can track with instrumentation.
[00:05:24] Joe Colantonio All right. If this is running in the background, collecting these test related events, if I'm running, like I say, a functional test, I go in and I validate, I enter a patient's name and it's able to do something. That's the test. Well, this uncovered things that maybe the test doesn't, I don't know if that makes sense, test actually isn't testing like say I did that the test passed but underneath it triggered some sort of error that I wouldn't have seen in that particular flow.
[00:05:48] Rodrigo Martin Olivera Yeah. So basically I think you have two ways of measuring that, one is measuring coverage. On the one hand, you can track not only the percentage of coverage, but also which lines are you covering and which lines are being missed, right? You have a way to correlate which files are being covered with those tests or not. But at the same time, for every test that you execute, one of the dimensions that can be stored is the pull request number, let's say, for where this test was executed, in which environment, and then you can start correlating, okay, this test that failing in this pull request build and then you can have with your platform, you can start asking questions around what happened during that time. There was maybe change was there was a deployment there. There was another fire in a database, let's say. You can start asking questions with your platform around that test execution.
[00:06:44] Joe Colantonio So say tests have been running fine for a month. Someone push what they think is a small change. The tests are failing. You can correlate to say the change you just pushed is the cause of this failure and you could roll it back to test it?
[00:07:00] Rodrigo Martin Olivera Yeah. And of course, if this is a change that your own team is doing, this is just a matter of regular CI/CD pipeline. This is my test. It's failing. That's good right. But if something starts failing and it wasn't the change that you introduced or your team introduce if you have these data in your observability platform? I like the latest deployment, deployment markers, etc. then you can start correlating those things and you will be able to debug more easily those situations.
[00:07:30] Joe Colantonio All right. Because I used to work with the team of eight team, sprint teams, all checking and code and if something broke, you never knew which one actually broke the build. You didn't know which one to roll back. So it didn't make any sense. This way it sounds like I was in that situation. Like, say, Team Sopranos. You checked in code. Roll it back. That that broke it.
[00:07:50] Rodrigo Martin Olivera Yeah, correct. Because you will have it time window as well. And you can ask within this time window what were the latest changes that were deployed to the environment from a service perspective. You could also even think about what infrastructure changes happened during that time frame. More on any new version of a database and new certificate renewal. Whatever, you can ask. You can do this with observability because the idea is that you can investigate and understand or try to understand these unknown, unknown situations right.
[00:08:23] Joe Colantonio All right. So that's a good point. So once again, performance for some reason spikes up. You don't know why. You could say, okay, what changed in the system. And it may be some you didn't even think about this new hardware piece maybe and you could trace of say, well, ever since I do hardware piece was added, the performance starts spiking. That is most likely the cause?
[00:08:44] Rodrigo Martin Olivera Yeah, that will be one use case. And if we go one step beyond, if you have event on of your testing tools and processes and executions, you can also have alerts. And you can even before a test starts failing, like actually failing, you can start seeing, okay, there's a declaration on test execution time, or you could set alerts on these tests are consuming way more memory than before, maybe two x memory than before. You can potentially act on those things before even you get the other effect.
[00:09:21] Joe Colantonio So that you can run this production as well or is it just pre-production?
[00:09:24] Rodrigo Martin Olivera We run it in pre-production only, but potentially it can be executed in production with any sort of monitoring, synthetic monitoring, etc.. It will be exactly the same concept. In the article we are talking about and instrumenting tests that are executed usually pre-merge but you can do exactly the same thing with your synthetic monitors and yeah, that would work as well.
[00:09:47] Joe Colantonio Nice. I assume there's multiple test related events you can track. Are there any like key metrics or key events that you always add is up to the user to go understand which one does like what would you recommend?
[00:10:00] Rodrigo Martin Olivera Yeah. So again, I like to split these kinds of two categories or the two main use cases that we are using. Right now we're tracking code coverage. And basically this is pretty straightforward. There's a service. In the list of events, you will have the service name, you will have the release tag. That is the version of the release, the tag of the release that you are testing with this coverage. The percentage of coverage measured in any way the team thinks it's better. We are measuring the amount by lines, right by statements. We are measuring the statements that were covered, which ones are released, they are basic and also the owning team which team is the owner of the service and of these tests. Basically, with only that information, we can start getting trends and create dashboards for teams. I work on a platform team, so we have kind of templates of dashboards that we provide to teams so they can have at a glance see that's sort of their current coverage, not only per team but those who per group, because we have groups that are form of multiple teams and those multiple teams have multiple services. You can start seeing how you can drill down and get like an aggregated set of coverage events. So that's on the one side on the coverage, but now one that this execution side, basically the most important things that we track are talking about Playwright let's say in which browser the tests were executed, in which environment, the pull request, the outcome. This is really important because Playwright provides whenever you wrap up your test execution, you have an outcome field that basically says if the status was expected, unexpected or flaky. And that's really important. So we track also if the test was flaky or not, of course, the status any error messages that the test reduce any error stuck that we can get from the report as well, error values and also the owning team. I think and the environment. I don't remember if I mentioned that one but those are the main dimensions.
[00:12:07] Joe Colantonio All right. You mentioned flakiness. Can you measure flakiness over time? A lot of times people I'll just rerun the test, a pass, but they never check on the outlier like why does it fail every now and then? Can this actually uncover those type of things that drive you nuts? Like look at the log files, actually say, okay, for some reason, every first of the month or something, it's failing because we have this job running in the background.
[00:12:29] Rodrigo Martin Olivera Yes. Yes, totally. Because you can use whatever visualization you have to kind of make graphics of these data. And basically, if for every test that you run, you know the status, if it's flaky or not, you can create sort of time series graph. And say, okay, in the last month I can see a trend here. Everything was flat. But as you mentioned, maybe every 29th of February things start blowing up. Then you can pinpoint these sort of things. But not only issues. You can also one thing that I like doing with this data is to measure if the actions that the teams are taking to reduce flakiness are actually working right. And you can see that really easily with the trend. If flakiness is slowly going down, then whatever actions the teams are taking are being effective. It's a way to measure those efforts.
[00:13:20] Joe Colantonio Well, that's interesting. You could measure maybe a team, there flakiness is going down, What are they doing differently and then using that to educate the other teams, Hey, you want to implement this because this is what's helping this other team?
[00:13:31] Rodrigo Martin Olivera Yeah, totally. And I think that's really, really helpful to see if these things are the team level. Of course, you have all the data. Potentially, you can have all the data, so you can take more high level decisions on your testing processes and see average and media and stuff, etc. That's useful. But I think this is most useful for team specific efforts where you can pinpoint exactly your situation, your number of tests, your flakiness. You can even say what is my spec that is failing the most right now? Which one is the most flaky, etc.? You can be as curious as you want and that usually goes gives you better results.
[00:14:12] Joe Colantonio How hard is it to interact with the data? Is it like a GenAI prompt and you just ask questions and it's like, how does that work?
[00:14:18] Rodrigo Martin Olivera Yeah. Basically the events that we have at New Relic are being sent to the New Relic database NRDB. And basically, you use NRQL, which is New Relic Query Language, to kind of make your queries on the events and that would be it. We do have a beta on prompting with AI via NRQL, so you can use that as well. But it's pretty straightforward. And in the article you can see that it's pretty similar to SQL. If you have a basic understanding of SQL you can start getting useful information.
[00:14:54] Joe Colantonio Now you mentioned that it alerts the right team if something goes wrong. How does it know that? Is it through tagging of the actual test script or is it annotating something within your dashboard or the data?
[00:15:06] Rodrigo Martin Olivera Yeah. Basically when you create your alert, you create your alerts based on whatever events you have in your platform. In this case, we are populating test related events so each team can create their own alerts based on those test related events. And remember that we are also tracking the owning team on each test that these are being executed or on each coverage that we are tracking. That's really easy to do. You have your team I.D. there and so you can start crafting alerts on again, whatever you think it's worth it to alert.
[00:15:43] Joe Colantonio I've got to follow up. You mentioned sort of about code coverage. So like I said, I'm totally ignorant with this. Are you able to say, like, I ran this test suite and it only touched this amount of features. I have a gap of 60% of my applications not being covered or 70% or the high risk areas are not being covered. Does that give that detail as well.
[00:16:03] Rodrigo Martin Olivera If you instrument it, you can get that detail. We are still not doing it right now. This is pretty straightforward. We are just measuring the percentage of a given merchant in master. Like what's the code coverage percentage from a unit testing perspective? What's the end-to-end testing coverage from an end to end perspective. But you have that data available, right? Because let's say in Playwright, if you see the report, you can see which are the files that are being executed with your tests. What's the coverage per file, let's say. You could potentially send those dimensions as well to your platform and then you could have that information as well.
[00:16:42] Joe Colantonio A lot of times I worked at a company, they'd have a hotfix and you're like, we have 2000 tests. I don't know what test covers this hotfix. So in theory, I could query it, say what it does this touch the functions of the features I thought of that I know was checked in as coverage.
[00:16:58] Rodrigo Martin Olivera Yep, totally. Again, this all depends on you having the ability to read the report of the tool that you're using, extract the data that is important for you to track us events, and send it over to that platform.
[00:17:11] Joe Colantonio How hard is it to do that then? Is that all depend on the tooling that you're using?
[00:17:15] Rodrigo Martin Olivera Yeah, I think it really depends. For instance, for Playwright, we are just parsing out the events on the JavaScript and TypeScript, etc. and sending it over by our own CLIA. But basically, the workflow is you get the events, the data of the events that you need either by parsing reports or if you have an API or if this is something that can be handled with your test runner. There are multiple ways of getting that data. Then it's basically calling an API to send that event information to the platform that you're using and that will be it. Then you need to send this payload in a compressed from, etc. Those are the technical details, but that's basically the workflow. If you work in a platform team, this is way easier because potentially you have a way or the tooling that everyone uses. So you can implement these things at the platform level and you don't need to reinvent the wheel for every team that needs to instrument the data. And that's basically how we do it at the platform level.
[00:18:15] Joe Colantonio Cool. I could use this as like a final check or a final gate that if these test events occur, these errors, then don't put it into production?
[00:18:24] Rodrigo Martin Olivera Potentially you could, but we are not doing it right now. We are doing the plain old CI/CD that it fail or that's not fail. Even if the test is flaky, the test will pass. It's okay, it's flaky, it fails, but then it passes. That's it. We are not doing this for as a gate, but rather as debugging tool, debugging information and for improving the performance of our pipelines.
[00:18:53] Joe Colantonio Gotcha. How do you know what's meaningful, though? Like security. You obviously have a security scanner and it flags like a thousand things. You're like, how do you know like when this actually fails or gives you the data or what's really important?
[00:19:05] Rodrigo Martin Olivera How do you know what's really important to check or to debug?
[00:19:10] Joe Colantonio Yeah, like, does it ever give like multiple points of data? How do you know where to check like where to focus in on?
[00:19:17] Rodrigo Martin Olivera Well, I think it depends on the question that you're trying to answer. That's always your indicator. Let's say, for instance, we started to have some reports of things that there end-to-end tests started to execute way and way more slow. They were slower and slower and slower. We as a platform team had that question in mind. Okay, how old this data. How can we check exactly what's the reason of this sudden spike in performance related issues? So we had a chance to okay, let's go and check in the machines that we are executing these tests. What's the average duration of the tests? I don't remember if I mentioned duration, but duration is another really important metric. You can have an average on per browser type or a medium per browser type on which machine, and then you correlate that with the actual build execution duration and each of your pipeline stations duration. Basically, we did this, we have all this information and we found out that the machines that we were using to execute test actually were kind of old AWS machines and we could upgrade those machines for the same cost to machines that are kind of the same cost but way better processors. We did that change. We measured that change again with the same events. Did the change measure the test execution duration. So at 30% improvement in performance and we were able to to fix this situation. That's just an example. It depends on what type of problem you're trying to solve.
[00:20:52] Joe Colantonio I hate this term ROI, but it sounds like you can actually use it to justify purchases like, Hey, if we buy this new hardware, we'll get a savings of this because the performance will be that type deal.
[00:21:02] Rodrigo Martin Olivera Correct. And you can do it really easily because you have all the information aggregated and you can query right away and do all the time series, etc. Otherwise, you could do potentially the same ROI analysis, but it will take you way more time.
[00:21:19] Joe Colantonio Is there anything like teams get wrong with implementing test events tracking? Do you see like there are any best practices for doing this or are there things that like you probably shouldn't use it for or should it do?
[00:21:30] Rodrigo Martin Olivera No, basically, I think the main things that should we track are the ones that we discuss and then some other off again, which type of metrics are you most interested in and thus more context you add to this event is whether. It's kind of hard because you kind of need to predict the future. Like what type of information will I need in case they have a situation in the future. We discuss the most important generic ones, but as you mentioned, maybe having in the events, the files that are being exercised by your test could be potentially something that can be implemented. There are no best practices rather than what would be important for answering your questions.
[00:22:12] Joe Colantonio Gotcha. Is this a new feature of New Relic or has this been around for a while?
[00:22:16] Rodrigo Martin Olivera Actually, it's a new, let's say, a use case of using New Relic. Because if you have struck this outside of testing, it's just sending events to New Relic and using the platform. Basically say use case of, hey, this is usually related to or focus always in production. But I've tried to give this more like a testing flavor to it.
[00:22:41] Joe Colantonio All right. So New Relic. I assume everyone knows what it is, but just in case folks that don't know what is New Relic?
[00:22:47] Rodrigo Martin Olivera Yeah, New Relic is a platform for teams that enable observability. Basically, it's a way to instrument your applications, your infrastructure, your cloud services. Use single platform that will help you with observability.
[00:23:04] Joe Colantonio Very cool. All right. Is that on the roadmap, you see, you have all this data, I assume. Is there A.I. machine learning in play or things you could do to start predicting things? Without, you have to know what to even ask. Is that crazy?
[00:23:17] Rodrigo Martin Olivera It's not crazy. I don't have it right now in the roadmap for my group, for my team. But it's something that we are definitely researching and see what's the best way to implement Gen AI for these kind of processes.
[00:23:32] Joe Colantonio Okay, Rodrigo, before we go, is there one piece of actual advice you can give to someone to help them with their DevOps observability efforts and what's the best way to find or contact you?
[00:23:42] Rodrigo Martin Olivera Yeah, sure. So my piece of advice is don't forget what happens before things are deployed to production. There are a lot of opportunities there to implement observability and make your CI/CD pipelines and your developer experiences way richer and way better if you instrument observability there as well. If you use observability in those parts of the software delivery lifecycle, you can basically find me in LinkedIn, my profiles is Rodrigo Martin there, those are having it have profile. I don't remember the profile there, but I'm sure that you will get a link afterwards in your notes.
[00:24:20] I'll have a link for that down below. Definitely check it out.
[00:24:23] For links of everything of value we covered in this DevOps Toolchain Show. Head on over to Testguild.com/p169. So that's it for this episode of the DevOps Toolchain Show. I'm Joe, my mission is to help you succeed in creating end--to-end full stack DevOps toolchain awesomeness. As always, test everything and keep the good. Cheers!
[00:24:46] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.
[00:25:30] Oh, the Test Guild Automation Testing podcast. Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.