About this DevOps Toolchain Episode:
In this episode of the DevOps Toolchain podcast, Joe Colantonio sits down with Jacob Leverich, cofounder and Chief Product Officer at Observe, to explore how AI and cutting-edge data strategies are transforming the world of observability.
With a career spanning heavyweight roles from Splunk to Google and Kuro Labs, Jacob shares his journey from banging out Perl scripts as a Linux sysadmin to building scalable, data-driven solutions that address the complex realities of today’s digital infrastructure.
Tune in as Joe and Jacob explore why traditional monitoring approaches are struggling with massive data volumes, how knowledge graphs and data lakes are breaking down tool silos, and what engineering leaders often get wrong when scaling visibility across teams.
Whether you’re a tester, developer, SRE, or team lead, get ready to discover actionable insights on maximizing the value of your data, the true role of AI in troubleshooting, and practical tips for leading your organization into the future of DevOps observability. Don’t miss it!
Try out Insight Hub free for 14 days now: https://testguild.me/insighthub. No credit card required.
TestGuild DevOps Toolchain Exclusive Sponsor
SmartBear Insight Hub: Get real-time data on real-user experiences – really.
Latency is the silent killer of apps. It’s frustrating for the user, and under the radar for you. Plus, it’s easily overlooked by standard error monitoring alone.
Insight Hub gives you the frontend to backend visibility you need to detect and report your app’s performance in real time. Rapidly identify lags, get the context to fix them, and deliver great customer experiences.
Try out Insight Hub free for 14 days now: https://testguild.me/insighthub. No credit card required.
About Jacob Leverich
Jacob Leverich is a co-founder and Chief Product Officer at Observe, Inc. He has at different points in his career been an overworked system administrator, an academic researcher studying warehouse-scale datacenter workloads, and a software engineer working on large-scale distributed data processing platforms. He found his calling doing all three as co-founder of an observability startup. He fights for the users.
Connect with Jacob Leverich
- Company: www.observeinc
- LinkedIn: www.jacob-leverich
Rate and Review TestGuild DevOps Toolchain Podcast
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.
[00:00:00] Get ready to discover some of the most actionable DevOps techniques and tooling, including performance and reliability for some of the world's smartest engineers. Hey, I'm Joe Colantonio, host of the DevOps Toolchain Podcast and my goal is to help you create DevOps toolchain awesomeness.
[00:00:18] Hey, what is AIOps? Well, you're in for a treat, because today, we'll talk with Jacob all about AI and observability. If you don't know, Jacob is the co-founder and chief product officer at Observe. He previously directed engineering at Splunk. It was co- founder at Kuru Labs. I may have botched that, but I'll find out. Listening to discover how AI and data lakes are reshaping observability, the role of knowing knowledge graphs, the cost of tool silos, and what engineering leaders often get wrong when trying to scale visibility and much more. Whether you're a tester, a developer, SRE or team lead, this episode is for you. You don't want to miss it. Check it out.
[00:00:54] Hey, Jacob. Welcome to The Guild.
[00:00:58] Jacob Leverich Hey, thanks for the warm intro, man. Appreciate it.
[00:01:01] Joe Colantonio Yeah, it's great to have you. You've had quite a journey. I'm just curious to know how you got into this, how you got into DevOps and the whole story?
[00:01:08] Jacob Leverich Oh, right on, right on. Yeah, it's actually, it was kind of the beginning of my career. And it's kind of like now where I am today. And in the middle, there's a lot of different stuff. And let me tell you about it a little bit. So yeah, so I'm the CPO at Observe now, an observability startup. And the product role is a newer one for me. I kind of came into this co-founder, head of engineering. So very much on the engineering side of it. But prior to that, I actually worked at Splunk. I was an engineer on the core search engine, just like banging out C++ code, just going out to bigger and better use cases. That was a really good experience. Learned a lot there. Prior to that, I spent a lot of time at big companies, failed startups, spent time at Google, spent time in HP Labs, spent time at IBM, did the whole grad school thing. None of it matters. When I started my career, I was a Linux system administrator like late 90s, early 2000s. That's kind of how I cut my teeth. That's how I paid for college. That's kind I really got into like professional life in technology. And it's just like banging out mountains of Postgres and monitor and maintain small fleets of servers, manage servers and color facilities, wore a pager, a Motorola pager.
[00:02:06] Joe Colantonio Fun times, yep, yep.
[00:02:08] Jacob Leverich Waking up at two in the morning, the kind of troubleshoot fires. And I kind of started my career that way and kind of like really loved kind of everything about that, you kind of the good and the bad. And then I went through all this other stuff in my career, but then ended up at Splunk and it kind of realized like, holy crap, but if I'd had this software when I was a practitioner, I would have been way better at my job. It's like no longer like kind of just like making homegrown tools to like kind of build, just kind of monitor and maintain these things and solve like the challenges of bringing all this data together. Like actually, if you have a tool that like helps you solves the hard technical challenges of bringing all the data together, that's cool. I don't have to do that myself anymore. But then also like kind of once you have it all in one place, like having something that facilitates the business process of like everyone, being able to look at the same data and triage the same alert and kind of work together to solve a problem, actually is like very meaningful for businesses that depend upon the digital infrastructure. And long story short, I kind of figured out like, oh, wow, I'm actually really passionate about this stuff. Like I used to do it and it was my life. And then, kind of, I went through this whole technical journey and learned a lot about like building large scale systems and big engineering teams and all that sort of stuff. But then realized like, Oh man, I actually, I want to get back into this. And I was sure when it bring like new technology to bear, to solve the problems that people have, kind of doing this job. And so long story, sure. That's kind of how I ended up kind of starting this company. And I think, kind of getting into the product side of things, it's like kind of the company is sort of operating enough and we have other people that are way better at engineering management than I am. And so I kind of focus on our customers and kind of being more, a little bit more outward facing and making sure we're still going after problems that people care about. Anyway, that's kind of the journey there.
[00:03:49] Joe Colantonio That's awesome. It's a pretty big leap, though. You've named some big companies you've worked for. Why become a co-founder, though? Seems like a lot of headaches, but it seems like you were passionate about a problem obviously that did something draw you to observe that, like you want to solve this so bad that you had to be a co founder?
[00:04:06] Jacob Leverich I think that actually had already kind of always been a little bit of my pride election, kind of the startup thing. I mean, so like take it like early 2000s, late 90s, we're kind of going through the crucible.com era, right? I lived in Austin, Texas and sort of I was just kind of managed to get into small startup circles and work for small companies and kind of always loved that part where it's like, kind you're always working with the owner of a business and the owner cares about the business and they care about the customer and they cared about the product and there's kind of, you actually have the autonomy to do like just a little bit of everything and can actually have influence and impact over it and you can see like your ideas take flight pretty quickly, whether good or bad. And definitely went through some failed startup experiences around that time right after I got out of grad school, you kind of do a startup that didn't perhaps go anywhere. I spent some time at bigger companies and that was good. Like learned a lot about like, well, what do big companies care about and what makes them tick and sort of how do operations work at bigger companies? And so I'd say it was good to have that exposure at larger companies, but I'm not sure that I'm the kind of person that would like really be satisfied working at a mega-corp for 10 years. I kind of always want to learn. I always want it to have impact. And I always wanna kind of connect the dots between what I'm doing and what the customer gets out of that. And so just that, that kind of input that the insatiable need to like connect to like, kind of the person who gets value out of the thing I do, but then also to explore just like everything I can about the technology and just the curiosity about it. It's hard to find that outside of a startups kind environment. And so I think it was always kind of my destiny to escape the bigger companies and to go back and just like bash my head against the wall at a startup, you over and over again, like it's just sort of as we're a good.
[00:05:56] Joe Colantonio Nice, nice. It looks like Observe, I could be wrong, really focuses on observability, obviously, by the name and scaling, though. Why scaling? It seems to always be a hard, complex problem. Why attack what seems to be something that's really difficult, I guess, to do?
[00:06:12] Jacob Leverich No, great question. I think it's always kind of this observability is a pretty established space, there are tons of tools in space, like why on earth would we possibly need another observability tool? And I think the search is always for, hey, like who doing this thing, like is actually well unsatisfied with their current solution, and like have unmet needs and like are actually having a hard time because of what kind of existing solutions like can't perhaps offer them. And I think, the lived experience for a lot of people. Site goes down at 2 a.m. And it's actually, it matters. Like it's some part of the business that actually is important and like customers use it or it generates revenue for the business or whatever it's like it matters and so people want to fix it urgently. But the types of systems that we operate these days are pretty God damn complicated. And it's usually not just like one thing that went wrong. It could be any of a number of things that went, wrong, perhaps all at the same time. And it's usually not just like one signal that's like the thing that like was the root cause. Like you might need to like sift through a lot of data to figure out like, well, where actually did the problem really originate and what am I going to have had that? And I think the lived experience for a lot of people were a couple of different things. One is, well they'd have a bunch of different sort of signals and different tools for those different signals. And it was always a struggle to figure out, well okay, which one's going to show me where the problem is. By the way, who actually knows how to use that one? I might need to get like 10 people on a call, to kind of start to figure out where the problem is. And in fact, we have a lot of clients where it's like not just 10 people, it's hundreds of people, like on a war call, like trying to figure it out, like who can see where the fire is coming from? That's one aspect of it, is all the preponderance of different signals that are out there, but then also, particularly at larger scale, like just the quantity of data becomes inconvenient to manage. And I think sometimes people don't like realize like how much data is being generated by the infrastructure kind of in larger businesses. It can be not just tens, but like hundreds of terabytes of flock data per day or a petabyte per day. And just the moving it around is hard, but then certainly getting any value out of it is very hard. And often when you're trying to firefight these issues, like you kind of don't know what you need and where to look. And there's, and so what people end up doing is they end up throwing out most of that data. And they're kind of flying blind and just guessing around like, Oh, let me just like turn this knob and see if it fixes it. It's kind of hard to actually deal with these situations, kind of at large scale, if you don't have the right data. And so our perspective on this has always been like, well, this is just fundamentally a data problem. This is a data management problem and it's sort of a lot of the, I guess it's not like necessarily a new problem like kind of large scale data management has been something that people dealt with in other industries or in other contexts before. And the solutions tend to be around the lines of like, well, we need to get it all in the same place or close together so that the right people have access to the data. We need to, I guess, correlate the data or contextualize the data so that someone knows how to go from one breadcrumb to another. They don't have to use their brain to figure out where to look next. And so just organizing the data a little bit. And then also just confronting the scale challenge, it does like, hey, there's going to be a lot of data and the data volume's growing. And if we aren't using kind of the latest and greatest technology to like cope with that, then we're going to have a hard time and we're gonna have to throw away all the data and we kind of got to be stuck at square one. And so just seeing like all those different pieces, we kind and also for what's worth like recognizing there's new technology available. There's like kind of disruptive ways to go about doing this. We started to just sort of see if we could make the match kind of what people are doing. Let's say more on like the data engineering or BI side of the world. They're talking about data lakes and data warehouses and data lake houses, and they sort of solving their way, problems they have in business analytics. Like, hey, perhaps this technology and these techniques are something that could be of utility in the tech hop space. And sort of there's all sorts of ideas over there around like, hey, raw data, but then making silver and gold, sort of versions of the data using Cloud Object Store as the data store of records. You have like infinite scalability, but it's also very economical. Like there's like these things that can actually be applied to this problem. And just sort of, we can think about sort of the data problem and the DevOps and observability in SRE space in similar ways. And if we could do that with what better outcomes could we achieve?
[00:10:50] Joe Colantonio Absolutely.
[00:10:51] Jacob Leverich And it really just always comes back to like, Hey, at two in the morning. The world's on fire. I have 200 people on a call. Like, how can I make that whole experience better?
[00:11:01] Joe Colantonio I started my career as a performance engineer. It's not exactly this problem, but it was similar. When I ran a performance test, I'd have all this data, metrics selected from the mainframe, from the network, from the database, and you see response times go up. And now you're like, how do I correlate all this? You had to go out to these people on a call, the database person, look at it while you run the test. And sometimes people suffer from not having enough information, but it sounds like in here you have too much information. How do you do that correlation then? Is observed like a platform then that actually brings the teams together to find the gaps, maybe invisibility that they may miss?
[00:11:35] Jacob Leverich Yes, so a couple of different things. I mean, so at the end of the day, some of this is actually pretty easy. We kind of treat it as like, almost just relational data modeling exercise. We have all those different data. Within that data, there tend to be correlation keys of some sort. And if we're talking about like structured logging and distributed tracing or something like that, then great, I have a trace ID and I can use that to correlate all the data. That's good. But even if you don't, you still have things like, well, which host did this run on or which pod did this signal come from? That's a correlation ID. So that one way you can correlate the data. But then I think even more than that, I think the thing where we find a little bit more joy is like, well, often there's like a user ID in there. Or back in the day with mail logs, it's like you have a message ID. And you can stitch the flow of data through all these different systems by using some correlation ID like that. And so our trick is to take all this data in to find those IDs. And so basically just do a basic relational data model. It's like, hey, this is a key that it can use as a foreign key to other data sets, or I can use it as a correlation key with other data. And so that's the gist of how you connect all this data and navigate around. And actually, that's just the way that we think about that. But then, I just want to go back to how you mentioned that, hey, performance engineering, you're doing all these tests, the benchmarking. You have all these different data. You want to correlate it all. Man, I identify with that so hard, and if we're going in a couple of ways, so. It's kind of when I was in grad school, what I was doing was computer architecture, actually I was like hardcore, like low for engineering optimization. Working on like HBC workloads and stuff like that. And my day-to-day job was basically doing cycle accurate simulations of like chip multiprocessors and memory systems and stuff like that, and it was generating profoundly large amounts of like. Just like all sorts of telemetry data from these systems, and also correlating that with like benchmark runs on real world systems to make sure that the simulator does what we think it does, all that sort of stuff. And it was the same kind of challenge. It's like, man, I have like hundreds of terabytes like all of this job like trace data. And I need to correlate all these signals and I'm trying to do linear regression with all of it. But it also just basically boils down to like very basic things. I need to like correlate things in time and I need to correlate them by whatever identifier is shared between these different signals. And that was the trick for always doing that. And then, for what it's worth, I was using tools like Hadoop, just doing all sorts of stuff to do that large scale data analysis. When I ended up at Google, I did performance engineering on the MapReduce team. And it was basically just like, hey, I need to figure out how to make these jobs faster or use fewer resources to be more efficient. And the way in which I went about doing my job, in addition to using profilers and all that sort of stuff, but it's like hey, I actually just need to do analysis on the job logs. And I'd spend all day long looking at the logs of data coming out of these large workloads and trying to find, hey, so how can I stitch together all the different pieces from all this big distributed system? How can I get all the relevant data together in one place? And then basically just ask analytic questions about it and figure out, which properties of this job make it faster than these other jobs? Kind of finding those signals. But it ended up just being like log analysis at the end of the day. Large-scale event analysis and fortunately we had a lot of tooling inside of Google that was like incredible for this. Namely Dremel was an internal tool that like kind of everyone used for like large-scale like you know kind of log analysis and what for it's right at that back end eventually became BigQuery so they commercialized it basically as a big analytics database and kind of when I was like later on, like thinking about what does this industry need? It's like why I'd love to have like everything that I had at Google, but like it doesn't really exist commercially yet until like the mid teens and things like Snowflake and Databricks and then open source tools kind of like also coming out as well. They'd sort of started to resemble like the things that you could use adequately for this type of data volume and for this kind of workload. It's sort of based on data that's sitting in an object storage. But there are also these columnar analytics database can be very efficient for processing large amounts of this data. Also have a relational data model so you can use things like foreign keys to relate data together with correlation IDs. But then also, we're becoming very good at handling semi-structured data because not everything comes in a nice table. It comes in as slop, just raw strings, and you got to figure out how to get the IDs out of those. Or it's just profound amounts of random JSON data. And everything's in different shapes and different attribute names. And it's all a slop. But these engines are becoming very good at processing that kind of data and doing it efficiently. And so that was actually a little bit of the idea here. Is it like, wow, there's actually some new stuff on the block that's commercially available off the shelf that is very good at doing this type of analysis and correlating all those different disparate data of all these random shapes and sizes. And so it's like, wow, we actually finally have the things like needed to solve like the really hard problems at scale that a lot of sort of like medium to large business, you have toil with and yeah, they devote engineering teams to build their own internal data pipelines to do this kind of stuff. It's like now actually there's an opportunity to actually build a solution that we can just hand to people on a platter. And so, sorry, it's kind of a long way to answer, but you can kind of see like how kind of there, there actually was this interesting confluence between this very hard problem at medium to large scale, particularly around the diversity of data and the volume of data, but then also like, oh my God, we finally have the technology that might help this. Now we need to kind of figure out how to make these two play nice with each other. And so that's kind of the idea.
[00:17:24] Joe Colantonio Let's talk about those technologies. You mentioned data lakes. Is AI the secret sauce now that's helping accelerate all these things? I know everyone slaps AI and everything. I think though machine learning probably would apply best to data than probably a lot of other things uses.
[00:17:38] Jacob Leverich Yeah. So I think the truthful answer right now is no, it hasn't quite like been like the differentiator yet. And for what's worth, the last couple of years, it's been pretty frothy with like all that, like the LLM stuff, but like actually we're starting to see like really cool things like happen with LLMs and kind of what people are doing recently with AI. However, there's like maybe a way that I think about this, that it's sort of like, a little bit different and how do we actually bridge the gap between like where we are now and like where we could like optimally make use of these new AI technologies for this use case. And I think the first observation is that, look, if you just like take like all your log data and like jam it into an LLM and ask it what's going wrong, like you're gonna fail, you're going to fail so hard because like they can't fit into the context, like it just doesn't make any sense. And you ou don't really solve like just generally like the troubleshooting incident response problem by just like throwing all your data into an LLM. You kind of need to, I guess, for lack of a better, like, curate it. Like, kind of figure out like, well, like hey, of all the data, like what is useful data, by the way, of the useful data? Like, who are the interesting like verbs and nouns like kind of within that data. And so just kind of taking all the data that we have and figure out a way to express it in simple terms that are easier for a human to understand actually gets us a lot of a way towards what an LLM would actually be capable of taking advantage of. And we always thought about this kind of in the early days as like, we're actually trying to make humans better at troubleshooting. And so the kind of one of the ideas that's really something that we've first kind of stealing from the BI space and trying to bring to this domain. At the end of the day, I don't necessarily care about my raw logs or my metrics or my traces or any of this. It's just like telemetry, it's just the raw data. It's great. It's sort of like the raw factual material that I might fall back to. But ultimately, I care about my user like Frank, who like just had a bad experience and like either had a page that didn't load or like a checkout that failed or whatever. Like I care Frank. And I wanna ask questions about Frank. And so it's like organizing and curating the data and elevating, like kind of like the thing of interest is actually kind of one of the ways in which, even as a human, it can help me organize my thought process. It can help communicate with other stakeholders in the business. It's like, yes, this major customer, they had a problem, but here's everything we know about it. Here's the metric that sort of corresponds to that customer. And by the way, it's good now. That customer is green. And like kind of just having like a way to talk about the business And what's fundamentally important about the business is like how you facilitate better troubleshooting and better communication with stakeholders than dealing with like the raw data. Likewise, cool, this is actually very, I would say apropos for LLMs. It's if we can humanize all of this technical telemetry. And by the way, that also kind of makes us get it a lot smaller to something we could possibly jam into the context for one of these things, then yeah, we might actually be able to take advantage of it. And so that's kind of where like, I think just being thoughtful about the data management strategy and what's the purpose of like bringing all this data and sort of like, how do we need to distill it in ways that are useful for this thing? I feel like no one in the industry has kind of cracked the AI nut yet, but we're starting to figure out, well, here are the building blocks and the intermediate problems we need to solve before we can kind of get the full realization of the power of these new AIs. And then I'll just like I had like one other I'm sorry just stream of cautiousness, but like I think just compounding on this idea of like bringing more data together. And I think, kind of like the thing that was the kind of original sort of like opportunity for us was like, well, we need to bring together like the infrastructure data and the log data and all this like vent data and we're gonna bring it in as business data. We want to bring it all together into a data lake. We have like some place where we can like have everyone together asking questions about it. But when you think about like what I want to have at my fingertips, when I am in the middle of a firefight, I went more than just like that telemetry data. Like I want post-mortem reports from previous incidents. I want to know like kind of generally, like, what is this lack discussion about this issue or this customer or this like module, like Ben, I'd love to have run books at my fingertips. And I think we're all this like ends up going is actually it's solving all the data management challenges. That's great. But actually ultimately, kind of where the destination is actually we can bring together all of this other business context. And when I'm asking questions of my observability solution, but ultimately, I'm trying to figure out, is it healthy or not? Or is this user having a good experience or not. If we can bring all of this contextual information in as well, then we can supercharge the ability of these tools to actually provide useful insights to me, to suggest to me like, oh, I see that you're running this in EKS. I recommend you go look at this setting. If it knows that context, it can provide much more interesting sort of suggestions.
[00:22:38] Joe Colantonio Yeah, I love that line, making humans better at troubleshooting, not replacing them with AI, but actually enhancing them to do the job better. In the pre show, we talked a little bit about knowledge graphs, and how do they contribute maybe breaking down the silos you might see, and provide maybe a better unified view of observability data that we've talked about.
[00:22:56] Jacob Leverich Yeah, well, I think it's in a nutshell, it's what I've been kind of talking about so far is that like, hey, I don't just care about my logs, I care about the things in the environment or the users of my environment, whether it's infrastructure or people or just entities in general. And I think about it very much, I mean, like people are familiar with Google. And like, it used to be the case decades ago, you type in a keyword, it gives you a bunch of search results. But then it became where you could ask a question about, say, the San Francisco 49ers. And it's not just going to be a bunch of search results. It's actually going to tell you something about the 49ers. And it is going to you, like, here's the upcoming game or something like that. And so I think, they used to call that the knowledge graph. And it was just the idea that there's actually like richer information. And there's kind of like more contextualized information that like actually may be relevant as an answer to this question. And I think this concept like very much applies to kind of where the industry is going, where this technology is going. It's like, it's more than just the raw data. It's actually, there are things about the business, there's things about infrastructure that are ultimately what I wanna know about, what I'm gonna ask questions about. Then the knowledge graph is more than just like kind of the inventory of those things, but it's also like how they connect to one another. And so I know that, oh, the 49ers, well, that's part of the NFL. And I can go to the NFL and find the other teams. And that's a way to browse the data. So it makes like a search space when you're troubleshooting that's like sort of navigable. And then when you think about, well, what could an AI do with that? Well, first off, it has a context in which it can ask questions, but it can also, with an agentic flow, it can sort of start to follow these links and figure out, okay, so if this system's down, well, how about the systems that are connected to it? Maybe I ought to go look at those and sort of like, the human troubleshooting process there is very straightforward. And so the troubleshooting for an agentics workflow ought to be similarly straightforward so long as you can provide it with that context. And so the knowledge graph, I mean, I wouldn't say it's not like a single technological thing, it's just this concept that actually, we're talking about more than just the raw telemetry, we are talking about things in the world, they are connected to each other, in all of our solutions, we just represent this as relational data model with like form keys between tables, it is no more complicated than that technologically, but the point is like, how are you using it and how do you like, you have both as either as a human, but then also kind of what does that set up for the future with these AI tools. That's kind of why this concept matters. And it matters even before the AI stuff.
[00:25:29] Joe Colantonio Yeah, for sure. So besides that, you mentioned it's before the AI stuff. That sounds like a mistake many companies might make. What are some other common mistakes companies make maybe when trying to implement observability, especially at scale, I always hear about people struggling with scale. Why is that? What is some common really quick mistakes you think they get this in place, then they're gonna have a better time scaling.
[00:25:50] Jacob Leverich Yeah, I mean, so I guess one of the, maybe one of the mistakes and it's sort of near and dear to my heart is, like 10 or 15 years ago, it was a case that logs were so expensive and just dealing with like large scale, like event management was so expensive, it's absurdly expensive that naturally people are like, this isn't going to work, I can't do that. And so it went like hardcore onto metrics. Makes perfect sense. Like it just makes so much sense to like, like particularly for a lot of just kind of real-time monitoring use cases. Yeah, I mean, I don't need the raw strings anymore. I can have these aggregated sort of signals to work off of when I'm doing my learning. It makes perfect sense. But then I think people kind of maybe swung the pendulum too hard. And now I can only see things at an aggregate level, but I can no longer figure out what's going on at the individual user level or at the individuals host level or individual. Yes, just anything like that. And so people talk about the cardinality problem, but I think when people are talking about the Cardinality Problem, it's because they swung the pendulum so hard towards metrics because at their scale, there was no event management system that could possibly serve their needs. And I think the truth is the right answer is somewhere in the middle. You want to use metrics for some things, but then for the deep analysis, the deep troubleshooting, you actually want to look at things at a more granular level. You need something that can handle the events as well. I think that's just something that I've seen a bunch of times is folks have just like swung the pendulum over so hard and just sort of abandoned sort of the more granular data because they assumed that it was too expensive, but like that, I think their dynamics that have changed significantly in the past decade. And so that's a big one. And then I guess the other thing is that, people tend to like when a group in teams grow and when lots of different job functions get involved, everyone kind of brings in a different tool to kind of do their thing makes perfect sense. But it like kind of at some point, it sort of precipitates like kind of some of the problems I was talking about earlier, it's like, Hey, the site's down, but there's like 10 different like tools we have to like, look at this thing. I unfortunately don't know how to use nine of them. So like, what do I do? I got to page my friend and tell him, get him to wake up and help me out. And it's not just like on the technical operations side of things. I mean, this is like support and usage analytics, and it could be FinOps, it could be security analytics. Like it's all these different like parties that actually have invested interest in the operations of the system. They all have like their own needs and they tend to, we need to like actually work with kind of monitoring the systems or monitoring this data. And so you end up like duplicating the data, you end up like sort of making just like just a hard trade-offs and sort of compromises. Like one team sort of is at the mercy of the other. And so people run into a lot of trouble with that kind of stuff. And so I think that's just a fact, right? That happens. I'd like to think that what we're building is sort of the start of the answer to a problem like that. If we can load all this data into a data lake, load it once, and we can give a lot of different people access to that data and for what it's worth, like, Hey, if it's in the data lake like, hey, bring your tool to the data, like this should actually like ideally be an ecosystem around it. And I guess what I see is, kind of long-term, this leads to a somewhat of a disaggregation of this market. And we've already seen this with the data collection and instrumentation, for the longest time, I use Fluent D for years before I even heard hearing this Splunk. And yeah, thank God open telemetry has like started to gain traction. And now like everyone can sort of see like, oh, this is my way out of kind of proprietary vendor specific instrumentation and data collection and sort of like, we're seeing a commoditization of like the data instrumentation data collection layer. And that's great because that kind of means like, okay, so cool. The data collection side, I can actually send the data to two places. And so if I have two different people that want to consume this data, that's fine. And I think that the next shoe to drop there is on the data management side. As well, it still is a very expensive to load this data into all these different systems. And so like that kind sucks. I'd rather economize that and load it into one system, load it once. By the way, like I'd rather not load it onto a silo. I'd prefer to load it someplace where I still actually own the data. I can still actually do whatever I want with them. I can sort of apply my own governance policies on top of it. It's sort of like that seems to be kind of what most people would want. And so the last piece of the puzzle to fall into place is like, okay, so now we have open data, like open data collection. We have it stored in like data lakes, ideally in open data formats. And then sort of now it's like, okay, now it a place that other all the point solutions for the engineering operations or for customer support or for security, they could all come to it. And sort of kind of put just their own, like, personalized view on top of it. But it's not like the only one person has access to that data. that's kind of, I guess, the way I see that unfolding over time.
[00:30:50] Joe Colantonio Can I explain it and see if I understand it correctly? So sounds like Observe is, it consumes all this data that all these other tools could feed in, but it's the source of truth almost that everyone can grab and then any of the data from it and find information just from that one source of truths. Do I understand that correctly?
[00:31:06] Jacob Leverich That's the idea.
[00:31:07] Joe Colantonio All right, nice. Cool.
[00:31:10] Jacob Leverich I think I just going back to like just on my personal journey, I used to do it as an early budding sort of system administrator, I was building all my own personal special, my own tools to like do a lot of this kind of stuff and my own tools that kind of help people kind of just try troubleshoot and kind of analyze this particular part of the system that I was responsible for. But when I got to Splunk, it was like, oh, wow, like this makes me better at my job is this not only does it solve all the technical challenges, it actually helps me share a hyperlink with my co-worker and we can all work for the same data and that's great but that was logs. I think just like thinking like more broadly about technical operations particularly at scale like it's not just logs I have lots of different data there's lots of difference business contacts I want to bring together and so yeah so it's where is the common ground that some place someone can go sort of the trusted repository this data I know if I go looking in this one place I'm going to find what I'm looking because everything is there. And that sort of helps people figure out, well, I'm trying to solve a problem. Where do I go? Actually, where could I possibly find this answer? And if you have a repository that sort of people have I guess, consolidated onto, then you have to prayer answering that question.
[00:32:25] Joe Colantonio Love it. All right, Jacob, before we go, is there one piece of actionable advice you can give to someone to help them with their AI scaling DevOps efforts? And maybe what's the best way to learn more about Observe and contact you as well.
[00:32:36] Jacob Leverich Oh, sure. I guess the one piece of advice is obviously this, we're building us technology and stuff like that, but never ever forget that this is a people thing, like sleep with the SRE book by my side, like there are just like good principles and practices that have to come to bear to actually like run a system that people would be happy to support. And so don't forget your blameless postmortems. Don't forget having like a regular reliability, sort of review cadence. Don't get to prioritize the work needed to make a system robust. All of those follow-up items that you find in your postmortems that like, oh, we ought to do this or that, or it'd be better if we could monitor this or like, it's one thing to say, yeah, we're going to spend 10 to 20% of ensuring time on that stuff, but it's another thing to actually live it and there's no piece of software that's going to solve that problem for you. Like you still have to like actually do the responsible kind of like management of a practice like this. And so that's like maybe my primary piece of advice is like, actually, it is within your power to like do these things and there's just a little bit of leadership kind of required to kind of bring a team along. And then for us, to find us LinkedIn's great. We're on LinkedIn. You can email me at Jacob@observeinc.com and our website's observeinc.com.
[00:33:50] Joe Colantonio We'll have a link to all these awesomeness things down below.
[00:33:53] And for links of everything in value we've covered in this DevOps ToolChain show, head on over to testguild.com/p187. So that's it for this episode of the DevOps ToolChain Show. I'm Joe, my mission is to help you succeed in creating end-to-end full-stack DevOps ToolChain awesomeness. As always, test everything and keep the good. Cheers.
[00:34:16] Hey, thank you for tuning in. It's incredible to connect with close to 400,000 followers across all our platforms and over 40,000 email subscribers who are at the forefront of automation, testing, and DevOps. If you haven't yet, join our vibrant community at TestGuild.com where you become part of our elite circle driving innovation, software testing, and automation. And if you're a tool provider or have a service looking to empower our guild with solutions that elevate skills and tackle real world challenges, we're excited to collaborate. Visit TestGuild.info to explore how we can create transformative experiences together. Let's push the boundaries of what we can achieve.
[00:34:59] Oh, the Test Guild Automation Testing podcast. Oh, the Test Guild Automation Testing podcast. With lutes and lyres, the bards began their song. A tune of knowledge, a melody of code. Through the air it spread, like wildfire through the land. Guiding testers, showing them the secrets to behold.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.