How you map out and explore performance in a system | Test Guild

[00:00:00] Mark Tomlinson: So let's bring all of this together now into how you can explore performance. And exploring performance requires that you take all of your knowledge about data states, all of your knowledge that we talked about four different physical resources, logical resources, and you put together a mapping type of exercise. And the map has to help you do a few things. One, visualize the components of the end to end system. We talked about topologies around the world. We talked about we saw Dynatrace example with a transaction flow topology, and we've even looked sort of inside the topology of a motherboard between inside a computer, right? [00:00:40][40.0]

[00:00:41] You also want to organize the sequence of your exploration over time. And what I mean by that is you have to know what you're exploring now, what you explored next, what you might explore next, and what are the dependencies or precursors to explore. So you have to a sequence of your exploration, sort of like a journal, a log book of how you've actually moved through the system. So how do you do that? How can you avoid repeated or redundant tests? I don't know how many times. Well, it didn't it didn't run for 50 users, run it for forty nine users, run it for 48 users. I mean, good boundary analysis and brand new testing techniques would say, well, what if we just ran for ten users? Let's do something extreme and meet in the middle. [00:01:25][43.7]

[00:01:26] So how can you repeat, you know, avoid that repeated stuff and of course, help yourself from getting completely lost in terms of the deductive reasoning that you need in order to move around the system. So in this example, topology, we've got a presentation tier, the custom clients and some APIs on the frontend. We'll get a server tier in the middle and we've got a data tier in the backend with different operating systems, different sources of data and enterprise apps. If we go from left to right, distributed, decentralized, then you go from physical to logical. [00:01:58][31.7]

[00:01:59] So at the very top of the stack you'll see presentation tier down to the physical tier. So for exploring performance, this what I'm proposing is you come up with a map and the map looks something like this. On the distributed end at the very top running on top of maybe a mobile laptop or a mobile device. You have the client tier and the client tier has the very top is the client app itself. Then you have maybe going through a browser. [00:02:30][31.1]

[00:02:30] Let's say for this example, and at the base you have an operating system and then that operating system, as you troubleshoot slow client performance, you're like, is it in the app? Is it in the platform browser or is it in the operating system or way down at the bottom? Is it the physical infrastructure? And if I can't find any problems in tier one way over on the left, then I come over and say, well, what can you tell me in the middle tier? Maybe the Web tier, right? So I start at the client. And I worked down the stack into the operating system, and if this operating system doesn't show me, hey, I'm going to explore and see if the CPU, no, the client looks really good. [00:03:11][40.8]

[00:03:12] Let's go to the next tier upstream. Let's go and see if that operating system and we look at the web tier and say CPU disk memory network, boy, the OS is looking good. But if we then work our way up and say, well, maybe we're logically constrained here way up in the Web configuration for the application, maybe you can't use the resources or thread's, maybe the HTTP Daemon is not good. But actually in this example, the web server looks great. So let's work our way back down. And we can't find anything wrong in the app server, so we go in the web server, so we go to the app server and the app server, guess what? It's fine too. Work our way up, application has plenty of threads, we're not no exhausted memory, no memory ever. [00:03:56][44.5]

[00:03:57] The client, the web tier, the app tier all look great. I'm going to work my way all the way back into a data API and guess what? I'm going to find some slow query or I'm going to find a limitation of CPU in the operating system for the database. And this is basically what I, how I map out how I explore performance in a system. So this is how I ended up mapping out performance in general, right? So again, we go from logical, work our way down to physical, right? And go back to that slide and look physical is CPU, disk, the memory network, right? And at the top, we have the app configuration and maybe we have the framework configuration that are way up here at the top. That's kind of hard to see, but that's fine. And the idea is that you work your way from the logical in the first observation, my end to end response time through this entire thing. Let's say it's 10 seconds. [00:05:02][64.5]

[00:05:03] So I have my first tier of analysis and I go through from the logical, down the tiers, logical framework, operating system, hardware and I and I that's hard to see too. Okay, You see that, you can see that. Okay, work your way down, logical application, no bottlenecks, framework, no bottlenecks, operating system, no bottlenecks, hardware, no bottlenecks. Now, I'm going to rely on the fact that this hardware has a network connection, right? That's how computer A talks to the web server in Computer B. So I build the next stack on computer B, just I get there by the network, but again, their CPU disk memory. right? So the network is fast, just a ping between those two. Then I'm going to work my way up, right? Go into the operating system from the operating system, I go into the framework and then from the framework I go into the app, right? And I analyze that each of those pieces. Hey, does the operating system show any swapping issues, memory limitations? Is the operating system using all these resources efficiently? [00:06:24][80.8]

[00:06:25] If I don't usually see anything troublesome, but these resources might be underutilized, you could have a logical bottleneck way up here in the app that's actually not using the resources at the bottom part of the, of the stack using not using it efficiently, which is that's kind of interesting. That's the idea. So the next step would be I didn't find anything wrong in the app, so that's green. I work my way back down the next network hop, right? Let's just say I go to the app and I do the same thing, work my way up through the OS, work my way up through the framework, worked my way up to the app, right? And again, CPU Disk Memory Network. And that's the same concept as you walk through that framework, get all the way back and you might find, hey, we have a three lane highway coming in here, right? And we have a one lane highway because of some weird limitation on CPU and we have a two lane highway. [00:07:27][62.3]

[00:07:30] Remember this from the other one, so the front end here has three incoming network connections. This one has a CPU bottleneck because this can only process on the CPU is limited number of connections. And this has a limitation of the number of things going to the database. And behind the database is a disk, right. Is a giant, giant storage array back there on the SAN, right? So our goal is to say the first bottleneck is right here in the Web and app tier and we have some limitation on the number of threads, let's say some number of threads for it in the framework for Java. And if we open up that number of threads, we can get three simultaneous calls happening between the frontend, the web server and the app server, right? I'm going to make this the Web and then, then we'll see, aha. [00:08:24][53.4]

[00:08:24] Now, we've moved the bottleneck back here because we don't have enough connections back to this last tier. This is the database here, right? So I need to open up my connections there to to allow three simultaneous connections to make it all the way through. And then once you figure out, hey, we've got a handle on flowing data back and forth, this 10 second response time, way over here, suddenly we can start opening this up and let's say it was limited down to one here. So we we added two more connections. So we're going to go from ten seconds. We're going to shave off, let's say half of that is five. So we're going to we're down to two point five seconds. And then we now notice to get even more users, we had to open this up. [00:09:07][43.0]

[00:09:07] So now we've cut down the actual amount of time that it was waiting for because we found a bottleneck here and then we find the bottleneck there. And if you need to scale even more, you know where to say, all right, well, guess what? We need to add make this a four lane freeway, have all four lanes of communication all the way to the disk, and then be able to bring that data back with more users at the same time. [00:09:33][25.4]

[00:09:34] And that's how I explore performance throughout the system, particularly keeping track of what things do I measure and observe at each of these tiers, number of threads at the framework level within a web server, app server, amount of memory use down here in the core operating system, the physical layer I get look at CPU, Disk, Memory, Network, just like we did with Top or with Perfman or with Activity Manager, Activity Monitor. The operating system itself can tell us things. Can I use that memory? Do I have enough schedulers? The database disk, do I access it. How many channels do I have? So this is the real challenge. If you can actually match the physical world down here with the logical world up there, you're going to be able to explore performance and you're going to be able to find all the bottlenecks you need to find and tune the system appropriately. [00:09:34][0.0]

[561.3]