About this Episode:
I think A/B testing is an under-utilized technique in SRE and Performance Engineering practices. In this episode, David Sweet, author of the book Tuning Up from A/B testing to Bayesian optimization, shares practical and modern experimental methods that are super helpful to software engineers. Discover tips for designing, running and analyzing an A/B test, identifying and avoiding the common pitfalls of experimentation, and much more. Listen up!
TestGuild Performance Exclusive Sponsor
SmartBear is dedicated to helping you release great software, faster, so they made two great tools. Automate your UI performance testing with LoadNinja and ensure your API performance with LoadUI Pro. Try them both today.
About David Sweet
David Sweet has Ph.D. in physics and has worked as a quantitative trader and as a machine learning engineer at Instagram. He has experimentally tested and tuned trading systems and recommender systems. He is also an adjunct professor at Yeshiva University.
Connect with David Sweet
- Twitter: phinance99
- LinkedIn: dsweet99
Full Transcript David Sweet
Intro: [00:00:01] Welcome to the Test Guild Performance and Site Reliability podcast, where we all get together to learn more about performance testing with your host Joe Colantonio.
Joe Colantonio: [00:00:17] Hey, it's Joe, and welcome to another episode of the Test Guild Performance and Site Reliability podcast. Today, we're going to be talking with David Sweet all about his upcoming book from Manning Press tuning up from A/B Testing to Bayesian Optimization. David Sweet has a Ph.D. in physics and has worked as a quantitative trader and as a machine learning engineer at Instagram. He has experimentally tested and tuned Trading Systems and Recommended Systems and is also an adjunct professor at Yeshiva University. David has a lot of experience in this area and I think you'll have your eyes open to maybe a new technique or a new process that you can actually add to your software development lifecycle to make your optimized software even better. You don't wanna miss this episode. Check it out.
Joe Colantonio: [00:01:01] This episode is brought to you by SmartBear. Listen, load testing is tough. Investing in the right tools to automate tests, identify bottlenecks, and resolve issues quickly could save your organization time and money. SmartBear offers a suite of performance tools like LoadNinja, which is a SaaS UI load testing tool, and LoadUI Pro an API load testing tool to help teams get full visibility into UI and API performance so you can release and recover faster than ever. Give it a shot. It's free and easy to try, head on over to SmartBear.com/solutions/performancetesting to learn more.
Joe Colantonio: [00:01:47] Hey, David, welcome to the Guild.
David Sweet: [00:01:50] Hi, Joe, thanks for having me.
Joe Colantonio: [00:01:52] Awesome to have you. So this book, like I mentioned in the preshow maybe a little bit outside my comfort zone, but I think the techniques you present in it could be really beneficial to folks that are software developers. So I'm just curious to know before we get into it David, is there anything I missed in your bio that you want the Guild to know more about?
David Sweet: [00:02:08] Oh, I think the bio is complete and I thank you for that introduction.
Joe Colantonio: [00:02:11] Awesome. Awesome. All right. So why did you write this book? I always ask the authors because it takes a long time to write a book, especially when you have mathematics involved I'm sure it takes even longer. So why this book?
David Sweet: [00:02:21] Well, I saw there's kind of two reasons behind it. One is I saw this connection between what ML engineers were doing at Instagram and what the quantitative high-frequency trading engineers were doing in other firms at work that that there's this process of ideation, testing something offline and then finally going online and either AB testing it or doing some kind of parameter tuning. So, one, I thought, well, there's some audience for this kind of information. The other is I just had my experience over the years of learning these techniques, kind of one by one, almost the way I learned them, kind of followed the chronology of their development. You know, the development was over a hundred years. My learning was shorter but it sort of follows that path. And I thought, well, there's no all this stuff isn't collected in one place, but it's all addressing this same problem of once you get online, you know, what do you do to sync up your model, your hypothesis about how this was to work with reality? How do you get those parameters to the right values? And if there are so many people doing this kind of thing, rather than have them struggle with a zillion resources as I did over a long period of time, maybe I compacted it into one book and make it a little bit easier for a bunch of people.
Joe Colantonio: [00:03:34] Awesome. So I guess the next question then is who's the audience for the book? As I mentioned, this is Mark Todd's performance engineers and server liability engineers. Who do you thinking of when you actually wrote this book?
David Sweet: [00:03:45] I had in mind three parties. One is quantitative traders, the other is machine learning engineers at social media companies like Instagram, Facebook, Spotify, Twitter, that kind of thing, LinkedIn. And then the other is software engineers building infrastructure for these kinds of systems who are sensitive to things like latency. For example, I know in high-frequency trading we want low latency when we look at least latency distributions. And when you're serving webpages or app updates to people, you're concerned about latency as well. People get bored very quickly. You know, sometimes in the hundreds of milliseconds range they get, they can get frustrated, so.
Joe Colantonio: [00:04:22] Absolutely and that's the exact target audience of this podcast. The performance engineers and server liability engineers are they're going to get a lot out of this because they have to create performance software. And like you said, especially when you get towards mission-critical applications, they need to perform really quickly. And also in their Black Friday or Holidays, the systems need to keep up with the load and stress that are put on the systems. So I guess, you know, the book is broken up into eight main chapters. The first one is System Tuning so I'm curious to know how you explain what is system tuning is at a high level before we dive in a little bit more.
David Sweet: [00:04:53] The way I think of it as a by analogy, so that there's this idea that's crept into engineering and software of tuning of, they're calling parameters knobs. I've seen them referenced in code bases before. Appium is an analogy too, a like an old FM radio where, you know, if you wanted to find a station that the, you'd sweep the parameter, which was like the frequency you're tuning into by hand, by turning knob left and right until you've got the optimal signal to noise ratio, at least to your ear for whatever you want to listen to. And so that's the analogy and the problem we're solving in engineering is whatever system you have will have some parameters to it. Maybe they'll have a threshold's over after over, which some signal goes first, take an action, or some weights on multiple signals when you combine them together. Or maybe it's parameters like the size of a cache or how much you know, how much memory you're going out. You're going to buffer various things like this, systems always have lots of parameters like these. And we know qualitatively the purpose these parameters serve, but we don't know the exact values they should take until we actually try them in production. You can get some sense offline, you can simulate things, you can emulate certain processes, you can take production, log data and kind of get a rough idea offline. But there's really no substitute for real production with the real thing.
Joe Colantonio: [00:06:08] So, yeah, seems to be a bigger trend nowadays. People actually testing in production. When I started off as a performance engineer, we'd built up the system and stage in and we test it, we go into production. Nothing was ever the same as what happened in production. So this definitely applies as well. So I guess along these lines as well, you start off the book talking about different workflows and I'm curious to know where TuneUp actually fits into those workflows. You talk a little bit about why you start off talking about workflows. I think you started talking about the software development workflow and then also machine learning workflow and I think the quantitive workflow.
David Sweet: [00:06:42] Yeah. So in all three cases, they follow this pattern as you've described here. The first step is you've got some ideation. You've got an idea. A new feature or an improvement or something you think will improve it or improve your system, and the next step is you have to figure out whether you're right or not. Right. So I think most of the ideas you come up with will probably not work, at least in a mature system, even if they're good ideas, not because you know where we're all done and coming up with bad ideas, but because the systems are complicated, the world is complicated, and you kind of have to grow this thick skin and say, well, maybe 10, maybe 10 percent of my ideas are going to come to fruition. They're going to mean something. So what do you do? The next step is to do some testing offline, where it's cheap, where it's fast, and weed out the worst stuff. But you're probably not going to get the right answer. You're going to find the best stuff, right. You can weed out things that are obviously broken. If they're buggy if they rely on some assumption that doesn't hold true, that kind of thing. And then the final step, though, is the is the production testing and so forth. If you're going to go run live and maybe do an A/B Test or some other kind of check on what's on this new idea that you've implemented now for the required, the offline part, that middle step is a backtest. For an ML engineer, the middle step is some kind of simulation or stunt or a data science type analysis of historical log data. And in software engineering offline, which you might do as you might run, you might run live data through the components and measure them. You can measure, for example, if you want to measure time as you can measure timings offline and see how long the software that each piece of the software took and ask, well, am I improving this? Am I making it faster by making the distribution tighter so it's more reliable? But in all three cases, once you go online like you're saying, the world changes, the world is bigger and more complicated than what you're doing offline, what you're doing offline as a model. Right. And all three cases, it's some model simulation emulation of the real world. It's going to leave out a lot of the dimensions. It's going to focus on what you think is important. But there could always be more to it that you didn't even realize was important and almost always there is.
Joe Colantonio: [00:08:38] So absolutely. And once again, it's another big trend the performance testing, they take a model that they get from production, they put it into the testing area, and still, somewhat there are still issues when they go live. As you said, there's a lot of different things maybe we can dive into why that is. But you also mentioned good ideas and you've mentioned this a few times. I've read in the book and I also saw a live stream where you said, look, Amazon, these bigger companies that have good ideas eventually turn out to be a very low percentage of things, actually try to actually turn it around to make it better software. Can you talk a little bit about why that is maybe?
David Sweet: [00:09:10] Sure. I can say that this kind of started with me. Like I like an inkling, like noticing this trend. I sort of describe engineering to my friends. It's like, it's like you come up with ten good ideas. You test them all and maybe one of them works. Right. And so I thought, well, you know, what is this? Is that how realistic is that? So I started taking a poll of colleagues here and there. I'd ask them, you know, how often does a new idea that you've come up with, you know, make it through testing and be productive in the end? And the answer, every single person said one in ten except for one guy. He was pretty cynical, generally speaking. He said one at one hundred, but I said one in ten. And so I did some research on this for the book. And I found Netflix does A/B Tests thing, they do production testing and their report is about one in ten of these tests lead to acceptance of a new feature component, that kind of thing, into production. Amazon reports 50 percent and Microsoft reported thirty-three percent I think of their A/B Tests result in acceptance. And I think this is just what I attribute this to is that the world is complex and you cannot keep it all in your mind. Right? You can't, I feel there are two layers of models, really, with the first one is in your head is where you're going through ideation. And you sort of you test your ideas. You do like a little simulation in your head. You test your ideas against what if I ran it this way? What if I read it that way? Could this cause a problem? And at some point, you get to a stage where you feel the idea has passed your internal tests, your mental model, and then you're ready to go and spend time writing code. Right. You want to avoid the cost of writing code, in the offline, you have tests that you've written out that are bigger than you can hold in your head kind of. And you want to pass those tests before you go online. But then online is just even bigger. It's even more complex. You've got beyond the complexity, you've also got non-stationary. Things change over time that all of that large data, all of that knowledge that you've stored in your mind about how things work, could be relevant in the next day. If things change, if you're a trader, the fees might change on the exchange. Your counterparty might change. If you're running a system that's used by a social media system, for example, that's used by a billion users, there might be trends. People respond to the news. They respond to changes in politics and so on. And that might change the way they interact with your product. And just it's not in the data.
Joe Colantonio: [00:11:27] So I guess if someone is skeptical, they might hear this and go, then why even try? I mean, it seems like not a high likelihood that this is even going to work anyway. So why even, why even try? What's the payoff? You know, so…
Joe Colantonio: [00:11:40] That is such a great question. It gets to something so fundamental about decision-making. Right. You can ask yourself, you can ask, well, what's the probability of something more? Or you can ask yourself, what is the expected value, right, of all of these things, or if you try 10 things and they take you some finite amount of time. Nine of them don't work. So you have a one in 10 chance of something working. But that one thing, once you install it, once you incorporate it into your system, it just keeps on giving. It's the gift that keeps on going right. So that the next day it provides more value and the day after and so on and so on. And it'll last a long time. And while it's providing that extra value, whether it's more money or more user engagement or lower latency, whatever you get out of it, you can then go and work on the next idea. And so these ideas, these improvements, they compound over time. And even though there's lots of the sort of detritus by the side of the road of all, of all the failed ideas, after working on something for six months or for a year for you for 10 years, you can craft something fantastic with incredible performance. Interestingly, it'll be complex enough that you won't know how it works. So it'll be part of the complexity of but now is not just the environment that's complex, it's your system that's complex. And then you're into AB testing all the more.
Joe Colantonio: [00:12:54] Right? It's almost like compound interest. Like you start off investing and it's like, oh, little by little in 20 years you look back like, oh, my gosh. And I guess that's something like Amazon I heard is really big on this testing. And they said maybe they had a small percentage, but over time they've been around for 20 years now. The algorithm has advanced so much now that it's exponentially growing, I guess based on those tune-ups and the changes they made.
David Sweet: [00:13:16] Yes, yeah. If your new idea, you know, one out of ten of your ideas works, it makes it a one percent improvement to your system while, well, you know, after several years, all those one percent improvements could double the performance, triple performance. It's really fantastic. It's hard, I think, the human brain to kind of, capture it. And you have to sort of trust that, trust that it's going to work. Yeah, I mean, the best sort of approach is, you know, put your head down and keep grinding out new ideas and testing them and look back every once in a while and remind yourself that it works.
Joe Colantonio: [00:13:41] So is there a framework someone can use? You talk about experiments a lot in the book. So in context, how do experiments work when someone's trying to create a model or try to tune their systems?
David Sweet: [00:13:52] OK, so the most basic experiment and the type is an A/B Test. It's also the most I saw most reliable and I think the easiest to communicate with other people. The results are the easiest to communicate because what you do is you take two systems, your original system as it's running a and then some modified version, call that B and you run them side by side or you run them or alternate them or randomly choose between them. But in the end, you have some measure of which one did better, which one gave better latency, better revenue, higher user engagement. And typically, from what I've seen, companies will build their own system. Still, a larger company will build its own systems to do this. And in the old days, you had to, although now there are companies that offer A/B Tests in commercial products. So they'll do the randomization and they'll tell you about how much replication to do and then they'll compute all the metrics and analysis at the end for you.
Joe Colantonio: [00:14:46] So for A/B Testing, is that the same as future flags where you lose a small amount of your code to customers and then based on how it performs, you roll it back? Or is it something different than that?
David Sweet: [00:14:56] That's exactly the kind of thing you'd want to do. So in the end, feature flags are a great way to implement this. That way you can affect the production system without lots of interaction or you just change a flag. And so. So, yeah. So what you do is you realize the customers so you can test for multiple things. You can test just simply to see whether it works. But you can also test measure very specifically the ways in which the customers appreciate or don't appreciate that new feature. Like do they spend more time using the app? Do they buy more? Do they click through on an ad or whatever, you know, whatever your metrics are that you're looking to improve?
Joe Colantonio: [00:15:26] Awesome. And I know you cover this a little bit. We're talking about trying to measure one thing is hard, but then you mentioned a few times measuring multiple things. How do you know how to measure all these other things in context to know what is actually moving the needle?
David Sweet: [00:15:38] There are two ways in which you wind up measuring multiple things. One is, do you have multiple ideas that you want to compare or say you have a parameter that can run between, what, from one to ten? So it's ten possibilities. So maybe you need to run ten different tests or maybe you slim it down and you run, you test one, three, five, seven, and nine, for example, and see which one does best. The other dimension along which you could measure multiple things is you have multiple objectives to meet. So maybe you want to increase revenue, but you don't want to decrease the amount of time spent on the site. Or maybe you want to increase engagement. But maybe there are indicators of like a negative kind of engagements, people bullying or people you know, or otherwise insulting people, that kind of thing, or passing around misinformation or other kinds of inappropriate information on a website. So that could lead to more engagement. But it's not the kind of engagement that you want for your product and for your users. Right. So typically, when you run an A/B Test, you're going to look at you're going to have maybe had some most important metric, but you're gonna want to look at all of your metrics all the time. So in the idealized A/B Tests, you have one metric you're going to T-test. You've picked a threshold beforehand for this T value and you say, well, if it's statistically significant to such and such a level, then I'll be willing to accept it and then you do and you move on and it is kind of automatic. In practice, I think each A/B Test is more of a discussion, right. The discussion about multiple metrics, have we pulled back on some, have, are we trading off improvements in one direction for regressions in another direction? And is that a good trade-off? Right. So perhaps if you're looking at I know in quantitive trading, this is looking at latency is an important thing. And you might say, well, we've lowered our median latency, but our tails have gotten larger. So while we can't have we can have that. So maybe we want to. So that's not a good trade-off. But we brought the tails in and the median has gone up a little bit for some situations that would be terrible, some situations that would be a good trade-off. And it really depends. And you really need to look at the metrics. And typically also you're going to discuss them with multiple people, like the more value that's on the line, the more stakeholders, so to speak, you want to have in a decision like that.
Joe Colantonio: [00:17:43] Are there any best practices for A/B Testing? I think you spoke about bias, how to create an unbiased design. I guess if you start off with a biased design, to begin with, they're wasting a lot of time. So any best practices on how to avoid an unbiased design or a biased design, I mean?
David Sweet: [00:17:56] There's an ideal situation where you know everything about your system and you can identify the biases and make sure that they're not represented. It's almost never possible and it's very subjective as possible so that the solution is randomization. Right. So the way to get rid of bias in your system is let's say you have you know, you have 100 users and you've got some new idea. What you could do is you could alphabetize the users and give the first half exposed the first half to your current system and expose the second half to your new idea. But how do you know what's being represented by these as the first or second half of the alphabet? Another way to do it, another easy way to roll something out is to expose your current system to one geography. Say the West Coast gets A and the East Coast gets B, but they might have different preferences east and West. And this doesn't make it complicated to understand whether any changes were due would apply globally, whether they're due to you or your new change or whether due to changes in the behaviors of the people on the East Coast and West Coast, and so on. So the randomization just says flip a coin. And every time you are going to do an exposure, every time you going to use is going to come in and interact with your application, expose them randomly to either A or B, and if that makes too much noise, maybe you want to assign users permanently to A, permanently to B, but you'll do it randomly so that you're not going to be fooled by their demographics or specific users, persistent behaviors. You just, break all those correlations as with the renovations.
Joe Colantonio: [00:19:21] Now, this is all in chapter two. You go into detail. You also talk about positive false negatives and positive false positives. You could talk a little bit about that before we go on to Chapter three a little bit.
David Sweet: [00:19:32] Sure. So false positive is when you say something was good, but it actually isn't a false negative is when you say something looks bad, but it actually isn't. Right. So these are the two mistakes you make. The one mistake is you run a test and you say oh, B is better. I'm going to accept it. But actually, you were wrong just because of the noise. And the other is you say a B wasn't better, but you were wrong. So in the first case where you accept something new into the system that shouldn't be there, you're damaging the system. Right. And over the long term, you know, just because you didn't measure the damage in your tests, which is over a short period of time, over the long period of time, the damage will show up. It's like if you ran the tests for longer, for long enough, eventually you would see that B was worse. But if you put in production, it's like you're running a test for an infinite amount of time for the rest of the lifetime of your system. So the damage from people. Sure. Lots of false positives, false negatives. On the flip side is if you had a good idea, B, but then you threw it out there, you're not making damaging the system at all, but you're missing an opportunity to improve it. And so, generally speaking, the false positives where you're actively damaging the system, they're considered worse, and given stricter limits are put on your willingness to accept false positives and false negatives. Right. So if you didn't make an extra dollar tomorrow, that wouldn't be so bad, as if you made a dollar less tomorrow, so higher. We just put the false positive on protecting you from false positives. Now, the question you might want to ask is, why not just say I'm not willing to accept any false positives or false negatives? And the problem is, of course, that it's probabilistic. In the best you can do is just settle in, let's say, a five percent with the most false positives you want to have, for example, a twenty percent false negatives that those are typical limits.
Joe Colantonio: [00:21:13] Very cool, so I guess once you have A/B Testing in place, what's the next step? I think chapters three and four, kind of dive into that. Chapter two is more about multiples, how to handle multiple changes to the system while maximizing this is metric. So is that running multiple AB testing at the same time and how to handle it?
David Sweet: [00:21:30] Yeah, well, so the way I develop the book is A/B Testing is like the foundation for everything and end that chapter teaches like about randomization, replication, meaning running, running the experiment over and over and over again until you get nice tight elderberries. So you get these probabilistic bounds on false positives for names and then everything else expands upon that and tries to make the whole. It's more efficient because as an experimenter, you want to run the experiment for a short time as possible so you can get the information you need to make your decision right. Testing is a cost because, in any A/B Test, you could be exposing your system to something that's worse. So your overall performance of the system will on average decrease, especially take into account that is what we were saying before, that most of your ideas are probably not going to be good for the system. So your tests really like nine times out of ten are going to cause harm to the system. So there are costs. So the better be a long-term payoff. Like said, that one that works better for the long term. So but at the same time, you want to keep the testing time as short as possible. And it does two things. One is it's less harm to your system as you're doing the testing of things that turned out not to work. But the other is you get a sequence of tests and you talked about compounding before it. The shorter the cycle of that sequence, the more cycles you can do per unit time per month or per quarter, that kind of thing. And the faster you compound, the faster you make that compounding work for you. So, yeah, so experimentation is a cost. We pay it because we want this compounding thing about these improvements. So all of the other methods in the book look at ways to make experimentation faster and more efficient to multibrand bandits. Look at ways to say, well, while we're testing, maybe we can pay attention to the metrics we're trying to improve. And if we see that B is starting to look better than A, maybe we'll shift a few of the users over to B right? We'll be a little more dynamic and we'll adapt the data as it comes in with A/B Testing, we just going to wait until the very end to make any kind of decision. The multi Ambanis will kind of make partial decisions along the way. They think of it. So A/B Tests, A/B Tests is kind of black and white, either A or B multi-embedded will is a seize all the gray areas in between, as well as willing to slide users over slowly to be if they're doing well and the contextual bandit that built goes builds upon that even further because are really interesting they're willing to look at multiple options, but they look at the decisions being made of whether to use two different versions of the system. Potentially, Shobana says this says, what if we left both versions in the system or if we had multiple versions system and we could change dynamically based on characteristics of the user. So whereas before we do randomization, because we want to ignore the demographics of the user, bandits take the opposite of the Appium approach. But they sometimes reversed in that they say, well, let me know about the user, let me know all the features that represent the user, and I'll choose whether it's going to be A or B, and you might have other versions, other features, C and D, other options. And I'll choose those based on a model that I built of the user. So like a regression that says based on these user features, what's the best option to choose A, B, C, or D? And I'll do that dynamically on the fly every time I see a new user. And so these with these systems you can tune, you know, hundreds, thousands, even millions of parameters in these models based on your logged data. It's a very special situation, though, where the metric has to be a very to be a short term metric. Some called the reward and you get the reward immediately after you make the decision. So, for example, you might say, I'm going to show some, show an image to a user and see whether they click on it. Right. You know, right away they click on it. That's the whole reward, something you couldn't do quite easily with a contextual bandit it is. Let's say you show the user an ad and you want to know if sometime in the next month they purchase the product or not. It's a little bit harder. Work gets even harder in aggregate metrics, like if I change the way I show ads to users in aggregate, spend more or less time on the site. And that's not the kind of thing that's great for a contextual bandit. So what you see as these methods can become more powerful, like a contextual bandit to many, many parameters. Right. And do a really good job of it. And precise. They can make lots of decisions precisely based on the user, but it becomes more specialized. Right, so if you look at it, I talked about something called response surface modeling, which is a little bit older, you can't do as many parameters, but it's really good for continuous parameters in that I mentioned before. Let's say you had some thresholds and I had to take a value of between one and ten. Let's say it was continuous between one and ten and could take maybe one and a half, could be three-point seven or whatever. Now a response service model works great with that kind of parameter. It'll build a model. It'll allow you to find the optimal parameter even if you haven't measured it. Maybe you measure one, three, five, and seven and it can tell you four and a half is the best answer without you having to go measure four and a half before him without having to hypothesize that for an exact number. So again, it specializes, but it becomes more powerful than AB testing. You get these answers in a shorter period of time, but you specialize. So this is a sort of the theme of the book is that each of these new methods builds upon the other specialized little bits and gives you some efficiency so you spend less time running experiments.
Joe Colantonio: [00:26:34] So I just want to talk a little bit about your time on Instagram, like how much of this did you do actually implement on Instagram? And was there a process in place or was there new and something that you had to how to get the team more involved in? Like if someone's listening, like, how do people, companies like Instagram succeed with the how can I, my company, implement it and succeed with it?
David Sweet: [00:26:53] The way Instagram works and actually as part of Facebook, there's all of Facebook has a team and it's been around for a long time, dedicated to building experimental tools for use, A/B Testing, vision optimization tools. They even they've released their internal software as open-source, something called X. And so X combines ideas of A/B Tests thing, multi-armed bandits and what not into vision. Optimization really combines those with those ideas. This is a vision, optimizer, access, but you can use it to run an A/B Tests you could use a trial term benefits. It's a powerful piece of.
Joe Colantonio: [00:27:33] So I guess also I would think that most companies if they aren't doing this, have to do this in order to be competitive as machine learning is only going to help make it easier, I guess, to do this type of experiments in optimization, I would assume?
David Sweet: [00:27:47] Yeah, I think it's a trigger for all the large company tech companies that you could think of, you know, Google, Apple, Twitter, LinkedIn, they most of them have published papers on the types of systems they use. And Google uses something called, produced something called Vizier, which is their competitor, their version of a Bayesian optimizer. And their paper where they published was cool because they said they had used the algorithm to tune a recipe for chocolate chip cookies by serving the cookies in the cafeteria at Google HQ. So or maybe it was in New York, but they serve the cookies and got feedback from the, from the eaters of the users, the cookies, the eaters. And then we're able to run the Bayesian optimization, make better and better cookies over time.
Joe Colantonio: [00:28:29] So that's an experiment, I definitely could participate in there for sure. So talking about that, participating, if someone's listening to this episode, like maybe they're not, they're not really strong in mathematics or statistics, what skills do they need in order to to get involved in A/B Testing in general?
David Sweet: [00:28:45] Yeah, I think this is something that's, you know, an engineer can participate in and your you know, your level of mathematics, I guess maybe will determine what level you participate. Certainly, if you're at a big company or even even a smaller company that has these tools. Now, typically this will be built by specialists. So eventually will be evolved, specialists. You can be a user of these tools without, you know, deeply understanding the mathematics behind them. That being said, what I try to do with the book was to write at a level four, for non-specialists like for users of these tools to understand what's going on enough to be better users of the tools. So sometimes it's fine to just know superficially what's going on. But as an engineer, one, you're usually curious and two, usually going to do a better job with every tool you have if you understand how it's working underneath the hood, at least to some extent. And so I wrote the book as for someone who knows Python and basically has like high school mathematics, which is mostly all of us, I guess maybe that, maybe the mathematics is rusty, but it's not, it's not gone. Right. So.
Joe Colantonio: [00:29:48] Absolutely. Okay, David, before we go, is there one piece of actual advice you can give to someone to help them with their AB testing tuneup experiments? And what's the best way to find the contact you, and get our hands on your book?
David Sweet: [00:30:00] Oh, I say my advice always measure early, a measure often. Only the real world knows what really works. And if you want to see the book and go to Manning's site, they've got it in their early access program. So there are five chapters online and you can read those now and read the other three chapters will be out soon.
Joe Colantonio: [00:30:19] Thanks again for your performance testing awesomeness. If you missed anything of value we covered in this episode, head on over to testguildcom.kinsta.cloud/p68, and while there, make sure to click on the try them both today link under the exclusive sponsor's section to learn more about SmartBear's 2 awesome performance test tool solutions LoadNinja and LoadUI Pro. And if the show has helped you in any way, why not rate and review it on iTunes. Reviews really do matter in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Performance & Site Reliability podcast. I'm Joe. My mission is to help you succeed with creating end-to-end, full-stack performance testing awesomeness. As always, test everything and keep the good. Cheers!
Outro: [00:31:04] Thanks for listening to the Test Guild Performance and Site Reliability podcast, head on over to TestGuild.com for full show notes, amazing blog articles, and online testing conferences. Don't forget to subscribe to the Guild to continue your testing journey.
Rate and Review TestGuild Performance Podcast
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.