About This Episode:
Have you tried Selenium 4 yet? In this episode, the creator of Selenium WebDriver, Simon Stewart, talks all about what new features you need to know about in the latest Selenium release. Discover what WebDriver BiDi is, what’s changes to expect in Selenium 4, and why Selenium 4 will be the point where Simon steps away from the project :(
The Test Guild Automation Podcast is sponsored by the fantastic folks at Sauce Labs. Try it for free today!
About Simon Stewart
Simon lives in London with his family and his dog. He spends his working hours either being paid to hack on code, or working on Selenium. He is also the co-editor of the W3C WebDriver and WebDriver BiDi specs.
Connect with Simon Stewart
Full Transcript Simon Stewart
Joe [00:01:32] Hey, Simon! Welcome to the Guild.
Simon [00:01:36] Hi there. Thank you very much for inviting me. It's good to see you again.
Joe [00:01:39] Awesome to have you on the show. I don't think we've spoken in a few years now. I saw you at a conference a few years ago. I think it was SourceCon and we talked a little. So when we talked originally, we really were talking about WebDriver the W3 standard that just come out and it seems like it's really evolving. I just saw a new thing called WebDriver BIDI spec, and it is that new. What is that?
Simon [00:02:04] Sure It's WebDriver BiDi so short floor WebDriver bi-directional. Because they're not very good at naming things and this project right so there's a good, clear, simple way of doing it so that's what we're gonna do. So that's evolved out of some of the work that's been going on a sort of rather automation is moved forward. I want to. One of the things that is pretty clear is that the original WebDriver spec is designed for a really long… Let me rephrase that. The original WebDriver spec is designed to allow you to have the test running very remotely from the browser. So SauceLabs, BrowserStack services like that are easy to do, but also you can set up like an internal grid at work. And it turns out that sort of the new generation of Prasa testing tools that are out there. Things like Puppeteer, Cypress, Playwright, things like that. They assume that you're running locally and what they do is they hook into the browser debugging protocols. And the main difference between the browser debugging protocols and the WebDriver spec, there are many. The major one is that the browser can send events to your test, right? So when people say, oh my well, my test using Puppeteer is so much faster. What's actually happening is an event is coming from the browser. And they're saving 200 milliseconds, 100 milliseconds. Instead of polling for an update, they get the update coming straight to them. Now, the problem with depending on a browser debugging protocol is obviously these things are designed for debugging browsers, like that's all the NECMI is all about, right? And so they change with every single release of the browser because there's no requirement for them to be stable API that people write code against. And clearly, when you're doing browser automation and you want to support the latest Chrome, the next version of Chrome, the latest Firefox, the next version of Firefox, the latest Edge, da dad da, blah, blah, blah. You want to be able to take the same APIs and apply them consistently between browsers. And so what we're doing with WebDriver BiDi is we are taking the lessons learned from that event-driven model where the browser can send events and you can send commands to the browser in a sort of bi-directional way. And we're implementing that and standardizing it. And so the base they were going for is something that looks and feels a lot like the Chrome debugging protocol. In the same way that when we did WebDriver WC3 spec, we took the existing WebDriver, the remote Jason protocol from the Selenium project, and we tweak them and we made it nice. We're taking the same kind of approach where we're taking the existing CVP which is derived largely from WebKits' s original implementation, like all the way back in the Mr. Time, like Chrome used to be based on WebKit. And then they focus to become blank and the Safari Fox stayed on WebKit. And so the underlying debugging protocol is very similar between most of the major browsers and. Mozilla has been putting something similar into Firefox. So there's a good basis there that there has some commonality. And then what we're doing with this back method, with the specification is we are standardizing sort of the modules that people will need, the commands they need and really clearly defining the behavior of things. Because right now it's completely up in the air. Whichever version of the browser you get, you may or may not get the same behavior. And clearly, that's unacceptable when you're writing tests.
Joe [00:05:56] Absolutely. So when we spoke a few years back, we were talking about Cyprus that they can help people as they like to because I had the chrome debugging features in as well the protocol. Was this back part of your thought of, okay, this is probably gonna become a thing, and if I want to incorporate into Selenium, we need a better way to do it?
Simon [00:06:17] Yeah, I mean we always try to be sort of focused on the needs of testers as we're writing, writing Selenium. Right. The reason why we moved from the original Selenium API, the RC API to WebDriver is the Selenium RC API is really nice and that you could type Selenium hit like the autocomplete button in your IDE and then you would have approximately at an, slightly exaggerating, you'd have approximately like 9000 different options. And if you scroll through those and read them, there's a little bit overwhelming. So what we did with WebDriver is we move to a more object-oriented model where you hit control space and instead of having 15000 thousand things you instead had I think we got against like five or nine or something That it's a really small number when we launched. And then you got a web element out of that, and that gave you a set of options of things you could do with web elements. So being sort of user-focused like that has always been something we wanted to do. And one of the things we hear from people is like, ah, you know, it's really hard to do some kinds of testing using Selenium. And I can do that with Puppeteer. I can do that with Cyprus because the browser is sending the events and letting me know things. And so well, you know, it would be nice if we could get everything into the current model we have but we can't. And so you go to where the people are. And so having some support for a bi-directional communication mechanism is clearly something that people want to know and people need. And so it's something that we should provide.
Joe [00:07:50] Nice. So…
Simon [00:07:51] Does that make sense?
Joe [00:07:53] Yeah, it's almost like your future-proofing Selenium as well. You're not adding CRUD to WebDriver you're adding another way to get to it almost. So they're not… Does that make sense?
Simon [00:08:06] Yeah, that the two specifications were right and the two specifications, so they play nicely together.
Joe [00:08:11] Right.
Simon [00:08:12] So you can find an element using the current W3C WebDriver spec. And you can pass the element into WebDriver BiDi. And similarly, you could take a reference from WebDriver BiDi and you could use that in standard W3C WebDriver.
Joe [00:08:30] So I keep talking about the past, but now I want to go to the future. So W3C once again a few years ago was a big deal. It was really hard. You guys really made it happen. That's awesome. Is it everything you thought it would be?
Simon [00:08:43] So in my mind, I always thought of the W3C spec as a sort of low watermark. At no point, once that spec was in place could rouse automation regress behind that point. And I think that's really important because there are so many different browsers and so many different contexts and admittedly, there are fewer browser engines than there used to be. Blinkx appears to be doing very well. The Chromium-based browsers are super popular. The Chromium browsers are super popular, but WebKit is still there. Gecko is still there. Like Mozilla is still fighting the good fight. And there are smaller engines as well. And so all these people have different browsers and they behave in slightly different ways because they're all folks from different versions of different, you know, at different points in time. But you want to ensure consistent behavior. And then there are new form factors appearing like browsers on televisions, browsers on mobile phones. You know, browsers in cars, I would imagine are coming at some point. That's going to happen. I can't imagine anything more lethal than sitting in a self-driving car, browsing the web, and not paying attention to the road ahead. So, yeah, like we wonder. I always thought of that as a sort of low-water mark, but clearly, it's not enough to be happy with, like the lowest possible standard you could get away with. We do want to enhance capabilities and make things nicer.
Joe [00:10:08] Nice. So talking about making things nicer, you keep improving on Selenium, and Selenium 4 is it something for official release or that's still in alpha?
Simon [00:10:18] We are just on the cusp of what should be the last Alpha, and we have been for a while. And progress has been a bit slower than I think anyone would like if you ask. And primarily the reason for that is that most of the work is done by volunteers and the volunteers are busy with their own lives and sadly and very fortunately, it depends on how you view it. When we started the project, I think we were all quite junior engineers and now we're sort of 14, 15 years further through our careers. And we have a few more responsibilities, both at work and in real life. And so finding the time to compare it is harder than it used to be.
Joe [00:11:02] Absolutely. And this one thing once again, we talked about before about leadership and you know what happens is like so few core contributors and that someone left. What's the contingency plan? I think in one year and the notes to this you said something about leadership going forward as one of the talking things, one of the things you want to highlight. What did you mean by leadership going forward? Is there something in the works on how to make a contingency plan if one of the major contributors no longer has time or yet to contribute?
Simon [00:11:33] Yes, I mean I've stood on stage at Selenium conference and said like Selenium 4 will be the point where I step away from the project and sort of follow in the trail that sort of Paul Hammond and Jason Huggins trod before me where it's like, you know, it's been great, but now somebody else can steer the ship. And so over the past year, year and a half, we've worked really hard on the project to make sure that things that used to be implicit, and I'm just like I could make the choice or a handful of people who were super well plugged into the project can make a choice, try and open it up, try and make it clearer and make it more accessible. So if you go to the Selenium website, selenium.dev you'll see that there is a project governance page. We've split the project up into various pieces to make it a bit easier to figure out, like how can I start contributing? How do I contribute more? And what does that mean as we go through? So as well as contributors, there's a technical leadership committee, the TLC, and that's made up of the language binding authors and key technical people on the project. And then there's Project Leadership Committee, which has very little to do with actual leadership and everything to do with talking Software Freedom Conservancy. Who are the group that we're under the GSF and they handle complicated things like legal matters and paying bills when we've set up conferences and things like that. So they make the..the SFC, the Software Freedom Conservancy allow us to focus on delivering open source. Yeah, all these things that used to be implicit are becoming more explicit. We've written them down and now we can have proper discussions about them. And if you take a look, you'll see that we've got sort of new committees coming through which is amazing. And we've had existing committees step up and accept more responsibility as the project moves forward, which I am super pleased to see.
Joe [00:13:36] That's awesome. I think one of the reasons and off Selenium to when it became a standard was that the browser makers would take step up and make more contributing to it because then there's a standard. So does that take the load of having core committers because browsers then are responsible for their implementation of the spec?
Simon [00:13:57] Yeah. Having the browser vendors implement the specification probably saved the projects and made everyone's life a lot easier. And I say it probably saved the project because we were adding more and more functionality and it was getting increasingly difficult to do that without making modifications to the browsers themselves. So Jim Evans did amazing work getting into the Explorer to sort of do what it's meant to do. And we had so much code to try and get Firefox to do what I had to do. And by having the browser vendors take on that work through the leadership and do the implementation of the W3C spec, what it meant is that we could take a step back and start doing some of the more ancillary things that are absolutely vital to the user experience. So things like the language binding. Sneak peek for what's coming up later, we're reworking Selenium Grid 4 so having not being able to…not being responsible for doing a browser implementation gave us the capacity to do things like re-architect Selenium grid and various bits and pieces like that. So it allows us to focus more on the ecosystem rather than the browser automation itself.
Joe [00:15:10] Awesome. That's a good point. You mentioned Selenium Grid. When people think of Selenium they think of, I think three tool's Selenium IDE, Selenium, WebDriver, and a Selenium Grid. So when we talk about Selenium 4, does that apply to all three of those? I don't know what you call them, tools or areas.
Simon [00:15:27] Yes, absolutely. So Selenium 4 is a nice point for us to sort of reintroduce people to various bits and pieces. So there's the new IDE which APLA tools have poured a huge amount of effort in to make that a web component-based and coming down the track there's an electron version as well, which is going to be really nice. So there's a new Selenium IDE. There's the new WebDriver bindings where we're bumping the version number because we're taking out the deprecated methods, the things that were less ideal and we're cleaning up some of the internals of the plans and APIs. And with a focus on maintaining backward compatibility. So you should be able to do this, drop Selenium 4 into your Selenium pretests and it should just work. With the caveat that if a method was deprecated, it's probably gone like we try and give people enough of a heads up that, hey, like you should try and do something different here. And it's been like a year, two years since we did a major release. So people have had time to look at those deprecation warnings and do something about them. But then there's that. And then there is the new Selenium Grid. And that is being almost completely rebuilt from the ground up to be sort of more suitable for use in the modern world.
Joe [00:16:47] Nice. So you're saying when Selenium 4 is officially released, it's an Alpha now, they'll get a newer Selenium Grid. What's new about it? Just the… is it behind the scenes where people know that it's different if they're just a common user?
Simon [00:17:04] Yeah, if you're only just running, you know, Selenium standalone server, you know, Selenium jar standalone you'll have the same sort of experience which is kind of nice. But there are things that were bolted on by third party projects that really should be part of the core Selenium project. So the Selenium group 4 for example, you can fire it up and have it use a docker container out of the box to run your browser instances, which was a feature that Selenium had and some other pieces. And it's like Docker appears to be a staple of the modern web development world. So we should lean into that a little bit. The other thing that we did is you can still run Selenium Grid like hub in the node. That common model that people have. There are some changes for comfort files and stuff like that, but the underlying functionality is the same. But then we've taken that a step further and you can distribute this into a Kubernetes cluster or AWS or GCP or RCA (??). And you can scale a grid to absolutely gargantuan sizes if that's what you want to do. And if you want to pay the price of running a gargantuan grid on a public infrastructure. But it's designed to scale sort of horizontally where it can. So we've taken out some of the pieces where it's like this isn't very efficient. This isn't making the best use of the resources and now things are designed to sort of scale a bit more easily which is a lesson actually that we learn from the selenoid people. Eric Huber and company (??) they've been they were talking and it's like, oh, yeah, they're making some really good points. And as the Selenium project, we should probably be doing better. And so that's been a lot of fun.
Joe [00:19:00] Awesome. So we talked about taking some stuff away from Selenium for where deprecation, but I know there's a whole lot of new things as well, like new elements. Anything you think people that are gonna be really excited about? They're baked into Selenium for that they're gonna be like, wow, I didn't know we'll be able to do that now.
Simon [00:19:18] If I was a user of Selenium, the thing that I would be most looking forward to is what we call the relative locators. So, I've been to Selenium conference. I think you've probably been to similar conferences where people have whole talks and like here's how you find an element on a web page. And it's very complicated CSS, a very complicated X path or, stacking up like loads of different locators. And it's like, well, apply some heuristic. And what you really want to be able to do is like, oh, click on the button that's above this search box or find the image that has the logo and below that, there's a login link, right? And people when they talk about things, they use this sort of very human language, right? Where are you going? Ah look, it's a barber, it's below it's kind of near this thing, right? And relative locators are attempts to encapsulate that in code. So what you can say is find elements and you give it a what kind of element you're looking for. So find them in per element above here and low here or to the right of this. And it will apply all these filters and then go, okay, this is the element you want. The thing that we need to do is order the returned element by proximity, which we know we need to do. That's on the list. But it's super nice because now you can something go, okay, fine. I can use a more human way of describing things. So that's what I would look forward to probably the most as a user.
Joe [00:20:54] Very cool.
[00:20:56] I would like to say that we invented ourselves, but once again we have taken a leaf out of somebody else's book, and in this case, there's a project that started years and years ago called Sahi by Narayan Raman. And they had relative locators for the longest time. And then you saw a new framework's like Taiko from (unintelligible) coming through, and that has them as well. Like, you know what? This is an idea whose time has come. And it's probably a good idea for us to put that into Selenium as well because it's such a useful feature and it's so nice as it used to be able to pick it up.
Joe [00:21:33] Absolutely, even now, I'm older. So when I was using QTP, they had this up. This type of option where you could do relative locator. So it's really cool to come into Selenium open source. So how much effort does people need to do then? Do they need to make…understand this X path and the CSS of everything they want to be near or it just as matter is just near the name of this button or the name of this link?
Simon [00:22:02] Actually, I don't know whether we've implemented near yet, but we will be able to have the building blocks to do that once we have proximity sorted out. You need to understand the box model a little bit. So when we say something is to the left of something, what we mean is the element to your search and to the left of, it means to the very left of that. So the most left, no stage. So say that's aI don't know, fifth position X 50, right? It would be anything under 50 would be considered left. If you had something that overlapped a little bit, that isn't strictly left of. If you just look at, like draw lines across the page, and I think people might find that a bit of a rough edge. So as we go through the beaches, I think we're going to find people coming up with perfectly reasonable use cases and we'll be tweaking things to be a bit more appropriate. So hopefully most of the time it'll do exactly what you want to and when it doesn't, it'll be because it's doing exactly what you told it, which is what computers are really good at but what humans really hate.
Joe [00:23:15] Absolutely. It's not about doing what you tell it to, but you don't know why something failed. I believe there's some new functionality with.. where was I reading? Something with exceptions. Are you doing a work on exceptions to make it so when it does feel it's easier to know why?
Simon [00:23:31] And not so much exceptions, but baked into the new grids. We have integrated a framework called Open Telemetry, which allows you to do distributed tracing. And so if I was a sysadmin, the thing I would be looking forward to with Selenium 4 as well as, hey, look, I can now deploy this on a Kubernetes grid and that's very exciting. Would be that when it goes wrong, you can hook into anything that consumes their tracing outputs and then you can have a look at what's going on in the grid and crack it open and try and figure out like, why? Why is this happening? Tools that support Open Telemetry, things like Honeycomb and Jaeger and some of those sort of big projects, I think Datadog as well, can consume the open telemetry stuff. And so, you know, you can just put this into your existing infrastructure for figuring out what the heck is going on and get some insight into what's going on in the grid, which is lovely. Just that level of transparency.
Joe [00:24:35] That's cool. That is really cool.
Simon [00:24:38] The other thing that we have in the grid is we're working on a new console for it, a new front end. And in order to power that, we've exposed our GraphQL endpoint. And so now you can run a GraphQL query against Selenium grid, either distributed or running on your local machine and you can extract a whole bunch of useful information limited to just the information you actually care about, which is sort of the promise of GraphQL right?
Joe [00:25:07] That is awesome. I'm getting excited. I haven't been doing automation hands on for a while, but I need to put on my machine again to get into it. So I guess another thing you implement also is we talked a little bit of a chrome debugging protocol. Can you talk a little bit? What that means for folks are maybe haven't heard of it before. Or maybe they've heard of it with other tools.
Joe [00:28:36] And now people realize how awesome that is to work for a health care company. And in order to you can just upgrade or downgrade or FDA came in, you have to run it against the same exact environment that you ran the testing at that point in time. So a future like that would definitely help because it's not going to break anytime. You don't have to download anything else. That is awesome. You all have thought of everything, this release.
Simon [00:29:00] Yeah. We're trying to like, so for a long time, we were busy doing that specification. And we sort of took our eye off and pull a little bit of how people were using it. And then once spec was done and we had some space we were then, okay, how will people consume this? Obviously, one time once I've left the project, I think there's gonna be some interesting stuff coming in Selenium 5 because we've done these no sees of facing things for the role APIs. And the next thing that people want is things like, ah, can you download the browser for me? And can you download the Chrome driver for me or the Gecko driver or the NSIS(??) driver? We'll make sure the Safari driver from my system is configured properly. And there are third-party libraries that already doing this stuff. And so we tend to point people like, hey, look, if you need this functionality for Java, for example, pick up WebDriver manager, and there are similar tools. And then there are the people who are wrapping the WebDriver APIs, and they're doing nice things for those, so in the Python world, they're Selenium base. And obviously, in the Ruby world there's water (??), which does some really nice things. So try and make users experience nicer. But I think sort of we've done some nice API isn't for and I think there's sort of additional functionality would be a really interesting way for whoever is leading Selenium 5 to take the project. But, that won't be me. That would be someone else.
Joe [00:30:33] I think some people might begin freaking out now, because when I think of Selenium, I think of two people, I think of you and I think of Jim Evans right off the bat. I don't know why it just pops my head. Well obviously you and Jim Evans, obviously. So I guess when people, when I recommend a tool or a solution, it's usually because of the leadership and how the strength of the community and obviously you have the community with Selenium, but are you confident that the leadership after you leave is going to is able to handle it? I'm not saying you're a God or anything, but it's just like people want stability and they want to make sure that that's not gonna be project that's going to die, that it's still gonna have, you know, that kick behind it and the leadership.
Simon [00:31:11] Yeah, I am 100 percent confident in the people that we have in the projects. You know, I'm not going to do what like you do you remember why the Lucky Stiff from the Ruby community who maybe you do, maybe you don't know.
Joe [00:31:29] No.
Simon [00:31:29] Okay. So why the Lucky Stiff? He was…he did some fantastic things at Ruby community, wrote some really nice guides. Like why the Lucky Stiff's guide to Braille's? I think was, that's one of the popular things and he did these libraries that people liked, but he always liked to be anonymous. And then somebody figured out who he was and said, I'm going to tell the world who you are. And so he deleted everything. So he deleted all his guides, all his libraries, everything, right? And clearly, like, when you're on the Internet, nothing is gone forever. But it took the Ruby community by surprise. And it was a bit of an amazing thing to do I am not going to do that right? I've been telling people for a year and a half you know behind the scenes. We used to, since we set Selenium 3, that my plan is a step away after 4. And you see people in a project taking more responsibility. So people like David Burns has been doing a lot of the heavy lifting, Diego Molina as well. You know, Manoj Kumar has been doing stuff and we're seeing sort of renewed engagement from companies like SauceLabs and BrowserStack who are contributing towards co-pays. So, yeah. I mean, I've been around for a long time. But, you know, David has been with the project forever. Diego has been with the project forever. And then we've got the sort of renewed energy coming from the investment that the companies are making, which is fantastic to see.
Simon [00:32:59] Wow. So I can't imagine what you're going to do with all your free time then if you're not messing around Selenium right. So.
Simon [00:33:10] I love build tools. I realize there must be something wrong with me, right? Because it's like, the first thing is what I really want to do is a…You know, a long time ago, I used to tell a story that I wrote WebDriver by accident. And what I wanted to do was I wanted to learn Ruby on Rails. And so I thought, I'll write an end to end test. That's the first thing I do. And the browser automation API is rubbish and it's like, okay, well I tell you, I'm joking, obviously, but I tell you what I'll do. I will write my own browser, Automation, a new API. And next week I'll be able to start letting Ruby on Rail. And clearly what's happened is I've got far enough along and it turns out building software is awful. And so before I can run my end to end tests so I now need to have a decent build tool. And that's what I'll focus on.
Joe [00:34:05] Awesome. kay Simon, before we go, is there one piece of actionable advice you can give to someone to help them to Selenium 4 and beyond automation testing efforts? And what's going to be the best way to find and contact you going forward?
Simon [00:34:19] The best piece of advice I can give about automation efforts is to remember the test pyramid. If you have a lot of small tests and a few integration tests and maybe like one or two Selenium tests, end to end tests, you're doing it right. If you have thousands of end to end tests, possibly using Selenium and five-unit tests, full tests, then you're in a whole world of pain. And it's been true since before I started Selenium, and it's true to this day. If you want to find me. Probably the best way is at shs96c on Twitter. That is my password by the way. That's actually the Twitter handle.
Rate and Review TestGuild
Thanks again for listening to the show. If it has helped you in any way, shape or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.