What is Selenium 4? The Latest in Automated Browser Testing

27 April 2021, 09:35 AM

By Test Guild

You've probably heard that Selenium 4 Beta is now available.

After speaking with Simon Stewart, creator of the open-source project Selenium WebDriver API, I compiled some of his favorite new features of Selenium 4 in this article.

What are the Pieces that Make up Selenium 4?

First, before learning the central point about Selenium 4, let's go over the basics.

When most folks hear “Selenium,” they automatically think of Selenium WebDriver.

Aside from all the features of the Selenium test project, did you know, however, that there are two other significant pieces that make up the Selenium test project: Those pieces are Selenium IDE and Selenium Grid.

Simon believes that Selenium 4 is an excellent point at which to reintroduce people to the various bits and pieces that make up the Selenium Project.

Selenium 4 equals Selenium WebDriver plus Selenium IDE and Selenium Grid | TestGuild

To better understand this aspect about Selenium, let me introduce you to:

What is Selenium WebDriver? Selenium WebDriver is an open-source API that allows you to programmatically interact with a browser on an operating system the way a real user would. Selenium WebDriver was created exclusively for browser automation.
What is Selenium IDE? Selenium IDE is a record and playback automation tool.
What is Selenium Grid – Selenium Grid allows you to save time by spreading your tests across multiple machines, including heavy virtual machines.

Let's dive into each to see what's new!

Selenium 4 WebDriver Updates

The team finally deprecated a bunch of methods you were previously warned about.

This clean-up effort was long overdue. Much work has gone into cleaning up many of the driver's interfaces, classes, and methods code base.

For example:

interations.Keyboard has been replaced with Actions and KeyInput instead.

You can see a list directly in the official documentation

No worries, though, since Simon also mentioned they did focus on maintaining backward compatibility with legacy versions.

That means you should be able to add the latest version, that is Selenium 4, into your older tests, and this latest version should just work; with the caveat, of course, that if a method was deprecated, it's probably gone.

If that’s the case for your test, know that they tried to give you enough of a heads up that you should have known better! 

Seriously. It's been a year or two since they did a major release, so you’ve had time to look at those deprecation warnings and do something about them. What are more things you need to know about Selenium 4? Read on.

Selenium Relative Locators | TestGuild

Relative Locators For Automation Scripts

I think the most exciting new functionality of this release is relative locators for test scripts.

Think about your tests.

The very first thing you normally do is to find an element. You're probably already familiar with the locators baked into Selenium for many years, finding by XPath, link text, CSS selectors, and so on.

The problem with all of these is they require you to know an awful lot about the structure of the page, how the DOM itself is arranged.

Sometimes wouldn't it be easier to be able to use more human language?

This is where relative locators come in!

This approach allows you to tell Selenium what specific web element to interact with, based solely on its position relative to other specific web elements using a very human-readable syntax.

These concepts are not new.

They’ve been part of the test automation space for a long time— in vendor tools like Quick Test Professional.

They’ve also have been part of other free, open-source automation frameworks like Sahi and Taiko.

It's finally part of Selenium 4 as well!

Why is this so cool?

If you've previously attended any automation conferences, you've probably heard some talks regarding approaches to locating elements, specifically finding tricky Web elements.

These locator-strategy techniques usually involve very complicated CSS, XPath, or stacking uploads of different locators.

What if instead you could, say, click on the button that's above this search box, or find the image with the logo and below that find a login link?

When people talk about these things, they use very human language.

“Where are you going? Ah, look. It's above. It's below. It's kind of near this thing, right?”

Example of the new relative location syntax:


//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement passwordField= driver.findElement(By.id("password"));
WebElement emailAddressField = driver.findElement(withTagName("input")
                                                  .above(passwordField));


//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement emailAddressField= driver.findElement(By.id("email"));
WebElement passwordField = driver.findElement(withTagName("input")
	                                          .below(emailAddressField));
  

//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement cancelButton= driver.findElement(By.id("cancel"));
WebElement submitButton= driver.findElement(withTagName("button")
                                            .toRightOf(cancelButton));

Relative locators are attempts to encapsulate this approach in the test code. You can say “find elements,” and give it a “what kind” of element or specified element you're looking for.

For example, find an input element “above here,” and “low here,” or, “to the right of this,” and it will apply all those filters then say, “Okay, this is the element you want.”

This is a really nice advantage to have because it gives you the ability to find specified elements using a more human way of describing things.

This doesn't come without some complications, however.

This approach can trip you up if you aren’t familiar with the box model, so be sure to become familiar with it before loading up your test case with these friendly locators.

Simon said that hopefully, most of the time, WebDriver will do exactly what you want it to do— and when it doesn't, it'll be because it's doing exactly what you told it to, which is what computers are really good at but humans really hate.

Chrome Debugging with Selenium

Another feature within Selenium 4 that has folks raving about tools or the most preferred tool suite like Cypress.io is the ability to tap into the Chrome DevTool protocol debugging info.

In fact, many of the new features are actually based on the current Chrome debugging protocol.

Selenium now has a new feature around the Chrome Browser DevTools debugging protocol, enabling you to resolve critical bugs.

This Chrome Debugging Protocol (CDP) is what debugging tools used to communicate with Chrome, and it’s super-low level.

Almost like machine code.

For example, with Selenium, do you want to have to write code like this?

package main

import (

"context"
"fmt"
"io/ioutil"
"log"
"time"
"github.com/mafredri/cdp"
"github.com/mafredri/cdp/devtool"
"github.com/mafredri/cdp/protocol/dom"
"github.com/mafredri/cdp/protocol/page"
"github.com/mafredri/cdp/rpcc"

)

func main() {
err := run(5 * time.Second)
 if err != nil {
   log.Fatal(err)
 }

}


func run(timeout time.Duration) error {
  ctx, cancel := context.WithTimeout(context.Background(), timeout)
  defer cancel()
  // Use the DevTools HTTP/JSON API to manage targets (e.g. pages, webworkers).

  devt := devtool.New("http://127.0.0.1:9222")
  pt, err := devt.Get(ctx, devtool.Page)
  if err != nil {
   pt, err = devt.Create(ctx)
    if err != nil {
     return err
    }
   }

Nobody would want to use this approach unless there's no other choice.

So popular libraries wrap the underlining protocol with a more user-facing syntax layer.

For instance tools like Puppeteer aren't using raw Chrome Debugging Protocol (CDP):

Rather is developed a user-friendly facing API layer on top of the raw CDP to make it easier to interact with.

All this is great, but the problem with Puppeteer is that it’s JavaScript only.

What happens if you have a different language of choice?

You want to be able to program in lots of different languages.

The other thing is that you don't ever really want that to write code that uses the Chrome debugging protocol directly because it's like writing a machine code.

In the same way, you don't want to write code that uses the WebDriver protocol directly. That would be incredibly painful.

And so, what the Selenium contributors have done in Selenium 4 or version 4 is take some of the common use cases and implemented these test cases on top of the CDP—only with friendly APIs that feel Selenium-ish and can be used with any of its language bindings.

Now compare this Selenium CDP GeoLocation code snippet to the raw Selenium CDP syntax you saw earlier in this post:

import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.devtools.DevTools;


public void geoLocationTest(){
  ChromeDriver driver = new ChromeDriver();
  Map coordinates = new HashMap()
 {{
    put("latitude", 50.2334);
    put("longitude", 0.2334);
    put("accuracy", 1);
 }};

 driver.executeCdpCommand("Emulation.setGeolocationOverride", coordinates);
 driver.get("<your site url>");
}

Compared to the previous CDP code see how this is much easier to read and write!

Another example of how CDP comes in handy is that if you want to authenticate, you can now register and get some authentication credentials with Selenium.

And when you come to a page that asks, “Would you like to log in?” It will then use the credentials you've given it.

Similarly, you can insert network traffic; if you want to have stubbed backhands in your tests or test environment, for example, you can.

So you can write code that takes the browser to make a request and have it appear as if the server sent it to the browser.

This approach gives you a straightforward, friendly and flexible API where you have an HTTP request, and you return the response, and you can do whatever you want in the middle.

The original goal was to make the CDP more user-friendly, which I think they’ve achieved.

But, as they say in infomercials, that’s not all—there's more!

Another thing the contributors have done with Selenium 4 is to have it support multiple versions of the Chrome debugging protocol.

This means that with Selenium, you can run a test with major web browsers, such as the latest Chrome browser and the previous version of Chrome, and the latest Edge and the prior version of Edge, or other browser vendor versions.

This is great because your tests will work the way you expect them to without needing to download a completely new browser or a new version of the API.

You can dive into more details by looking at the class named Dev Tools.

This class contains all the methods, including predefined methods, you can use to interact with the developer options.

It also gives you access to other critical test information around performance and security.

Here is a list of some of the Chrome Development properties you can now capture in your tests:

Application Cache
Fetch
Network
Performance
Profiler
Resource Timing
Security
Target CDP domains

Selenium IDE New Capabilities

Applitools and other contributors have completely redesigned IDE, as I've written about in some of my past posts, including The Stunning Return of Selenium IDE

Looking at the Selenium IDE runner playback tool, you can tell they’ve poured a tremendous amount of effort into making it into a Web-component-based application.

I don't yet have much info on this piece about Selenium since the latest release at the time of writing is still version 3.17.0.

Simon did mention, though, that they are working on an upcoming standalone Electron app testing tool version as well.

How cool is that?

I will update you more about this record and playback tool as soon as the new major Selenium 4 is available.

Selenium Grid Getting Started Hub and Node Setup | TestGuild

Selenium Grid Updates

The Grid’s architecture has been completely rebuilt from the ground up to be more suitable for use in the modern software development world.

In fact, one of my most popular posts was on Zalenium, which was an attempt to build on capabilities the older Grid was lacking.

Zalenium has been discontinued, and when I asked its creator Diego Molina why, he said:

“I killed it because most of its features are now part of Grid 4. Video recording and running tests in docker containers is part of it.”

Selenium 4 Grid

Selenium WebDriver API creator Simon mentioned there were things that bolted onto it by third-party projects that really should be part of the core Selenium project.

The Selenium Grid 4, for example, also has Docker Support.

You can fire it up and use a Docker container out of the box to run your browser instances, which was a feature that Zalenium had, along with some other functionality

But then they’ve taken that a step further.

The new Selenium Grid now works in a Kubernetes cluster, AWS, GCP or Azure.

And, you can scale a grid to absolutely gargantuan sizes—that is, if you can afford the price of running a gargantuan grid on public infrastructure.

But it's designed to scale horizontally where it can.

Selenium Test Code Grader Robot | TestGuild

Selenium’s Telemetry Feature

Baked into the new grids, the contributors have also integrated a framework in Selenium called Open Telemetry, which allows you to do distributed tracing.

If anything goes wrong during a test run, using this capability, you can hook into anything that consumes tracing outputs. You can see what's going on in the Grid, and crack it open to try and figure out why it’s happening.

Other tools that support Open Telemetry—things like Honeycomb, Jaeger, Datadog, and others— can consume the open telemetry stuff.

If you're a sysadmin-type geek, you’re going to love this, because you can just put it into your existing infrastructure for figuring out what the heck is going on, and you’ll be able to gain some insight into what's going on in the Grid.

If you've been craving more transparency with your Selenium grid test runs, you now have it.

Selenium Grid and GraphQL

The Selenium Grid also has a new front-end console, which is powered by a GraphQL endpoint.

This allows you to run a GraphQL query against the Selenium grid, either distributed or running on your local machine, and you can extract a whole bunch of useful information.

For example here is the syntax to query the of each node in the grid using the new GraphSQL API

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ grid { nodes { status } } }"}' -s

Leadership Going Forward for Selenium 5 and Beyond

Along with all this latest version, Selenium 4 alpha goodness does come with some sadness.

If you haven't already heard, Simon is leaving the Selenium project.

He’s announced it on stage at several recent conferences; Selenium 4 will be the point at which he steps away from the project.

But don't panic. He’s leaving Selenium 4 in the hands of some very capable contributors.

Over the past year and a half, he has worked hard to insure the things that used to be implicit are clearer and more accessible.

If you go to the Selenium HQ website, Selenium dev, you'll see there is a project governance page.

They’ve also split the project into various pieces to make it a bit easier to figure out things like how to start contributing, etc.

In addition to its contributors, there’s also a technical leadership committee, the TLC, which is made up of the language binding authors and key technical people on the project.

There's also the Project Leadership Committee, which has very little to do with actual leadership and everything to do with talking with the Software Freedom Conservancy.

There has been a huge push to make the project more open and understandable to newcomers so that it is easier to support and contribute to.

Selenium 5 Upcoming Features

Apart from Selenium 4, we would also like to introduce upcoming features with Selenium 5. If you've been following the development of Selenium over the past few years, there was a strong push to make it a W3C protocol – specification standard.

This took a lot of time and effort.

Now that the W3C Standards Wire Protocol spec is done, the team has been focusing on more user-facing functionality.

To my way of thinking, that means the best is yet to come for Selenium!

Keep your eye out for even more practical automation Selenium testing awesomeness in the future.

Parting Words of Wisdom from Simon about Selenium

The best piece of advice regarding your automation efforts is to remember the test pyramid.

If you have many small tests, a few integration tests, and maybe one or two end-to-end Selenium tests, you're doing it right.

If you have thousands of end-to-end tests (possibly using Selenium) and full, five-unit tests, then you're in for a world of pain.

It has been true since before I started Selenium, and it's still true today.

Automation Guild 2021 Promo with Simon Stewart creator of Selenium WebDriver and other | TestGuild

Automation Guild 2021 Event

Don't miss Simon's Using Selenium 4 in Anger session at the 5th annual online Automation Guild conference. Register here

A bearded man with blue glasses and a black-and-white jacket smiles at a microphone in a studio setting.

About Joe Colantonio

Joe Colantonio is the founder of TestGuild, an industry-leading platform for automation testing and software testing tools. With over 25 years of hands-on experience, he has worked with top enterprise companies, helped develop early test automation tools and frameworks, and runs the largest online automation testing conference, Automation Guild.

Joe is also the author of Automation Awesomeness: 260 Actionable Affirmations To Improve Your QA & Automation Testing Skills and the host of the TestGuild podcast, which he has released weekly since 2014, making it the longest-running podcast dedicated to automation testing. Over the years, he has interviewed top thought leaders in DevOps, AI-driven test automation, and software quality, shaping the conversation in the industry.

With a reach of over 400,000 across his YouTube channel, LinkedIn, email list, and other social channels, Joe’s insights impact thousands of testers and engineers worldwide.

He has worked with some of the top companies in software testing and automation, including Tricentis, Keysight, Applitools, and BrowserStack, as sponsors and partners, helping them connect with the right audience in the automation testing space.

Follow him on LinkedIn or check out more at TestGuild.com.

James Daniel says:

April 15, 2021 at 3:02 am

Wow, yet another in-depth and fascinating article from you, Joe!

Your blogs and podcasts always amaze me. It is heartening to hear Simon Stewart speak about the latest happenings across Selenium. Apart from that, the way you compiled some of his favorite new features is very helpful. I’ve also come across few interesting blogs that talk about several aspects of Selenium Testing in Cigniti’s website which is indeed very useful – https://www.cigniti.com/blog/category/selenium-testing/
Shubham Panday says:

May 16, 2021 at 8:29 am

Great Sir I got onother article from you about reading.
You post always amazing blogs as they are informative and such a quality assistant thank you.
i recommend to check this out also for Software Testing Online Training – https://shapemyskills.in/courses/diploma-in-software-testing/
Athira says:

May 17, 2021 at 5:14 am

Hello Joe,
As someone who has worked in the test automation services sector, I find this blog to be very insightful and interesting. Thank you very much for sharing all of this wonderful information. You have started from the basics and explained every addition and feature to the core. Keep up the good work!