What is Selenium 4? The latest in Automated Browser Testing

Automation Testing Published on:
What you need to know about Selenium 4

You've probably heard that Selenium 4 Alpha is now available.

After speaking with Simon Stewart, creator of the open-source project Selenium WebDriver, I compiled some of his favorite new features in this article.

What are the Pieces that Makeup Selenium Version 4?

First, let's go over the basics.

When most folks hear “Selenium,” they automatically think of Selenium WebDriver.

Did you know, however, that there are two other significant pieces that make up the project: Those pieces are Selenium IDE and Selenium Grid.

Simon believes that Selenium 4 is an excellent point at which to reintroduce people to the various bits and pieces that make up the Selenium Project.

Selenium 4 equals Selenium WebDriver plus Selenium IDE and Selenium Grid  | TestGuild

Let me introduce you to:

  • What is Selenium WebDriver? Selenium WebDriveris an open-source API that allows you to programmatically interact with a browser on an operating system the way a real user would. It was created exclusively for browser automation.
  • What is Selenium IDE? Selenium IDE is a record and playback automation tool.
  • What is Selenium Grid – Selenium grid allows you to save time by spreading your tests across multiple machines.

Let's dive into each to see what's new!

Selenium WebDriver Updates

The team finally deprecated a bunch of methods you were previously warned about.

This clean-up effort was long overdue. Much work has gone into cleaning up many of the driver's interfaces, classes, and methods code base.

For example:

interations.Keyboard has been replaced with Actions and KeyInput instead.

You can see a list directly in the official documentation

No worries, though, since Simon also mentioned they did focus on maintaining backward compatibility with legacy versions.

That means you should be able to add the latest version into your older tests, and it should just work; with the caveat, of course, that if a method was deprecated, it's probably gone.

If that’s the case for your test, know that they tried to give you enough of a heads up that you should have known better! 

Seriously. It's been a year or two since they did a major release, so you’ve had time to look at those deprecation warnings and do something about them.

Selenium Relative Locators | TestGuild

Relative Locators For Automation Scripts

I think the most exciting new functionality of this release is relative locators.

This approach allows you to tell Selenium what element to interact with based solely on its position relative to other Web elements using a very human-readable syntax.

These concepts are not new.

They’ve been part of the test automation space for a long time— in vendor tools like Quick Test Professional.

They’ve also have been part of other free, open-source automation frameworks like Sahi and Taiko.

It's finally part of Selenium as well!

Why is this so cool?

If you've previously attended any automation conferences, you've probably heard some talks regarding approaches to finding tricky Web elements.

These locator-strategy techniques usually involve very complicated CSS, XPath, or stacking uploads of different locators.

What if instead you could, say, click on the button that's above this search box, or find the image with the logo and below that find a login link?

When people talk about these things, they use very human language.

“Where are you going? Ah, look. It's above. It's below. It's kind of near this thing, right?”

Example of the new relative location syntax:


//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement passwordField= driver.findElement(By.id("password"));
WebElement emailAddressField = driver.findElement(withTagName("input")
                                                  .above(passwordField));


//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement emailAddressField= driver.findElement(By.id("email"));
WebElement passwordField = driver.findElement(withTagName("input")
	                                          .below(emailAddressField));
  

//import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
WebElement cancelButton= driver.findElement(By.id("cancel"));
WebElement submitButton= driver.findElement(withTagName("button")
                                            .toRightOf(cancelButton));
 

Relative locators are attempts to encapsulate this approach in code. You can say “find elements,” and give it a “what kind” of element you're looking for.

For example, find an input element “above here,” and “low here,” or, “to the right of this,” and it will apply all those filters then say, “Okay, this is the element you want.”

This is a really nice advantage to have because it gives you the ability to find elements using a more human way of describing things.

This doesn't come without some complications, however.

This approach can trip you up if you aren’t familiar with the box model, so be sure to become familiar with it before loading up your tests with these friendly locators.

Simon said that hopefully, most of the time, WebDriver will do exactly what you want it to do— and when it doesn't, it'll be because it's doing exactly what you told it to, which is what computers are really good at but humans really hate.

Chrome Debugging Feature with Selenium

Another feature that has folks raving about tools like Cypress.io is the ability to tap into the Chrome DevTools protocol debugging info.

Selenium now has a new feature around the Chrome Browser DevTools debugging protocol.

This Chrome Debugging Protocol (CDP) is what debugging tools used to communicate with Chrome, and it’s super low level.

Almost like machine code.

For example, do you want to have to write code like this?

package main

import (

"context"
"fmt"
"io/ioutil"
"log"
"time"
"github.com/mafredri/cdp"
"github.com/mafredri/cdp/devtool"
"github.com/mafredri/cdp/protocol/dom"
"github.com/mafredri/cdp/protocol/page"
"github.com/mafredri/cdp/rpcc"

)

func main() {
err := run(5 * time.Second)
 if err != nil {
   log.Fatal(err)
 }

}


func run(timeout time.Duration) error {
  ctx, cancel := context.WithTimeout(context.Background(), timeout)
  defer cancel()
  // Use the DevTools HTTP/JSON API to manage targets (e.g. pages, webworkers).

  devt := devtool.New("http://127.0.0.1:9222")
  pt, err := devt.Get(ctx, devtool.Page)
  if err != nil {
   pt, err = devt.Create(ctx)
    if err != nil {
     return err
    }
   }

Nobody would want to use this approach unless there's no other choice.

So popular libraries wrap the underlining protocol with a more user-facing syntax layer.

For instance tools like Puppeteer aren't using raw Chrome Debugging Protocol (CDP):

Rather is developed a user friendly facing API layer on top of the raw CDP to make it easier to interact with.

All this is great, but the problem with Puppeteer is that it’s JavaScript only.

What happens if you have a different language of choice?

You want to be able to program in lots of different languages.

The other thing is that you don't ever really want that to write code that uses the Chrome debugging protocol directly because it's like writing a machine code.

In the same way, you don't want to write code that uses the WebDriver protocol directly. That would be incredibly painful.

And so, what the Selenium contributors have done in version 4 is take some of the common use cases and implemented them on top of the CDP—only with friendly APIs that feel Selenium-ish and can be used with any of its language bindings.

Now compare this Selenium CDP GeoLocation code snippet to the raw CDP syntax you saw earlier in this post:

import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.devtools.DevTools;


public void geoLocationTest(){
  ChromeDriver driver = new ChromeDriver();
  Map coordinates = new HashMap()
 {{
    put("latitude", 50.2334);
    put("longitude", 0.2334);
    put("accuracy", 1);
 }};

 driver.executeCdpCommand("Emulation.setGeolocationOverride", coordinates);
 driver.get("<your site url>");
}

Compared to the previous CDP code see how this is much easier to read and write!

Another example of how CDP comes in handy is that if you want to authenticate, you can now register and get some authentication credentials with Selenium.

And when you come to a page that asks, “Would you like to log in?” It will then use the credentials you've given it.

Similarly, you can insert network traffic; if you want to have stubbed backhands in your tests, for example, you can.

So you can write code that takes the browser makes a request and have it appear as if the server sent it to the browser.

This approach gives you a straightforward API where you have an HTTP request, and you return the response, and you can do whatever you want in the middle.

The original goal was to make the CDP more user friendly, which I think they’ve achieved.

But, as they say in infomercials, that’s not all—there's more!

Another thing the contributors have done with Selenium is to have it support multiple versions of the Chrome debugging protocol.

This means you can run a test with the latest Chrome and the previous version of Chrome, and the latest Edge and the prior version of Edge, or other browser vendor versions.

This is great because your tests will work the way you expect them to without needing to download a completely new browser or a new version of the API.

You can dive into more details by looking at the class named Dev Tools.

This class contains all the methods you can use to interact with the developer options.

It also gives you access to other critical test information around performance and security.

Here is a list of some of the Chrome Development properties you can now capture in your tests:

  • Application Cache
  • Fetch
  • Network
  • Performance
  • Profiler
  • Resource Timing
  • Security
  • Target CDP domains

Selenium IDE is Back | TestGuild

Selenium IDE New Capabilities

Applitools and other contributors have completely redesigned IDE, as I've written about in some of my past posts, including The Stunning Return of Selenium IDE

Looking at the IDE runner playback tool, you can tell they’ve poured a tremendous amount of effort into making it into a Web-component-based application.

I don't yet have much info on this, since the latest release at the time of writing is still version 3.17.0.

Simon did mention, though, that they are working on an upcoming standalone Electron app testing tool version as well.

How cool is that?

I will update you as soon as the new major version 4 is available.

Selenium Grid Getting Started Hub and Node Setup | TestGuild

Selenium Grid Updates

The Grid’s architecture has been completely rebuilt from the ground up to be more suitable for use in the modern software development world.

In fact, one of my most popular posts was on Zalenium, which was an attempt to build on capabilities the older Grid was lacking.

Zalenium has been discontinued, and when I asked its creator Diego Molina why, he said:

“I killed it because most of its features are now part of Grid 4. Video recording and running tests in docker containers is part of it.”

Selenium 4 Grid

Simon himself mentioned there were things that bolted onto it by third party projects that really should be part of the core Selenium project.

The Selenium Grid 4, for example, also has Docker Support.

You can fire it up and use a Docker container out of the box to run your browser instances, which was a feature that Zalenium had, along with some other functionality

But then they’ve taken that a step further.

The new Selenium Grid now works in a Kubernetes cluster, AWS, GCP or Azure.

And, you can scale a grid to absolutely gargantuan sizes—that is, if you can afford the price of running a gargantuan grid on public infrastructure.

But it's designed to scale horizontally where it can.

Selenium Test Code Grader Robot | TestGuild

Telemetry Feature of Selenium

Baked into the new grids, the contributors have also integrated a framework called Open Telemetry, which allows you to do distributed tracing.

If anything goes wrong during a test run, using this capability, you can hook into anything that consumes tracing outputs. You can see what's going on in the Grid, and crack it open to try and figure out why it’s happening.

Other tools that support Open Telemetry—things like Honeycomb, Jaeger, Datadog, and others— can consume the open telemetry stuff.

If you're a sysadmin-type geek, you’re going to love this, because you can just put it into your existing infrastructure for figuring out what the heck is going on, and you’ll be able to gain some insight into what's going on in the Grid.

If you've been craving more transparency with your Selenium grid test runs, you now have it.

Selenium Grid and GraphQL

The Grid also has a new, front-end console, which is powered by a GraphQL endpoint.

This allows you to run a GraphQL query against the Selenium grid, either distributed or running on your local machine, and you can extract a whole bunch of useful information.

For example here is the syntax to query the of each node in the grid using the new GraphSQL API

curl -X POST -H "Content-Type: application/json" --data '{"query": "{ grid { nodes { status } } }"}' -s 

Leadership Going Forward for Selenium 5 and Beyond

Along with all this Selenium 4 alpha goodness does come with some sadness.

If you haven't already heard, Simon is leaving the Selenium project.

He’s announced it on stage at several recent conferences; Selenium 4 will be the point at which he steps away from the project.

But don't panic. He’s leaving Selenium in the hands of some very capable contributors.

Over the past year and a half, he has worked hard to insure the things that used to be implicit are clearer and more accessible.

If you go to the Selenium HQ website, Selenium dev, you'll see there is a project governance page.

They’ve also split the project into various pieces to make it a bit easier to figure out things like how to start contributing, etc.

In addition to its contributors, there’s also a technical leadership committee, the TLC, which is made up of the language binding authors and key technical people on the project.

There's also the Project Leadership Committee, which has very little to do with actual leadership and everything to do with talking with the Software Freedom Conservancy.

There has been a huge push to make the project more open and understandable to newcomers so that it is easier to support and contribute to.

Selenium 5 Upcoming Features

If you've been following the development of Selenium over the past few years, there was a strong push to make it a W3C-specification standard.

This took a lot of time and effort.

Now that the W3C Standards Wire Protocol spec is done, the team has been focusing on more user-facing functionality.

To my way of thinking, that means the best is yet to come!

Keep your eye out for even more practical automation testing awesomeness in the future.

Parting Words of Wisdom from Simon about Selenium

The best piece of advice regarding your automation efforts is to remember the test pyramid.

If you have many small tests, a few integration tests, and maybe one or two end-to-end Selenium tests, you're doing it right.

If you have thousands of end-to-end tests (possibly using Selenium) and full, five-unit tests, then you're in for a world of pain.

It has been true since before I started Selenium, and it's still true today.

Automation Guild 2021 Promo with Simon Stewart creator of Selenium WebDriver and other | TestGuild

Automation Guild 2021 Event

Don't miss Simon's Using Selenium 4 in Anger session at the 5th annual online Automation Guild conference. Register here

What you need to know about Selenium 4