About This Episode:
Welcome to this week's episode of the Test Guild Automation podcast. Our featured expert is Simon Hofmann, the creator of nut.js, a Node.js Desktop Automation Testing framework that allows you to program your mouse and keyboard with JavaScript or TypeScript.
Have you ever been bogged down with repetitive tasks you wish to automate? Nut.js may be the solution for you! Its open-source Node.js cross-platform desktop automation can automate mouse movement and give you full control over your cursor. You can move, click, or drag it wherever you need, saving time and effort.
But that's not all! One of the critical components of visual testing or automation is the ability to search for images and text on-screen. Nut.js provides plugins to do just that, allowing you to perform on-screen searches efficiently.
Are you interested in improving your tests or workflows? With nut.js, you can retrieve information about open windows, giving you the power to make your tests more effective. And if that's not enough, nut.js also allows you to automate keyboard input to press single keys or type pages of text quickly. Whether you're using JavaScript or TypeScript, nut.js can handle it all.
So if you're looking for a desktop automation framework that can save you time and effort, look no further than nut.js. Tune in to this episode to learn more about its features and how it can benefit you!
This week's sponsor is TestGuild.
Let's talk if you're in the DevOps automation/software testing space and want to offer real-world value/solutions that can improve the skills or solve a problem for the Guild community. Discover how to reach your ideal customer now: testguild.info
Exclusive Sponsor
The Test Guild Automation Podcast is sponsored by the fantastic folks at Sauce Labs. Try it for free today!
About Simon Hofmann
Simon is a half self-taught, half trained software engineer who likes to dive deep into technical topics, cares about semantic versioning, testing and API design.
He's working on nut-tree/nut.js and its ecosystem to make JS desktop automation enjoyable!
Connect with Simon Hofmann
-
- Company: www.nutjs.dev
- Blog: www.nutjs.dev/blog
- LinkedIn: simon-hofmann-959230119
- Twitter: s1hofmann
- YouTube:s1hofmann
Rate and Review TestGuild
Thanks again for listening to the show. If it has helped you in any way, shape, or form, please share it using the social media buttons you see on the page. Additionally, reviews for the podcast on iTunes are extremely helpful and greatly appreciated! They do matter in the rankings of the show and I read each and every one of them.
[00:00:04] Get ready to discover the most actionable end-to-end automation advice from some of the smartest testers on the planet. Hey, I'm Joe Colantonio, host of the Test Guild Automation Podcast, and my goal is to help you succeed with creating automation awesomeness.
[00:00:25] Joe Colantonio Hey, it's Joe. Welcome to another episode of the Test Guild Automation Podcast. And today, we'll be talking with Simon, all about desktop automation testing frameworks for Node.js. Simon is a self-taught, half-trained software engineer who likes to dive deep into technical topics and cares about semantic versioning testing in API design. That's a really interesting history behind what he's done and a really cool solution I think you're really going to enjoy. He's been working on nut-tree/nut-js. It's an ecosystem to make JS desktop automation more enjoyable. So really excited to have him on the show. You don't want to miss this. Check it out.
[00:01:01] This episode of the Test Guild Automation Podcast is sponsored by the Test Guild. Test Guild offers amazing partnership plans that cater to your brand awareness, lead generation, and thought leadership goals. And your products and services in front of your ideal target audience. Our satisfied clients rave about the results they've seen for partnering with us from boosted event attendance to impressive ROI. Visit our website and let's talk about how Test Guild could take your brand to the next level. Head in over to TestGuild.info and let's talk.
[00:01:33] Joe Colantonio Hey, Simon. Welcome to the Guild.
[00:01:39] Simon Hofmann Hey, Joe. Thanks for having me.
[00:01:40] Joe Colantonio Great to have you. So is there anything else you want the Guild to know more about? I looked a little bit on your background, and it seems like you've done a lot of different things, so just curious if you can maybe expand on the bio that I gave.
[00:01:52] Simon Hofmann So basically the bio sums it up pretty nicely. Before it, I dove into the topic of computer science. I used to work as a mechanical engineer, but I have been interested in computers throughout my whole life. And once I finished my basic training as a mechanical engineer, I decided that I should continue to go to school and that ultimately led me to end up with a CS degree.
[00:02:24] Joe Colantonio Nice. How did you get involved in testing, though? Because it seems like you have more of a background in development.
[00:02:29] Simon Hofmann First and foremost, I'm a software developer, yes, but in one of my previous gigs and one of my previous companies, I've been on a development team that was specifically tasked with developing in end-to-end testing framework. Somehow relates and also development is also pretty closely related to testing. So whatever you develop, you'll want to have tested.
[00:02:53] Joe Colantonio Absolutely. I just find sometimes developers don't end to testing. Do you have any recommendations and maybe for developers to get them the tools that actually work better for a development type of environment to get them more involved in testing their code more frequently?
[00:03:07] Simon Hofmann Once you get started developing a greenfield project, a proper testing setup should be the first thing you should consider when you get going. So I wouldn't start a project without having a proper test setup at hand. And I mean, essentially it only gives you benefits. So let's say you're developing an application. Let's say a backend and your backend has a whole suite of reliable and meaningful integration tests, for example. Your future colleagues will be really grateful, I can tell you because having a meaningful set of integration tests allows you to get started developing and evolving your application without worrying of breaking something. Because if you have your business logic and the stuff that provides business value to your company properly tested, you will definitely be sure that you're not breaking something and lose money.
[00:04:05] Joe Colantonio Absolutely. I think when people think of automation though, for developers, they think of unit testing. So do you do more than unit testing? Seems like you're also more involved in integration and end-to-end testing.
[00:04:15] Simon Hofmann I don't know who coined that sentence, but there is a saying that you should write integration tests quite a few of them, because integration tests give you, in my opinion, more confidence than a unit test. And they also allow you to move quicker when you are refactoring something. I mean, unit tests, there are use cases where a unit test is definitely the right solution. But when thinking about complex systems that interact with other distributed systems, you should definitely keep an eye out for unit tests. And let's say if you're at a front end to your whole application stack, then you should definitely also dive into end-to-end testing and nut JS for example, basically combines all three of them because there are unit tests on code level, but there are also integration tests when it comes to, let's say, loading third-party plugins. They are connected via different interfaces and there's also a whole suite of full end-to-end tests that basically evaluate the whole function of the framework.
[00:05:17] Joe Colantonio Absolutely. So I want to dive a little bit more into nut JS that you mentioned it, I guess at a high level, what is nut JS for people that don't know, that's never heard of it?
[00:05:25] Simon Hofmann So nut JS is a node js framework that is written in TypeScript, and it's a framework that allows you to automate your desktop. And by automating your desktop. I am talking about simulating keyboard input, simulating mouse inputs, accessing clipboard content, and you also have the possibility to add plugins to the core framework which will allow you to perform on-screen image search, for example, or OCR. So you can essentially work with whatever you see on your screen.
[00:05:56] Joe Colantonio Cool. So I think when people think of front-end automation, especially in this day and age, they think of a browser application. But it seems like this was made specifically for desktops. So why did you create a tool for the desktop? Did you find that some tests required more than a browser?
[00:06:11] Simon Hofmann Not everything happens inside a browser these days, and especially if you think about an end-to-end testing scenario where you have, let's say, a web application, let's say a web shop, and you want to create an end-to-end test that verifies that full ordering process. So let's say a customer accesses your web shop, puts something into the basket, performs a checkout, and then let's say he receives an invoice, but the invoice is a PDF file and you want to validate that what ends up at the customer's site is actually a valid PDF with the proper content. Then that would happen outside of a browser.
[00:06:53] Joe Colantonio Absolutely. I know you've been developing this for a while. And also, I think it has like a lot of stars on GitHub, like 1.5 or more. So what use cases do people usually use us for? I know there are a lot of choices out there, not that many for desktops though, so that's why I was really excited to see it. Like, have you seen people use this and how long have you been developing it?
[00:07:11] Simon Hofmann So I'm developing it since August of 2018. So that's another 50 years. Use cases where it's been used. There are quite a few. So actually a lot of people are using it for game automation. But what made me really happy is that I had noticed that there is a free online school that teaches JavaScript or Nut JS development and they are using it to build a web socket server with their students. So that's essentially that's a remote control. So you can remote control your desktop via a web socket connection. But there are quite a few of other topics that nut JS is using. And I know of one example that uses it in a Video conferencing tool. This can share where you can access your peers mouse, for example, and the underlying technology that is used to perform the remote code execution or the remote mouse from nut JS and maybe, you know, scriptkit? and the underlying technology that performs keyboard actions in scriptkit is nut JS.
[00:08:20] Joe Colantonio Nice,I love about this. It seems like you set it up also to be expendable like you mentioned. So there are different kinds of plug-ins. And one of them that caught my attention was visual automation and testing. So who creates the plug-ins? Are you creating all these plug-ins on top of it? Like, who maintains the who's creating the plugins, and maybe talk a little bit more about the different plug-ins and the different types of things it can do.
[00:08:38] Simon Hofmann So when I started out developing nut JS.there was no plug-in system. Back in the days it consisted of essentially what is nowadays the core of the framework. And that core also contained all the code that was used to perform onscreen image search. And after a while I realized that this was a bit of a problem because most users were not that interested in onscreen image search in the first place. When they started out, they were mostly looking for mouse and keyboard automation. And back in the day, all the code that was used for on-screen image search was limited in which Node versions it supported. So naturally, it was always a problem because I was always running behind on updates. I had to recompile binaries and all that stuff. People continuously asked whether it would be possible that they use nut js without all the image matching stuff because without the image matching it was usable with all the recent Node versions. And that led me to think about the whole architecture of the project and I started to think about that it would be cool to have a plug in system that allows you to use the core on its own. But if you would, for example, want to use onscreen image search, then you would just have to install a plug in. It will self-register and you would have all the functionality in place. So that led me into refactoring the old framework. Moving the image matching code into its own plugin. And at the moment, I think I am the only person that's actively publishing new plugins. But there is also forks of existing plugins for myself that for example, are say, updated to recent versions or add support versions or node versions and stuff like that. All of the plugins that are listed on the Node JS website are for myself.
[00:10:45] Joe Colantonio Nice, but I also like it's cross-platform. So if I'm correct, you can run against Windows, Mac Linux, all the platforms, which is pretty cool.
[00:10:55] Simon Hofmann That was also one of the main goals I tried to achieve when I started developing it. Because one of the key motivations why I actually started developing my own tool was that all of the existing frameworks were not that actively maintained or more likely unmaintained at that time. I also wasn't happy about the current state of these frameworks because all these desktop automation frameworks are essentially relying on binaries. You have a lot of C++ code that accesses operating system level APIs to perform all the automation for, let's say ... movement and all of the existing frameworks that nut provide proper way to easily install all these tools. You would have to recompile all the native code on your machine once you installed it. And that would be a big problem if you would want to integrate to package in any kind of application you want to distribute to users because you would be on your own when it comes to how do I manage package distribution, how do I cross-compile stuff for different platforms so that. Well, I have to think about how could I manage to have just a single package I can install via NPM install, but the default way in the node js world and it just works out of the box.
[00:12:24] Joe Colantonio So yeah, that's another problem. Sometimes some of these tools require a lot of setup and this seems like it's just from the command line at one install command. And there it is. Is there anything anyone needs to know before they use it? Do they need to have certain anything else installed on the machine in order for it to work? Well, like you said, as truly self-contained.
[00:12:43] Simon Hofmann On Linux, there are two requirements for X11 libraries to be installed but on both Mac and Windows it's really just an NPM install and everything runs out of the box. And it's also the same for the image matching and the OCR plugins because these plugins are essentially also binaries and in most frameworks that also do onscreen image search. Most of these frameworks are using open CV in the background so nut JS does it as well. But for the most part, what I have seen on other frameworks, they are relying on users to install open CV binaries on their system. And I can tell from experience that installing open CV on a Windows is a different story than installing it on a Linux box, for example. And then you would also have to make sure that you have installed the right version. And that was also something I really didn't want my users to care about. The image match and plug-ins I provide are fully self-contained. They ship all the required libraries and you can just install them via them, open their code and they are ready to go.
[00:13:56] Joe Colantonio Why did you use Node.js or JavaScript rather than Python because I looked at your history, it seemed like you started off using Python. Is there a benefit or is a reason why you chose this tech stack to write this in?
[00:14:08] Simon Hofmann And I used to write a lot of Python when I was still in university, but over time that focus really shifted towards the whole node JS and JavaScript ecosystem. And nowadays I hardly write Python anymore. So you could say I'm all in on the JavaScript and TypeScript ecosystem.
[00:14:31] Joe Colantonio Nice. So talking about that, I think this also integrates with popular libraries like JEST so how does a plug-in to JEST or how would you use this with JEST?
[00:14:38] Simon Hofmann Just that was actually an experiment that I started out of curiosity because I started thinking about, okay, I'm now able to search for something on my screen, but how would I use that in case of actually writing a test? And then I well, so all the tests I have and nut JS are written as JEST and so I started thinking about maybe I can integrate it with the existing framework I'm already using. And I started reading up on the JEST documentation and I noticed that it's pretty straightforward or pretty easy to write your own extensions for the framework. And so I gave it a try and added my own metrics. So now let's say you want to, for example, verify that your screen is showing an image, then you could write a typical JEST expectation in the form of expect screen. So you passed the nut JS screen object and then you can formulate your expectation in the form of to show and to pass to an image file you have on your system. And so the integration JEST basically came out of the same motivation I had with the overall API design, because I think it's a really nice way to structure your JEST expectations the way it is done in JEST or in Mocha by writing expect whatever test object you have in my example to show an image on my screen or not to show. It makes it really easy to read and understand what's the actual expectation of your test cases.
[00:16:15] Joe Colantonio I think I also saw something about Sikuli and Selenium. So does this integrate at all with Sikuli and Selenium or replacement for solutions like Sikuli and Selenium? Or are there any use cases where you would use this with Selenium anything like that?
[00:16:28] Simon Hofmann So what is the connection between nut JS and Sakuli? Well, I started developing nut JS in my free time, but well, I was one of the lead developers of Sakuli. Back in the days when I took over the development and maintenance of Sakuli, it was still in version 3.1 and was written in Java. So it was a mixture of Java code in combination with mouse one JavaScript engine that was integrated in the JDK or the JVM. And once I had the prototype of nut JS in place, that validated that it is possible to perform desktop automation with JavaScript. I pitched, rewrite of Sakuli V1 with Node.js.
[00:17:17] Joe Colantonio So Sakuli is still open maintained project?
[00:17:22] Simon Hofmann I'm no longer with my previous employer and I don't think that it's currently maintained.
[00:17:28] Joe Colantonio Gotcha. So you also mentioned in your bio modern software testing relies on containerization, Kubernetes, Docker, stupid questions, is any way you would use nut JS to help with maybe maintenance of environments or I don't know, setting up environments for you automatically?
[00:17:44] Simon Hofmann So I wouldn't recommend it if there is a way to perform it via, let's say, an automation tool like Ansible or TerraForm or something like that. But if you have set up that requires user interface interactions, then you could definitely do that. And one thing I'm considering for the future of nut JS is also a remote plugin that allows you to basically write code on one machine executed on one machine. But the actual execution or the actual interaction with your mouse and keyboard happens on a remote system.
[00:18:20] Joe Colantonio Oh, that's neat. Now, why would you do that just for remote provisioning of things or sending things up remotely, obviously?
[00:18:27] Simon Hofmann Exactly. That could be one use case. Yep. If you distribute it or if you distribute the remote calls over a bunch of machines, you can automate setting up whatever applications.
[00:18:40] Joe Colantonio Awesome. So would I know if you just released upon the website as release version three? So what's new in version three that added to your already existing code base?
[00:18:51] Simon Hofmann So with version three, I'd restructure the plugin interface. With version three, it is now possible to provide arbitrary plug-in data. If you would write your own plugin for nut JS, you could define an interface of your optional data, your plugin needs or your plugin is able to process and that would give or that gives plugin developers a lot more flexibility on how they can add additional functionality. So let's say your plugin should allow users to pass in. So one example is from the second image-matching plugin I created, and that plugin allows users to specify the amount of scaling to apply when searching for an image. And this is something that is not contained in the default interface for plugins. But with this new arbitrary data you can pass along to a plugin, you can pass down to it. It is now possible to configure that anyways, and along with release of V3, I also released that OCR plugin I mentioned earlier, so that gives you the possibility to search for text on your screen. So you could say screen.find single word hello and it would locate the text on your screen so you could use your or you could move your mouse towards it, which gives you a really nice way of interacting with your UI because you can enter, you can restrict the search area to look for the text to let's say the area of the currently active window and in case of, let's say a pop-up, you have to confirm you could limit the search area to the area of that window and still say screen.find single word confirm. And this way you could move your mouse cursor to the button that says Confirm. Click it close to the pop-up. Stuff like that.
[00:20:48] Joe Colantonio Nice. How accurate is that? Do you have to worry about pixels differences on different machines if you're developed on one and put on another, such as old school thinking that I have?
[00:20:58] Simon Hofmann So for all the image-based stuff, that's currently still a bit brittle. So if you switch it from one machine to the other with a different screen resolution, it probably will break because the image locate sort of the algorithm that performs the image search is not that flexible when it comes to different resolutions. That's something I still want to improve, but I don't find the time to do so. So far of the OCR on, I would consider it pretty stable because most of modern desktops do have high enough resolution for accurate OCR results. My guess would be that OCR should be stable between different machines.
[00:21:44] Joe Colantonio Just a random question I would just look at through the release file and I've saw something about Jimp security advisory. That's something. What is Jimp?
[00:21:52] Simon Hofmann Jimp is an NPM package that nut JS uses internally. And a while ago it had a CVE that was not yet addressed in an official package release. So what this advisory does is that it basically gave nut JS users an instruction on how to force transitive packages to be resolved in a way that the CVE did not affect them.
[00:22:21] Joe Colantonio Gotcha. Cool. I guess if someone's listening to this and like, whoa, I need to learn more, I'd love to get this implemented by need a little help. Do you offer consulting services?
[00:22:28] Simon Hofmann If you ask me, I'm happy to help.
[00:22:32] Joe Colantonio Awesome. All right, so before we go, is there one piece of actual advice you can give to someone to help them with their automation testing efforts? And what's the best way to find or contact you?
[00:22:43] Simon Hofmann The easiest way to contact me is probably via email, so you can either reach out via the contact that is listed on the Node JS website or hit one of my websites like s1h.org. And you can also find me on Twitter at GitHub. I'm running a nut js discord server. Happy to onboard new users there. I actually started the discord in the hope of catering to some kind of community around the framework. I'm always happy if new users or new people join.
[00:23:18] Thanks again for your automation awesomeness. The links of everything we value we covered in this episode. Head in over to testguild.com/a442. And if the show has helped you in any way, why not rate it and review it in iTunes? Reviews really help in the rankings of the show and I read each and every one of them. So that's it for this episode of the Test Guild Automation Podcast. I'm Joe, my mission is to help you succeed with creating end-to-end, full-stack automation awesomeness. As always, test everything and keep the good. Cheers.
[00:23:55] Hey, thanks again for listening. If you're not already part of our awesome community of 27,000 of the smartest testers, DevOps, and automation professionals in the world, we'd love to have you join the FAM at Testguild.com and if you're in the DevOps automation software testing space or you're a test tool provider and want to offer real-world value that can improve the skills or solve a problem for the Guild community. I love to hear from you head on over to testguild.info And let's make it happen.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.