Testing Garage: Selenium

Showing posts with label Selenium. Show all posts

Thursday, September 22, 2022

WebDriver: Tracing the Interface WebDriver - Part 2

In the previous post of this WebDriver series, I shared a gist about what WebDriver does and how. In this blog post as Part 2 of this series, I'm sharing a bit more details on WebDriver and RemoteWebDriver.

From there, we will see how AppiumDriver is related to WebDriver -- which extends the interface SearchContext.

This blog post is written as part of 21Days21Tips from The Test Chat. The tip shared in this post is to know more about WebDriver internals and how it associates with RemoteWebDriver and AppiumDriver.

This should help in understanding the Selenium APIs better and from where it comes. This helps in having a better mental model of the Selenium WebDriver and how we want to structure the instructions in the tests and utilities we write

SearchContext and WebDriver

Picture: Representation of SearchContext and hierarchy of WebDriver

The SearchContext is the parent interface in the WebDriver hierarchy

The subinterfaces of SearchContext are

WebDriver
WebElement

This SearchContext defines two methods

findElement(By by)

Modifier and Type is: WebElement
It finds the first WebElement using the given method

findElements(By by)

Modifier and Type is: java.util.List<WebElement>
It finds all elements within the current context using the given mechanism

Note: I'm referring to Java APIs of Selenium in this blog post
More details of this can be found here.

Note: Selenium's Ruby client describes the Interface SearchContext as this.

The WebDriver provides the below methods:

close()
findElement(By by)
findElements(By by)
get(java.lang.String url)
getCurrentUrl()
getPageSource()
getTitle()
getWindowHandle()
getWindowHandles()
manage()
navigate()
quit()
switchTo()

More details of these methods can be found here.

RemoteWebDriver and AppiumDriver

Further, we see the class RemoteWebDriver implements the interface WebDriver. Today, the WebDriver and RemoteWebDriver communicate using standard W3C specifications.

That way, all the modern browser which adheres to W3C specification should not have (much) trouble when using WebDriver and RemoteWebDriver to mimic the user action on them. We see the ChromiumDriver(), ChromeDriver(), FirefoxDriver(), Edgedriver, SafariDriver(), and OperaDriver() extending the RemoteWebDriver.

This hints us to know and learn:

Why do we initiate the WebDriver for first
And, then we instantiate the browser's driver
Later how we use WebDriver's instantiation to drive action (mimic the user action) on the browser using the respective browser's driver

When we want to automate using Selenium Grid, we make use of RemoteWebDriver to drive the action between the client and server.

The class AppiumDriver extends the WebElement and RemoteWebDriver from the project Selenium. And further, it has its own methods to interact with the mobile elements. More details about the Java Client of AppiumDriver can be found here.

The subclasses of AppiumDrivers are:

AndroidDriver
iOSDriver
WindowsDriver

21 Days 21 Tips -- #day17

Here are my pointers to fellow test engineers

Interface SearchContext is top in the hierarchy of the WebDriver interface
Interface SearchContext defines

Should I want to search for the element in the whole page

using WebDriver object

Or, should I search within a containing element

using WebElement object

We can notice methods returning the type WebElement

RemoteWebDriver implements the interface WebDriver
The modern browsers drivers extends the class RemoteWebDriver
AppiumDriver extends the class RemoteWebDriver and interface WebElement

AndroidDriver extends AppiumDriver
iOSDriver extends AppiumDriver
WindowsDriver extends WindowsDriver

For more understanding of the SearchContext and WebDriver, refer to below git repository of SeleniumHQ:

SeleniumHQ Repository -- https://github.com/SeleniumHQ/selenium
Selenium Java Client

SearchContext

https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/SearchContext.java

WebDriver

https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/WebDriver.java

RemoteWebDriver

https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/remote/RemoteWebDriver.java

ChromiumDriver

https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/chromium/ChromiumDriver.java

The below understanding should give a mental model of how the call happens in Selenium's library:

WebDriver and browser's driver instantiation
The order in which it is instantiated and used in programming to automate actions on the browser

If noticed, the automation we do is more of programming and not of Selenium's library. We extend and implement the Selenium library in our programming to mimic the action on the browsers and mobile apps.

Friday, April 1, 2022

WebDriver: Clarifying the Confusion on Why and What is the WebDriver - Part 1

I had a question "What is WebDriver and why should I use it to automate on a browser?" I tried to understand it and relate its presence in code written using Selenium. I see this question in the test engineers who are starting the practice of automation on browsers.

And, most of us get confused with WebDriver, WebDriverManager, and WebdriverIO. All of these are not the same but all these work around the same space that is automation on the web and mobile.

Between, I learn understanding of WebDriver is fundamental to the practice of automation on web browsers. The same idea is taken to the automation of mobile apps using Appium.

I'm sharing this learning of me as a part of 21Days21Tips the initiative from The Test Chat community. The tip here is to assist by providing clarity around the WebDriver and why we use it in automation on a browser.

What is WebDriver?

The WebDriver is part of the Selenium library and we use it every time when we are trying to do any interaction with and upon a browser. It is also a language binding and helps to write the browser controlling code. For example, if I pick Selenium's Java WebDriver,

it provides the APIs that I consume to control the actions on the web page displayed on a browser
likewise, if I pick Selenium's Python WebDriver it provides me the APIs that I consume to automate my actions on a browser

I code here using Python

That said, the WebDriver is a set of APIs and to be precise it is an object-oriented API adhering to the W3C standards. As a result, the WebDriver drives the browsers effectively today as all popular browsers to the W3C standards. The HTTP is used as the transport protocol.

Understanding the WebDriver

On a higher level, this is what WebDrier does:

The tests we write make use of WebDriver API
This WebDriver API carries the commands (written in the test) to interact with the browser's driver
On receiving the commands, the browser's driver and the browser will have native communication, where the driver will translate the commands to the browser to emulate the action on a browser.
The browser returns the response to its driver
The browser's driver will transfer information to the WebDriver
Then, WebDriver shows the information to a user who is running the test

Examples of browser's driver are:

chromedriver of Chrome
geckodriver of Firefox

Representation of Selenium WebDriver's Communication

The instructions (commands) that I pass via WebDriver's object are translated to stateless information. That is, there is no state maintained between the client and the browser's driver.

Representation of Selenium's WebDriver SPI & Browser Interaction

When the code enters into Stateless Programming Interface (SPI), it is called into a process that breaks down what the element is, by using the unique identification and then calling the command. For example, let us look into the below statements to understand what the code looks like at SPI:

Code written using WebDriver API:

WebElement greetBox = driver.findElement(By.id("greeting_textbox"));
greetBox.sendKeys("Welcome to Testing Garage's Blog");

SPI:

findElement(using="id", value="greeting_textbox")
sendKeys(element="greetBox", value="Welcome to Testing Garage's Blog");

Note: The findElement and sendKeys are the commands provided by Selenium's WebDriver API to find the web element on the web page and enter the text into the web element. The browser's driver receives these commands and data, then emulates the command (a user action) on the browser, and carries back the response to WebDriver.

21 Days 21 Tips -- #day13

Here are my pointers to fellow test engineers who are confused about WebDriver

WebDriverManager and WebdriverIO are not WebDriver

But all of these are around automation of the web and mobile

WebDriver interface helps in

Control of the browser
Identification and selection of web elements on the web page
Provides assistance to debug

Browser Level API

driver.manage().window().maximize();
driver.get("https://testingGarage.blogspot.com");
driver.navigate().back();
driver.navigate().forward();
driver.getWindowHandle();
driver.getWindowHandles();

Few Page Level API

driver.findElement(By by)
driver.findElements(By by)
driver.getCurrentURL();
driver.getTitle();
driver.getPageSource();

If you notice, we use these APIs to automate the browser

The tests we write use these APIs of Selenium WebDriver along with the assertion

Why are we using "driver" in the above commands?

This is another question and confusion among fellow test engineers starting to practice automation
I will share this in the next tip :)

This understanding of WebDriver, and the why and how it is instantiated (in the next post) will help you to be comfortable in starting to read the test code written using Selenium.