Friday, April 1, 2022

WebDriver: Clarifying the Confusion on Why and What is the WebDriver - Part 1

 

I had a question "What is WebDriver and why should I use it to automate on a browser?"  I tried to understand it and relate its presence in code written using Selenium.  I see this question in the test engineers who are starting the practice of automation on browsers. 

And, most of us get confused with WebDriver, WebDriverManager, and WebdriverIO.  All of these are not the same but all these work around the same space that is automation on the web and mobile.

Between, I learn understanding of WebDriver is fundamental to the practice of automation on web browsers. The same idea is taken to the automation of mobile apps using Appium. 

I'm sharing this learning of me as a part of 21Days21Tips the initiative from The Test Chat community.  The tip here is to assist by providing clarity around the WebDriver and why we use it in automation on a browser.


What is WebDriver?

The WebDriver is part of the Selenium library and we use it every time when we are trying to do any interaction with and upon a browser.  It is also a language binding and helps to write the browser controlling code.  For example, if I pick Selenium's Java WebDriver,

  • it provides the APIs that I consume to control the actions on the web page displayed on a browser
  • likewise, if I pick Selenium's Python WebDriver it provides me the APIs that I consume to automate my actions on a browser
    • I code here using Python
That said, the WebDriver is a set of APIs and to be precise it is an object-oriented API adhering to the W3C standards.  As a result, the WebDriver drives the browsers effectively today as all popular browsers to the W3C standards.  The HTTP is used as the transport protocol.


Understanding the WebDriver

On a higher level, this is what WebDrier does:

  1. The tests we write make use of WebDriver API 
  2. This WebDriver API carries the commands (written in the test) to interact with the browser's driver
  3. On receiving the commands, the browser's driver and the browser will have native communication, where the driver will translate the commands to the browser to emulate the action on a browser.
  4. The browser returns the response to its driver
  5. The browser's driver will transfer information to the WebDriver
  6. Then, WebDriver shows the information to a user who is running the test
Examples of browser's driver are:
  • chromedriver of Chrome
  • geckodriver of Firefox


Representation of Selenium WebDriver's Communication


The instructions (commands) that I pass via WebDriver's object are translated to stateless information.  That is, there is no state maintained between the client and the browser's driver.



Representation of Selenium's WebDriver SPI & Browser Interaction

When the code enters into Stateless Programming Interface (SPI), it is called into a process that breaks down what the element is, by using the unique identification and then calling the command.  For example, let us look into the below statements to understand what the code looks like at SPI:


Code written using WebDriver API:

WebElement greetBox = driver.findElement(By.id("greeting_textbox"));
greetBox.sendKeys("Welcome to Testing Garage's Blog");

 SPI:

findElement(using="id", value="greeting_textbox")
sendKeys(element="greetBox", value="Welcome to Testing Garage's Blog");


Note: The findElement and sendKeys are the commands provided by Selenium's WebDriver API to find the web element on the web page and enter the text into the web element. The browser's driver receives these commands and data, then emulates the command (a user action) on the browser, and carries back the response to WebDriver.


21 Days 21 Tips -- #day13

Here are my pointers to fellow test engineers who are confused about WebDriver
  1. WebDriverManager and WebdriverIO are not WebDriver
    • But all of these are around automation of the web and mobile
  2. WebDriver interface helps in
    1. Control of the browser
    2. Identification and selection of web elements on the web page
    3. Provides assistance to debug
  3. Browser Level API
    1. driver.manage().window().maximize();
    2. driver.get("https://testingGarage.blogspot.com");
    3. driver.navigate().back();
    4. driver.navigate().forward();
    5. driver.getWindowHandle();
    6. driver.getWindowHandles();
  4. Few Page Level API
    1. driver.findElement(By by)
    2. driver.findElements(By by)
    3. driver.getCurrentURL();
    4. driver.getTitle();
    5. driver.getPageSource();
  5. If you notice, we use these APIs to automate the browser
    • The tests we write use these APIs of Selenium WebDriver along with the assertion
  6. Why are we using "driver" in the above commands?
    • This is another question and confusion among fellow test engineers starting to practice automation
    • I will share this in the next tip :)
This understanding of WebDriver, and the why and how it is instantiated (in the next post) will help you to be comfortable in starting to read the test code written using Selenium.