Showing posts with label Selenium. Show all posts
Showing posts with label Selenium. Show all posts

Thursday, September 22, 2022

WebDriver: Tracing the Interface WebDriver - Part 2

 

In the previous post of this WebDriver series, I shared a gist about what WebDriver does and how.  In this blog post as Part 2 of this series, I'm sharing a bit more details on WebDriver and RemoteWebDriver.  

From there, we will see how AppiumDriver is related to WebDriver -- which extends the interface SearchContext.

This blog post is written as part of 21Days21Tips from The Test Chat.  The tip shared in this post is to know more about WebDriver internals and how it associates with RemoteWebDriver and AppiumDriver.

This should help in understanding the Selenium APIs better and from where it comes.  This helps in having a better mental model of the Selenium WebDriver and how we want to structure the instructions in the tests and utilities we write


SearchContext and WebDriver



Picture: Representation of SearchContext and hierarchy of WebDriver


  • The SearchContext is the parent interface in the WebDriver hierarchy
    • The subinterfaces of SearchContext are
      • WebDriver
      • WebElement
  • This SearchContext defines two methods
    • findElement(By by)
      • Modifier and Type is: WebElement
      • It finds the first WebElement using the given method
    • findElements(By by)
      • Modifier and Type is: java.util.List<WebElement>
      • It finds all elements within the current context using the given mechanism
        • NoteI'm referring to Java APIs of Selenium in this blog post
        • More details of this can be found here.

Note: Selenium's Ruby client describes the Interface SearchContext as this.  


The WebDriver provides the below methods:
  • close()
  • findElement(By by)
  • findElements(By by)
  • get(java.lang.String url)
  • getCurrentUrl()
  • getPageSource()
  • getTitle()
  • getWindowHandle()
  • getWindowHandles()
  • manage()
  • navigate()
  • quit()
  • switchTo()

More details of these methods can be found here.


RemoteWebDriver and AppiumDriver


Further, we see the class RemoteWebDriver implements the interface WebDriver.  Today, the WebDriver and RemoteWebDriver communicate using standard W3C specifications.

That way, all the modern browser which adheres to W3C specification should not have (much) trouble when using WebDriver and RemoteWebDriver to mimic the user action on them.  We see the ChromiumDriver(), ChromeDriver(), FirefoxDriver(), Edgedriver, SafariDriver(), and OperaDriver() extending the RemoteWebDriver.

This hints us to know and learn:
  1. Why do we initiate the WebDriver for first
  2. And, then we instantiate the browser's driver
  3. Later how we use WebDriver's instantiation to drive action (mimic the user action) on the browser using the respective browser's driver
When we want to automate using Selenium Grid, we make use of RemoteWebDriver to drive the action between the client and server.

The class AppiumDriver extends the WebElement and RemoteWebDriver from the project Selenium.  And further, it has its own methods to interact with the mobile elements.  More details about the Java Client of AppiumDriver can be found here.

The subclasses of AppiumDrivers are:

  • AndroidDriver
  • iOSDriver
  • WindowsDriver


21 Days 21 Tips -- #day17

Here are my pointers to fellow test engineers

  1. Interface SearchContext is top in the hierarchy of the WebDriver interface
  2. Interface SearchContext defines
    • Should I want to search for the element in the whole page
      • using WebDriver object
    • Or, should I search within a containing element
      • using WebElement object
        • We can notice methods returning the type WebElement
  3. RemoteWebDriver implements the interface WebDriver
  4. The modern browsers drivers extends the class RemoteWebDriver
  5. AppiumDriver extends the class RemoteWebDriver and interface WebElement
For more understanding of the SearchContext and WebDriver, refer to below git repository of SeleniumHQ:

The below understanding should give a mental model of how the call happens in Selenium's library:
  • WebDriver and browser's driver instantiation
  • The order in which it is instantiated and used in programming to automate actions on the  browser

If noticed, the automation we do is more of programming and not of Selenium's library.  We extend and implement the Selenium library in our programming to mimic the action on the browsers and mobile apps.


Friday, April 1, 2022

WebDriver: Clarifying the Confusion on Why and What is the WebDriver - Part 1

 

I had a question "What is WebDriver and why should I use it to automate on a browser?"  I tried to understand it and relate its presence in code written using Selenium.  I see this question in the test engineers who are starting the practice of automation on browsers. 

And, most of us get confused with WebDriver, WebDriverManager, and WebdriverIO.  All of these are not the same but all these work around the same space that is automation on the web and mobile.

Between, I learn understanding of WebDriver is fundamental to the practice of automation on web browsers. The same idea is taken to the automation of mobile apps using Appium. 

I'm sharing this learning of me as a part of 21Days21Tips the initiative from The Test Chat community.  The tip here is to assist by providing clarity around the WebDriver and why we use it in automation on a browser.


What is WebDriver?

The WebDriver is part of the Selenium library and we use it every time when we are trying to do any interaction with and upon a browser.  It is also a language binding and helps to write the browser controlling code.  For example, if I pick Selenium's Java WebDriver,

  • it provides the APIs that I consume to control the actions on the web page displayed on a browser
  • likewise, if I pick Selenium's Python WebDriver it provides me the APIs that I consume to automate my actions on a browser
    • I code here using Python
That said, the WebDriver is a set of APIs and to be precise it is an object-oriented API adhering to the W3C standards.  As a result, the WebDriver drives the browsers effectively today as all popular browsers to the W3C standards.  The HTTP is used as the transport protocol.


Understanding the WebDriver

On a higher level, this is what WebDrier does:

  1. The tests we write make use of WebDriver API 
  2. This WebDriver API carries the commands (written in the test) to interact with the browser's driver
  3. On receiving the commands, the browser's driver and the browser will have native communication, where the driver will translate the commands to the browser to emulate the action on a browser.
  4. The browser returns the response to its driver
  5. The browser's driver will transfer information to the WebDriver
  6. Then, WebDriver shows the information to a user who is running the test
Examples of browser's driver are:
  • chromedriver of Chrome
  • geckodriver of Firefox


Representation of Selenium WebDriver's Communication


The instructions (commands) that I pass via WebDriver's object are translated to stateless information.  That is, there is no state maintained between the client and the browser's driver.



Representation of Selenium's WebDriver SPI & Browser Interaction

When the code enters into Stateless Programming Interface (SPI), it is called into a process that breaks down what the element is, by using the unique identification and then calling the command.  For example, let us look into the below statements to understand what the code looks like at SPI:


Code written using WebDriver API:

WebElement greetBox = driver.findElement(By.id("greeting_textbox"));
greetBox.sendKeys("Welcome to Testing Garage's Blog");

 SPI:

findElement(using="id", value="greeting_textbox")
sendKeys(element="greetBox", value="Welcome to Testing Garage's Blog");


Note: The findElement and sendKeys are the commands provided by Selenium's WebDriver API to find the web element on the web page and enter the text into the web element. The browser's driver receives these commands and data, then emulates the command (a user action) on the browser, and carries back the response to WebDriver.


21 Days 21 Tips -- #day13

Here are my pointers to fellow test engineers who are confused about WebDriver
  1. WebDriverManager and WebdriverIO are not WebDriver
    • But all of these are around automation of the web and mobile
  2. WebDriver interface helps in
    1. Control of the browser
    2. Identification and selection of web elements on the web page
    3. Provides assistance to debug
  3. Browser Level API
    1. driver.manage().window().maximize();
    2. driver.get("https://testingGarage.blogspot.com");
    3. driver.navigate().back();
    4. driver.navigate().forward();
    5. driver.getWindowHandle();
    6. driver.getWindowHandles();
  4. Few Page Level API
    1. driver.findElement(By by)
    2. driver.findElements(By by)
    3. driver.getCurrentURL();
    4. driver.getTitle();
    5. driver.getPageSource();
  5. If you notice, we use these APIs to automate the browser
    • The tests we write use these APIs of Selenium WebDriver along with the assertion
  6. Why are we using "driver" in the above commands?
    • This is another question and confusion among fellow test engineers starting to practice automation
    • I will share this in the next tip :)
This understanding of WebDriver, and the why and how it is instantiated (in the next post) will help you to be comfortable in starting to read the test code written using Selenium.