Tuesday, March 11, 2025

AMYQ: Approaching the Solution to Test Data Challenges of Shrini Kulkarni

 

In this post, I'm trying to reason in brief for the questions shared by Shrini Kulkarni in this AMYQ.  Here are the challenges/questions shared by Srini,

  • #1 challenge - setting up data in upstream systems to suite test cases that need to be run. There is AUT and there are upstream systems. In a corp setup -- individual teams are setup for each application. Hence getting another team to set some data in other system often encounters lots of manual effort and red-tapism
  • #2 Reserving test data created in AUT or upstream systems for specific team's use so that other teams do not change it.


My Understanding of These Challenges

  1. AUT is in place and it has upstream systems.
  2. There are multiple teams for each AUT in an org.
    • Say teams A, B, and C for sake of learning here.
  3. Team A has difficulty when wanting to create the Test Data in the space of Team B, and vice versa.
    • The difficult for Team A can be as,
      • Not permitted to create data
      • No awareness and understanding of Team B's system
      • Cannot make progress in testing unless there is data created for Team A
      • The created Test Data by Team B is not shared to Team A for multiple reasons as data pollution and corruption which disturbs their test cycles
      • And, more ...
  4. How to reserve Test Data created in Upstream system for use of a specific team?
    • How to make sure other team does not use it or edit it or delete it?
This is my understanding of Shrini's challenges.



Conway's Law and My Experiences

The Conway's Law says,

The architecture of a system will be determined by the communication and organization structures of the company.

Inverse of Conway's Law says,
The organizational structure of a company is determined by the architecture of its product.

The above stated both laws hold good for Microservices teams and the teams which consumes their services.  

I have experienced,
  • Each microservices teams creating their own set of test data.
  • These test data is not shared to any other team.
  • The other teams are not allowed to create data in their space.
  • No team is aware of what others are doing in tests and what they are using to test.
  • The teams work in silos.
This leads to friction, aggression and unhealthy communications between teams.  Now, where is the possibility of having the Test Data in upstream system which can be consumed to run tests by every other teams?

Do you see the statements of Conway's Law and Inverse of Conway's Law here in how these teams are setup, orchestrated and communicates in building one product?



What's the Problem?

Here is my understanding of the Problem Statement from the challenges shared by Shrini,

How to setup Test Data in upstream system, to suite the test cases, that need to be run?  And, how to ensure Test Data created by one team is not modified by other teams?

If observed, the outcome described in previous section is not a technical problem.  

It is the collaboration and communication problem, that comes up in presumption -- other teams using one's test data it affects their team's work and delivery.

Yes, it can impact badly if the creation, editing and managing the test data is not done attentively by teams when everyone shares and consumes it.  So the respective team's engineering managers and directors show the resistance to create data in their team's space as it impacts their pipeline and delivery.


How I Solved It?

In the contexts where I work, I have different upstream systems to which I'm supposed to interact and get the data.  Then, use these data to test the service offered by the product.

But, creating test data in other team's space is not allowed!

There are multiple solutions that I approached with and solved this problem in different organizations.  The below said approach worked with most clients and so I share it here.  It is like a Design Pattern; I can use the structure of it in multiple teams to solve the similar problems.

I came up with an approach to create a suite having the endpoints of all these upstream systems.  And, I need not say how difficult and pain it was to get the endpoints and its details from teams.  It was a marathon circus of me to get it!

Here is how I approached the solution,

  1. I understood my teams were comfortable with Postman.  
  2. For this problem solving, I see Postman was simple and quicker than writing a automation project with libraries like RestAssured and request.
    • Team can run the Postman Collection from their system in quick time.
  3. I created an Test Data Inventory
    • The inventory has the Test Data which the test teams engineered for their testing and automation
    • These data is not just for functional testing; it had Test Data for security, performance and accessibility too.
  4. I built the Postman Collection having the endpoints of these upstream systems.
    • The collection is version maintained like any other code in the organization.
    • The meaningful commits are being made and pushed, as and when the need comes up.
  5. I crafted an environment file and wrote the scripts which places a variable having the stamp
    • This stamp tells, it is a test data created by automation run for this version of regression cycle and from this team.
    • The stamp is max of 10 characters and it is dynamic based on that run to avoid duplicate data.
      • I said it is dynamic and not the random!
      • This dynamic characters has meaning and an intent.
      • Note, when I say dynamic it is created for that iteration of Test Data
  6. These created test data are used by team eventually not touching the test data of other teams.
This let us move further with our tests and automation in the pipeline.   Note that, I have multiple upstream systems to which the AUT speaks.  Not disturbing them is a challenge!

I maintain the inventory of test data having versions of test data being crafted sprint-on-sprint.  We pull the test data of different versions from the inventory as and when it is needed for the intent of test and automation.

All that said, the team continues to work in silos with no much collaboration.  Who has to solve this?  Though I can solve this, my role and pay grade does not allow me to do so effectively.  I have solved within my teams and not to the organization level.  It is a engineering and culture problem which has to be addressed at the organization level.



To summarize, 
  1. The test data creation and its inventory management having versions of test data is an Engineering culture of an organization.
    • It has to be taken in consciously and help the teams move swiftly in building the resilient and usable systems.
  2. Test Data and its engineering is a project within an engineering project.
    • No wonder if any LLMs offers a business solution solely on Test Data in coming days!
    • Test the data offered by such LLMs before consuming it.
  3. Practice in the space of test data and testing the test data.
  4. There is Data Coverage in the engineering like it has Test Coverage.
    • What's your Data Coverage?



Monday, March 10, 2025

AMYQ: Ask Me Your Questions on Test Data -- Session 1


I made an opportunity for myself in this format -- Ask Me Your Questions (AMYQ).  I want to keep it a live interaction as much as possible and I chose a YouTube as an aid.  I'm experimenting it; I will improvise and upskill here it as I move ahead in this.

It is not a AMA.  It is AMYQ format with a topic which I come up listening to community.  In this format, I collaborate and interact with community listening to challenges and problems in their practice and work.  And, working on a solution approach for their context.

I asked the software engineering community for the questions around Test Data here.  I have received a few on LinkedIn and a couple of them in person.  We will be going through them.  

I will share my perspectives and approaches to deal with Test Data on Ask Me Your Questions on Test Data, while I listen to you.  Please join here.


Details of  this AMYQ Session -- 10th March 2025

  • Title: Ask Me Your Questions on Test Data
  • Date and Time: 10th March 2025, 8:30 PM IST
  • Duration: 30 minutes + 10 minutes
  • Interaction: Live
  • YouTube: https://www.youtube.com/live/cKS71LgwPM0


Questions Received


From Shrini Kulkarni
  1. #1 challenge - setting up data in upstream systems to suite test cases that need to be run. There is AUT and there are upstream systems. In a corp setup -- individual teams are setup for each application. Hence getting another team to set some data in other system often encounters lots of manual effort and red-tapism
  2. #2 Reserving test data created in AUT or upstream systems for specific team's use so that other teams do not change it.

From Avanti Gada

  1. Creating/finding data set to test features built on LLM's. How to test AI tools which were built using LLM's.
    • The application are internal to organisation. To take generic example say there is college finder when student searches with certain inputs it looks at Internet and gets all possible options in results. How to ensure the data fetched by LLM are right

From Sukanya Santhanakrishnan
  1. What one should keep in mind when the test data is confidential like passwords/person details? How should the system handle this in terms of security?
  2. In the AI era, do we rely on LLM generated test data and how much can we believe those? What are the additional steps we need to take after getting LLM's data? 
  3. What are the considerations when the applications handling large datasets under high memory usage? 
  4. What are your go-to checklist when you start preparing test data?



Thursday, March 6, 2025

When Does the Community Choose Not to Respond to Questions?


I read the testing, automation and test engineering related questions posted on the social media and web.  I try to understand the question, problem expressed and collaborate to assist.  I do keep 46 minutes in a day for this activity.  

Sometimes, it is hard to assist reading the question.  I feel like not giving up; but, then I do not see the communication happening actively from other end.


What Makes It Hard To Assist?


The questions asked,
  • It will lack the context.
  • It does not tell who is the person and what she or he is trying to accomplish.
  • It will not have information on
    • the environment.
    • what is the challenge he or she is facing.
    • what she or he tried so far.
  • It will not have minimal data as
    • screenshot, exception stack trace, the complete error message and details following it, data being used, code outline, and more details to the context.

The above are minimal data needed on removing any sensitive information.  Know and understand what is sensitive information for your context when you are sharing.



Then, what do the question will have?
  • Most will have a phrase or a couple or three sentences of what they are doing and what is seen on the screen.  And, asking what to do for what is seen on the screen?


Why it is hard to assist with vague details?

People who want to assist won't have any context about you -- who you are, what you are doing, or why are you doing it that way.  They won't understand your purpose or what you are actually experiencing based on the limited and vague information in the questions.

Without clear details, it is hard to connect the dots or pinpoint the problem.  Instead, people who want to assists will be guessing, making assumptions, and probably considering multiple possibilities.  Is this a way to use the time in community?

When questions follow a similar pattern but vary in challenges and context, it becomes demotivating to decode them.  Over time, people will lose interest in reading unclear questions, leading to fewer responses and missed opportunities for meaningful discussions.


What's happening at other end?

People who want to collaborate and assist will take the time to read the question.  But, when a question is too vague to understand, they often give up.  And, they feel bad for doing so.  The question remains unanswered.

When a solution is provided, there's often no update on whether it worked or not.  Those who contributed keep checking back, only to find no response.  Is that fair to those who invested their time to help?  Would you feel good if you were in their position?

Remember, in the community one can't buy someone's time to listen or solve a problemIt has to be earned!  And, it has to be earned every single time.




How to Frame and Post the Question?


There is no one excellent way of doing it.  Then, what should I do to post my questions?

  1. Know the community.
  2. Read through the questions shared earlier in that community.
  3. Look at the questions that have found resolutions, acknowledged and accepted.
    1. Observe how the question or problem experienced is described.
    2. Look at the details shared and how it is shared.
    3. Look at the context details shared and how it is shared.
    4. Observe how the interactions and conversation is taken forward from the two sides.
      1. Closely notice the words and how the energy is kept high in both ends and how each side is pushing for it.
      2. Importantly, look at the time taken to respond from both ends.

I do not want to share an example or reference saying this is the way to do it.

You figure out for your problem and to its context.  Share the minimal information said above and ask for the help.  Consistently improvise on how you ask, share and describe the problem.