Showing posts with label Problem Solving. Show all posts
Showing posts with label Problem Solving. Show all posts

Tuesday, December 16, 2025

Payment Transaction Declined With Code 201, Why?


I read a LinkedIn post yesterday.  This post asks for the appropriate contextual message to deal and how to continue upon what is shown to the user.  I see it is fair and straight.  I expect the same.  

Thanks Liudas Jankauskas for writing this LinkedIn post and sharing your experience.

What surprised me is why the discussion was not taken forward in the post.  As a  result, we did not seed the mindset and attitude of Test Engineering and Prevention discussion.  Testing does not prevent; but, the outcome of the test will persuade the prevention efforts and culture.

Okay! If you see this post is too long to read, then here is in short to you -- This is not a API problemThe API has worked seamlessly and it has done what it is supposed to do.  The code 201 returned is right and it is supposed to be there.  Here, the 201 is not a HTTP Status Code.  It is 3DS Error Response code.

Then, whose problem it is?  To know that, read the entire blog post.  You will thank me and yourself!


I Say This, Before I Start

  1. I'm writing this post as an interpretation and analysis for the JSON shared in the LinkedIn post.
  2. There is no intention of pointing to anyone.
    1. The intention is to share -- how to analyze and have perspectives to analyze when I [a test engineer] experience it.
    2. This post can become a reference to someone who is serious about Test Engineering and Prevention, and Testing.
  3. The intention of this post is to let know,
    1. How to analyze such incidents in the payment context?
    2. Is this a API problem?  What behaviors should be classified as an API problem?
    3. Which component and layer is supposed to handle it?  How and why?
    4. Which other components and layer is supposed to assist to handle it?
    5. How to see and interpret the HTTP and its status code in a narrowed cases?
    6. It is not always the API problem!
  4. I'm open to correct myself, unlearn, learn and update this blog post, when shared why I'm not making sense with I have written here.
    1. I will be thankful and humble to you for helping me to correct myself. 🙏
    2. Be comfortable and do connect and help me if you see I need it.


The Payment Failed; Could Not Buy Ferry Ticket

The user wanted to buy the ferry tickets by making the online payment.  The payment did not happen and could not buy the ferry ticket.

Instead, the user sees the JSON on the UI. 

The JSON has error description which reads as -- "TdsServerTransID is not received in Cres."

To understand what is TdsServerTransID and Cres, it is required to understand the payment gateway flows.   Continue reading the next sections to know them.


How It Is Layered and Works?


Before I get into analysis and say where the problem is, this context requires an understanding of the payment gateway flow.  Having this understanding, it helps to know how it could have been handled and prevented.

Refer to this below pic.  It gives the sequences of interactions in the payment transaction.  Note that, the below pic is not complete.  It is kept to what is needed to the context of this blog post.


Transaction between 3DS Server and Bank for the Challenge through
POST request.  It is shown as CReq and CRes.


The Sequence of Interactions
  1. I initiate the payment on merchant website or app to buy the ferry ticket.
  2. The merchant creates a transaction id for tracking.  Then, it hands over the rest to payment gateway.
  3. The payment gateway uses the 3DS way to carry this transaction.
  4. The 3DS initiates the Authentication Request (AReq) which will pass through the Directory Server.
    • The Directory Server reads the data in the request and forwards it to the right bank who have issued the card [or account] used in the payment.
  5. The bank receives the AReq.  
    • The bank decides should it give a challenge to the user to make the payment or just agree with the data passed in AReq
  6. Now, in this case the bank has decided to give the challenge to the user.  
    • The bank lets know the 3DS server through the response ARes.
  7. The challenge usually is to enter the OTP received over a SMS and authenticate.  I presume the same in this case.
  8. The user enters the OTP and POSTS the request.  Let us call this request as CReq.  The CReq is sent from from the user's browser or app to the bank.  The user interface on the browser or app at this point is handled by the payment gateway and not the merchant's web or app where the ticket is being brought.
    • This CReq will have,
      1. tdsServerTransactionId
      2. messageVersion
      3. messageType
      4. challengeWindowSize
      5. ascTransactionId
    • Note that, the CReq having the tdsServerTransactionId which is generated by the payment gateway.  tdsServerTransactionId is required in the CReq and expected by the bank.
  9. On receiving, the CReq, the bank processes the request.  The bank responds back to the 3DS server; let us call it CRes.
  10. The CRes will have,
    1. tdsServerTransactionId
    2. acsTransactionId
    3. challengeCompletionId
    4. transStatus
    5. messageType
    6. messageVersion
  11. If all goes as expected, the authorization will be given for the payment.  
    • If not, the bank responds to the payment gateway (3DS) that there is a problem and the message.
      • It is the payment gateway which has to read this problem and message.
      • The payment gateway has to give the better contextual information through its interface to the merchant who is consuming the service.
      • This is key!
      • Hope you found the spot here!


The Code 201 


Maybe, we test engineers presume the error code 201 returned is HTTP Status Code.  Ask your team what is this code and what is the purpose of returning in the JSON.

Back to 201, here.  It is not a HTTP Status Code.  It is neither a custom developed error code by payment gateway nor merchant who is selling the ferry ticket.

This is a payment domain specific code used in the payment technology.  Yeah!  The code to tell the payment gateway and merchant what happened with the initiated payment transaction.  

To be particular and more specific to the context, the 201 here is a 3DS APIs response code.


The 3DS Error Code and its Description.
The credit of this image is to developer.ravelin.com

The above image is shared here so that you know the error codes are available in the payment context.  I'm aware of 3DS protocol hence I could relate and understand the error code 201.  Refer to the References section at the end of this blog post.

The payment gateway has to read this error code. Then provide the right contextual message and instruction to merchant saying what to do incase of payment failure.

Here is the catch.  Not all merchant develops a mechanism to handle this.  Instead, it is left to payment gateway service provider and depend on it.  

Do the merchant bother about handling it at their end? 
  • As a merchant, I can switch to another payment gateway tomorrow.  
  • Why should I invest in building, developing and maintaining this as a part of my business when I'm paying another business to do it for me?  
  • Isn't that senseful business question or decision?
Now, you tell me who is expected to handle this 201 error? Who should let me know on the merchant web user interface what to do?



The Case and My Initial Observations


On submitting the challenge given by the bank, the below details is shown to the user.  


Pic used from the post of Liudas.
It is the CRes returned from Bank to 3DS Server at Payment Gateway.

My Observations
  1. Looks like the ticket being bought and payment made was in Turkey.
  2. The payment gateway involved to complete the payment transaction is Payten.
  3. The user is tryin to buy the ferry tickets and is making online payment. This error is seen on submitting the challenge given by the bank,
  4. Looks like the card is used for the payment.
  5. In the LinkedIn post, I read, multiple attempts were made to make payment, and the same message is seen.
  6. The Error Code returned is 201.
    • Which means, a mandatory field is missing and it is a violation.
  7. Error Component is S, which probably means 3DS server.
    • The component that raised the error.
  8. There is an error which is not handled.
    • The error reads as Unexpected error.
    • There is an id to track it.  That is, for auditing and tracking by the payment gateway.
    • This id do not have anything to do with the bank and merchant who is selling the ferry tickets.
  9. The message has a version that reads as 2.1.0
    • This means, 3DS protocol used is of version 2.1.0
    • This is communicated by 3DS server to the Directory Server and Bank Server, so that they use the same version in the contract.
      • If the same version is not available at the bank server side, the lower version will be used.
  10. Error description reads, the TdsServerTransID is not received in the CRes.
    • The 3DS component is raising this error.
    • TdsServerTransID means ThreeDS Server Transaction Id
    • 3DS means Three-Domain Secure, which is a security protocol used for online card transactions.
  11. This means, the response from bank is, it cannot fulfil the payment request.
    • Because, the request from the 3DS layer do not have TdsServerTransID.
    • Wait!
      • Do I actually know if the CReq had the TdsServerTransID in the payload?
      • This is critical to know.
    • This is important to know.
      • The AReq (Authentication Request) will have the TdsServerTransID in its payload.
      • That means, this was present earlier.  So, the CReq is fired from the 3DS server on receiving the ARes.
      • The bank received TdsServerTransID and said it is giving a challenge  before proceeding to authorization for payment.
        • This response (ARes) from the bank will have TdsServerTransID.
          • Along with this, the below are also present as a correlation ID to track this transaction.
            • dsTransID
              • The Directory Server ID to track this transaction
            • acsTransID
              • The bank server ID to track this transaction
      • From here, the CReq payload requires the TdsServerTransID and acsTransID
        • Where did the TdsServerTransID go missing now?  How?  Why?
        • This needs to be test investigated.  I have shared possible causes in the next section.

The TransactionID of the merchant who is selling the ferry ticket is different from the TdsServerTransID.
    • The TdsServerTransID is needed for the bank and Payment Gateway to complete the transaction.
    • Without this TdsServerTransId, the bank cannot proceed to authorize the payment.

All this is happening between the payment gateway [3DS server] and the Bank.  The user and the merchant website is not in the picture here.



Why Did It Happen?


It happened because, the CReq from the 3DS Server of payment gateway did not have that TdsServerTransID.  That is what the CRes from the bank says.  

Note that, I presume, the bank systems and services are functioning and serving its customer at this point in time.  This is an analyzed assumption I'm making here and I'm aware of this assumption I have made.

What made the CReq to miss the TdsServerTransID?
  1. I read, the user made multiple attempts to initiate and make the payment.
  2. I have multiple perspectives and interpretations with hypotheses to say this could have lead to miss the TdsServerTransID.
    • Talking about each hypotheses in detail is not the scope of this blog post.
    • But, here are some of the possibilities that could have led to this situation.
      • A glitch at payment gateway at that point in time.
      • Someone is breaching in the middle and tweaking the request and response over the network.
      • Device and hardware
      • Geo location
      • Network and traffic
      • Timeouts
      • Latency
      • Storage running out
      • Configuration gone wrong
      • Caching and missing -- not persisting
      • Intermediate and dependency services clogged
      • A new release deployed and updated
      • Downtime in the bank's services
      • Intermittent bank's services
      • Mismatching 3DS protocol versions that is not supported and accepted at either ends
      • And, more!
In simple, it was well handled by the bank.  I assume, the amount is not debited from the user's account.  This is equally important and the LinkedIn post do not share about it.

I see the bank's service have worked well in this context and done what is expected out of it.  The thumb rule is, when there is discrepancy in the data expected and received, do not authorize the payment request; abort it

Can the service do much more if the TdsServerTransID is missed in the payload of AReq?

Me as a test engineer in the payment gateway engineering team, I will test for the below in minimal and as a must,
  1. AReq with no TdsServerTransID and observe how what happens!
  2. Do the 3DS server still fire a AReq to the bank with no TdsServerTransID?
    1. If yes, then that is a problem!  It should be handled here.
    2. This problem can be prevented!
    3. I will ask my team why are we encouraging such request?
      • This increases our customer support cost and operations time in responding to our clients for using our payment gateway.
      • And, merchant can lose the business because of our payment gateway.
  3. What should 3DS server do when initiated AReq is missing the TdsServerTransID?
    1. What all other data in the transaction and session should be retrained and intact?
    2. Will these data change with a fresh TdsServerTransID created?
  4. I will explore and figure out what factors caused the 3DS to lose the user's authentication.


Who Created The Problem?

  1. From the error shared, I see,
    •  It is 3DS server who created this problem presuming the bank and its response is right.  


Is This An API Problem?

  1. No, it is not a API problem.
  2. If we call it as a API problem, we have not understood what is an API.


Whose Problem It Is?

  1. It is the problem of the service that fired the CReq.
    • Because, it fired CReq without the TdsServerTransID which is mandatory key-value in the payload.  
    • TdsServerTransID value cannot be null nor empty in JSON.
It appears to be a business logic problem of the service at 3DS server side for firing CReq with no TdsServerTransID.



How It Can Be Prevented?

  1. By making sure CReq will always have a distinct TdsServerTransID.
  2. If no distinct TdsSevrerTransID in a fresh CReq being fired, abort the request.
    • Create the distinct TdsServerTransID and then construct the request payload before firing the CReq.
To make sure, payment gateway [3DS server] preserves the TdsServerTransID, dsTransID, and acsTransID of the session.  If any one of this not matching at any of the components during the transaction, aborting the transaction is the best and right action to do.  

Can you recall your daily life experience where you are said to not refresh or click on the browser's back button when you have initiated the payment transaction on web or mobile app?  This is the reason!  To preserve the data of the transactions in the session between these systems -- merchant web or app, payment gateway [3DS server], Directory Server and Bank Server.

This is an automation candidate.  It has to be part of the daily test runs in the automation.



Who Should Be Fixing It?

  1. In my opinion for today upon analysis and assumption I have made, it should be fixed by the payment gateway.
  2. The payment gateway has to interpret the 3DS error code returned by the bank.  
    • And, then initiate the appropriate action as the payment transaction has failed. 
  3. A new transaction between the 3DS and bank has to be started for that merchant's order.  
    • If it cannot happen, the payment gateway should abort all the current open session tied to that merchant's order.
    • And, let know the user what is happening, and then direct the user.
  4. The payment gateway can read the ARes and CRes.
    • That means, the payment gateway can read the HTTP Status Code of ARes and CRes.
    • If there is an error code in ARes and CRes,
      • Then, the gateway should assert for the 3DS error code along with the HTTP Status Code.
      • For example, 
        • Say, the HTTP Status Code is 400 and 3DS Error Code 201 in CRes.
          • Assert for these two and direct with appropriate contextual message and direction.
          • In general, this is how custom developed error codes are handled by the client on receiving it from the services.
Note that, it is Error Code and not the Response Code.  The two are different.  And, the HTTP Status code is different from these two.



Why It Is Not An API Problem?


API is an interface which exposes the available services to the consumer.

It is the services which collects data and build the request payload, and expects the payload to process. 

This request will pass through an interface which is opened to the consumer.  This interface is called an API -- Application Programming Interface.

Analogy,
  • The car has gear stick to switch the gear.
    • This is an interface to the car's gear system.
    • The driver will use the car's gear system through this gear stick.
  • The gear box in the car is a service.
    • The gear box adjusts and responds by switching to the gear per the driver's input.
    • The different operations [business logic] provided by the gear box services are,
      • Reverse
      • Parking
      • Neutral
      • Switching the gear and assisting other components to speed up or speed down the car.

Can the interface have problem?  Yes, it can have.  In that case, the service will not be available to serve or to discover.  
  • Like, if the gear stick has problem, I cannot use the gears of the car, but, the gear box can be fully functional and in working condition.  
  • It is just the interface [gear stick] having the problem and as a result the consumer [driver] is unable to use the services [gear box].

In this case, the payment gateway could fire CReq to the bank.  That is, the API exposing this service is functioning.  It is the problem in a service.  It is the service that has missed to ensure and mandate the presence of TdsServerTransID -- a business logic problem.

In simple, we Software Engineers [including me] use the term API vaguely and with no sense of what it means.  This is my observation.  Further, the Software Testers have tossed this term API in all possible ways and learning it in incorrect ways.  I have no doubt in it when I say this.

Next time, when you say it is a API problem, rethink on what you are saying.  Is it an API or the service(s) which is accessed through that API?  It is useful when we describing the behavior of the system and its layer.  

I cannot say it is a gear box problem for the gear stick not [usable] working, and vice versa.  You see that?




To stop here,
  1. Testing skills and testing will help when it is collaborated with awareness and skills of the tech stack used in building the software system.
  2. Testing skills, programming and tech skills are not enough!
    • The domain skills and awareness is essential and critical.
    • If one is aware of the domain where the problem is observed, then the 201 will not be read as HTTP status code.
  3. Build the domain skills and maintaining the knowledge base as a GitHub project helps in a longer run.
  4. Interfaces and its gateways should not be overloaded with additional responsibilities to make sure the mandatory key-value is present.
    • If did so, one should learn why. Because, it is an anti-pattern and not a suggested software engineering practice.
  5. It is the services that has problems most times.  
    • The interfaces and its gateway will have the discovery, orchestration and traffic problems along with the risks of security.


References:
  1. https://httpstatuses.io/
  2. https://developer.elavon.com/products/3dsecure2/v1/api-reference
  3. https://developer.ravelin.com/psp/api/endpoints/3d-secure/errors/
  4. https://developer.elavon.com/products/3dsecure2/v1/3ds-error-codes
  5. threeDSServerTransID  (in our case the JSON reads as TdsServerTransID)
    • 3DS Server Transaction ID
    • Universally unique transaction identifier assigned by the 3DS server to identify a single transaction.
    • The 3DS server auto-populates and appends this filed value in the authentication request (AReq) it sends to bank in addition to the data you send.
    • This value can also be found by a lookup in the response received by this service -- /3ds2/lookup


Wednesday, June 18, 2025

Monetary Value Round Off -- A Legal and Business Problem

 

I read a blog post from Gaurav Khurana suggesting on round off of a number having decimals.  This is the post.  What made me curious is that number and what it meant.  It was a discount money having decimals.  Here, the number of decimals was not restricted to two.

Gaurav, said, it is a better experience if that discount amount is restricted to two decimals and rounded up.  Initially, I see, this is a fair expectation.

Here is the pic of a bill which Gaurav has shared.  I'm using it here on his permission.


I woke all of a sudden at 2:18 AM today, and I relooked into the post and that value whose decimal is not restricted.  I said myself there is something which cannot be rounded up or rounded down here and I could interpret it for a context.


Interpretation of the Value and its Meaning

  1. The "Bank Discount" showed the below amount
    • - 450.90999998999996
  2. The amount is in Indian Rupees
  3. This amount tells, it is the discount offered by the Bank on that transaction
  4. On round up, it will be,
    • - 450.91
  5. If I use the round up value, then the "Balance Paid" amount will be,
    • 4,058.18
  6. If I do not use the round up value, then the "Balance Paid" amount is still,
    • 4,058.18
  7. If I do not use the decimal values in "Bank Discount", then Balance Paid is,
    • 4,059.09
      • That is, I will have to pay 0.91 rupee (91 paisa) more.
    • In this case, someone is at loss.  Who?
      • Business?
      • Customer?
        • In the context of this bill, it is, customer who is paying.
        • Customer will pay 91 paisa, which is, almost one rupee more.
          • Now, imagine, the profit which business makes from all its customers for not having that decimal values in the amount!
        • Is this right?
The meaning and value of Balance Paid is not affected, in both cases, that is,
  • When I use Bank Discount value as is with decimals.
  • Or, When rounded up with two decimals.

My question is,
We should have check on decimals shown and used.  But, should we round up the number when it is of monetary significance and value?


Why I have this question?

  • Refer to the below examples.
    • 450.35
      • On round up, it will be 450.5
    • 450.55
      • On round up, it will be 451
If observed, on round up, the amount discounted is in excess by 0.15 rupee, that is, 15 paisa; and, in the case-2, it is in excess by 0.5 rupee, that is, 50 paisa.

If the bank loses these paisa for each transactions of same customer and other customers, won't the bank is shelling out the money?  Is this good?

Likewise, if I have to pay 0.15 paisa and 0.50 paisa, am I not paying more?  Then, imagine the big amount that business is making by rounding up a monetary number and making its customer pay in excess.


My Thoughts on Monetary Value Round Off

  1. The monetary value should not be rounded up or rounded down.  
    1. But, it can have a decimals and limited to two or to what business sees manageable.
      • As Gaurav said, decimal to two digits makes life easier for all is my understanding.
    2. The customer need to see this monetary value.
  2. The generated bill or transaction should have this detail.
  3. When making a payment, the billing system, can show a round off value.
    1. And, the business should decide upon, should the amount be round up or round down
May be to avoid such arguments and discussions in the billing counter, today, the bill amount will be a whole number and not a number with decimal.

Such round up instances can turn to a legal problems for the business.

I have noticed, the business offering round down on a monetary value, at times.  Also, I have noticed, certain consumer mobile apps showing rounded up number without breaking down the bill details.



Whatever, as a Test Engineer, I and you, have to create a report [ticket] for product's reference; keep the stakeholders informed of such behavior in the system by explaining the impact to business in monetary and legal aspects.





Tuesday, March 11, 2025

AMYQ: Approaching the Solution to Test Data Challenges of Shrini Kulkarni

 

In this post, I'm trying to reason in brief for the questions shared by Shrini Kulkarni in this AMYQ.  Here are the challenges/questions shared by Srini,

  • #1 challenge - setting up data in upstream systems to suite test cases that need to be run. There is AUT and there are upstream systems. In a corp setup -- individual teams are setup for each application. Hence getting another team to set some data in other system often encounters lots of manual effort and red-tapism
  • #2 Reserving test data created in AUT or upstream systems for specific team's use so that other teams do not change it.


My Understanding of These Challenges

  1. AUT is in place and it has upstream systems.
  2. There are multiple teams for each AUT in an org.
    • Say teams A, B, and C for sake of learning here.
  3. Team A has difficulty when wanting to create the Test Data in the space of Team B, and vice versa.
    • The difficult for Team A can be as,
      • Not permitted to create data
      • No awareness and understanding of Team B's system
      • Cannot make progress in testing unless there is data created for Team A
      • The created Test Data by Team B is not shared to Team A for multiple reasons as data pollution and corruption which disturbs their test cycles
      • And, more ...
  4. How to reserve Test Data created in Upstream system for use of a specific team?
    • How to make sure other team does not use it or edit it or delete it?
This is my understanding of Shrini's challenges.



Conway's Law and My Experiences

The Conway's Law says,

The architecture of a system will be determined by the communication and organization structures of the company.

Inverse of Conway's Law says,
The organizational structure of a company is determined by the architecture of its product.

The above stated both laws hold good for Microservices teams and the teams which consumes their services.  

I have experienced,
  • Each microservices teams creating their own set of test data.
  • These test data is not shared to any other team.
  • The other teams are not allowed to create data in their space.
  • No team is aware of what others are doing in tests and what they are using to test.
  • The teams work in silos.
This leads to friction, aggression and unhealthy communications between teams.  Now, where is the possibility of having the Test Data in upstream system which can be consumed to run tests by every other teams?

Do you see the statements of Conway's Law and Inverse of Conway's Law here in how these teams are setup, orchestrated and communicates in building one product?



What's the Problem?

Here is my understanding of the Problem Statement from the challenges shared by Shrini,

How to setup Test Data in upstream system, to suite the test cases, that need to be run?  And, how to ensure Test Data created by one team is not modified by other teams?

If observed, the outcome described in previous section is not a technical problem.  

It is the collaboration and communication problem, that comes up in presumption -- other teams using one's test data it affects their team's work and delivery.

Yes, it can impact badly if the creation, editing and managing the test data is not done attentively by teams when everyone shares and consumes it.  So the respective team's engineering managers and directors show the resistance to create data in their team's space as it impacts their pipeline and delivery.


How I Solved It?

In the contexts where I work, I have different upstream systems to which I'm supposed to interact and get the data.  Then, use these data to test the service offered by the product.

But, creating test data in other team's space is not allowed!

There are multiple solutions that I approached with and solved this problem in different organizations.  The below said approach worked with most clients and so I share it here.  It is like a Design Pattern; I can use the structure of it in multiple teams to solve the similar problems.

I came up with an approach to create a suite having the endpoints of all these upstream systems.  And, I need not say how difficult and pain it was to get the endpoints and its details from teams.  It was a marathon circus of me to get it!

Here is how I approached the solution,

  1. I understood my teams were comfortable with Postman.  
  2. For this problem solving, I see Postman was simple and quicker than writing a automation project with libraries like RestAssured and request.
    • Team can run the Postman Collection from their system in quick time.
  3. I created an Test Data Inventory
    • The inventory has the Test Data which the test teams engineered for their testing and automation
    • These data is not just for functional testing; it had Test Data for security, performance and accessibility too.
  4. I built the Postman Collection having the endpoints of these upstream systems.
    • The collection is version maintained like any other code in the organization.
    • The meaningful commits are being made and pushed, as and when the need comes up.
  5. I crafted an environment file and wrote the scripts which places a variable having the stamp
    • This stamp tells, it is a test data created by automation run for this version of regression cycle and from this team.
    • The stamp is max of 10 characters and it is dynamic based on that run to avoid duplicate data.
      • I said it is dynamic and not the random!
      • This dynamic characters has meaning and an intent.
      • Note, when I say dynamic it is created for that iteration of Test Data
  6. These created test data are used by team eventually not touching the test data of other teams.
This let us move further with our tests and automation in the pipeline.   Note that, I have multiple upstream systems to which the AUT speaks.  Not disturbing them is a challenge!

I maintain the inventory of test data having versions of test data being crafted sprint-on-sprint.  We pull the test data of different versions from the inventory as and when it is needed for the intent of test and automation.

All that said, the team continues to work in silos with no much collaboration.  Who has to solve this?  Though I can solve this, my role and pay grade does not allow me to do so effectively.  I have solved within my teams and not to the organization level.  It is a engineering and culture problem which has to be addressed at the organization level.



To summarize, 
  1. The test data creation and its inventory management having versions of test data is an Engineering culture of an organization.
    • It has to be taken in consciously and help the teams move swiftly in building the resilient and usable systems.
  2. Test Data and its engineering is a project within an engineering project.
    • No wonder if any LLMs offers a business solution solely on Test Data in coming days!
    • Test the data offered by such LLMs before consuming it.
  3. Practice in the space of test data and testing the test data.
  4. There is Data Coverage in the engineering like it has Test Coverage.
    • What's your Data Coverage?



Thursday, March 6, 2025

When Does the Community Choose Not to Respond to Questions?


I read the testing, automation and test engineering related questions posted on the social media and web.  I try to understand the question, problem expressed and collaborate to assist.  I do keep 46 minutes in a day for this activity.  

Sometimes, it is hard to assist reading the question.  I feel like not giving up; but, then I do not see the communication happening actively from other end.


What Makes It Hard To Assist?


The questions asked,
  • It will lack the context.
  • It does not tell who is the person and what she or he is trying to accomplish.
  • It will not have information on
    • the environment.
    • what is the challenge he or she is facing.
    • what she or he tried so far.
  • It will not have minimal data as
    • screenshot, exception stack trace, the complete error message and details following it, data being used, code outline, and more details to the context.

The above are minimal data needed on removing any sensitive information.  Know and understand what is sensitive information for your context when you are sharing.



Then, what do the question will have?
  • Most will have a phrase or a couple or three sentences of what they are doing and what is seen on the screen.  And, asking what to do for what is seen on the screen?


Why it is hard to assist with vague details?

People who want to assist won't have any context about you -- who you are, what you are doing, or why are you doing it that way.  They won't understand your purpose or what you are actually experiencing based on the limited and vague information in the questions.

Without clear details, it is hard to connect the dots or pinpoint the problem.  Instead, people who want to assists will be guessing, making assumptions, and probably considering multiple possibilities.  Is this a way to use the time in community?

When questions follow a similar pattern but vary in challenges and context, it becomes demotivating to decode them.  Over time, people will lose interest in reading unclear questions, leading to fewer responses and missed opportunities for meaningful discussions.


What's happening at other end?

People who want to collaborate and assist will take the time to read the question.  But, when a question is too vague to understand, they often give up.  And, they feel bad for doing so.  The question remains unanswered.

When a solution is provided, there's often no update on whether it worked or not.  Those who contributed keep checking back, only to find no response.  Is that fair to those who invested their time to help?  Would you feel good if you were in their position?

Remember, in the community one can't buy someone's time to listen or solve a problemIt has to be earned!  And, it has to be earned every single time.




How to Frame and Post the Question?


There is no one excellent way of doing it.  Then, what should I do to post my questions?

  1. Know the community.
  2. Read through the questions shared earlier in that community.
  3. Look at the questions that have found resolutions, acknowledged and accepted.
    1. Observe how the question or problem experienced is described.
    2. Look at the details shared and how it is shared.
    3. Look at the context details shared and how it is shared.
    4. Observe how the interactions and conversation is taken forward from the two sides.
      1. Closely notice the words and how the energy is kept high in both ends and how each side is pushing for it.
      2. Importantly, look at the time taken to respond from both ends.

I do not want to share an example or reference saying this is the way to do it.

You figure out for your problem and to its context.  Share the minimal information said above and ask for the help.  Consistently improvise on how you ask, share and describe the problem.




Thursday, November 7, 2024

Functional Testing Is Must In Performance And Security Testing

 

I'm sharing about how I missed to test for functionality while I was immerse focused on testing for performance of a Stored Procedure.  I was unhappy for a couple of days as I missed something that I practiced for years.  

I'm glad for reinforcing this learning with much more awareness into my testing's MVT and MVQT, now.


Context of Testing


A Stored Procedure was optimized for better execution time.  No change in the functionality.  This part of the system is not touched for a long time (years?).  There was no change in functionality here for long time (years?).  The time taken by SP was of concern.  I was asked to test for the optimization.

The complicated area, here, is the test data to use.  It took me days, for identifying and building the test data to test this optimization by mimicking the production incidents, use cases, and data.

When I got the test data ready, it was the fourth day of my testing this change.



Where Did I Go Blind By Being Focused?


The test data that I prepared is solely for the evaluation of the execution time.  This test data helped to test functionality as well.  But, my focus was on evaluating performance not functionality from this test data.

The change in SP did impact the functionality.  I was supposed to use the large data range to test for functionality of this feature which includes two SPs.  But, the task assigned was to test just one SP which is optimized.  I got blind here!  

Are you asking, what is the impact of this functional problem?
  • In the one complete business work flow, this functional problem added the same data into different sets in the subsequent iterations.  Redundant Data -- This is not an expected behavior.

I just spoke performance, traces, data I/O and execution time, because that was a pressing problem.  Why?  That was the objective given to me.  

My testing mission fell short in redefining this objective.  If I had redefined it, I would, have added functionality in the better scale.

If I had redefined it, I would have pulled the other SP into functional testing which is also part of this feature's work flow.  These two SPs are expected to handle the data by eliminating the redundancy.

It was a simple test, but, I did not include/had that in my testing mission that day.



Why Did I Go Blind?


The performance test blinded me for functionality, as I saw the basic functional flow looked functioning.  But, the data count was going wrong when a bigger data range is used in the context.  

See here, how stupid I was in my testing!  I'm testing for a SP that has a change as part of its optimization for execution time.  I never brought the functional testing in.  Why?  I focused on the testing objective.

I just looked into one SP that is optimized.  I did not look the other SP which has to work along with this SP later to complete functional flow of the feature.  Why?  How is that even possible?  I was asking myself this.  I see, this is okay from the perspective of the testing objective I had.  But, not okay from the perspective of a test engineer who is supposed to think the impact and prevent the problems.

My immersed and concentrated focus on performance and its related activities on a SP for four long days did not let me see this.  



What Am I Saying Here?


While I have tested for DBs and ETL systems for years, I did not use my learning here.  What is that learning?
When there is a change in any part of the ETL, SP or DB of a system, testing for the functionality for the business workflow is equally important.  Vary the data dimensions and evaluate the counts.

I was completely hooked into the execution time and the test data while switching between the environments for four days.  The chaos in data between environments is something that misleads easily.  I fell to it this time.

I say to myself, if it is a fix for the performance optimization or a security [or any quality criteria], testing for functionality is equally important and of priority as running the tests for performance or security.

When a DB layer is picked for fixing and optimization, testing for functionality in a equal scale is must.  There is a change in the code or/and infrastructure and it has to be noted with additional attention.

To add on this, this time, I did not go through and analyze the SP.  I took this call from the test team.  This call of me costed and had a major part in letting me not to think of functionality.

My fellow colleague ran a test with varying data size by completing the business workflow and observed the problem, and informed me.  I give the credit to Sandeep.

If I had brought this performance test under the automation, I would not have done this.  Why?  I will evaluate and assert for each data returned for different sizes.  I did not automate here and there was no need for it in this context.

Redefine the testing objective that you have got; it helps when you see the model of a system and test.


Respect all the fix and suspect all the fix.  This helps in a longer run!  



Monday, January 22, 2024

RAAMA: My Test Discovery Model

 

RAAMA -- I Look at You Everyday!


I have tried to put up one of my Test Discovery models in a conceptual way here with name RAAMA - Refer to, Arrange, Action, Monitor, and Assert.

Maybe this model helps you and your test engineering team as it is helping me.  Use this to your context with addition or subtraction for what you are seeking.

I refer to this RAAMA of me everyday and when I'm testing.  I'm finding the new learning and realization everyday that I was unaware earlier.  My understanding of RAAMA is not same what I had on the previous day.

My understanding of this RAAMA is incomplete and I have made PeACE with it by accepting it.  My understanding is growing and getting better everyday.  I will share a better version of it as I experience it.

Each time I look up to RAAMA and refer to it, I see a new dimension to RAAMA.  The awareness, exposure, and the questions are getting better giving the better realization of what I was ignorant and unaware.  The RAAMA is exposing me to be a better test engineer today than what I was earlier.



RAAMA - I Look at You Everyday!





RAAMA - One of my evolving models for Test Discovery


Note: I have not explained in detail what I mean for each node and its sub-nodes.  I can talk and discuss it with you if you look for it; I'm just one email away to get started.



Tuesday, November 28, 2023

Behind the Every Test Data, There is a ?!

 

Read this blog post to have a perspective about the Test Data and Test Data Management.  The point is, if I'm not aware of a test and what does it tell me to explore, I cannot think of a Test Data.

That said, if I know what I should be evaluating as part of performance, why, when and how, this will help me to come up with a thought for identifying the tests and its test data for the same. 

The ninth question from season two of 100 Days of Skilled Testing is:

What role does data management play in performance testing, and how do you ensure the availability of suitable test data sets?


Testing and "Ensure"

We test and have tests in testing, because, there is no "sure" and "ensure" idea in software.  But, we presume on a rational basis upon, "if, these are this", in a given context when the software processes.

Now, ask yourself, how can we ensure the availability of suitable test data sets?

In my opinion, the Test Data is often misunderstood.  This is the primary problem and should be the first problem, when asked "what are the challenges in creating the test data?".

When you read the concluding lines of this blog post, you will learn why I say this.


Test Data and Immunity

In my opinion and experience in practicing the Test Engineering, I see, the Test Data should be a viral strain and it should have its variants.  When this test data is used to test [experiment, test investigate, and debug], how do the software and its ecosystem respond?

  • Does the software and its ecosystem is immune to this test data?
    • Does it exhibit any risks and problems?
      • If yes, then, do the purpose of my testing and automation is accomplished with this test data?
This puts me back to question, what is the purpose [intent] of my test?  It drives me to derive the test data which helps me to know -- What am I supposed to learn and on priority?  With this, I get an idea for what kind of test data I should be creating knowing its pattern.

If the system is immune to Test Data and not reveling anything new in the context, I classify this pattern of test data as "Immune" to the context.

In my practice and research work in Test Engineering and Software Testing, to start, I categorize Test Data into two areas.
  1. Immune
  2. Not Immune
Further, I have categories, under these two, where I classify the Test Data deterministically for the context.   Get in touch if you want to learn more about this.  I'm just one ping away!

The tests should help me to evaluate for the immunity and also non-immunity; both are essential and necessity.  

The credit is to me for such classification of Test Data.  It is my research work out of my practice.

Note that, Test Data is not just the input [characters or files] entered or given to a system.  Test Data has its association to tech stacks, infrastructure, ecosystem, business workflows and people.  To craft such Test Data, one has to have the understanding of the system and its internals, and, the problem it solves by knowing how it solves.



Performance Testing and Test Data

  1. What is that I'm testing as part of performance?
  2. What do I want to evaluate in the name of performance?
  3. What part of the system is evaluated for its performance?
    • Should I evaluate this in isolation or as a wholeness of the system?
  4. What domain knowledge and information I should have when testing for performance?
  5. What system's architecture and internal details I should understand and be aware to test for performance?
  6. Is this the first delivery?  Or, do we have this system running in the production?
    • If it is first delivery,
      • How will I create the test data to suit the consumers of this application?
      • What are the key workflows of business that we should be evaluating for its performance?
      • Do all workflows and sub-systems need the evaluation for performance, and on priority?
      • How do I map the fragmentation of users and their data [with its patterns]?
      • What are the infrastructure and ecosystem characteristics that should be part of the test data identified?
      • Does caching have any effect if the same pattern of data is used?
    • If it is a running version in production
      • Can I refer to the DB to figure out the pattern for the particular workflow that I'm evaluating?
      • How can I match the test data to have the production data's characteristics and attributes?
  7. What is the backup strategy for the Test Data?
    • How do I version control the Test Data?
    • Which version of the Test Data I should be using?
  8. What is the threshold I'm targeting with Test Data?
    • What should be the size of the data in DB when I make the IO and RW operations?
    • What should be the network capability when I make the IO and RW operations?
    • What should be the hardware capability when I make the IO and RW operations?
    • What should be the geographical traffic and its pattern when I make the IO and RW operations?
    • More of such factors will be considered when identifying and deriving the test data.
  9. What is the client error yielding Test Data that I should have for the workflow?
  10. What is the server error yielding Test Data that I should have for the workflow?
  11. What is the redirection yielding Test Data that I should have for the workflow?
  12. What is the no-response and no-change Test Data that I should have for the workflow?
And, more.  It is simple; get in touch to discuss and know beyond the listed.



To conclude and stop here, all these questions, do not ensure or assure or make sure that I will have test data for evaluating a characteristic of performance.
  • It helps me to know:
    • What are the tests I should be doing?
    • What kind of preparation I should be having in my practice to create the Test Data for these tests?

The, Test Data should challenge the available Testability and its limits.  If it is not doing, then, we are having a test data no doubt about it; but, it is of shallow. Shallow!?

One has to ask self, "Is this sufficient enough and effective Test Data for the system [and workflow] I'm testing?"

The, Test Data should drive the engineering team to add more layers of Testability into the system.




Sunday, November 19, 2023

MVQT: The Testing and Tests with a MVP's Perspectives


I was leading multiple teams and its delivery in a testing service company.  Then, I came up with this thought -- Like MVP, I also have the MVT (Minimum Viable Tests) for a MVP.

Further, I expanded this thought in my day-to-day practice on tailoring to different contexts. I'm observing that it is applying well to the different contexts when I tailor it to the contexts.  After experimenting it for 10 years, I'm sharing this as a blog post.


What is a MVP?

I take this from Eric Ries.  It looks simple and precise to me.

The Minimum Viable Product is that version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort.

I see this technique [and a concept] can be applied to anything when I'm developing.  As a test engineer, I develop the tests and test code in major as part of my testing.  On applying the idea of MVP to my testing and deliveries, I see the value and result.

Reading this blog post of me to know who is a developer.


Testing, Tests, MVP and MVQT

In software test engineering, I see the MVP as Minimum Viable Questioning Tests.


The Minimum Viable 'Q' Tests (MVQT) for a focused area of a feature [or to a feature]

  • Helps me to identify the priority tests that should be executed for first
  • Allows me to learn information on priority which matters critically to product and stakeholders
    • So that a informed decision can be made.


The Q in MVQT stands for "questioning".  I read it as Minimum Viable Questioning Tests.  I see the "Q" as a placeholder for the Quality Criteria.  That is, MVFT means Minimum Viable Questioning Functional Tests to a feature or a workflow.




The MVQT are key to know:

  • Have I identified and designed the priority tests?  How do I know that I have got them?
  • Did stakeholders get the information which they wanted to know on priority?
  • Did MVQT help me to
    • Explore and know what I wanted to know about a feature or a workflow?
      • How fast I was here to know and learn this?
      • How did I develop my tests incrementally?  Did I?  If not, then, is it a MVQT?
  • Did MVQT help me to know
    • Am I aligned and in sync with expectations of my stakeholders and customers who are using the software product I'm testing and automating?
  • Did the MVQT help me 
    • In collecting the critical information in a given context for the scope of testing and automation?
    • Do the learning and outcome from this MVQT help to reinforce the validated learning of customer?
  • Do MVQT result support the outcome of Unit Testing result?

The tests in MVQT has to be consistently revised and evaluated to keep it as a MVQT.  Note this, not all tests are MVQT.  If the number of MVQT is growing to a part of feature or to a feature, it is time to think about what is MVQT for you.

The "minimum" tests are highly effective and it helps me learn and test better technically and socially.



MVQT and Testing

  • Sanity or Smoke Tests
    • The set of MVQT which helps me learn can the build be taken further testing
  • MVFT - Minimum Viable Questioning Functional Tests
    • Apply this to a feature or a workflow or to that part which can be evaluated with minimum tests for its functionality
      • To update is this aligning to the validated learning of customer [stakeholders]
  • MVPT - Minimum Viable Questioning Performance Tests
  • MVUT - Minimum Viable Questioning Usability Tests
  • MVAT - Minimum Viable Questioning Accessibility Tests
  • MVTxT - Minimum Viable Questioning Tester's Experience Tests
  • MVST - Minimum Viable Questioning Security Tests
  • MVAF - Minimum Viable Questioning Automation to a Feature
  • MVLT - Minimum Viable Questioning Localization Tests
  • MVUIT - Minimum Viable Questioning UI Tests

You add more of this to your list and context.

In a way, MVQT should ask and look for the testability, automatability and observability.  If this is not happening, then there is no possibility of saying I have got my MVQT.

More importantly, in the CI-CD ecosystem, MVQT pays a major role.  If I should have my tests in the  CI-CD pipeline, then, the MVQT is the way and it focuses on a targeted area to evaluate it.  Else, it is hard, impractical and not possible to test in CI-CD eco system by delivering continuously.


Ask and Review for MVQT

Ask for MVQT, when you review these:

  • test strategy, test framing, test design, test ideas, test cases, test plan, test architecture, test engineering, testing center of excellence, and test code

For example,

  • What is the minimum viable questioning performance tests that you have got to test this feature?
  • What is the minimum viable questioning performance tests that you have got to test this workflow?
  • What is the minimum viable questioning security tests that you have got to test this feature?
  • What is the minimum viable questioning GUI tests that you have got to test this feature?
  • What is the minimum viable questioning contract tests that you have got to test this end point?
Likewise, What is the minimum viable questioning automation tests that you have got to test this feature?

Ask how these tests qualify as MVQT in this context of testing and automation?

This should help you to see how effective is the test strategy in a given context.

Importantly, the MVQT and its effectiveness is a testability to test your tests.



The Credit is to Me

I'm not sure if the idea what I'm saying here in this blog post is practiced by other test engineers.  I have not seen this being discussed about it in public forum.  I have not come across it in my awareness and to the exposure I have put myself.

Hence, I will take this credit to me.  Giving the credit honestly is not a common sight and practice.  I have not got my due credits for using the ideas, thoughts and work that I have come up with.

So, I make it as a open letter and call out that credit for this idea, thought, concept, and practice will be to me when you listen, use and practice it.



Saturday, October 21, 2023

The "Bottleneck" in a Test Engineer's Eyes

 

Preference to Bottle Over Jar! Why?


Have you heard Jar Neck anytime when describing a problem or solution?

  • I have heard Bottleneck often and consistently; but, not Jar Neck .  Why? 
  • Be it in Software Engineering or day-to-day life problem solving description,
    • The Bottleneck is referenced and not a Jar Neck.

Looks like people want Bottle but not the Bottleneck speed and benefits.  Bottle without its neck is a jar?!



Bottleneck exist for better controllability
.

  • In a bottle, the bottleneck is a solution!  It is not a problem!
    • It is to mitigate any risk and problem that arises from the flow of content in the bottle.

Yet we describe, learn and communicate the neck of a bottle as a relativity and analogy to a problem.  


Are you aware of Gateway in the software system?

  • The Gateway can be seen as a neck of a bottle which controls the incoming requests and outgoing response.
  • Gateway is a necessity.
    • We need Gateway to be adaptable in size of its neck based on traffic volume it is handling.  Here, the gateway's neck size should adapt and scale contextually.
      • When describing a problem, we are talking about how this bottleneck size which is not adaptable for the context.
      • That adaptability has to be built in engineering to scale in any dimensions and magnitude.
        • When this is not done, we equate the software system's problem to a bottleneck as a analogy, which is incorrect!  The bottle has got its size and its neck size fixed for a purpose and as a solution.
          • The context of a bottle and today's any systems are different.
            • It is good to draw similarities from General Systems Thinking and observations.
            • But the solution cannot be generic to all systems; it has to be contextual.  The software system has to have its contextual solution.


So, next time when someone in your team or network talk about bottleneck, do share them bottleneck is for better controllability.  Having a contextually resizable and adaptable bottleneck is the need for Software Engineering; not the elimination of bottleneck.

In fact, a software system should have and will have a bottleneck in a point.  And, this bottleneck will be adaptable to the context for having what it should let through and process.

Is the runway of an airport a bottleneck when it is compared to a sky?  Is that a solution or a problem?  Likewise, the ship will have a defined route path and it does not sail without a route path.  Is this a bottleneck to ship and its business?  A elevator can accommodate the defined number of people or kilograms allowed, and not beyond that to move.  Is that a bottleneck?  The esophagus in human body has a size which medical science observes as normal and acceptable; any deviation from that size measurement, the medical science test investigates it as a risk and problem. Why?  Is the circumference size and length of esophagus a bottleneck to human anatomy and physiology system?

The engineering solution will and should have a bottleneck at a point.  Having a adaptable bottleneck to the context is one what tries to accomplish in a software system's scalability and operability.


Please, do not equate solving a "bottleneck" situation with Agile practice.  Does it look like a joke?  I will not be surprised if someone says bottleneck problem is solved if practiced Agile.


Wednesday, October 11, 2023

Prioritizing Performance & Its Requirements - The Two Engineering Tasks

  

How do I gather and prioritize the performance requirements of a student from schools, colleges, universities and society? 

Note that, I said performance.  What do performance mean in schools, colleges, universities and society?

  • Any time, you asked this question to self?
  • If you are living with children, did this question cross your mind, no matter in what class the children are studying?

This is not the question which can be ignored.  Also, this question is not precise and to the context.  It is the question that is resonated but has no acceptable rational base for whatever context from which it arises.  The same when it comes to performance requirements of a software system.

If you observe closely, the system in which we live, it pushes towards performance for what it thinks as a performance.  Isn't it?


The third question in season two of 100 Days of Skilled Testing is:

How do you gather and prioritize performance requirements from stakeholders and project documentation?



Prioritizing Performance & Its Requirements


If you read attentively, the title of this section says - prioritizing performance and its requirements.  I did not say, prioritize performance requirements.

That said, what is performance for Netflix is not same for the  Aadhar system.  But, both have prioritized the performance and looks to be aggressive in knowing the requirements for same.  Don't you think so?

We're in the timeline of Shift Left.

  • How to shift with performance to left? 
  • What all in the system should focus on the performance in the Shift Left?


MVP & Performance Engineering Story


When we are going to take MVP as early as possible to market, there is a tradeoff.  What are considered in subsequent priorities which will be compromised on negotiation by engineering and business?

The context matters when prioritizing!  Be it for performance or functionality or security or any quality criteria.

I will interpret the question asked from the point view of a MVP's deployment or publishing.

  • Are you asking why I have picked MVP?
    • The performance is contextual and it is based out on multiple touchpoints, its boundaries and interfaces of a system.
    • I cannot talk on all those in this blog post.
    • Neither, I want to talk about the KPIs and metrics.  
  • I want to share which you can pick, consume, and apply in your work.


Now, we have prioritized performance for a MVP.  Aren't we?  Prioritized means, it matters, it concerns us and we are okay to compromise on few for it.  Let us jump to Left with MVP in our hands to identify the requirements of business.

As a business, we will have a rough idea on how we are pitching and selling our services and to whom.  As a test engineer, you can sense what is the key transaction [business work flow] in the MVP.  You will know the touchpoints, interfaces and boundaries in the architecture that communicates and work together to keep MVP delivering the value.  Don't you?

Say the business wants the MVP to support and serve 500 requests per second.  I should know about the 500 requests here.

Is it,

  • Concurrent Requests?
  • Active Concurrent Requests?

This matters!  Both are not the same.  Have you asked this question?  It is a requirement we miss to capture.



Capturing Performance Questions for a MVP

It is about the awareness for first!  How much am I opening up myself to the awareness?  This brings an energy and it is contagious.

How do I bring the performance awareness in my team so that it is engineered into the system we develop?  This is a culture drive to an organization.

Now, I know, the MVP has to serve 500 concurrent active users in a second per business's expectation to meet its reach and target.  If I do not know this, I have to capture this data, for first.  How do you capture?



A Use Case to Ponder

One use case which would trigger the spark of thinking is - How should Disney+ Hotstar's services perform to live stream the India vs Pakistan Cricket World Cup 2023 on 14th Oct 2023?

  • How should it capture the performance and its requirement for this day?
    • How should this system scale to crores of viewers streaming the live video of the match?
    • How should this system scale for the gamification - emoji, chat and other viewers engagement during the crores of viewers making requests from client interface?
Try to play the past 30 minutes of this live video? What did you see? Why? That is part of the performance engineering strategy!

This use case opens up the different topics of Architecture and Performance Engineering. Be aware of it and explore on them.  This is not what we want to talk, now.  We want to talk on a MVP and how to capture its requirements for having better performance experience.



Step Up by 5% Heuristic


On having a test environment which is close to the production context and the test data that looks realistic, I get started.

I framed this "Step Up by 5% Heuristic" after few months on starting the practice of Performance Engineering and Testing.  I failed, and I learned. I'm learning.

I know, the expectation is 500 active concurrent users per second.

I will start to evaluate the integrated systems of the MVP for the 5 percent of 500.
  • What is the 5% of 500?
    • I will start with 25 concurrent active users requests for the MVP.
      • I will observe the emotional experience of when using the MVP during this time.
      • I will monitor and record the KPIs, and other needed data.
    • Does it fail to serve 5% of concurrent active users?
      • If failed, I know what to do now.  It helped me.
        • This helps me to draw the requirements better and rationally for the existing system's architecture.
      • If it succeeds, it helped me partially in knowing what actually I wanted to know.
        • I will raise the active concurrent users to 10%
        • That is, increasing it by 5%
          • I repeat these tests until the MVP architecture lets me know about the requirement it needs for the performance of serving 500 active concurrent users in  a second
            • Read the above sentence, again
            • The tests on MVP will let me learn what are its performance requirements for serving active concurrent 500 users in a given architecture, infrastructure, and tech stacks


Beyond by 37% Heuristic


I framed this "Beyond by 37% Heuristic" after I failed in framing the tests for performance.  Talking the rationale of this heuristic is not the purpose this blog post.  Let us catch up if you are curious and interested, we will discuss on this.

Do a salary hike of 30% indicate a high performance?  I don't know!  But 30% hike is something not commonly given to all, is what I see in my career so far.

That said, this 37% has worked for me in the contexts I'm testing.  Did it serve 685 (500 + 185) active concurrent users in a second?  It helps me to draw a requirement analysis of the MVP system for this volume of concurrent active users.

Now, I will step up by another 37% of concurrent active users. That is, 870 (685 + 185) active concurrent users in a second.
  • If seen, I have 1.5x traffic now.  Did it serve?
    • If yes, how many active concurrent users were served in a second?  
    • I will correlate the KPIs of other integrated systems of a MVP.
      •  With the captured data and emotions
        • This will tell, what should we expect despite what is the expectation from business
        • This difference will let us know "the requirements"
          • How to gather information on -- what has to be optimized, changed, reorchestrated, eliminated, included, and more.
          • We start technically in establishing and framing the Performance SLAs and SLOs between the tech team and business.
          • Now the performance & its requirements will appear in the dots that are,
            • Being connected
            • To be connected
            • To be disconnected
            • That does not exist


To conclude, shift wherever, take the performance engineering together! Revive its requirements to be healthier!



Note: You should read these blog posts if you have not:
  1. Performance Testing: Unspoken KPIs and The Missing Correlation
  2. Architecture: The Common Shared Understanding -- Part 1
  3. Architecture: Its Aid in Performance Engineering -- Part 2