Monday, October 16, 2023

Performance & Tests: Getting Started and Data Analysis

 

On running tests,

  • We will have data (information) as one of the byproduct.
  • Analyzing the data of the integrated sub-systems in isolation and correlation,
    • It will lead us to a technical analysis on each integrated system.
In the report, we draft this analysis along with actions to be taken.

Note: When said sub-systems do not ignore or skip the client or consumer; the system does not comprise just server.


No Golden Rule

There is no one way to do a testing.  Likewise, there is no one way or the golden rule to test for performance.  It is contextual and depends on what I want to learn.

In fact, in few contexts, we can have a value adding performance test with just one request.  Just, I should be well aware of -- what is that I want to know and learn from this test.

That said, there are multiple interfaces where we can observe, analyze and learn from the performance data collected.

The fourth question from season two of 100 Days of Skilled Testing, is:

What are your favorite hacks to analyze performance testing results and find anomalies?

Well, this question do not mention explicitly if it is for server or client or database or caching or messaging or for what interface of a system.  It is a question; but, to me it looks too generic and at a point it looks vague.  Having said this, that is how the learning journey and curve starts! 


Result vs Report

What is a result?

  • Is it an evaluation after a data [information] is put to scrutiny?
  • Or, the result is a data that is collected and not yet interpreted?

It depends on individual or team and how it being practiced.

The result is different from a report.


Getting Started and Data Analysis


I should know how the system architecture is designed and orchestrated with its boundaries and interfaces.  This helps a lot.  What kind of architecture is this?  Is it a monolith?  If it is monolith, my approach to test for performance differs.

If I'm asked to start the analysis of data for a system that I'm not aware of,
  • I will start by analyzing the below indicators on knowing the architecture and the orchestration of the sub-systems for critical business workflows
    1. CPU usage
    2. RAM usage
    3. Data I/O
    4. Network usage
    5. The Heat and sound dissipated from the hardware which holds and binds
      • CPU, RAM, Data I/O, Network and tech stacks installed and configured

It hints me to look further and test investigate, when I observe:
  • Having a steady consumption
    • What is steady consumption in this context?
  • Having a low consumption
    • What is low consumption in this context?
  • Having a unusual consumption spike and fall of it
    • I follow the pattern to study further
    • What is considered as knee, spike and fall, in this context?
  • Having a zero consumption
  • Having a maximum consumption
    • What is maximum consumption in this context?
Having a high consumption doesn't mean a problem.  Likewise, having a low consumption does not mean all is well.  I have to uncover them to learn what it means in the given context.

In each of this, there will be a pattern.  I will learn them.  I will correlate with other sub-systems and learn what they were doing in the said timeline.

Do you recollect this line -- "the architecture should provide the Testability"?
  • I wrote about it in one of the blog posts of Performance Engineering.

I refer to the below by traversing with the timeline,
  • The logs by asking for it
  • Data recorded
  • Any APMs that are in place
I correlate all these with above said indicators.

This gives me a start. It is one of easiest start that I can have to get started with analysis.


Well this is to analyze at the server end.  What about the client [consumer] end?  It is simpler and will share in the coming blog posts.



Do you want to know more on this and other strategies that can be used contextually?  Let us get connected and converse.  I'm happy to share and learn on listening to you.  It is fun and awareness!



Wednesday, October 11, 2023

Prioritizing Performance & Its Requirements - The Two Engineering Tasks

  

How do I gather and prioritize the performance requirements of a student from schools, colleges, universities and society? 

Note that, I said performance.  What do performance mean in schools, colleges, universities and society?

  • Any time, you asked this question to self?
  • If you are living with children, did this question cross your mind, no matter in what class the children are studying?

This is not the question which can be ignored.  Also, this question is not precise and to the context.  It is the question that is resonated but has no acceptable rational base for whatever context from which it arises.  The same when it comes to performance requirements of a software system.

If you observe closely, the system in which we live, it pushes towards performance for what it thinks as a performance.  Isn't it?


The third question in season two of 100 Days of Skilled Testing is:

How do you gather and prioritize performance requirements from stakeholders and project documentation?



Prioritizing Performance & Its Requirements


If you read attentively, the title of this section says - prioritizing performance and its requirements.  I did not say, prioritize performance requirements.

That said, what is performance for Netflix is not same for the  Aadhar system.  But, both have prioritized the performance and looks to be aggressive in knowing the requirements for same.  Don't you think so?

We're in the timeline of Shift Left.

  • How to shift with performance to left? 
  • What all in the system should focus on the performance in the Shift Left?


MVP & Performance Engineering Story


When we are going to take MVP as early as possible to market, there is a tradeoff.  What are considered in subsequent priorities which will be compromised on negotiation by engineering and business?

The context matters when prioritizing!  Be it for performance or functionality or security or any quality criteria.

I will interpret the question asked from the point view of a MVP's deployment or publishing.

  • Are you asking why I have picked MVP?
    • The performance is contextual and it is based out on multiple touchpoints, its boundaries and interfaces of a system.
    • I cannot talk on all those in this blog post.
    • Neither, I want to talk about the KPIs and metrics.  
  • I want to share which you can pick, consume, and apply in your work.


Now, we have prioritized performance for a MVP.  Aren't we?  Prioritized means, it matters, it concerns us and we are okay to compromise on few for it.  Let us jump to Left with MVP in our hands to identify the requirements of business.

As a business, we will have a rough idea on how we are pitching and selling our services and to whom.  As a test engineer, you can sense what is the key transaction [business work flow] in the MVP.  You will know the touchpoints, interfaces and boundaries in the architecture that communicates and work together to keep MVP delivering the value.  Don't you?

Say the business wants the MVP to support and serve 500 requests per second.  I should know about the 500 requests here.

Is it,

  • Concurrent Requests?
  • Active Concurrent Requests?

This matters!  Both are not the same.  Have you asked this question?  It is a requirement we miss to capture.



Capturing Performance Questions for a MVP

It is about the awareness for first!  How much am I opening up myself to the awareness?  This brings an energy and it is contagious.

How do I bring the performance awareness in my team so that it is engineered into the system we develop?  This is a culture drive to an organization.

Now, I know, the MVP has to serve 500 concurrent active users in a second per business's expectation to meet its reach and target.  If I do not know this, I have to capture this data, for first.  How do you capture?



A Use Case to Ponder

One use case which would trigger the spark of thinking is - How should Disney+ Hotstar's services perform to live stream the India vs Pakistan Cricket World Cup 2023 on 14th Oct 2023?

  • How should it capture the performance and its requirement for this day?
    • How should this system scale to crores of viewers streaming the live video of the match?
    • How should this system scale for the gamification - emoji, chat and other viewers engagement during the crores of viewers making requests from client interface?
Try to play the past 30 minutes of this live video? What did you see? Why? That is part of the performance engineering strategy!

This use case opens up the different topics of Architecture and Performance Engineering. Be aware of it and explore on them.  This is not what we want to talk, now.  We want to talk on a MVP and how to capture its requirements for having better performance experience.



Step Up by 5% Heuristic


On having a test environment which is close to the production context and the test data that looks realistic, I get started.

I framed this "Step Up by 5% Heuristic" after few months on starting the practice of Performance Engineering and Testing.  I failed, and I learned. I'm learning.

I know, the expectation is 500 active concurrent users per second.

I will start to evaluate the integrated systems of the MVP for the 5 percent of 500.
  • What is the 5% of 500?
    • I will start with 25 concurrent active users requests for the MVP.
      • I will observe the emotional experience of when using the MVP during this time.
      • I will monitor and record the KPIs, and other needed data.
    • Does it fail to serve 5% of concurrent active users?
      • If failed, I know what to do now.  It helped me.
        • This helps me to draw the requirements better and rationally for the existing system's architecture.
      • If it succeeds, it helped me partially in knowing what actually I wanted to know.
        • I will raise the active concurrent users to 10%
        • That is, increasing it by 5%
          • I repeat these tests until the MVP architecture lets me know about the requirement it needs for the performance of serving 500 active concurrent users in  a second
            • Read the above sentence, again
            • The tests on MVP will let me learn what are its performance requirements for serving active concurrent 500 users in a given architecture, infrastructure, and tech stacks


Beyond by 37% Heuristic


I framed this "Beyond by 37% Heuristic" after I failed in framing the tests for performance.  Talking the rationale of this heuristic is not the purpose this blog post.  Let us catch up if you are curious and interested, we will discuss on this.

Do a salary hike of 30% indicate a high performance?  I don't know!  But 30% hike is something not commonly given to all, is what I see in my career so far.

That said, this 37% has worked for me in the contexts I'm testing.  Did it serve 685 (500 + 185) active concurrent users in a second?  It helps me to draw a requirement analysis of the MVP system for this volume of concurrent active users.

Now, I will step up by another 37% of concurrent active users. That is, 870 (685 + 185) active concurrent users in a second.
  • If seen, I have 1.5x traffic now.  Did it serve?
    • If yes, how many active concurrent users were served in a second?  
    • I will correlate the KPIs of other integrated systems of a MVP.
      •  With the captured data and emotions
        • This will tell, what should we expect despite what is the expectation from business
        • This difference will let us know "the requirements"
          • How to gather information on -- what has to be optimized, changed, reorchestrated, eliminated, included, and more.
          • We start technically in establishing and framing the Performance SLAs and SLOs between the tech team and business.
          • Now the performance & its requirements will appear in the dots that are,
            • Being connected
            • To be connected
            • To be disconnected
            • That does not exist


To conclude, shift wherever, take the performance engineering together! Revive its requirements to be healthier!



Note: You should read these blog posts if you have not:
  1. Performance Testing: Unspoken KPIs and The Missing Correlation
  2. Architecture: The Common Shared Understanding -- Part 1
  3. Architecture: Its Aid in Performance Engineering -- Part 2