Friday, October 27, 2023

The Actual Performance Bottleneck is a Test Engineer's Awareness & Practice

 

The bottlenecks are necessity.  We cross bottlenecks everyday; just, we do not observe it.  If you have not read my today's thought on the bottleneck, read it here.  

The seventh question from season two of 100 Days of Skilled Testing is:

What are the common performance bottlenecks, and how do you go about pinpointing and resolving them?


I have a different opinion and thoughts to share on this question.  It may look weird, but, I see this is the reality.

Let me share what happens to a practicing Test Engineer when going through a bottleneck in practicing a subject to upskill.

  • The time taken to accomplish tasks and its milestones looks unusually high - the other extreme end.
    • Some attempts will not even kick off on entering the bottleneck for several reasons. 
      • One of the reason is the fear and what my peers think if I fail to accomplish it.
        • I lose the focus.
  • It leads to lose of interest, unhealthy confidence and the inconsistency.
    • The procrastination kicks in.
    • Without no self-motivation, determination and consistency, we give up; clog and remain in the bottleneck.  One will fail to make use of the bottleneck to scale self.
      • One do not go back or come out of it.
        • We start to blame the bottlenecks.
        • But, we fail to understand that the bottlenecks are part of the system.


What makes one to have a hard time with available bottleneck is,

    • Ignoring the bottleneck.
    • Not being aware of the benefits and impact from the bottleneck's existence.
      • Its effects are adverse in a longer run
      • Likewise, its benefits are immense in a longer run

    If seen, the existence of bottleneck has two extreme ends -- good and not so good.  The bottlenecks cannot be eliminated or eradicated; it can only be managed with awareness and knowing how to adapt and scale through it.



    Me, The Actual Performance Bottleneck

    This may appear as a critical self evaluation.  But, it is not.    In my experience and practice, I'm learning consistently,

    • If there is any difficulty in communicating about the tests and identifying the information from my tests,
      • It is do with me for first.
        • I will have to communicate it and why it is so.
    • If I'm not exposing myself to a subject's area and its practice by putting the subject to test,
      • I'm the bottleneck.
    • Further, the other bottlenecks that I encounter from all other systems, it easily compounds the problem and its impact.

    Did you see the bottleneck is a necessity? It drives me to scale and be operable.  If no bottleneck, may be, I will not expose myself to the awareness to identify and learn the information.

    Call out any testing with a name, for first I will be the bottleneck to myself and to my testing, if I'm not aware of it and not practicing in it.

    For example, I'm asked and taught to do functional testing at a GUI layer.  Why I'm not asked and taught to do the performance and security tests at a GUI layer?  Or, why I do not think of it?  Should someone say me to think of it, practice it, and, do testing for the same?  If the floor and industry do not push me to do so, I will have to open myself to create that opportunity and awareness.  Did you see that, this is my performance bottleneck?

    To test, evaluate and identify [pinpoint] the performance bottleneck in the software system, I will have to practice Performance Engineering & Testing.  Only then, I will be able to identify its existence precisely before pinpointing.  If I do not practice by building and refining the awareness, I cannot explain what I'm pinpointing.

    I pinpoint myself [the bottleneck] first, if I identify wisely what is the performance bottleneck.  If I fix my bottlenecks, I will be able to identify bottlenecks in the system that I test.

    Let me practice the performance engineering and help my team, and community to practice it.  If not, I will not be in position to name a bottleneck, forget pinpoint it. I will just follow, what others say and think it is right.


    Note: I share the above from my personal experience in testing practice and also from reading [& answering] the performance testing questions that are posted in the communities social spaces.  I see fixing this and practicing better here is important for first, than identifying and solving the bottlenecks in software system.


    Saturday, October 21, 2023

    The "Bottleneck" in a Test Engineer's Eyes

     

    Preference to Bottle Over Jar! Why?


    Have you heard Jar Neck anytime when describing a problem or solution?

    • I have heard Bottleneck often and consistently; but, not Jar Neck .  Why? 
    • Be it in Software Engineering or day-to-day life problem solving description,
      • The Bottleneck is referenced and not a Jar Neck.

    Looks like people want Bottle but not the Bottleneck speed and benefits.  Bottle without its neck is a jar?!



    Bottleneck exist for better controllability
    .

    • In a bottle, the bottleneck is a solution!  It is not a problem!
      • It is to mitigate any risk and problem that arises from the flow of content in the bottle.

    Yet we describe, learn and communicate the neck of a bottle as a relativity and analogy to a problem.  


    Are you aware of Gateway in the software system?

    • The Gateway can be seen as a neck of a bottle which controls the incoming requests and outgoing response.
    • Gateway is a necessity.
      • We need Gateway to be adaptable in size of its neck based on traffic volume it is handling.  Here, the gateway's neck size should adapt and scale contextually.
        • When describing a problem, we are talking about how this bottleneck size which is not adaptable for the context.
        • That adaptability has to be built in engineering to scale in any dimensions and magnitude.
          • When this is not done, we equate the software system's problem to a bottleneck as a analogy, which is incorrect!  The bottle has got its size and its neck size fixed for a purpose and as a solution.
            • The context of a bottle and today's any systems are different.
              • It is good to draw similarities from General Systems Thinking and observations.
              • But the solution cannot be generic to all systems; it has to be contextual.  The software system has to have its contextual solution.


    So, next time when someone in your team or network talk about bottleneck, do share them bottleneck is for better controllability.  Having a contextually resizable and adaptable bottleneck is the need for Software Engineering; not the elimination of bottleneck.

    In fact, a software system should have and will have a bottleneck in a point.  And, this bottleneck will be adaptable to the context for having what it should let through and process.

    Is the runway of an airport a bottleneck when it is compared to a sky?  Is that a solution or a problem?  Likewise, the ship will have a defined route path and it does not sail without a route path.  Is this a bottleneck to ship and its business?  A elevator can accommodate the defined number of people or kilograms allowed, and not beyond that to move.  Is that a bottleneck?  The esophagus in human body has a size which medical science observes as normal and acceptable; any deviation from that size measurement, the medical science test investigates it as a risk and problem. Why?  Is the circumference size and length of esophagus a bottleneck to human anatomy and physiology system?

    The engineering solution will and should have a bottleneck at a point.  Having a adaptable bottleneck to the context is one what tries to accomplish in a software system's scalability and operability.


    Please, do not equate solving a "bottleneck" situation with Agile practice.  Does it look like a joke?  I will not be surprised if someone says bottleneck problem is solved if practiced Agile.


    The Internal Metrics of High Performance Mobile App

     

    Do you have a question -- What are KPIs and Metrics and the differences between them?  Then read this blog post.  It will help in knowing the differences and how each help the other mutually and remain exclusive.

    The sixth question from season two of 100 Days of Skilled Testing is:

    How do you determine the essential performance metrics and Key Performance Indicators (KPIs) when assessing the performance of a mobile native app?


    Why I Focus on Metrics Here?

    Here, in this blog post, I will focus on Metrics and not KPIs.  If you have read the above referenced blog post on KPIs and Metrics, you will learn,

    • The KPIs are defined objectives by the business.
      • Knowing this is important.
      • But, to accomplish the KPIs objective, one has to extract the contextually suitable metric from the system and have to evaluate it.

    As a test engineer, I evaluate from technical perspective with an orientation of business thoughts.

    • So, the identifying and defining the metrics makes more sense to my role.
      • Hence, I pick the Metrics.

    A KPI example for a mobile app,

    • The set percentage [what percentage?] of five start rating in store with positive emotions in a review.

    What may be the KPIs, I will have to correlate it with [a] metrics so that we work towards accomplishing the KPIs set.



    What is Performance to Mobile Native App?


    The Native App

    A native mobile app is an app developed for a particular mobile device or platform like Android and iOS.  The native apps developed for one platform cannot be run on another platform.  That is, Android app cannot run on iPhone, and vice versa.

    The Android apps are written in Kotlin, today.  The iOS apps are written in Swift.  The native apps can be developed in a way that it can run both in offline and online mode based on its business objectives.

    An example of the native app which we can easily understand is,

    • WhatsApp for Android and iOS devices


    Performance and Native App

    For first, my mobile device is not my customers device.  The Android device fragmentation makes it much more challenge with computing power offered for devices by OEMs.

    Though, iOS have its fragmentation on OS version and device's computing power, it is not huge when compared with fragmentation of Android devices.  This is a challenge for Android mobile app in all aspects and especially in performance.

    The performance will indicate different advantage, risks and problems to a native app on a platform and its devices.  The performance can be classified into different areas for native mobile apps.

    It can range from, being deep technical areas to the experience a user expresses in using the app in a given context.  Everything is performance here!

    Hence it is not easy when talking about performance in the mobile's app space.  While it is so ambiguous and hard subject, how one can pull the metrics for the performance here?

    From a technical perspective, the word "performance" is not specific; it is vague.  If one says, the mobile app is performing well, what does it mean?

    • Is it fast?
    • Is it consuming low battery power?
    • Is it consuming less memory?
    • Is it not consuming much network?
    • Is it smooth to interact, responsive, and no jank experience?
    • Is it intuitive and secure?
    • Is it number of crashes?
    • Is it having a consistent update and bug fixes?

    What you say?

    All these leads to a question - What is a high performance app?  You ask this question to yourself and to your tech team, business team and consumers.  Figure out what is a high performance app to them.  This helps!


    The Common Metrics - Android and iOS


    Below are some of the common metrics for which a Test Engineer extracts data and evaluate it via testing and automation.

    1. App Install Time
    2. App Launch Time
      1. Cold Start
      2. Warm Start
      3. Hot Start
    3. Private Memory Size of App on available Heap
    4. Number of Views
    5. Garbage Collection - Frequency
    6. Data Residual Size and Sharing
    7. Network Payload Size
    8. Energy Consumption in Workflows
    9. Wakelocks and its Impact
    10. Frames Skipped
    11. Open Threads and Processes
    12. Storage & Data Size
    13. App Size

    And, more.  Each of these are own deep area for analysis and tuning the performance.

    I have been practicing this for last 11 years in my testing.  It is one of my research areas in Software Testing & Engineering

    These days, instrumentation also offers different metrics for the mobile apps which can correlate with the KPIs set for the native mobile apps.



    To conclude, start understanding the internals of Android and iOS.  It opens up you to the performance and practice of high performance apps.

    I'm happy to share if you are interested in practicing these subjects and testing.  Let us connect and converse!


    Wednesday, October 18, 2023

    What are KPIs and Metrics?

     

    I use to have this question - Are KPIs and Metrics the same?  Especially when I started to learn and practice the Performance Testing, this question bounced back to me often.

    Do you have this question?


    The Use of KPIs and Metrics

    When business and stakeholders talk about it so much, there should be a value out of it.  What is that value?  Why it is important to identify and capture the KPIs and Metrics?

    The KPIs and Metrics are derived from data we collect.  These data are processed to extract and normalize, so that, it is in a state as expected by the consumer for making a decision.

    The stakeholders will make decisions and take actions referring to KPIs and Metrics. For example,

    • The number of Daily Active User (DAU) is a KPI and also a metric.
      • But, a metric cannot be a KPI
    • How many of this DAU, closed the transaction within five seconds using a wallet?
      • It is a metric; not a KPI.

    Another example,
    • KPI
      • How many users installed the latest version of the mobile app and have signed in?
        • If there no active users on latest version that indicates a kind of risk and problems to the business.
      • Reopened Tickets in Customer Care
        • This indicates there is something going wrong
    • Metric
      • What is the average time taken to see a streaming screen for users in 4G data network?
        • If this is not captured, there is no data for business to establish a relationship with KPI set.
      • Average Reopened Tickets in a customer service
        • The distribution and time towards lower number
        • The distribution and time towards higher number

    KPIs and Metrics are not the same while both have quantitative measurement. Both are different.  Identifying and knowing the difference between them in your context is important.

    They go hand-in-hand when setting a direction and action.  So that, the business and stakeholders realign to the goals and objectives defined.



    KPIs vs Metrics






    To conclude this post, investigate your metrics and question why the chosen KPIs.  It will help you to design your Test Models and identify the tests in given context.




    Tuesday, October 17, 2023

    Software Engineering - The Unquestioned Understanding of Client in Testing


    Client - The Unquestioned Understanding

    When asked or said "client" or "client-side", most of us assume or take it as:

    • Web page displayed in the browser
    • Mobile apps - Android and iOS apps
    • Desktop applications

    I see this is one of the unquestioned understanding and assumption in the Software Engineering.  While it is not wrong, the client does not always mean -- a web page displayed on browser, mobile apps, desktop applications, a terminal, etc.

    The client is one of the subjects which we have not attempted enough to understand in Software Testing & Engineering.

    A client is one who consumes the service in form of a response, and then does what it has to do.



    Client - The Contextual Entity


    The entity which takes the a client place [role] is entirely based on the context.  That web page displayed on a browser is a client per some model.  So is the mobile apps.  A service looking for data from Redis is a client too.

    The client can be within the backend system which requests another entity to process its request and awaits for the response.  This client is not always a mobile app, web page on a browser, or a terminal where I'm working with commands.  Do you see that?

    Next time when you hear the word client, ask for the context and know who is the client that is being discussed.



    Client's Awareness in Performance Tests


    By now, you should be breaking your assumption and the myths around the word "client".

    In testing for Performance, it is critical to be aware who is client, when and how?  Evaluating this client will not be like evaluating a web page on a browser or the mobile apps.

    The fifth question from season two of 100 Days of Skilled Testing, is:

    How does client-side performance testing contrast with server-side performance testing, particularly in their objectives and area of emphasis?

    Hope now you will question when said client-side performance,

    • What client are we talking here?
    • Where do this client sit in the system's architecture.

    Based on this information,

    • How the tests for performance is approached and executed for a client, differs.
    • What is collected and observed to evaluate the performance of a client, differs.



    This blog post is not to illustrate the different clients and how evaluate it for performance.  When the question talks about a particular client and in a context, I will share the approach, and how to evaluate the same.

    To end, have we explored the clientless interface?  How to test this interface?

      

    Monday, October 16, 2023

    Performance & Tests: Getting Started and Data Analysis

     

    On running tests,

    • We will have data (information) as one of the byproduct.
    • Analyzing the data of the integrated sub-systems in isolation and correlation,
      • It will lead us to a technical analysis on each integrated system.
    In the report, we draft this analysis along with actions to be taken.

    Note: When said sub-systems do not ignore or skip the client or consumer; the system does not comprise just server.


    No Golden Rule

    There is no one way to do a testing.  Likewise, there is no one way or the golden rule to test for performance.  It is contextual and depends on what I want to learn.

    In fact, in few contexts, we can have a value adding performance test with just one request.  Just, I should be well aware of -- what is that I want to know and learn from this test.

    That said, there are multiple interfaces where we can observe, analyze and learn from the performance data collected.

    The fourth question from season two of 100 Days of Skilled Testing, is:

    What are your favorite hacks to analyze performance testing results and find anomalies?

    Well, this question do not mention explicitly if it is for server or client or database or caching or messaging or for what interface of a system.  It is a question; but, to me it looks too generic and at a point it looks vague.  Having said this, that is how the learning journey and curve starts! 


    Result vs Report

    What is a result?

    • Is it an evaluation after a data [information] is put to scrutiny?
    • Or, the result is a data that is collected and not yet interpreted?

    It depends on individual or team and how it being practiced.

    The result is different from a report.


    Getting Started and Data Analysis


    I should know how the system architecture is designed and orchestrated with its boundaries and interfaces.  This helps a lot.  What kind of architecture is this?  Is it a monolith?  If it is monolith, my approach to test for performance differs.

    If I'm asked to start the analysis of data for a system that I'm not aware of,
    • I will start by analyzing the below indicators on knowing the architecture and the orchestration of the sub-systems for critical business workflows
      1. CPU usage
      2. RAM usage
      3. Data I/O
      4. Network usage
      5. The Heat and sound dissipated from the hardware which holds and binds
        • CPU, RAM, Data I/O, Network and tech stacks installed and configured

    It hints me to look further and test investigate, when I observe:
    • Having a steady consumption
      • What is steady consumption in this context?
    • Having a low consumption
      • What is low consumption in this context?
    • Having a unusual consumption spike and fall of it
      • I follow the pattern to study further
      • What is considered as knee, spike and fall, in this context?
    • Having a zero consumption
    • Having a maximum consumption
      • What is maximum consumption in this context?
    Having a high consumption doesn't mean a problem.  Likewise, having a low consumption does not mean all is well.  I have to uncover them to learn what it means in the given context.

    In each of this, there will be a pattern.  I will learn them.  I will correlate with other sub-systems and learn what they were doing in the said timeline.

    Do you recollect this line -- "the architecture should provide the Testability"?
    • I wrote about it in one of the blog posts of Performance Engineering.

    I refer to the below by traversing with the timeline,
    • The logs by asking for it
    • Data recorded
    • Any APMs that are in place
    I correlate all these with above said indicators.

    This gives me a start. It is one of easiest start that I can have to get started with analysis.


    Well this is to analyze at the server end.  What about the client [consumer] end?  It is simpler and will share in the coming blog posts.



    Do you want to know more on this and other strategies that can be used contextually?  Let us get connected and converse.  I'm happy to share and learn on listening to you.  It is fun and awareness!



    Wednesday, October 11, 2023

    Prioritizing Performance & Its Requirements - The Two Engineering Tasks

      

    How do I gather and prioritize the performance requirements of a student from schools, colleges, universities and society? 

    Note that, I said performance.  What do performance mean in schools, colleges, universities and society?

    • Any time, you asked this question to self?
    • If you are living with children, did this question cross your mind, no matter in what class the children are studying?

    This is not the question which can be ignored.  Also, this question is not precise and to the context.  It is the question that is resonated but has no acceptable rational base for whatever context from which it arises.  The same when it comes to performance requirements of a software system.

    If you observe closely, the system in which we live, it pushes towards performance for what it thinks as a performance.  Isn't it?


    The third question in season two of 100 Days of Skilled Testing is:

    How do you gather and prioritize performance requirements from stakeholders and project documentation?



    Prioritizing Performance & Its Requirements


    If you read attentively, the title of this section says - prioritizing performance and its requirements.  I did not say, prioritize performance requirements.

    That said, what is performance for Netflix is not same for the  Aadhar system.  But, both have prioritized the performance and looks to be aggressive in knowing the requirements for same.  Don't you think so?

    We're in the timeline of Shift Left.

    • How to shift with performance to left? 
    • What all in the system should focus on the performance in the Shift Left?


    MVP & Performance Engineering Story


    When we are going to take MVP as early as possible to market, there is a tradeoff.  What are considered in subsequent priorities which will be compromised on negotiation by engineering and business?

    The context matters when prioritizing!  Be it for performance or functionality or security or any quality criteria.

    I will interpret the question asked from the point view of a MVP's deployment or publishing.

    • Are you asking why I have picked MVP?
      • The performance is contextual and it is based out on multiple touchpoints, its boundaries and interfaces of a system.
      • I cannot talk on all those in this blog post.
      • Neither, I want to talk about the KPIs and metrics.  
    • I want to share which you can pick, consume, and apply in your work.


    Now, we have prioritized performance for a MVP.  Aren't we?  Prioritized means, it matters, it concerns us and we are okay to compromise on few for it.  Let us jump to Left with MVP in our hands to identify the requirements of business.

    As a business, we will have a rough idea on how we are pitching and selling our services and to whom.  As a test engineer, you can sense what is the key transaction [business work flow] in the MVP.  You will know the touchpoints, interfaces and boundaries in the architecture that communicates and work together to keep MVP delivering the value.  Don't you?

    Say the business wants the MVP to support and serve 500 requests per second.  I should know about the 500 requests here.

    Is it,

    • Concurrent Requests?
    • Active Concurrent Requests?

    This matters!  Both are not the same.  Have you asked this question?  It is a requirement we miss to capture.



    Capturing Performance Questions for a MVP

    It is about the awareness for first!  How much am I opening up myself to the awareness?  This brings an energy and it is contagious.

    How do I bring the performance awareness in my team so that it is engineered into the system we develop?  This is a culture drive to an organization.

    Now, I know, the MVP has to serve 500 concurrent active users in a second per business's expectation to meet its reach and target.  If I do not know this, I have to capture this data, for first.  How do you capture?



    A Use Case to Ponder

    One use case which would trigger the spark of thinking is - How should Disney+ Hotstar's services perform to live stream the India vs Pakistan Cricket World Cup 2023 on 14th Oct 2023?

    • How should it capture the performance and its requirement for this day?
      • How should this system scale to crores of viewers streaming the live video of the match?
      • How should this system scale for the gamification - emoji, chat and other viewers engagement during the crores of viewers making requests from client interface?
    Try to play the past 30 minutes of this live video? What did you see? Why? That is part of the performance engineering strategy!

    This use case opens up the different topics of Architecture and Performance Engineering. Be aware of it and explore on them.  This is not what we want to talk, now.  We want to talk on a MVP and how to capture its requirements for having better performance experience.



    Step Up by 5% Heuristic


    On having a test environment which is close to the production context and the test data that looks realistic, I get started.

    I framed this "Step Up by 5% Heuristic" after few months on starting the practice of Performance Engineering and Testing.  I failed, and I learned. I'm learning.

    I know, the expectation is 500 active concurrent users per second.

    I will start to evaluate the integrated systems of the MVP for the 5 percent of 500.
    • What is the 5% of 500?
      • I will start with 25 concurrent active users requests for the MVP.
        • I will observe the emotional experience of when using the MVP during this time.
        • I will monitor and record the KPIs, and other needed data.
      • Does it fail to serve 5% of concurrent active users?
        • If failed, I know what to do now.  It helped me.
          • This helps me to draw the requirements better and rationally for the existing system's architecture.
        • If it succeeds, it helped me partially in knowing what actually I wanted to know.
          • I will raise the active concurrent users to 10%
          • That is, increasing it by 5%
            • I repeat these tests until the MVP architecture lets me know about the requirement it needs for the performance of serving 500 active concurrent users in  a second
              • Read the above sentence, again
              • The tests on MVP will let me learn what are its performance requirements for serving active concurrent 500 users in a given architecture, infrastructure, and tech stacks


    Beyond by 37% Heuristic


    I framed this "Beyond by 37% Heuristic" after I failed in framing the tests for performance.  Talking the rationale of this heuristic is not the purpose this blog post.  Let us catch up if you are curious and interested, we will discuss on this.

    Do a salary hike of 30% indicate a high performance?  I don't know!  But 30% hike is something not commonly given to all, is what I see in my career so far.

    That said, this 37% has worked for me in the contexts I'm testing.  Did it serve 685 (500 + 185) active concurrent users in a second?  It helps me to draw a requirement analysis of the MVP system for this volume of concurrent active users.

    Now, I will step up by another 37% of concurrent active users. That is, 870 (685 + 185) active concurrent users in a second.
    • If seen, I have 1.5x traffic now.  Did it serve?
      • If yes, how many active concurrent users were served in a second?  
      • I will correlate the KPIs of other integrated systems of a MVP.
        •  With the captured data and emotions
          • This will tell, what should we expect despite what is the expectation from business
          • This difference will let us know "the requirements"
            • How to gather information on -- what has to be optimized, changed, reorchestrated, eliminated, included, and more.
            • We start technically in establishing and framing the Performance SLAs and SLOs between the tech team and business.
            • Now the performance & its requirements will appear in the dots that are,
              • Being connected
              • To be connected
              • To be disconnected
              • That does not exist


    To conclude, shift wherever, take the performance engineering together! Revive its requirements to be healthier!



    Note: You should read these blog posts if you have not:
    1. Performance Testing: Unspoken KPIs and The Missing Correlation
    2. Architecture: The Common Shared Understanding -- Part 1
    3. Architecture: Its Aid in Performance Engineering -- Part 2


    Thursday, October 5, 2023

    Architecture: The Common Shared Understanding -- Part 1

     

    When we are developing a software system, the requirement from a stakeholder is not 'Fast' or 'Scalable'  or  'Responsive'.  But, the stakeholder needs it and expects it.  If you see, on a larger picture, the software system development and maintenance is a job of balancing too.


    When a Software Architect [Technical Architect and Test Architect], works on architecting the software system and testing for the same,

    • It is about balancing the technical aspects with the business's requirements from stakeholders.  
      • Do you see that?


    Knowing the architecture of a software system and testing of same is one of a primary task for engineers on the project.  Because, we software engineers have to balance it well.  Balance, what? Balancing the technical aspects together with business's requirements from stakeholders.


    This blog post is part of 100 Days of Skilled Testing series.  The second question posted for season 2 is,

    How important is the understanding of application architecture to do performance testing better?

     

    What is an Architecture?

    In context of Software Development & Engineering, the word "architecture" is one of the ambiguous words among the teams in a project and an organization. 

    As a test engineer,

    • Did you ever had a discussion or arguments or debate with programmer and architect?
    • I had such discussions and I continue to have it today as well in the projects that I work.
      • This is to know and understand
        • What I should be doing as an engineer for first and as a Test Architect in the role?


    The outcome of this discussion showed me,

    • We all did not have a common understanding of it
      • We did not share,
        • "What I understood for the architecture and this architecture?"

    The primary goal of a Software System's Architecture is,
    • We all engineers on a project have a same understanding of it, in the aspects it exists for.
    • This understanding is arrived after we have put our thoughts into scrutiny and decided that we stick on to it, so that,
      • We can balance well between the technical requirements and business expectation.
    Are you with me, so far?



    A Software System's Architecture is,
    1. A common shared understanding of what we all have for,
      •  What we are developing, testing, and to about maintain?
        • And, Why? Who? When? Where? How?
    2. Represents the boundaries and interfaces of what matters,
      • That is [to be] orchestrated, designed, implemented, how it communicates, and what it will have, and not.
      • It also can show how the teams are structured and how the team and organization is organized.
        • For example, in the Service Oriented Architecture, the teams are built and structured with respect to the service they offer.
      • It is a model that is better than other models in a given context of technical requirements and business needs
    3. The context and awareness for,
      • Why it is the way it is?
        • The cost and value for being so.
      • What to do when it has to be changed? Where to change?
        • How simple and quick to change?
          • What are the cost and value for being so?
      • How can I monitor and observe all these consistently?
    4. The Gateway of Testability - it tells what is the Testability available for,
      • Letting know what is critical and priority to test
      • Where, How, When, and What tests can be framed, designed, and executed? To what extent?
        • Why these tests?
        • If an architecture does not talk about and do not have the Testability, we have a serious problem!  This has to be fixed for first on priority.
          • An architecture has to provide the Testability and Programmability scope and opportunities to develop a software system that is of value!

    For today, this is my understanding for the "Architecture".

    I'm a Test Architect in the role and I expect myself to be an hands-on engineer for first.  It is a necessity for an architect to be an hands-on engineer.




    Note: Read the Part 2 of this blog post here.


    Architecture: Its Aid in Performance Engineering -- Part 2

     

    I hope, you and I have the common shared understanding for the word "architecture".  If not read this blog post and come here to know about the dots.


    Do I Know the Dots?

    Before connecting the dots, I should know,

    • What are the dots and how to identify them, where and when? 
    • Who can help me in doing so?

    In Software Security & Engineering, we use a Threat Model to,
    • Identify the risks, surface area, tests and to develop the payloads. 
    • A software system's architecture will help in developing and improvising the Threat Model consistently.

    I see, the same for software system's Performance Engineering & Testing.  To test better for the performance,
    • I need to identify the dots, risks, surface area, tests, payloads, monitoring aid and correlation of all these.
    • The understanding of software system's Architecture is a necessity to do so.
      • But, what are the dots here?

    The dots can be identified when I know how to use the Testability provided by the architecture.  This leads me to evaluate the performance for the boundaries and interfaces in isolation and as a whole, and then correlate.


    With this, it puts me to question - What is the performance of this architecture?
    • I did not say the performance of software system; I said, the architecture.  
    • There should be some characteristics to identify and evaluate the performance engineering models offered by the architecture. 
      • What are they?  

    The architecture's characteristics helps,
    • To identify and distinguish the dots in ease and to test better.
      • How can I test for the performance aspects of a software system using the architecture's characteristics?
      • How do I identify these characteristics in the architecture of software system?



    The Characteristics of an Architecture


    Last year I read an article from ByteByteGo System Alliance.  This article flashed in my mind as I read the question,
    How important is the understanding of application architecture to do performance testing better?
    This is one of the article which I refer to identify the performance characteristics of the architecture.  I refer to the cheatsheet shared in this article for my references.

    In few projects and organizations that I have worked, most of these characteristics were put into practice and production environment.  I monitored them in usual traffic and unexpected traffic.  The feel is something that I cannot describe in words; I want to experience it.



    Performance Engineering Aided by Architecture


    Which characteristics of architecture is associated to the which boundaries and interfaces of a software system?  Knowing this, helps you and me in thinking - What has to be tested in performance for this interface in this boundary?

    I want to share my work experience here.  But, I see, if I share something which we all can relate to, it will be of help in knowing - Why it is important to know about the architecture to do the performance testing?

    Below one is a recent use cases from software industry for the same.
    • I will not explain in detail; but, I will bring the key points to the context of this blog post


    Amazon Prime's Audio-Video Monitoring Service Moving to Monolith Architecture

    What I and You Should Know:
    • The complete Amazon Prime system did not move to Monolith.
      • The Prime Video's Audio-Video Monitoring Service moved to Monolith.
        • Why?
          • This monitoring system which was orchestrated with a Microservice architecture did not scale after a limit
          • Problem Statement:
            • Say, the Prime Audio Video Monitoring Service expected a load of active 100 concurrent users streaming movie Kantara in Kannada audio.
            • After the 6th user started streaming the same video [in the same audio or different audio language], this monitoring system did not scale to include other 95% of concurrent user. 
            • As a result, the Prime Audio Video Monitoring Service slows down [or stops,] and eventually the video streaming to the active concurrent users will take a hit. 
            • This monitoring service is important so that each user gets a video and audio of the agreed upon quality and streaming.
    What I understand is,
    • This monitoring service was continued in production while the team looked for better solution in performance with the given architecture.
    • While it did so, the cost of having this architecture was high when it had to scale up.
    • Looks like Prime Video business beared this cost for sometime is what I see.
    But, the online streaming business cannot settle and agree to pay high cost, while it is planning to stream the live sports action in coming days.  A need came to look into performance characteristics in the being used monitoring service's architecture.

    It eventually re-orchestrated the existing components with a new architecture in place.
    • It moved from the distributed microservice system to a monolith system, where the spawning of Amazon Step Function (error detector clusters) happened vertically.
    • Along with this, the architecture of this monitoring service was placed in a way, such that, most of its components came into one process.
    • Thus, eliminating the S3 bucket as immediate storage for video frames (as images) and audio files.
    • This architecture helped the creational, behavioral, structural and functional characteristics of Prime Video's Audio-Video Monitoring service.

    Prime Video says upon testing for performance and changing to monolith by rearranging the existing components,
    • It saved 90% of the cost.
    • 90% of cost for Amazon, is what in the numbers if it is in Indian Rupees? 
    • How much a tester gets paid if just 1.5% of this 90% is paid as a salary per month?
      • I will leave this to your thoughts and calculation.

    If I know the architecture and where to look for what characteristics, it helps me to think of right performance tests for the context.  

    The Hostar's emoji introduction in live cricket matches during 2018 and its consistent improvisation in processing for performance is a good use case, to the question -- Why it is important to know about the architecture to do the performance testing?




    To conclude, architecture cannot be ignored in Testing.  It plays a critical role for aiding and identifying the testability and BCFS (behavior, creational, functional and structural) characteristics of a software system.


    Wednesday, October 4, 2023

    Performance Testing: Unspoken KPIs and The Missing Correlation

     

    I love Performance Engineering!  In a nutshell, the performance is all about capability in a given capacity,  in a context.  The context is critical element in Test Engineering.  In the common language of a workplace context, the performance it is about the productivity expected and delivered.

    This blog post is part of 100 Days of Skilled Testing series.  The first question posted for season 2 is,

    What key performance metrics or KPIs (Key Performance Indicators) do you consider when defining performance requirements?  Do you use the same metrics for client and server-side performance testing?  If not, these differ in what way?


    KPI Classification

    On a high level, irrespective of a boundary and interface of a software system, the Performance KPIs can be classified into two:

    1. Service Oriented
      • It helps to learn by correlating,
        • How well a system is providing the service to the intended users in a given context?
        • How a system is not providing the service to the intended users in a given context?
        • For example, Response Time, Availability, etc.
    2. Efficiency Oriented 
      • It helps to learn by correlating,
        • How well an application uses its features and resources in a given context?
        • How an application is having trouble in making use of its features and resources in a given context?
        • For example, Utilization of Resources, Throughput with the resources available in the context, etc.


    Performance Engineering & Metrics

    You will be aware of the other commonly spoken or written metrics that a performance tool offers or has it in its glossary.  I want to focus on KPIs what we are not aware or not exposed to in the communication or Performance Engineering & Testing Report.

    The metrics in Performance Engineering & Testing depends on what in the system, I'm putting to an evaluation.  

    For example, we do not consider how a feature is implemented and how many threads it can spawn and use in a given point of time.  Further, this leads to CPU and its logical cores, RAM, Disk I/O, and types of server or machine used.  Also, another important data is how the OS is setup with configurations where the different tiers of a system is deployed or installed.  These metrics are common either when I'm facing a client end or serve end, yet it is missed.

    If you notice, these are hardly spoken or heard metrics.  But it plays a crucial role.  A performance report compiled should have this data so that the technical people can correlate with other commonly heard metrics.

    Being aware of the KPIs that are commonly seen in the Performance Report is a must.  But, knowing what we are not aware or/and missing besides the known is a necessity!  This is missing in the correlation when evaluating the performance's perspective of a system.


    The Missing Correlation

    Identify what in your report is a need to correlate.  The statistics or numbers are representation and not a derivation.  It is of no use until I derive out of it with a correlation.

    I have to derive the correlation by including and also by eliminating the representations.  For doing so, the missing correlation representation is a must so that KPIs look rational, technical, logical and analytical.

    Uncover what is being missed in your correlation so the KPIs representation look senseful!