Testing Garage: 2024

Monday, December 30, 2024

Testing Debt -- It Exists and Hits Every Day in All Environments

As an engineer on the team, I see the discussion and short conversations on Technical Debt. No matter what, there will be a Technical Debt in the software system we build and deliver.

Likewise, when there is a Technical Debt, there will be a Testing Debt for sure. Identifying and learning the magnitude and impact of the Testing Debt is part of my job.

My Understanding of Technical Debt

What is a Technical Debt?

It is the cost of additional rework caused by choosing the quickest solution rather than the effective solution.
Making decisions based on speed above all else is one factor that leads to the Technical Debt.
Technical Debt has pros and cons.
The unintended Technical Debt will not come to notice immediately or sooner.

This is one of the challenges we have to deal with.

Because intentional technical debt is what we are aware of from the decision made.
The impact of unintended technical debt has to be managed by every team involved in the software development.

Starting With The Testing Debt

I don't know if the term Testing Debt exist in the industry!

I have been using the term "Testing Debt" for the last 9 years in my practice.

I use this term to share and keep stakeholders informed about the rework which we have to do as a result of Technical Debt.
On reading, what is Technical Debt, can we relate to what is Testing Debt?
Most times, the testability, automatability and observability characteristics of a system will be affected as a consequence of Technical Debt.
Further, to compound the cascading effect, the Testing Debt will come in.

One of the major impact due to the Testing Debt is the change in the Deterministic capability seeded with a test

Reworking on this deterministic capability is not simple and straight in all cases for the change introduced in the business operations, tech layer and infrastructure.

When I say, there is a Technical Debt arising from what we are doing and delivering, it also means, there is a Testing Debt that is created as a consequence.

To speak in terms of engineering, the testing is part of technical activity.
It looks funny to me when the one portrays the testing team as non-technical and label with terms.

For example, manual, repeatable, repetitive, etc.

It is a cycle and repeatability to an extent. Repeatability is one part in the engineering cycle and process.

Do You Know Your Testing Debt?

To remind you and me, the testing is sampling! How do you sample when you have a growing Technical Debt and Testing Debt?

I will share one of the common Testing Debt which we everyone of us will have and I assume so.

The ask for in-sprint automation and automate everything. Is this a fair ask and practical? This is not a line to discuss here. But, I will tell you how we testing team will be hit by the Testing Debt in delivering for this expectation set by business and stakeholders.

The Technical Debt leads to rework sooner or later in the engineering and it will lead to major rework in the Test Engineering. Isn't it?

Say you have worked to build the testing infrastructure, regression suite, automation suite and integrated with the pipeline. Now, there is a rework and change in the system as a fix for few Technical Debts.

Does this effect your testing infrastructure, regression suite and automation suite that you have in place?

Do we have [or given] enough time and resources for Testing Team to work and fix this and keep the pipeline running seamlessly?

This is not just about time and resources!

For the change in tech stack and engineering of the system, the Test Engineering has to adapt to it and challenge it.

If the Test Engineering does not challenge the engineering of the software system, we are not in a position to see the perspectives of risks and problems
Eventually, we will not even be in place to learn what are the no-risk perspectives and aspects of the system

How do I manage this so that I can turn around quickly to the context and keep the pipeline smooth?

This is a challenge to every Test Engineer who faces the effect of Technical Debt.

We Test Engineers witness and experience the Technical Debt which also includes Testing Debt in different forms and intensity.

Wait! As you read this blog post, did you try to think of the intended Testing Debt in your project?

What are the unintended Testing Debt and its impact?

This is something not easy to identify and learn, while we can identify and learn the unintended Technical Debt to some extent.

This will exist; it will impact and hit all the environments, everyday!

Testing Debt and Test Engineer

Note this, building a Test Engineering solutions to withstand the effects and changes from a Technical Debt is a skill. It is possible.

For this, the Technical Debt should not be damaging the deterministic attribute which is seeded to a test.
Picking and integrating such a deterministic layer within the layer of testability, automatability and observability is a mark of a skilled Test Engineer and Test Engineering.

Technical Debt and Testing Debt are not the same. But, the Testing Debt is the outcome of a Technical Debt and has a relation to it.

Testing does not drive the change in how the system is implemented; or, I have not experienced it so far. Wait! The outcome of the testing can and has changed how the system is implemented. These two sentences are not the same.

To end here, how do I manage myself with all these debts? We are also in the job where we have to grow together by talking, negotiating and solving the debts that are the outcome of decisions from business and stakeholders.

Monday, December 23, 2024

The Odds and Otherside of Mentoring & Community Work

The space of mentoring and association is something not easy to understand in the begining days. Especially when a mentor is not coming from a job having big designation and social media following. I got hit by these waves. Now, I know how to balance my swimming with these waves. I'm swim smoothly without losing my energy and focus.

I try to understand what could be going in the mind at the other end.

Here in this blog post, I'm sharing what I experience, and what I'm said by few who wants to pair up and practice.

The intention of this write up is to share what I look for in minimum and no intention of hurting or talking bad of anyone.

Do Not Disclose My Name and My Practice With You

I was approached by few fellow engineers from the software testing community. After a few sessions of pair practicing and learning together, I was asked to not mention or take their name and talk about the practice's accomplisment in the community. I respect and acknolwedge this ask. But, then, I asked why so! I did not see any appropriate reason or concern expressed for doing so. I have not shared anything about our practices and so far accomplishment.

That said, I see these mentees and a few testing community had no concerns in tagging and getting associated with others in social media and in the community spaces.

The curious me, tried to uncover what could be the reason for this. I learn, these are a first few reasons:

No big job title and role metnioned on my LinkedIn profile
Not having the major social media following
Not speaking often in conferences and not given an opportunity for not having a title or big company name in my LinkedIn profile
Not being a panelist in any conferences
Not being a panel member in the discussion
Not being a AMA person in social media and community space
Not working in a organization which has brand name that bring crowd to conferences or to the one's benefits
Not getting into unneccessary discussion or space that highlights the exchange of words [not the thoughts]
For not taking the inequality gesture and treaments
For practicing deep
For offerring the assistance when it is not asked

This thing, I stopped!
I learned my lessons

Helping and assisting with no expectation in return

When you do not asked for any returns, the value is not recongized
The same people pay thousands elsewhere and say it did not help them

And, I'm too good and humble is what I see; this does not work in the longer run in any business

And, being so makes me not practical

Today, I'm saying NO to -- who approach me, and, then ask me not to say with anyone in the community as it impacts them and their relationship with others.

I'm asked to share and give my work and artefacts, but, do not want to step up for giving a mention or credit in public! I don't get it how!

I'm saying NO for such learning association and mentorship connection for the last two years. I see, this is the best that I can do.

The practice and learning need courage, and the openness to receive and give back. Without this, we cannot experience the learning, practice and growth.

It is not that I make it public by tagging and bragging. I don't make enough time to brag by tagging unnecessarily. I write it meaningfully when I see our accomplishment adds value and benefits to more people, and to the mentee and me.

The point is, who is seeking assisatance do not have enough courage to stand up and say, we are practicing. But, the same people will associates every other corners tagging with credits to others.

So, where is the problem? If it is associating together with me, I want to say no to those who see that problem.

This is not just with individuals. I see the same with a few Software Testing Communities. At the start of a day, it is a business for the software testing communities, today. While I get a lot of learnings from the communities, such things need to be ignored, and I do it. If we do not make a way to support and sustain the community bussiness, there will be no space to make meaningful and sensible noise and exchange the learning. Today, I watch myself in how I contribute to certain testing communities. Giving back to community is must when I get so much from community.

So, Why This Blog Post?

I want to share this blog post for first to whoever approach me for practicing together or a community work.

I do not want to partner and associate in a mentorship and community work, if a person and group is

Lacking the courage
Not wanting to give the due credits
No mention for whatever we lose, learn and gain in the pair practice we do

If the receiver is not confident and happy about the learning, gain and loss we make together, then I want the person to make use of her/his time with other mentor and skilled engineer. I refer them to other mentors [and skilled engineer]. That way, the person and group can feel proud of her/his learning and talk about it in public by mentioning the other mentor [and engineer] name.

I'm not being paid or making money when I work with a mentee or for a software testing community. I expect the recognition, mention and due credit to be put out in public when the mentee or a community makes a loss, benefit and gain. I see this is a fair expectation!

I do not want to be used with no respect and keep asking for the same. If I remain so, I will not set a better example to myself, my teams and to the fellow people in the community.

Be courteous to those who give you, while you do not know, what that person is going through.

Saturday, December 21, 2024

Do Adding Cores Improves The System's Performance?

Hardware and Programming Language

If I have no understanding of -- sysem's architecture, how the system is designed and implemented technically knowing its business purpose, then I cannot test for performance rationally and technically,

In this context, the awareness and understanding of below is necessary and important.

The infrastructure where the system and its services are deployed and consumed
The hardware on which the system is deployed

The understanding of the hardware along with its limitation

The hardware on which the service of the deployed system is consumed

The limitation of consumer's hardware

Understanding the CPU and its Cores

How we are programming the threads to exeucte on these Cores?

The programming language used to implement the system; this play a vital role

Does it allow the threads to run on two and more different cores at a time?
Or, does it confine the threads to run on one Core?

The way in which we have programmed the system for its instruction execution at a thread (subroutine) level

How the threads are implemented and how it exeuctes on a CPU and its Cores?

I should be aware of above said information when building high performant system and testing for the performance of a system.

The value and benefits of these information are not just limited to performance testing alone. It also helps in testing for seucrity and functionality. The awareness of these information will give you an edge in the testing for performance.

CPU Cores and Software System's Performance

I hear this often and especially during the businness seasons time:

"We will add more CPU with more cores. This will improve the exeuction time and improves the performance. The business will not be impacted."

Does this make sense technically? What's your thought on above statement as a Test Engineer testing the system for different quality criteria?

If I do not have awareness on information that I said above, I will not be in a position to test and advocate better for performance.

Do Cores Added Reduce the Exeuction Time?

By just adding more cores to a CPU, it does not always speed up a program's exeuction time.

I learn, if a program which is designed to run on multiple cores have threads, that must run on one core, then this will be a limitation for the maximum speed [by reducing execution time] which we can achieve by adding more cores.

Also, the programming language used will have a role. If the programming language uses Global Interpreter Lock (GIL), then it makes sure that a process created out of it can run only one instruction at a time, despite of the cores it is currently using. That is, though a process created has access to multiple cores in a given time, the insturctions will be running on just one core. Which means, the threads cannot run on multiple cores. The instruction will be running on just one core and just one thread at any point in time. Eventually, this leads to higher exeuction time for a process of program.

Should I say high exeuction time is a high performance and high performant system? That is contextual! What do you say? What should a consumer of the service (business) say about this performance?

To summarize, adding multiple cores to a CPU, does not necessarily speed up the exeuction time for a program. It is dependent on how we have designed and written the program and the programming language used.

To know more about GIL

Global Interpreter Lock -- https://en.wikipedia.org/wiki/Global_interpreter_lock
https://langdev.stackexchange.com/questions/1873/what-is-a-global-interpreter-lock-and-why-would-an-interpreter-have-it

Thursday, November 7, 2024

Functional Testing Is Must In Performance And Security Testing

I'm sharing about how I missed to test for functionality while I was immerse focused on testing for performance of a Stored Procedure. I was unhappy for a couple of days as I missed something that I practiced for years.

I'm glad for reinforcing this learning with much more awareness into my testing's MVT and MVQT, now.

Context of Testing

A Stored Procedure was optimized for better execution time. No change in the functionality. This part of the system is not touched for a long time (years?). There was no change in functionality here for long time (years?). The time taken by SP was of concern. I was asked to test for the optimization.

The complicated area, here, is the test data to use. It took me days, for identifying and building the test data to test this optimization by mimicking the production incidents, use cases, and data.

When I got the test data ready, it was the fourth day of my testing this change.

Where Did I Go Blind By Being Focused?

The test data that I prepared is solely for the evaluation of the execution time. This test data helped to test functionality as well. But, my focus was on evaluating performance not functionality from this test data.

The change in SP did impact the functionality. I was supposed to use the large data range to test for functionality of this feature which includes two SPs. But, the task assigned was to test just one SP which is optimized. I got blind here!

Are you asking, what is the impact of this functional problem?

In the one complete business work flow, this functional problem added the same data into different sets in the subsequent iterations. Redundant Data -- This is not an expected behavior.

I just spoke performance, traces, data I/O and execution time, because that was a pressing problem. Why? That was the objective given to me.

My testing mission fell short in redefining this objective. If I had redefined it, I would, have added functionality in the better scale.

If I had redefined it, I would have pulled the other SP into functional testing which is also part of this feature's work flow. These two SPs are expected to handle the data by eliminating the redundancy.

It was a simple test, but, I did not include/had that in my testing mission that day.

Why Did I Go Blind?

The performance test blinded me for functionality, as I saw the basic functional flow looked functioning. But, the data count was going wrong when a bigger data range is used in the context.

See here, how stupid I was in my testing! I'm testing for a SP that has a change as part of its optimization for execution time. I never brought the functional testing in. Why? I focused on the testing objective.

I just looked into one SP that is optimized. I did not look the other SP which has to work along with this SP later to complete functional flow of the feature. Why? How is that even possible? I was asking myself this. I see, this is okay from the perspective of the testing objective I had. But, not okay from the perspective of a test engineer who is supposed to think the impact and prevent the problems.

My immersed and concentrated focus on performance and its related activities on a SP for four long days did not let me see this.

What Am I Saying Here?

While I have tested for DBs and ETL systems for years, I did not use my learning here. What is that learning?

When there is a change in any part of the ETL, SP or DB of a system, testing for the functionality for the business workflow is equally important. Vary the data dimensions and evaluate the counts.

I was completely hooked into the execution time and the test data while switching between the environments for four days. The chaos in data between environments is something that misleads easily. I fell to it this time.

I say to myself, if it is a fix for the performance optimization or a security [or any quality criteria], testing for functionality is equally important and of priority as running the tests for performance or security.

When a DB layer is picked for fixing and optimization, testing for functionality in a equal scale is must. There is a change in the code or/and infrastructure and it has to be noted with additional attention.

To add on this, this time, I did not go through and analyze the SP. I took this call from the test team. This call of me costed and had a major part in letting me not to think of functionality.

My fellow colleague ran a test with varying data size by completing the business workflow and observed the problem, and informed me. I give the credit to Sandeep.

If I had brought this performance test under the automation, I would not have done this. Why? I will evaluate and assert for each data returned for different sizes. I did not automate here and there was no need for it in this context.

Redefine the testing objective that you have got; it helps when you see the model of a system and test.

Respect all the fix and suspect all the fix. This helps in a longer run!

Sunday, November 3, 2024

Logarithm and the Expression of an Algorithm

I got to know about Logarithm in my High School. But, I never knew it would be part of an engineer's life. As a Software Test Engineer, I encounter the discussion that involves Logarithm often when testing an algorithm that is designed and implemented to solve a problem.

What is Logarithm?

I found this definition in Math is Fun. And, I see this is simple and straight.

How many of one number multiply together to make another number?

For example, how many of 4s make 64? 4 x 4 x 4 = 64.

That is, when multiplied 3 of the 4s, I get 64. Therefore, the logarithm is 3.

It is represented as below and said as "the logarithm of 64 with base 4 is 3".

log₄₍64) = 3

The same can be written or expressed as exponent in Math. That is, 4³ = 64. The exponent and Logarithms are related. The logarithm tells us what is the exponent.

The logarithm answers the question -- What exponent do we need for the chosen base to get the given number?

John Napier introduced Logarithms in 1614 as means to simplify the calculations.

What Logarithm?

In the Computer Science, we use the Common Logarithm. In my practice so far, when talking about the algorithms, I have been using Common Logarithm which is denoted using log.

The other type of Logarithm is, the Natural Logarithm. This is denoted using ln.

What is the Base?

One of the questions which is asked in discussion of an algorithm's analysis is what is the base? This leads to question, What is the base that we should consider in Computer Science?

In my so far experience, I see, it depends on the context of an algorithm that is under evaluation and if it is of logarithmic nature in the time complexity and growth rate.

In the context of Computer Science, I see, the base do not matter much when equating the right hand side to the left hand side calculation. But, one can pick the base that suits to context when needs to express the relationship of an algorithm under evaluation. The base need not be always 2 here.

Note: I understand below from the peer discussions for the logarithmic base in Computer Science:

The asymptotic notation focuses on growth rate and ignores the constant factors. Any two logarithms differ by a constant factor, and the base makes no difference. Hence, I do not see base specified for a logarithm when using asymptotic notation. That said, when I have a logarithm with a constant base, it is okay to not specify base.

When the base of logarithm depends on parameter (an input or any external configuration) to the algorithm and which is not constant, it is a better practice to mention the base.

In my testing experience for an algorithm's functionality, performance, complexity and growth rate, I see, the engineers keep the input parameters as configurable. That is, it is kept as a constant which can be changed based on the need basis for the context. So the base is not mentioned in the logarithmic expression for an algorithm most times.

If you see, my understanding is not appropriate in context of logarithmic asymptotic notation, do share your thoughts as comments to this blog post. I will be happy to introspect and learn.

References:

https://www.mathsisfun.com/algebra/logarithms.html

Tuesday, October 8, 2024

Do Your Bug Report Annoy You and Fellow Testers?

I read the quotes or thoughts often about the code being written. Like, write the code for other programmer and not just for you; so that, the other programmer can pick it in ease and work from there. You should have come across similar thoughts on code.

Have you ever come across thought[s] that speak about the bug report being written?

The bug report you write, is it for you alone? Or, is it for the audience to whom you wrote? Or, is it for someone who picks it up and work upon later?

How good are the audience of your bug report on reading it? How did other fellow testers feel reading your bug report? How easy it was for you to read your own bug report and work on it later? How smooth it was for other tester to understand your bug report and test the fix?

I experience this:

A bug report with a precise and helpful technical details did not serve the audience and fellow testers
A bug report with no precise and helpful technical details did not serve the audience and fellow testers
A bug report with plain English and attachments did not serve

While I say, that, I see this helped most audience sometime:

A bug report in plain English giving the context, little or no technical information and associated details

It has happened, that I have rewritten my bug reports on reading it after an hour. And, I have rewritten the bug report of others as well after testing the fix. In both cases, I "prevent" the pain which I and others go through to some extent. At least, I hope so!

To end, I recall this quote of Martin Fowler

Any fool can write code that a computer can understand. Good Programmers write code that humans can understand.

I see, this holds good for a bug report as well. All and any of us can write a bug report. A skilled engineer [and test engineer] can write a bug report which does not bring unwanted pain to her or his audience.

Anytime, did you read your own bug report after 3 months of writing it? How deep was the pain and annoyance to know what it was all about? Give the same bug report to your fellow tester or programmer or product owner; ask, what did they know from it.

Friday, July 26, 2024

My First Hand Analysis of CrowdStrike Falcon Update Incident

I attempted to analyze the process dump of CrowdStrike shared by my friend. He said, there could be an attack which is leading to crash of Windows OS globally. This made me curious to look into the dump and learn.

I had no much context around it, but, a test engineer in me did not sit quite. I started to analyze the dump information. Here is my first hand analysis that I made on 19th July 2024 post 10:30 AM IST.

What I Saw?

It is a Windows OS's process dump.
Looks like something with C or C++ application reading how the memory offsets were in the dump.
It started to read a memory offset.
Then the process witnessed an exception.

Here the program could not read further
Why it could not read further from this offset?

My little experience of testing drivers on Windows OS for a card printer machine, refreshed and recalled what I had witnessed when testing.

Scratching and Striking My Mind

I started to ask these questions myself while I asked what could have gone wrong. I could not stop here as I was curious what led Windows machine crash. I referred to web and learn there was an update by CrowdStrike, and then this incident.

The bugs do exist in every software no matter the level and depth of testing, automation and engineering's excellence. All software do crash and OS is not an exception to it. But, what made the update to crash the Windows OS? Pointing and blaming CrowdStrike or Microsoft is not a way for the practicing test engineer. If these two organizations are serving its huge customer base, they have something working and reliable. Engineering does not eliminate problems.

By now, I had a thought that it is not an attack. It is a software bug! Where is the bug? What is the bug? Was it not experienced in pipeline?

The Open Ended Questions

I had these questions as I analyzed and spoke to my friend.

What is Falcon?
What was this update to Falcon?
How frequently the updates are rolled out?
How the updates are rolled out globally?
What pipeline do they have in testing?
Who is impacted the most in business? Is it Microsoft or CrowdStrike? Impacted in what way?
What is CrowdStrike? What they do? Who are the customers?
Where do the CrowdStrike's Falcon sit in the OS and what it does?
How CrowdStrike works in the machines and what it offers?
What do the dump say? Relook into it with different perspectives.
How this could have been prevented?
How will I prevent this if I join this team knowing this incident?

With these questions, I started to analyze the process dump which was shared.

I had more such questions, but these were the first few that I crossed as I started.

Analysis of Process Dump

My interpretation, tells me the below for today

Accept that it is an incident as any other incident which I witness in production environment.
Do not fall to the speculation happening around. Remain calm and focus to interpret and understand your exploration.
I see, if it can start to read from an offset and then ending to experience a non-existent or invalid offset, is it a NULL Pointer?

What is NULL Pointer?

A NULL Pointer is a pointer that does NOT point to any memory location and hence does not hold the address of any variables.
If I do not initialize and assign, the pointer will have NULL as its value.
For example, int *test;

When I want to access the pointer test (a location in memory) pointing to, I will not be sure what is in the pointer when I read it.

I may not set it later or set it.
In this case, the code can tell if the pointer is valid or pointing to a garbage memory

But, if I declare it like int *test = NULL;

I can check if was set and initialized

It is a better practice to assign a NULL value to a pointer during initialization so that we can check if it is NULL or as any address assigned to it.

This understanding of Pointer makes me think, is it due not initializing a pointer and so the error code c0000005 on reading a memory that is not valid.
When we assign a NULL value to pointer, it is a null pointer in C++

We assign null value for testing and asserting

If the memory is allocated to a pointer or not
If it has a return address and is a valid one or not
If a pointer is not initialized, assigning null it prevents problems to certain extent

With this understanding, I also read, it started to read from an offset 0x9c, and then failing.

What is 0x9c?

In Octal it is 234. In Decimal it is 156.
Can there be such address in a computer's memory? I don't know.

If it is a access violation, then is it a memory which is in preemption of the OS?

If so the OS can terminate the program or process which is trying to access it.
Is this killing the process and aborting the operation of Falcon's IPC and eventually Windows coming to BSOD?

This tells me it is not a NULL Pointer in first case but not initializing a pointer to NULL.

I infer, if the pointer was assigned to NULL, that is initialized, there could have been some hint in the state and event when accessing the memory.

This is my analysis; but, I have not seen the test code nor aware of the product. All this inference is based on the process dump and my experience of testing drivers.

It got something in between from update (a config or pattern?) for which it cannot find and read in the memory? Why?

This indicates me, it could be a bug, that is, a logical problem. This is my hunch for today!

Data in the dump

Exception Address
Read from Address 0x9c
Exception Code: c0000005 (Access violation)

Testing my Interpretations

CrowdStrike as an org when it caters its SAAS to such a customer base, won't it have a testing pipeline

It will have, I have no doubt in it. They test and roll out the updates, I believe in it.

Did they witness any such incidents earlier?

I searched on web for it and I did not find something similar on the Windows, earlier.

Is this a NULL Pointer? Are you sure?

No, I'm not sure. But, there is something that is leading it to address which does not exist or which is invalid? I will have to wait for their RCA to know technically what caused this. But this is my understanding reading the dump.

How do you think it is a memory access problem?

The error code 0xc0000005 says that.
I referred to driver easy website for the information because my experience of testing the drivers for Windows OS and experiencing such incidents led me there. This is what I learn:

https://www.drivereasy.com/knowledge/solved-how-to-fix-0xc0000005-error/

Do you think the programmer would not have handled the obvious Pointer and NULL initialization?

I believe there will be a check for Pointer and what it is pointing to. But is it due to no initialization? Technically this has to be analyzed which I cannot do. I will have to wait for CrowdStrike team to share the tech details.

Is this a driver problem that killed the Windows kernel?

I don't know. But, the .sys file will not have driver as per my learning. It will have information about the drivers and any configurations.
This incident is a problem, which impacted both CrowdStrike and Microsoft. Maybe, both will have their areas to look and fix it they see so. But, in this context, CrowdStrike can fix it quicker and that is much better -- is what I understand.
I'm a Windows user for long time. I see, Windows has worked well to all my contexts so far. The Engineers of Windows OS knows better than me here. I'm not well aware and informed as they are.
CrowdStrike's engineering team are skilled and they are rolling out updates often in a day. They have a better pipeline when this is being done.

But, the question I have is, how did this happen?
No one lets such problem into production when they are aware of it. Do you?
There is something that has not come to their observation and experience. What is that?
Knowing this will help to prevent this and similar incidents happening in future.

I'm waiting to know what did not come to their experience and led to this incident.

What could be in the .sys file of CrowdStrike?

I don't know! I want to learn that.
But, from my testing of .sys file and drivers on Windows OS, I learn there could be a configuration details with certain pattern or information to capture at run time, and help the installed software to run. This is my learning and awareness from my testing.
That said, testing at OS level and Anti Virus engines are not obvious. Testing of drivers is like the risky mines. What is sufficient and good enough in test coverage? It needs an expertise at OS internals level.
Windows OS having such a fragmentation in its versions, updates and patches, it is a battle field and mines for engineers building such solutions for sure!
I learn, the Windows OS stopped when an application tried to access the invalid region or non-existent memory.

The update which was rolled out, did it have a configuration or a pattern that showed a logical problem when processing it?
I have such questions and thoughts that are striking my mind as I think and build a problem model for the same.

Is this a race condition incident?

I see, it is not a race condition incident as users across globe experienced it.

Is this specific to a Falcon version, OS version and hardware?

Not all host machines would be on latest version of Falcon, is my presumption.
At least, n-1 and n-2 versions should be on host machine which experienced this behavior.

So it is not a Falcon version specific, I see.

It looks to me as it is not specific to the Windows OS version and hardware configuration.

It is an application software problem which occurred at driver level is what I see.

This is an IPC communication and process is my understanding.

The driver can receive the IPC communication in continuous mode.
At times, this can get queued based on the application and what it does.

Where is the Problem?

Well, I'm looking and pulling from my visualization by relating with my experience of testing the driver on Windows OS. I don't know the exact reason or close enough to tell what could have gone wrong.

Reading the process dump, it says accessing a memory that does not exist or corrupted. One of the high possibility is, the starting offset is seen but it is not helping when reading.

For example, Ravi has the address of India's Prime Minister house.

But, he does not know from where to start despite having the address.
He is void and null in knowing where to start and what to do when he is not initialized with the start location to begin the travel to the Prime Minister's house.
In short, he do not know where the address is pointing to and what it has, though he is given a address to start.

Can he access the Prime Minister's house premise without any access granted and authorized to do so?
If not, won't he be arrested by police or other security forces and stop him?

Do I Know the Precise Problem?

I don't know! I do not know the CrowdStrike product and platform. I'm waiting to read the technical details from Crowd Strike.

I see, it comes to the data, state and event. I would focus on how to prevent it learning which data, state and event led to this behavior. I think of figuring out the Test Design and Strategy that can help me to identify such use cases. I focus here and see can it brought into the automation so that it gets exercised and regressed consistently.

If it is due to the memory access that had a problem, I did such tests when testing driver for a hardware machine on Windows OS. I will share the tests that I did in upcoming blog.

I wrote the technical analysis from process dump to CrowdStrike and Microsoft. I did not get a response. Anyways, I'm sharing the overall information in a non-technical way so that it is consumable to most readers here.

Note: Here are another threads of me sharing my thoughts on same:

1. https://x.com/testingGarage/status/1814215089525821763?t=XSFdx69ElL0ZmBOcEFrTjg&s=19

2. https://www.linkedin.com/posts/ravisuriya_%3F%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F-%3F-activity-7221156949445206017-oeRa

Sunday, March 3, 2024

Performance Test Report: Between the Effective and Ineffective Reports

In this post, I'm picking the thirteenth question from season two of 100 Days of Skilled Testing.

What is an effective way of reporting performance test results and mention some tools you have used in test execution, analysis, and reporting?

I see two questions. I see it is not a wise attempt to learn these two questions combined as one. In my opinion, the second question added, it dilutes and make the whole question vague.

What Should a Report Do?

The report should be contextual, compelling, influencing, and targeted to the intended audience to act upon on making a decision. The software testing report is not an exemption from it.

The performance testing report should know

Who are its audiences?
How they read, relate and understand the information?

If this is ignored, the report will not serve the purpose of commissioned testing. The effectiveness of a report cannot be determined solely on how the stakeholders responds to it.

On understanding the risks and problems in the system's current capability, mentioned in the report, the stakeholder might not respond with an action to tune the performance aspects. This could be for multiple factors including that of business.

Note that, a skilled and problem solving engineer understands the business and how it drives. Just being technically skilled will not help an engineer to grow in a longer run in her or his career. The system's performance tuning decisions most times will be driven by business.

Did the report persuade the stakeholders with an awareness, mutual understanding and agreement? The report should drive this conversation. If not, we have a problem.

On reading the performance testing report, do the stakeholders get an informed awareness on what happened during testing in the present capability of a system's performance criteria? Do the stakeholders understand and mutually acknowledge how it benefits and costs the business?

This is the foremost value serving expected from a testing report. If not, I look at how the data and story is presented in the report.

The bottom line is, did we mutually acknowledge, agree and understand on current capability and consequences? If not, the basic purpose of the report is not met.

Performance Testing Report

The software testing is a high technical activity. You agree or not to this, but, this is the reality.

Testing for performance is technical investigation activity. It includes the orchestrated study in correlation of

hardware, operating system, network, tech stacks & software used in SDLC, architecture, designs, certain decisions, people, business and you - the test engineer.

The fundamental in-depth awareness and knowledge of these areas is essential and a necessity to analyze the performance's aspect. The performance testing report will show this trait of you as a test engineer.

We have stakeholders who work in technical area and in non-tech area. How to compile the effective performance testing report?

There is no one way or defined way of writing an effective performance testing report. Figure out what works in context of your testing to have a effective report knowing - What Should a Report Do?

Outline of Persuading Performance Test Report

It is a technical story telling in a non-technical way with data, pictures, comparison by relating, metaphors, and contemporary history. I compose the performance testing reports in line with business targets and objectives set. I provide a metaphors to relate and know the value and cost.

At times, I will have two reports. I share it with respective stakeholders.

One with non-technical summary and conclusion
The other with technical details, analysis and data from investigation

Sometimes, I include the above two reports in one report based on the context.

In overall, this will be in minimum as part of my performance testing report to start.

What part of the system is being tested?
Why that part of the system is being tested?
Mentioning the vague performance requirements gathered from stakeholders.
Refining and precising the performance requirements to be specific, contextual and deterministic.
Who are the stakeholders of this report?

What sections the respective stakeholders to refer for the analysis and outcome?

Problem statement of the performance testing statement
Brief summary of performance testing outcome. [TLDR]

What aspect of system's performance is evaluated and why?
Brief summary of performance test carried out and outcome.

Detailed Report with Technical Details

Analysis and Technical Investigation
Representation of data which is analyzed
Identification of bottlenecks, risks, problems and its symptoms
Summary of the test's outcome

You don't have to stick on to one format or a template. Figure out what works well in your case so that the intent of your tests and outcome is understood by stakeholders. Give a structure to your report!

The performance testing reports will have metrics, graphs, numbers and proposals. The presence of metrics, graphs, numbers and the other said, does not make the report effective. Then, what makes it effective? When you call it effective? When you accept it is not effective? Only, you can figure it out to your context. I can assist you here; pull me in.

There is no good [effective] report or not good [ineffective] report. The report is either

From a team with skills, experience, trained, and practicing
From a team which is not trained, and, not practicing

Sunday, February 25, 2024

Backtracking of Testing, Security and Tools

When I started my software testing career in 2006, I was in this thought -- What tools should I use, so that,

I can do the testing that is sought after
I can test for performance
I can test for security

Moving from a search for tools to building the mindset and attitude. It is a journey! It took me time to see this journey. I hopped on to this journey in 2011. I see, this is not an ending journey, while I know where should I go and reach. I'm on this journey.

I had no mentors. I had no seniors in software testing to guide and discuss on my thought process. I had developers (programmers) who had little or no interest in testing; so it did not matter to them. But, they have helped me to be better tester. I'm grateful to them. Then, the community was not so connected, organized and share the knowledge as it does in 2024. The software testing was not considered or seen as a technical activity, then. I have stood, fought, demonstrated and delivered my testing as a technical activity. I'm continuing it.

Today, on 24th Feb 2024, I read the below question in a community's social space and decided to write this blog post.

Hey, everyone .... Can anyone please suggest a good tool for API security testing?

This question resonates in test engineers. Most of we test engineers still look and ask for tools when it comes to security testing. To test engineers, the performance and security testing are still a conception and activity with tools alone. In reality, it is not! If you are in such thought or you come across such question to answer, this blog post is for you.

Backtracking the Problem Identification

In programming, we have an approach by name Backtracking. It is about exploring in possible ways to find possible solutions for a problem. And, a best solution which works in context is picked.

What's the problem here? Testing, Security and Tools. Are you with me so far? Let us backtrack this problem.

Note: I see a difference between the words 'possible' and 'all'. Hence, I use the words "possible ways" and "possible solutions" and not "all ways and all solutions".

Bounties and Entry

There are reputed bug bounties for security testing. To get into this bounties one has to showcase her/his discoveries and skills with her/his recognized portfolio.

The tools are accessible to all. The community edition and licensed edition tools are available. We use these both editions of tools.

But, why not all of us with tools cannot get into such invited security bug bounties?

You will answer this question if you ask yourself. Hope this backtracking should have helped by now!

The Security Engineering is a vast practice area in Software Engineering. There are dedicated security engineers in role. But, we test engineers can take up the testing for the security of software systems which the team is programming and building.

I advise, a practicing test engineer

To start with building an interest for security engineering.
Consistently hone and build the mindset, attitude and skills needed for the testing the security aspects.
Pick simple problems, solve it. Do it consistently, while you explore the layers.

While this is done consistently, it is time to find the mentors in Security Testing. The mentors will assist you in practicing how to test effectively for security making use of simple contextual necessary tools. Also, a mentor will let you know how to test for security without tools to an extent. The tool is effective when known how to use it. The tools help immensely only if I can test for security.

To backtrack in a different perspective, did any tool that you use, find a P1 security problem [or risk] by itself in its scan? Did your programmers acknowledge to that risk or problem? I will pause with these two question to you.

Today, my testing for security is confined to systems that I test. I test for web application, mobile apps, web APIs, and database. I can assist here, if you do the home work and ping me.

Saturday, February 3, 2024

Database: Finding the Tables Having Specified Column Name

In today's pair testing session with a mentee, we were testing for Database I/O. We were on PostgreSQL. One of the questions a mentee had is,

How can I figure out the tables having this column name?

Running through every tables and exploring if the column being looked for is present or not, is time consuming. It is not a approach to take as well.

I went through this when I started the ETL testing practice in 2011.

Here is the query that works on PostgreSQL to find table names which has specified column name.

Query:

select table_name, column_name
from Information_Schema.Columns
where table_catalog='database_name' and column_name like '%column_name%'

It is a better approach to know the precise column name and using the condition as -- column_name='EmployeeId'.

This query should work on MySQL and MSSQL Server. If not working on MSSQL, need to look into the FROM and WHERE clauses if it is vendor specific.

Performance Testing - What to Know Before User Behavior and Traffic Pattern?

This blog post is in series of 100 Days of Skilled Testing. I see, I do not have to pick every questions asked in this series. I pick and share to which I see, I can add value.

The twelfth question from the season two of 100 Days of Skilled Testing, is:

What strategies do you use to simulate realistic user behavior and traffic patterns when conducting performance tests?

The twelfth question asked is vague and it needs to be refined for preciseness to pick it up and continue.

The Question and the Gap

I see the below are missing in the above asked question:

What aspect of performance is under evaluation?
What is the system that is being evaluated for a performance's aspect?
What part of the system is being evaluated for a performance's aspect?

Queuing? Messaging? Database I/O? Memory? Space? CPU? Client Performance? Functional Module?

Who are the users? What are their personas?
How and where the users are accessing the system?
What is the context of users accessing this system?
What is the geo location of users who are accessing this system?
How long these users are connected by accessing this system?
Are there any differences among these users in their roles and privileges in accessing this system?
Can the user access system through multiple interfaces?
Are you assuming the user is on web browser and mobile apps to access this system?
Is this system you are referring to, is a software system? Or any other system that is controlled environment like - access door, elevators, etc. ?
You are asking to simulate the user behavior and traffic pattern. Should I assume, I and you know or agree to any volume of user? And, all these users are here for the same purpose when accessing the system?
Are you considering any time or at a particular time when talking about the traffic pattern?
Are there any unrealistic users who is accessing your system? You say 'realistic user'.

Do you see that bots and non-human are also allowed as a user in your traffic?

Have you evaluated this earlier in your system?

If yes, do you have the history and data for user behavior and traffic pattern?
If you don't have, do you allow to use or have your competitor's user behavior and traffic pattern data?

What is the tech stack of your system?

What part of your tech stack, you want to evaluate with this user behavior and traffic pattern?

What is the architecture of your system?
What part of your system and its architecture is being evaluated with this user behavior and traffic pattern?
Are you running this exercise for the first time? If not, where I can refer to previous exercises?
How the interaction and events are handled from its start to completion?

What all are needed to complete the transaction in work flow?
How this transaction can go invalid for lack or incorrect data, state and action?

What is spike, drop, saturation, expected, unexpected, and average numbers in the traffic coming in?
What do you understand by traffic? Do you mean number of requests coming in?

Do you mean the being committed I/O operations?
Do you mean the response received at the other end?
What is the definition of 'traffic' in this context?

What is that you want to study and evaluate by the User Behavior and Traffic Pattern information gathered in this context?

Using the above questions, I will get an idea to proceed.

I will build a model from information I collect using above asked questions. This model we will used to further in testing for a performance's aspect. The value added to the performance test depends on this model as well. To get a better model in context, it is useful to address the gaps. From here, I start to think further.

What do you ask and look for when building a model for User Behavior and Traffic Pattern?

Performance Testing - The Unusual Ignorance in Practice & Culture

I'm continuing to share my experiences and learning for100 Days of Skilled Testing series. I want to keep it short and as a mini blog posts. If you see, the detailed insights and conversations needed, let us get in touch.

The ninth question from season two of 100 Days of Skilled Testing is

What are some common mistakes you see people making while doing performance testing? How do they avoid it?

Mistakes or Ignorance?

It is mistake when I do an action though I'm aware that it is not right in the context.

I do not want to label what I share in this blog post as mistake. But, I call it as ignorance despite having or not having the awareness, and the experience.

The ignorance said here is not just tied to the SDLC. It is also tied to the organization's practice and culture that can create problems.

To this blog post's context, I categorize the ignorance in these categories -- Practitioner and Organization.

Practitioner's ignorance

Not understanding the performance, performance engineering, and performance testing

When said performance testing, taking it as - "It is load testing"
No awareness on what is performance and performance engineering

Going to the tools immediately to solve the problem while not knowing what is the performance problem statement

Be it web, API, mobile or anything,

Going to one tool or tools and running tests

No much thinking on how to design the tests in the performance testing being done
Ignoring Math and Statistics, and its importance in Performance analysis
No idea on the system's architecture, and how it works

Why it is the way it is?

The idea of end-to-end is extended and used in testing for performance and having hard time to understand and interpret the performance data

How many end-to-end your tests have identified?
Can we test for performance to all these identified and unidentified end-to-end?

Relying on the resource/content in internet and applying or using it in one's context without understanding it
No idea on the tech stack and how to utilize the testability offered by it in evaluating the performance
Not using or asking for testability
Getting hung to most spoken and discussed 2 or 3 tools on the internet
Applying tools and calling out it as performance testing
No attempting to understand the infrastructure and resources

How it impacts and influences the performance evaluation and its data

Idea on Saturation of resources

Thinking it as a problem
Thinking it as not a problem

Not working to identify where will be the next bottleneck when solving a current bottleneck
What to measure?
How to measure?
When to measure?
What to look when measuring?
Not understanding the OS, Hardware resources, Tech Stacks, Libraries, Frameworks, Programming Language, CPU & Cores, Network, Orchestration, and more
Not knowing the tool and what it offers

I learn the tool everyday; today, it is not the same to me compared to yesterday

I discover something new that I was not aware of what it offered and exist
I learn the new ways of using the tool in different approaches

No story in the report with information/image that is self-describable to most who reads it
And, more; but the above said resonates with most of us

Organization's ignorance

At the org level, for first and to start, it is ignorance in Performance Engineering

Ignoring the practice of performance engineering in what is built and deployed
Thinking and advocating, increasing the hardware resources will increase and better the performance

In fact, it will deteriorate over a period of time no matter how much the resources are scaled up and added

Ignoring the performance evaluation and its presence in CI-CD pipeline
The performance tests on CI-CD pipeline should not take beyond few minutes

What is that "few minutes"?

Not prioritizing the importance of having the requirements for Performance Engineering

Recently, I was asked a question - How to evaluate the login performance of a mobile app using a tool "x"?

In another case, I see, a controller having all HTTP requests made when using web browser. Running these requests and trying to learn the numbers using a tool.

I do not say this is wrong way of doing. That is a start.

But, we should NOT stay here thinking this is a performance engineering and that is how to run tests for learning a performance aspect[s].

To end, the performance is not just - how [why, when, what, where] fast or slow? If that is your definition, you are not wrong! That is a start and good for start; but, do not stick on to it alone and call performance. It is capability. It is about getting what I want in the way I have been promised and I expect; this is contextual, subjective and relative. The capability leads to an experience. What is that experience experienced?

Sometimes, serving the requests by what you call as slow, is a performance. What is slow, here?

The words fast and slow are subjective, contextual and relative. It is one small part of performance engineering.

That said, let me know, what have you been ignoring and unaware in practice of Performance Engineering & Testing?

Friday, February 2, 2024

Deep Link and its Testing via Automation

I get these question consistently from my fellow testers and community.

How to automate the mobile apps and web applications using Deep Links?
How to automate the business flows using Deep Links?
How to achieve end-to-end business flows testing on using Deep Links?
How to automate scenarios in mobile apps using Deep Links?
What is the best approach to automate the mobile apps using Deep Links?
What is the best practice to automate using the Deep Links?

And, more questions on same pitch.

No Deep Dive into - What is Deep Link?

A hyperlink in HTML is a kind of deep link within a website or to another website.

Deep Link is known with different names for web, Android app and iOS app. All these names have the same understanding and intent at some point.

The Deep Links are URIs that takes me directly to a specific part (activity or fragment) of the app that I'm using or testing. The Deep Link will have an intent which tells where I will be taken on using it.

When we converse on diving deep technically into testing and automation of Deep Link, will share more insights into its internals.

Deep Link and Challenges

This question is discussed with me often:

How to do end-to-end testing using the Deep Link?

Automation of a mobile app using Deep Link poses a challenge which is not experienced in web application.

One such challenge is, say you have not installed the mobile app. [This is solvable!]

On using a Deep Link, I should be taken to Apple Store or Play Store based on the app.
I have to install the app.

Post this, in the traditional automation, I should start traversing the business work flows via GUI.
Is this adding to the flakiness aspect of automation via GUI?

When we talk so much about flakiness and how to avoid (not prevent), should we exercise business workflows when automating using Deep Link? What you are thinking? Let me know!

Scoping of Automation Using Deep Link

Back to the fundamentals.

We have to automate, no escape from it. Let us automate what must be automated!
Let us not fall into trap of "Automate everything!"

For today, I'm in this mindset and attitude,

What we automate depends on the objective or goal that we want to accomplish.

Each test should have precise and deterministic goal.

A test via automation is not an exemption to it.
A test defined in automation should be precise, deterministic and have a single objective - Single Responsibility Principle.

What is the objective of my testing via automation for the Deep Link? This define the scope and extent of my automation. This will minimize the number of checks that I do using Deep Link.

The purpose of Deep Link is to take me to specific part of the mobile app.

Should I start the end-to-end or exercising the workflow to be included in the Deep Link tests?

If included, am I not complicating the testing via automation?

Automation using Deep Link

I ask this question to myself and to my team.

What is the goal of testing via automation using Deep Link?

This question helps me to pick minimal and necessity flow actions. It has lead and leads me to define minimal tests for Deep Link based on what we want to learn from automation of same.

To me, the purpose of Deep Link is not end-to-end testing. It's purpose is,

Am I taken to the intended state and data when used the Deep Link?

I have kept the test intent to this.

With this, I have come with tests that has minimal must evaluation and assertion to learn if the app is responding or not to the Deep Link. This is what the business wants when the Deep Links are created.

The app usage and workflow function is not a problem statement of Deep Link in a general context.

Deep Link is not for end-to-end. It is to take to you from a point to another point, that's it.

Are you automating using Deep Link?

Monday, January 22, 2024

RAAMA: My Test Discovery Model

RAAMA -- I Look at You Everyday!

I have tried to put up one of my Test Discovery models in a conceptual way here with name RAAMA - Refer to, Arrange, Action, Monitor, and Assert.

Maybe this model helps you and your test engineering team as it is helping me. Use this to your context with addition or subtraction for what you are seeking.

I refer to this RAAMA of me everyday and when I'm testing. I'm finding the new learning and realization everyday that I was unaware earlier. My understanding of RAAMA is not same what I had on the previous day.

My understanding of this RAAMA is incomplete and I have made PeACE with it by accepting it. My understanding is growing and getting better everyday. I will share a better version of it as I experience it.

Each time I look up to RAAMA and refer to it, I see a new dimension to RAAMA. The awareness, exposure, and the questions are getting better giving the better realization of what I was ignorant and unaware. The RAAMA is exposing me to be a better test engineer today than what I was earlier.

RAAMA - I Look at You Everyday!

RAAMA - One of my evolving models for Test Discovery

Note: I have not explained in detail what I mean for each node and its sub-nodes. I can talk and discuss it with you if you look for it; I'm just one email away to get started.

Pages