Showing posts with label Cost. Show all posts
Showing posts with label Cost. Show all posts

Tuesday, January 7, 2025

Is Software Testing a Cost to Business?


How quick one gets to have the awareness of the cost and value in the trade, it will be of help.  I consistently exercise the below questions in my practice.

  • What are the Values added from my work to the business?
    • How do I benefit from this?
  • What are the Values removed from my work to the business?
    • What does it bring to me?
  • Who is not getting benefited from the Values I'm adding and why?
    • What are the loses and its impact?
    • What are the benefits and gains we are losing?
  • Who is getting benefitted from the Values I'm adding and why?
    • What are the benefits and gains we are making?
  • What are the Costs added from my work to the business?
    • What are the impacts?
    • How does it impact me?
    • How does it benefit me?
  • What are the Costs removed from my work to the business?
    • How does it benefit me and what do I gain?
    • How does it benefit the business and others?  What do they gain?
  • Who are all experiencing the Costs from my work and why?
    • What are the pains from these costs?
  • Who are not having any Costs from my work and why?
    • Do they need anything from my work and its value add?

This exercise will help me to know the expectations of stakeholders and business. I align myself to deliver the expectations as I exercise these questions.  

The cost and value from my software test engineering work will have impact and influences my growth with the benefits that I will make.



Is Software Testing a Cost to Business?


I witness the discussion which says testing is a cost in software engineering business.  But, I do not hear the same statement on development.  This made me to think, why?

Here are my understanding on thinking over it for now.
  1. Software development is not about just programming.
    1. When said development it includes every teams and their work.
    2. Programming and Testing are parallel activities in the software engineering which helps each other.
    3. If testing is a cost, then for sure, programming is also a cost!  It is an associated activity.
  2. In business, every activity and investment on them is a cost in multiple ways.
    1. These are evaluated costs and being taken for a purpose in expectation of returns.
  3. Have I come across anyone saying Automation in Testing is a cost, like I hear for testing?
    1. It is seen as investment and necessity. Why?
      1. May be, because, in a belief if [once] automated, that is sufficient enough it goes on for itself without human intervention.
      2. If this is the thought, do you think we are doing it wrong?
  4. Have I seen the discussion and statements which concludes testing for performance and security is cost?
    1. If yes, how?
    2. If no, why these two are not seen as cost?
  5. Serving and well maintained engineering system is a trade-off
    1. I see, we engineers and business choose what to trade-off
      1. This decision can be on logical and non-logical basis; but, there will be a trade-off
      2. Each trade-off has costs and values
      3. What should I trade-off in the software test engineering and why?
      4. What should I trade-off in the software engineering that we are doing as a business?  Why?
  6. When one is offered a offer letter from a business, there is a CTC mention.
    1. CTC means Cost to Business
      1. Programmer has a CTC
      2. Tester has a CTC
      3. Every others in different teams have a CTC including the CxOs
      4. We all [and our job] are cost to a business, who can add value and get the high returns


To end for now, I learn, everything and anything business picks up is a cost.  Because, there is an investment on it.  If there is no investment, is there anything happening or progressing and changing?

I do not want to get into discussions which says software testing is a cost.  I see such discussion is lacking in understanding the software engineering and intrinsic for first.  Software Development is not all about programming; the programming is one small part of it.

There are different teams that work together in collaboration to develop and consistently delivering the usable software systems.  In this process, everything in software engineering business is an invested and evaluated cost in the interest of returns and values that are expected.

It is time to know and ask "What way of testing and its outcome is a cost?" than saying and listening to -- 'testing is a cost', which is nonsensical.  This holds good for any activity in software engineering.  

Any serving engineering marvel is a calculated and evaluated trade-off.

Friday, July 26, 2024

My First Hand Analysis of CrowdStrike Falcon Update Incident


I attempted to analyze the process dump of CrowdStrike shared by my friend.  He said, there could be an attack which is leading to crash of Windows OS globally.  This made me curious to look into the dump and learn.

I had no much context around it, but, a test engineer in me did not sit quite.  I started to analyze the dump information.  Here is my first hand analysis that I made on 19th July 2024 post 10:30 AM IST.


What I Saw?

  • It is a Windows OS's process dump.
  • Looks like something with C or C++ application reading how the memory offsets were in the dump.
  • It started to read a memory offset.
  • Then the process witnessed an exception.
    • Here the program could not read further
    • Why it could not read further from this offset?
      • My little experience of testing drivers on Windows OS for a card printer machine, refreshed and recalled what I had witnessed when testing.


Scratching and Striking My Mind


I started to ask these questions myself while I asked what could have gone wrong.  I could not stop here as I was curious what led Windows machine crash.  I referred to web and learn there was an update by CrowdStrike, and then this incident.

The bugs do exist in every software no matter the level and depth of testing, automation and engineering's excellence.  All software do crash and OS is not an exception to it.  But, what made the update to crash the Windows OS?  Pointing and blaming CrowdStrike or Microsoft is not a way for the practicing test engineer.  If these two organizations are serving its huge customer base, they have something working and reliable.  Engineering does not eliminate problems.

By now, I had a thought that it is not an attack.  It is a software bug!  Where is the bug?  What is the bug?  Was it not experienced in pipeline?


The Open Ended Questions


I had these questions as I analyzed and spoke to my friend.
  • What is Falcon?
  • What was this update to Falcon?
  • How frequently the updates are rolled out?
  • How the updates are rolled out globally?
  • What pipeline do they have in testing?
  • Who is impacted the most in business? Is it Microsoft or CrowdStrike?  Impacted in what way?
  • What is CrowdStrike?  What they do?  Who are the customers?
  • Where do the CrowdStrike's Falcon sit in the OS and what it does?
  • How CrowdStrike works in the machines and what it offers?
  • What do the dump say? Relook into it with different perspectives.
  • How this could have been prevented?
  • How will I prevent this if I join this team knowing this incident?
With these questions, I started to analyze the process dump which was shared.

I had more such questions, but these were the first few that I crossed as I started.



Analysis of Process Dump


My interpretation, tells me the below for today
  1. Accept that it is an incident as any other incident which I witness in production environment.
  2. Do not fall to the speculation happening around.  Remain calm and focus to interpret and understand your exploration.
  3. I see, if it can start to read from an offset and then ending to experience a non-existent or invalid offset, is it a NULL Pointer?
    • What is NULL Pointer?
      • A NULL Pointer is a pointer that does NOT point to any memory location and hence does not hold the address of any variables.
      • If I do not initialize and assign, the pointer will have NULL as its value.
      • For example, int *test;
        • When I want to access the pointer test (a location in memory) pointing to, I will not be sure what is in the pointer when I read it.
          • I may not set it later or set it.
          • In this case, the code can tell if the pointer is valid or pointing to a garbage memory
        • But, if I declare it like int *test = NULL;
          • I can check if was set and initialized
        • It is a better practice to assign a NULL value to a pointer during initialization so that we can check if it is NULL or as any address assigned to it.
      • This understanding of Pointer makes me think, is it due not initializing a pointer and so the error code c0000005 on reading a memory that is not valid.
      • When we assign a NULL value to pointer, it is a null pointer in C++
        • We assign null value for testing and asserting
          • If the memory is allocated to a pointer or not
          • If it has a return address and is a valid one or not
          • If a pointer is not initialized, assigning null it prevents problems to certain extent
    • With this understanding, I also read, it started to read from an offset 0x9c, and then failing.
      • What is 0x9c?
        • In Octal it is 234. In Decimal it is 156.
        • Can there be such address in a computer's memory? I don't know.
        • If it is a access violation, then is it a memory which is in preemption of the OS?
          • If so the OS can terminate the program or process which is trying to access it.
          • Is this killing the process and aborting the operation of Falcon's IPC and eventually Windows coming to BSOD?
      • This tells me it is not a NULL Pointer in first case but not initializing a pointer to NULL.
        • I infer, if the pointer was assigned to NULL, that is initialized, there could have been some hint in the state and event when accessing the memory.
          • This is my analysis; but, I have not seen the test code nor aware of the product.  All this inference is based on the process dump and my experience of testing drivers.
      • It got something in between from update (a config or pattern?) for which it cannot find and read in the memory?  Why?
        • This indicates me, it could be a bug, that is, a logical problem.  This is my hunch for today!
  4. Data in the dump
    • Exception Address
    • Read from Address 0x9c
    • Exception Code: c0000005 (Access violation)

Testing my Interpretations


CrowdStrike as an org when it caters its SAAS to such a customer base, won't it have a testing pipeline
  • It will have, I have no doubt in it.  They test and roll out the updates, I believe in it.

Did they witness any such incidents earlier?
  • I searched on web for it and I did not find something similar on the Windows, earlier.

Is this a NULL Pointer?  Are you sure?
  • No, I'm not sure.  But, there is something that is leading it to address which does not exist or which is invalid?  I will have to wait for their RCA to know technically what caused this.  But this is my understanding reading the dump.

How do you think it is a memory access problem?
  • The error code 0xc0000005 says that.
  • I referred to driver easy website for the information because my experience of testing the drivers for Windows OS and experiencing such incidents led me there.  This is what I learn:
    • https://www.drivereasy.com/knowledge/solved-how-to-fix-0xc0000005-error/

Do you think the programmer would not have handled the obvious Pointer and NULL initialization?
  • I believe there will be a check for Pointer and what it is pointing to.  But is it due to no initialization?  Technically this has to be analyzed which I cannot do.  I will have to wait for CrowdStrike team to share the tech details.

Is this a driver problem that killed the Windows kernel?
  • I don't know.  But, the .sys file will not have driver as per my learning.  It will have information about the drivers and any configurations.
  • This incident is a problem, which impacted both CrowdStrike and Microsoft.  Maybe, both will have their areas to look and fix it they see so.  But, in this context, CrowdStrike can fix it quicker and that is much better -- is what I understand.
  • I'm a Windows user for long time.  I see, Windows has worked well to all my contexts so far.  The Engineers of Windows OS knows better than me here.  I'm not well aware and informed as they are.
  • CrowdStrike's engineering team are skilled and they are rolling out updates often in a day.  They have a better pipeline when this is being done.
    • But, the question I have is, how did this happen?
    • No one lets such problem into production when they are aware of it.  Do you?
    • There is something that has not come to their observation and experience.  What is that?
    • Knowing this will help to prevent this and similar incidents happening in future.
      • I'm waiting to know what did not come to their experience and led to this incident.

What could be in the .sys file of CrowdStrike?
  • I don't know!  I want to learn that.
  • But, from my testing of .sys file and drivers on Windows OS, I learn there could be a configuration details with certain pattern or information to capture at run time, and help the installed software to run.  This is my learning and awareness from my testing.
  • That said, testing at OS level and Anti Virus engines are not obvious.  Testing of drivers is like the risky mines.  What is sufficient and good enough in test coverage?  It needs an expertise at OS internals level.
  • Windows OS having such a fragmentation in its versions, updates and patches, it is a battle field and mines for engineers building such solutions for sure!
  • I learn, the Windows OS stopped when an application tried to access the invalid region or non-existent memory.
    • The update which was rolled out, did it have a configuration or a pattern that showed a logical problem when processing it?
    • I have such questions and thoughts that are striking my mind as I think and build a problem model for the same.

Is this a race condition incident?
  • I see, it is not a race condition incident as users across globe experienced it.

Is this specific to a Falcon version, OS version and hardware?
  • Not all host machines would be on latest version of Falcon, is my presumption.
  • At least, n-1 and n-2 versions should be on host machine which experienced this behavior.
    • So it is not a Falcon version specific, I see.
  • It looks to me as it is not specific to the Windows OS version and hardware configuration.
    • It is an application software problem which occurred at driver level is what I see.
      • This is an IPC communication and process is my understanding.
        • The driver can receive the IPC communication in continuous mode.
        • At times, this can get queued based on the application and what it does.


Where is the Problem?


Well, I'm looking and pulling from my visualization by relating with my experience of testing the driver on Windows OS.  I don't know the exact reason or close enough to tell what could have gone wrong.

Reading the process dump, it says accessing a memory that does not exist or corrupted.  One of the high possibility is, the starting offset is seen but it is not helping when reading.
  • For example, Ravi has the address of India's Prime Minister house.
    • But, he does not know from where to start despite having the address.
    • He is void and null in knowing where to start and what to do when he is not initialized with the start location to begin the travel to the Prime Minister's house.
    • In short, he do not know where the address is pointing to and what it has, though he is given a address to start.
      • Can he access the Prime Minister's house premise without any access granted and authorized to do so?
      • If not, won't he be arrested by police or other security forces and stop him?

Do I Know the Precise Problem?


I don't know!  I do not know the CrowdStrike product and platform.  I'm waiting to read the technical details from Crowd Strike.

I see, it comes to the data, state and event.  I would focus on how to prevent it learning which data, state and event led to this behavior.  I think of figuring out the Test Design and Strategy that can help me to identify such use cases.  I focus here and see can it brought into the automation so that it gets exercised and regressed consistently.

If it is due to the memory access that had a problem, I did such tests when testing driver for a hardware machine on Windows OS.  I will share the tests that I did in upcoming blog.

I wrote the technical analysis from process dump to CrowdStrike and Microsoft.  I did not get a response.  Anyways, I'm sharing the overall information in a non-technical way so that it is consumable to most readers here.



Note: Here are another threads of me sharing my thoughts on same:
1. https://x.com/testingGarage/status/1814215089525821763?t=XSFdx69ElL0ZmBOcEFrTjg&s=19
2. https://www.linkedin.com/posts/ravisuriya_%3F%3F%3F%3F%3F%3F%3F-%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F-%3F-activity-7221156949445206017-oeRa




Friday, November 4, 2022

My Work, My Fit, and Company's Goals

 

I, My Role and Expectations


At least once a day, it is useful to think about oneself.  I started doing this late in my life and career. I started doing it in recent years.  If I do not think about myself, I will be lost very soon.  This is not selfishness; it is self-care, which is what I'm learning.

It is essential for me to think about myself, because:

  • It helps me to see what I'm
  • It helps me to see where I moved today
    • Does this move help me personally?
    • Does this move help me professionally?
    • What benefits does it bring me?
    • What benefits does it bring to those with whom I associate and work together?
    • Does it keep me in sound mental and physical health?
    • Did I learn today?
      • Something new?
      • Anything I refined and unlearned?
    • Does it bring any costs and cons to me?
  • Am I fit?
    • Fit to where?
    • Fit to what?
    • Fit? How?
    • Fit? Why?
    • Fit? When

In all the roles I take in my personal and professional life, I'm evaluated at some point in time.  I will be judged for:
  • Did I fit?
  • Did I do my role
  • Did I meet the expectation?

The problem is not that I'm evaluated.

The problem is I'm evaluated without saying what makes me fit to be in the association and how I will have to meet the expectation.  Some associations can remove us while some cannot.

When I say this, I want to say this -- the word family is often misunderstood; not all associations can be a family although the word family is used often in associations.  This is reality and not a fact!

Does family eliminate me if I do not fit in?  I don't know!  At least the hope is, family is where I can be myself; without the thought of me being judged and evaluated for what I take and bring to the family.  My family as well have expectations from me in the different roles I live in with them.

When I can see this in my family, why don't I see this in the place where I work together with other people?

Do I fit in here for what I make out of this place (company) and take it to my family, home, and my life?

I wish my home and school had helped me learn this question early in my life!  I expect it now because I realize the "value of fit", now, that is, after I graduated and started to work with people in the organizations.

I consistently learn that every one of us is replaceable in any association, be it family or a workplace.  And, it moves on; it does not stop.  If not replaceable, it is manageable to continue and move on.

When we are in the association, how fit we are so that it is hard to replace?  Maybe that's a price (value) tag and a necessity of one!



The Response, That I Should Evaluate


As a responsible colleague and team member, I promote the discussion of this question at least once a month.  I ask this question to whom I report at work.

I will have this question in every one-on-one catch-up that I will have with my reporting manager.  And, I expect a response to this question and want it recorded for future reference.

What is that question?

How does my work fit in with the company's goals?

Evaluating the response:
  • How do I evaluate the response to this question?
  • What should I do on the evaluation of the response?
  • Why should I evaluate the response?
  • What should be my next course of action?
  • After all, what is my response to this question and how do I evaluate it?

To get promoted to the next roles,
  • I need to be solving the problem of my higher (or next) role
  • I need to have the capability (skills) to solve problems of my higher role

But this is not a question of promotion.  It is the question of being fit for the company's goals.

While I get promoted or to be promoted, my work may still not fit with the company's goals.  Identifying this early helps.

I have learned, sometimes the promotion does not necessarily come with the fit for the company's goals.  But then eventually the fit will be evaluated at one stage by someone in the company together with a promotion given.

This has led me to ask this question consistently and then evaluate the response with the business, political and rational mindsets.

I say the same to my team, that is, ask this question for yourself and to the reporting manager.  Evaluate the response that comes to you.

Should you ask this question to your reporting manager in each month's one-on-one catch-up?



The Fit Equation Changes


In the team and company, we believe:
  • We are contributing
  • We are a value-adding fit type

We keep saying to ourselves how we make a great fit and difference.  Isn't it?

This "fit equation" keeps changing every day or quarter or year or appraisal cycle.  I learn, this "fit equation" keeps changing rather than evolving.

Adapting to this consistent change and delivering is evolving.  This is my understanding for today.



Biases, Communication, and Problem Solving


We all are in biased mindsets and perceptions at any point in time.  The people in the company need help to break these mindsets so that one's fit equation is questioned and assessed regularly.  In my opinion, this is a great assistance that a manager and a leader can give to her/his people.


I expect the managers and leaders to ask the company:
  • What the company wants from the people?

We people in the company and in the team, let us ask the manager and leaders:
  • What the company expects from me?
  • What is my fit equation?
  • Does the current work that I'm delivering fits the company's goals?

I have heard most times from people saying, "I was said that I did not fit with the company's goal".


How will one know what is the company's goals and how can one align with them unless it is communicated and recorded professionally?  I see, to start it needs communication, clarity, and affirmation first from both ends.

Does this solve the problem?  No!  It gives an onset to understand the problem and the differences to fix.  With this, the manager and leader can help the team, and vice versa, in solving the problem.  Thereby contributing to the company's goals by aligning with them.

If you are a manager or a leader, make sure you have this as an essential practice in monthly catch-up to assess this fit equation and let know your people.  I love seeing this initiative from managers and leaders.

This is one of the leader's fitness to be in the role to assist people and the company.  By doing so, we will help the company, business, employees, investors, and customers.

To reset this post's intent equation:
  • How the work expected from me fit in with the company's goals?
  • How does the work I'm doing fit in with the company's goals?


Wednesday, September 18, 2019

Communication, has explicit and implicit messages! Have you got them?



Besides all the technical work the testing team and a tester do, there are times the tester and her/his testing will take a hit.  Here is one such hit and how I solved it.  When working in an organization which is building a product, usually there will be multiple teams involved.  Likewise, multiple people on influencing and making the decision about the product, development and shipment.  Any one miscommunication here and the slipping of time, the testing team gets its time squeezed and cut, most times isn't it?  I have experienced it.


When I looked into what was the problem here and to solve it, I learnt these:
  1. When decision makers communicate, they can have two types of communication being conveyed for teams. 
  2. One is explicit communication - which is verbal and written.
  3. The other is implicit communication - which is neither verbal nor written; but it is expected that it has to be understood by teams.

If the testing team don't catch the implicit communication, what could be the impact in the time given to testing?  It depends on magnitude of the problem!  Most times, the explicit communication made in a context as well goes misinterpreted by the teams.  If you are not part of that teams which misinterpreted and did it right each time, probably you had a better and standing leader in your team who solved the communication problem.

A simple heuristic here for the testing team and others who involved in this context -- How one can misinterpret what I said? Did I misinterpret in what is said?  One can be not that sharp in communication and interpretation, say.  But that number of team members with varied experiences level, are same in that skill?  Could be yes, then we have to help team; if not, then where is the problem?  Here the question is about solving and proceeding further so team can be of help to each others.

I lead the projects and testing delivery while I was working in Moolya Software Testing Pvt. Ltd. Having lead 55+ projects for different customers and its deliveries, in this role, I had to communicate with external teams (programmers and testers), internal teams (i.e. team within Moolya), sales team, recruit team, human resource for skill development programmes and management team (at customer place and in Moolya).  Especially working in the Startup projects will give the lessons very well, at least I have seen it for me.  Note here, if I had one misinterpretation with customer's communication and passing it to my teams in Moolya and to management, how impact that would be? Is it a big cost?  I did misinterpret in initial stages; but then when I observed it, I made sure, I will have to minimize it and keep it to zero if possible.  I worked on it and assisted the teams I was leading to practice it.  For example, say a feature and release date said by customer explicitly and I misinterpreted it. Further customer too assumed I got the message.  Were there any implicit messages here, which customer assumed that I have got it?  So do I assumed that I got it?  Once I found that I was struck with problem here in a project, I fixed it in me for first.  Note, my Moolya is always close to the tester in me!

Here is how I worked on it and continue to work on it. Despite this practice, at times I do still fail and I know I'm human and so are others.  All I check is what is the cost and value of what I did here!

Whenever I'm certain that I fully and completely understand the other person and teams without the benefit of clarifying questions, I ask myself this question -- If I weren’t absolutely, positively certain that I fully and completely understand, what would I ask?  This question helps as a trigger heuristic to know more about my understanding.  But will I do this each time?  No, I pick it in contexts which hints me something is missing. For example, when there is confusion arising in me or in teams; in the time of where major decision leads are being considered in decision being made.  If none of the team have confusion or questions, I still revisit on this and ask teams for saying what is been understood.  How I ask team members is another way of doing it. I will not get into that details, you see that's another challenge and problem to solve. Let me keep one problem here and focus on its solving.  If you said, who will do this when there is no time to do task which I have been assigned, fair enough.  In my role, I will have to do that until I get confidence -- this team or these team members can handle themselves in the situation by questioning and asking for clarification.  My role is not just to lead and deliver the projects.  I'm responsible and accountable for my team members skill development too - which is implicit expectation of an organization though it is not communicated in offer letter or in promotion letter or in promoted role.  If you don't agree or you say I should not be doing this, fine!  One day, when it comes and hits to you or to your team and organization, probably that is the whiteboard day.

I keep in mind that it’s in situations of absolute certainty in which I'm sure I understand and I say it to myself -- I'm most likely to misinterpret. I make it my responsibility to clarify my own terminology to ensure that the other person or team members understands me. As I provide clarification, I ask these questions to myself:

  1. What assumptions might I be making about their meaning?
  2. What assumptions might they be making about my meaning?
  3. How confident am I that I’ve exposed the most damaging misinterpretations?

I see sometimes, people getting annoyed!  But my testing career experience have shown me, the decisions most times goes not conveyed well enough explicitly.  Later it is said, isn't that mean same or no change or there is a change, i.e. it is implicit.  Later, when taken implicitly and proceeded, I was asked why did I do that.  Now should I say -- that isn't implicit as last time?  Then answer is NO.  I decided that I should be avoiding this mistakes and I started to practice -- interpreting and getting clarified it, though few feels annoyed.  I see value here and cost of being catastrophic happening is high if went on assumptions for the implicit meanings that is not literally communicated.

The best solution for me is simply to try to heighten my awareness of the potential for misinterpretation.  I probably can’t catch all communication problems.  But if I do the best I can, and don’t allow myself to feel too rushed or too intimidated to ask for clarification, I should find that I'm in sync with the other person(s) or team(s). If I'm not, it’s better to find out early on, rather later when the consequences could be catastrophic.

I make sure, I ask questions and get it clarified.  I do insist on communicating verbally and in written about the explicit and implicit part of communication and clarification.  I hope this will be helpful to testers and others who work in a team and with teams to deliver the shipment on time.

To end this post, see this is not part of any technology stack, automation, testing, and roles.  It is about how we solve the problem that blocks the core problems which we are solving.  If you happened to see a value in it, do talk a word about it with your team member.  A change in team member or in you, can help your organization, your team, your product, your customer, your promotion and your salary.