Sunday, June 28, 2015

Log Writing, Helps!

It was initial months of my career on taking the Software Testing as my profession by choice.  I remember I was into 2nd week of this job. In first two days, I was said to read the book of William E Perry, by identifying few chapters by a Project Manager.  On reading, I was supposed to update what I have understood and have to answer the questions he asks.  He came to my desk at 3.45 PM sharp every day with a mug and coffee in it.  He was strict but he helped me a lot to practice. I believe he is the one who gave me hike of 3000 INR and it was my first hike and my salary came to 9000 INR. To add up on this, my parents had said me, 'There will be seniors and big people in office; don't talk too much or ask back questions. Respect them and make sure you take good name."

I and other tester by name Kanthraja MP worked together in lab. There was a consultant from other company and his contract was about to end. I was said to take task of automation. The next day when I came, I learned, his last day was yesterday. A lab with server slots, huge hardware systems, distributed computing systems, Windows, Linux and OS/2 machines integrated with centralized hardware.  Automation has to start tonight and it continues for whole night.  

I was looking at that tool used for automation. The company had brought it for license. Those were the QTP (initial version), WinRunner and LoadRunner days. This tool was something which I had not heard at all when I asked my friends.

The automation also runs in US lab and it will be controlled from India's lab.  I just ran the script once on one client machine and it failed. I went to Test Lead and said, I will write the program again and will use that for the over day automation run. I requested for one day time and said will resume the automation from tomorrow. I wrote the program and in evening I configured the systems with it. But, the programming team was debugging for Out Of Memory and I could not get machine to run the program. I asked my friend Kanthraja MP to run the script and wait for 30 minutes. If he sees any trouble with the automation in lab, I asked him to call me. I had to collect a BMTC bus pass in Shivajinagara bus stand, I rushed that evening.

I got a call from him saying the program fails to run. I searched for internet browsing center and asked him to mail the error log it to company email-id. Looking at logs, I learned, it was difficult for me to know what's happening. I called him again and said, I'm coming back to office. He was waiting for me and I see a tense face.

It was difficult to know what's going wrong and what's happening. I wanted to know but I did not know how to learn it nor investigate it. Then, I got an idea to print each variable value, state value and the flow where it is.  Recompiled and started the automation again. Got to know the problems and it was fixed in an hour.  Initiated the automation run on both lab and we monitored for an hour. It was going smooth and we left for day.This helped us to figure out Messaging problems implemented via CORBA, memory problems and elements which were causing them, and many other problems. 
The printing of details in the log helped us lot.

I remember it today as well. For all the buggy program I write today, still I use the verbose prints to maximum extent possible in context. It helps me to investigate and learn systems. More over it helps me to learn my buggy program, better, each time.

To remember is, even the log consumes the space on system. If it is consuming much, then tuning the log writing is necessary.  Format and preface content which helps better for context, prefixing it for each line becomes handy to differentiate. With this, I see, the testability and also the learnability of the log will be simpler.

Sunday, May 3, 2015

Fragmentations: Heuristic and COP FLUNG GUN -- Part 3

Back in this post, I have listed very few perspectives of 'Communication' factor. But it is beyond what I see or what I have seen in view of programming and testing. There is a point which is usually not considered in this Communication factor. It is device fragmentation.  It is a buzz word and it lies there most times as it is not attempted in Software Testing to understand what actually it is and what is the outcome of it. Is is just the device fragmentation the problem source?

Interestingly the behaviors noticed on a device will not be because of fragmentation in first place. But the buzz words makes it to feel it is because of device fragmentation. Going back a bit, is device fragmented or the OS is fragmented? I see both and every other factors of COP FLUNG GUN will have the influence of fragmentation and anything that comes in near future will undergo the influence of fragmentation. It indicates, this will never come down unless the user needs from technology come down. Wait, what is fragmentation and defragmentation? It is very much essential to understand because the understanding of COP FLUNG GUN factor gets better each time if this is understood each time.

In an Operating System, everything is file which in turn will be instructions and data. If this is a file system, then processing of data which will be represented as a file and will be broken into block of files. This fragmented files might be sequentially placed or may not. If not sequential then problem starts -- where disk read, disk write and wait time, all increases in Operating Context.  To improve the condition here in Operating Context due to fragmentation we initiate activity called Disk Defragmentation. Defragementation is a process of grouping the same logical content file so that they are connected in close groups and as a result the file operations by an Operating System becomes easiest and space on disk/memory and processor can be used better for other needs. If observed, the Disk Defragmentation in Operating System actually helps to overcome the impact of fragmentation in Operating Context environment. Then why cant we defragment on the mobile device?

It is like this, I tear the novel and throw its paper pieces across the streets of a city, randomly. Say, I can recognize these piece of papers on the city roads. This is fragmentation in a perspective. Now how will I defragment it by collecting and place sequentially and keep it connected logically and content wise? There are lot other factors in the city's environment -- sun, dust, heat, rain, wind, cleanliness of city etc which will disturb me to collect them together and make a novel again and not just the street which I use to commute.  The same happens in the mobile device its mobile operating system. 

The mobile device's hardware is different from each other where the same app has to run on it. Likewise the Operating System gets customized for each of this device. To relate, the novel cover which was torn into different shapes is the mobile device and the pages of torn novel are customized mobile OS.  I have to make points on each of this torn devices and OS. How? As these torn pieces miss my context and in the influence of environmental and operating context factors, it is more likely that  my points might fail or show more problems now which I cannot guess nor anticipate easily.

Few Fragmentation Factors in Mobile Device and OS
  1. Hardware Specifications -- Input; Processing and Storage; Output; Collaborating Unit and Capability
  2. Software Specifications -- Customized OS and its versions; Other software & its versions; Software updates - device software; component software; Anything that is bound to be updated or receive the updates
  3. User's Context -- Usage pattern; Environmental factors; User's choice and Locale Settings; Rooting and Jail Breaking
  4. Environmental Factors -- Infrastructure; Service Provider; Network
  5. Regulations -- Policies of devices; Polices of user location; Polices of software
  6. Application installed -- It's internal way of functioning on above identified factors
  7. Other unidentified factors -- which always exists as unknown until identified

Ecosystem Fragmentation Not just Device Fragmentation

Now it is obvious that fragmentation is not just the device. Not just the device fragmentation is all. It is also including the user who uses the device and app as well along with unidentified factors. How to defragment these fragmentations? This is the challenge to the mobile technology which keeps changing consistently with the next version of app or mobile software or hardware as it gets updated or rolled out. As a result, does it get outdated quickly leaving the little to applications installed and to the previous model (and versions) of hardware and software? Fragmented!

Fragmentation and COP FLUNG GUN

While I learn fragmentation is everywhere here, I see it is also observed in this heuristic -- COP FLUNG GUN.  Identifying it in first hand is not possible as it is highly contextual while few identity remains generic in most cases.

Further in this blog post series I will learn and mention few generic and as well the context specific fragmentation factors in this heuristic. Interesting will be how the fragmenation of  Communication is while it is not mentioned in previous post. Did I tell myself about the fragmentation of files on mobile OS and devices while I was saying how it is from external factors? That has a role too in contribution to the external factors.

Now, let me allow myself to defragment what I have got here.  It is fragmented and goes unrecognized!

Note: I keep the technical stuffs in simple relativity and examples so I can learn it anytime and share it to others who wishes to know about it. Being technical and non-technical is expressing in simple and to anyone is what I understand for now.

Sunday, April 26, 2015

Ignored OS Component Shows Itself

It was a new feature that was pushed into a production in short time - two days. I designed the tests for this and tested the same. And in parallel fellow testers tested it as I wanted their mind to see the short falls of my tests and the risks. All of sudden close to one month after a release, in one noon the test team received High Priority message indicating production problem and it was this feature.

Fellow testers took it up and switched to production environment and noticed it. I asked if it is reproducible consistently. And it was reproducible consistently. The next question was switch the machine and environment of client and boot to production environment and see if it is reproducible.

Now it was not reproducible in few client machines but it is observed in few client machines. In mean time, I observed the logs of servers for around an hour and did not see any change that I'm seeing from last one month.  I pushed the production branch to test environment and ran the sequences and noticed the problem on staging environment as well.  
This was a major clue for me. It indicated, there is a change on client machine which is with users and as well with programmers & testers which shows and don't show this problem. And, the same code.
Few walked to my desk and asked, "Why is this missed and we said to testing team it is a major change and it should be tested thoroughly." I had to say, "Wait, I have no clue for now other than first clue what I have." Testers seated around me were into silence. The, Technical Architect came in to desk immediately and said to all, this is well tested feature and he has tested each line of code of this feature and I'm very confident in it. I thanked him on behalf of testers. Then why the problem now was the question for which I had to say, "Wait for a day, I'm looking into it. It is a problem, it has to be fixed and no other way to live with it for now."

I'm in no mood or state of mind, to tell what is testing and what is quality to people because I can invest the same time in testing and do the useful for product, team and myself. I pick the stage selectively where I have to stand up and talk and when I have to ignore and continue the work.

Below are the questions I asked for the, Technical Architect
  1. Did we release other code than what we tested? I heard, "No"
  2. Are you sure that no other code commit is done in RC that went to product. I heard, "The tested code."
  3. Are you sure that no other packages and utilities are changed in this release. I heard, "Change are the same which test team is informed in labels."
  4. As said, I see no change or differences in server and the suspect is on client now. Any thoughts on it? I heard, "The same changes on client interface and no changes there."
I looked into the production release build's code commit label and all looked same and that indicated no code changes in that build which was pushed to production. I confirmed the same to teams. Now, studying the client machines, I started looking at machines which had differences and which did not have then.

There were no minor or major update information in OS and its associated information in client machine. The next information to explore was, to look the problem reproducible in client machine having lower version OS and the latest yet to release.  I went to the Technical Architect, and we had discussion about a suspect. We had silence and we decided to start working on it now.
During the Test Design time, the risk was indicated that can come to product from one component which the product makes use of in the client machine. And this component is part of the client's machine's OS which controls most of the products that will be installed on the client machine.  Reading the beta development changes and bugs of it at time of test design, it was also tested by installing the latest beta but it was still a beta and changes could come in until it's code and architecture design are frozen.
One of the major change for this component was in its architecture and design how it saves data on Client machine. But here, the product could not wait till these component get frozen and it went out by skipping the words of Test Design saying there will be no impact from this to our product.  Test Design's risk list clearly mentioned this but it was said 'acceptable risk' and the cost is known to us and we are agreed to it.
Test team communicated the design of release plan for the same seeing the risk to product but business could not wait for this. The result was observed in one month of time.
Now, the cost was still bearable but not anymore here because the user will not be able use the Client interface anymore that means no service served from server to user and user is blocked.  I continued exploring and it was already 5 hours had passed and I had lead sources but not yet isolated the cause.

Monitoring closely with much more tests, it was isolated that, component of the Client's OS machine is the cause. This was ignored saying it is acceptable earlier though it was mentioned in risk list of Test Design. But the question I got was why not all user is facing this problem and only part of user.
The Client OS vendor is pushing the change of this component to users in batch across the globe and not in one go. That batch of people whose component got updated in their OS, they doing that sequence of operation are getting blocked with the product. And the question is, will all the user do that? Of course, yes and not, it depends on mindset of the user. It was left to business to make informed decision now, ff business sees that user is important, then you make decision what you should be doing.
Learning the change in Client OS's component was not easy and identifying that is the source of problem when it was pushed as an update to Client OS.  The OS did not show any update changes but it was changed. The Test Design happens at every time when testing and it is not that it should happen at beginning. It is an ongoing activity. In this case, I took bit of time since I knew the architecture of product and Client OS communication process as the component future changes were in beta and the impact. Moreover it was a major release and one slight incorrect move can create trouble to users and business.

Reconfirmed the cause. The Client OS component's architecture and the changes it makes to Client interface, is the cause.  This was fixed in product to handle it and it went to production in a day.

The learning I shared to teams and my fellow testers from this Test Investigation
  • We design tests always when we test but also doing it by learning the context of changes happening outside the product, it always helps.
  • There is no good and right time to design test. Every time is time for designing the tests. In this context, it needed a chunk of time as it was a major change to product in short time. Hence I took time separately out for it.
  • We test to learn the change and its impact as well. Learning changes around our product is always useful.
  • Knowing the system architecture and supporting platform architecture is an advantage in testing.
  • Maintaining the record list of changes in system and associated system is useful no matter if it is a simple or complex software.

Saturday, April 25, 2015

Blog Framing the Isolation Tests and Test Investigation

The unusual behavior in the software and in it's associated systems is easy to identify at times. But the source which brought in that unusual behavior in software or in its associated system will not be straight to learn and isolate it. When this happens in a critical time of the release or after the critical release, imagining the uneasy conditions of stakeholders and user base using that software product, can be scary to business. No stakeholder wants to get here. Testing helps to isolate such behavior when it is supported.

Last night I joined my friends who have built a product in their start up and wanted the Release Candidate to get tested for their critical release. While I was testing along with the testers, I was asked, "there will be no problems with this release, Ravi?"  I can understand my friend's concern and I said, "problem will be there, let us keep the calm to receive them and fix it and I don't know what problem can come up apart from the context we have tested now and in the context we have tested as well.". We learned a problem and moved further to test for it before they wanted to push the patch which had fix for a problem -- because the zipping of the content was not happening when data is sent from server to client.

After the test sessions, I went along with team for food. I asked myself "why I should not work with the testing team here and tell what I tested?"  Most times, I work individually on tasks though I'm part of team. I keep updating the team what testing I'm doing and about the tests. This helps me to see how could I have done it better and have I missed any test that actually mattered in the context.

I started sharing with testers what I did while we had food.  During this talk, I thought let me write such tests that are about isolating the cause of the unusual behavior. By keeping the confidential information not coming in the blog post as much as possible to context, I will write such cases from now.  From the next post, such Testing stories will be labeled under 'Investigation' label of my blog.

Thursday, January 1, 2015

An existence way before I saw it !

I want to write the technical practices at which I'm failing and practicing. And the practices which I want to do for the betterment of my testing. Sometimes, I will get into writing as this as well because I want to challenge myself with what I'm seeing and interpreting about it.

This "Year in Review" app in last two weeks of Dec 2014 from Facebook gained inertia from FB users but it did not gain the momentum in field of Engineering to learn what it is -- technically and for the people who use it.  This is not wrong, it is right. Because, that is the way it is used. Until there is a way of figuring out other way of using it or accepting it, the same way will be used, more likely.

I do say 'yes' for words of Eric Meyer's in the way he see it. He is very right in that perspective. I do say 'yes' for words for who said "Year in Review" as spam. They are right because they happened to see it as spam. So, what's wrong in it? Nothing! I do say "Oh, yes!" for said it is a bug because that looks as a bug in that particular context for them. So it makes very logical at that instance of time.

While I do all this, I did not happen to learn what it is at all. For, Eric Meyer it was the design and how the algorithm picked photo irrespective of the context in which the photo was posted and sewing it as happy moments of the year.
But did it have customize option to remove the content and place the self picked and written content, at that time? That's the question which I want to learn. If it wasn't, then it is a bigger problem that any designers want to advocate for.  If there was an option to customize and remove and then add photo with content, then I see it provided an option. Parsing the technology data is easier and FB does that. But parsing the human emotion is not easier because we human's don't understand emotions of others. Then, how the algorithm written from a human can?
Where's the problem here if at all it exist? It exists way before FB saw to release this app and its users started to see it; while few started using it unhappily and happily.

I read this being addressed as Spam. Yes, it is spam if one see that way. But there are lot other spam with which one leaves everyday. It goes unnoticed and why this go so much attention? Did few accept it because someone said it and because someone said it should it be accepted? I believe, the UX team in FB would have exercised this app and survey would have gone for the analysis, way before it went out live. If not, this is something FB has to consider having the in-house UX team. Recently, when I met the VP, UX Practice for FB, we discussed and I learned how the stuffs get exercised there.  
Okay, what amount of its users considered it as spam while other as a source to express their joy with others? This is the number which might give different story for 'Spam' word attached to this app. Whatever, it way existed before one considered it as spam, because there were lot other which also have same attributes but used without labeling it as 'Spam' in FB. Isn't it?  Common life example, "Hi", "Hey!", "Good Morning", "What's up?", "Take Care!", "Good Day", "Good Night", should be considered as spam now because they appear from whom we meet whether we love or care for them or not. Isn't it logical? Then why this app got this label for appearing on screen. I need to learn this to help technology be better so I be happy.  This also hints, thought users are on FB since years, one might not have explored the inside of FB Settings with their account. And it is not logical to ask as well, because socially FB is not a technical product it is a social media. So, can I ignore it? That's the question at other level I want to educate myself and FB might have to think. Hopefully FB might not get into this though they send out a mail on update of their policies and technical changes to the policies.

Its a bug, an algorithm bug which did not understand what to add for saying "wonderful year of me". Can the friend or spouse who is living with me, can say, I had beautiful life for that day or for this years? May be or may not be. If yes, how consistent?  Expecting consistency from algorithm is reasonable but expecting algorithm to understand something I did not anticipate myself, is reasonable? That's the limitation of software and technologies. And, of human too but at that point of time only. Further she or he will learn and update self and technologies and way to use the product of technology until next bug is figured out.
In this view, it can be classified as bug. But, that does not help me to learn technology. This is the question which I ask myself when something I see or suspect as bug -- "When is this not a bug? Pick out the cases with claims technically, socially and contextually." I ask the same for fellow testers with whom I practice. Because, saying it is a bug is right, because it is annoying her/him. But, that is not the solution or solving the problem. If I want to say, I'm a tester, I want to learn "why it cannot be a bug as well, in first place before going with information and saying this is bug under this context." 
If it is a bug in a context, what was the design doing to avoid the problem or annoyance the user can face? That's the question I get and how well the users are learned to see this in the app. This takes altogether different survey for which FB has to spend dollars or may be million dollars. And it is up to them. And it is up to one who say it is bug for me or no harm in it.  Harm comes when both does not understand how to overcome with this.
It all existed way before, one saw and say it in the "Year in Review" app. It continues to be there as it is because I see same pattern of jumping into conclusion (from practicing practitioners and practitioners who have started to practice) before learning what it is. I feel strongly, one should be triggered why it is a bug and more importantly when it is not a bug from technical perspective. This helps to solve in quicker pace which is highly ignored in technical stream and studies. May be for this, there was a subject Operation Research in existence but still we do not apply it effectively in Engineering Operations or be any other operations.

To close it here, it is an existence way before I saw the comments as -- spam, bug and design flaw. In contexts, all these fits in and also none of these fit. I learn why it does not fit in, because it will help me to learn quickly and before thinking of solving it.  I request below to my fellow team members and practitioners.
"If you or your team member or friend said it is a bug or problem or anything that is troubling, think if adding one more question to it helps or doesn't. And the question is, when is this not a bug or problem or the annoyance? This is the wonderful gift as well which can be gifted, if asked so."