Sunday, May 3, 2015

Fragmentations: Heuristic and COP FLUNG GUN -- Part 3


Back in this post, I have listed very few perspectives of 'Communication' factor. But it is beyond what I see or what I have seen in view of programming and testing. There is a point which is usually not considered in this Communication factor. It is device fragmentation.  It is a buzz word and it lies there most times as it is not attempted in Software Testing to understand what actually it is and what is the outcome of it. Is is just the device fragmentation the problem source?

Interestingly the behaviors noticed on a device will not be because of fragmentation in first place. But the buzz words makes it to feel it is because of device fragmentation. Going back a bit, is device fragmented or the OS is fragmented? I see both and every other factors of COP FLUNG GUN will have the influence of fragmentation and anything that comes in near future will undergo the influence of fragmentation. It indicates, this will never come down unless the user needs from technology come down. Wait, what is fragmentation and defragmentation? It is very much essential to understand because the understanding of COP FLUNG GUN factor gets better each time if this is understood each time.

In an Operating System, everything is file which in turn will be instructions and data. If this is a file system, then processing of data which will be represented as a file and will be broken into block of files. This fragmented files might be sequentially placed or may not. If not sequential then problem starts -- where disk read, disk write and wait time, all increases in Operating Context.  To improve the condition here in Operating Context due to fragmentation we initiate activity called Disk Defragmentation. Defragementation is a process of grouping the same logical content file so that they are connected in close groups and as a result the file operations by an Operating System becomes easiest and space on disk/memory and processor can be used better for other needs. If observed, the Disk Defragmentation in Operating System actually helps to overcome the impact of fragmentation in Operating Context environment. Then why cant we defragment on the mobile device?

It is like this, I tear the novel and throw its paper pieces across the streets of a city, randomly. Say, I can recognize these piece of papers on the city roads. This is fragmentation in a perspective. Now how will I defragment it by collecting and place sequentially and keep it connected logically and content wise? There are lot other factors in the city's environment -- sun, dust, heat, rain, wind, cleanliness of city etc which will disturb me to collect them together and make a novel again and not just the street which I use to commute.  The same happens in the mobile device its mobile operating system. 

The mobile device's hardware is different from each other where the same app has to run on it. Likewise the Operating System gets customized for each of this device. To relate, the novel cover which was torn into different shapes is the mobile device and the pages of torn novel are customized mobile OS.  I have to make points on each of this torn devices and OS. How? As these torn pieces miss my context and in the influence of environmental and operating context factors, it is more likely that  my points might fail or show more problems now which I cannot guess nor anticipate easily.


Few Fragmentation Factors in Mobile Device and OS
  1. Hardware Specifications -- Input; Processing and Storage; Output; Collaborating Unit and Capability
  2. Software Specifications -- Customized OS and its versions; Other software & its versions; Software updates - device software; component software; Anything that is bound to be updated or receive the updates
  3. User's Context -- Usage pattern; Environmental factors; User's choice and Locale Settings; Rooting and Jail Breaking
  4. Environmental Factors -- Infrastructure; Service Provider; Network
  5. Regulations -- Policies of devices; Polices of user location; Polices of software
  6. Application installed -- It's internal way of functioning on above identified factors
  7. Other unidentified factors -- which always exists as unknown until identified

Ecosystem Fragmentation Not just Device Fragmentation

Now it is obvious that fragmentation is not just the device. Not just the device fragmentation is all. It is also including the user who uses the device and app as well along with unidentified factors. How to defragment these fragmentations? This is the challenge to the mobile technology which keeps changing consistently with the next version of app or mobile software or hardware as it gets updated or rolled out. As a result, does it get outdated quickly leaving the little to applications installed and to the previous model (and versions) of hardware and software? Fragmented!


Fragmentation and COP FLUNG GUN

While I learn fragmentation is everywhere here, I see it is also observed in this heuristic -- COP FLUNG GUN.  Identifying it in first hand is not possible as it is highly contextual while few identity remains generic in most cases.

Further in this blog post series I will learn and mention few generic and as well the context specific fragmentation factors in this heuristic. Interesting will be how the fragmenation of  Communication is while it is not mentioned in previous post. Did I tell myself about the fragmentation of files on mobile OS and devices while I was saying how it is from external factors? That has a role too in contribution to the external factors.

Now, let me allow myself to defragment what I have got here.  It is fragmented and goes unrecognized!



Note: I keep the technical stuffs in simple relativity and examples so I can learn it anytime and share it to others who wishes to know about it. Being technical and non-technical is expressing in simple and to anyone is what I understand for now.


Sunday, April 26, 2015

Ignored OS Component Shows Itself


It was a new feature that was pushed into a production in short time - two days. I designed the tests for this and tested the same. And in parallel fellow testers tested it as I wanted their mind to see the short falls of my tests and the risks. All of sudden close to one month after a release, in one noon the test team received High Priority message indicating production problem and it was this feature.

Fellow testers took it up and switched to production environment and noticed it. I asked if it is reproducible consistently. And it was reproducible consistently. The next question was switch the machine and environment of client and boot to production environment and see if it is reproducible.

Now it was not reproducible in few client machines but it is observed in few client machines. In mean time, I observed the logs of servers for around an hour and did not see any change that I'm seeing from last one month.  I pushed the production branch to test environment and ran the sequences and noticed the problem on staging environment as well.  
This was a major clue for me. It indicated, there is a change on client machine which is with users and as well with programmers & testers which shows and don't show this problem. And, the same code.
Few walked to my desk and asked, "Why is this missed and we said to testing team it is a major change and it should be tested thoroughly." I had to say, "Wait, I have no clue for now other than first clue what I have." Testers seated around me were into silence. The, Technical Architect came in to desk immediately and said to all, this is well tested feature and he has tested each line of code of this feature and I'm very confident in it. I thanked him on behalf of testers. Then why the problem now was the question for which I had to say, "Wait for a day, I'm looking into it. It is a problem, it has to be fixed and no other way to live with it for now."

I'm in no mood or state of mind, to tell what is testing and what is quality to people because I can invest the same time in testing and do the useful for product, team and myself. I pick the stage selectively where I have to stand up and talk and when I have to ignore and continue the work.

Below are the questions I asked for the, Technical Architect
  1. Did we release other code than what we tested? I heard, "No"
  2. Are you sure that no other code commit is done in RC that went to product. I heard, "The tested code."
  3. Are you sure that no other packages and utilities are changed in this release. I heard, "Change are the same which test team is informed in labels."
  4. As said, I see no change or differences in server and the suspect is on client now. Any thoughts on it? I heard, "The same changes on client interface and no changes there."
I looked into the production release build's code commit label and all looked same and that indicated no code changes in that build which was pushed to production. I confirmed the same to teams. Now, studying the client machines, I started looking at machines which had differences and which did not have then.

There were no minor or major update information in OS and its associated information in client machine. The next information to explore was, to look the problem reproducible in client machine having lower version OS and the latest yet to release.  I went to the Technical Architect, and we had discussion about a suspect. We had silence and we decided to start working on it now.
During the Test Design time, the risk was indicated that can come to product from one component which the product makes use of in the client machine. And this component is part of the client's machine's OS which controls most of the products that will be installed on the client machine.  Reading the beta development changes and bugs of it at time of test design, it was also tested by installing the latest beta but it was still a beta and changes could come in until it's code and architecture design are frozen.
One of the major change for this component was in its architecture and design how it saves data on Client machine. But here, the product could not wait till these component get frozen and it went out by skipping the words of Test Design saying there will be no impact from this to our product.  Test Design's risk list clearly mentioned this but it was said 'acceptable risk' and the cost is known to us and we are agreed to it.
Test team communicated the design of release plan for the same seeing the risk to product but business could not wait for this. The result was observed in one month of time.
Now, the cost was still bearable but not anymore here because the user will not be able use the Client interface anymore that means no service served from server to user and user is blocked.  I continued exploring and it was already 5 hours had passed and I had lead sources but not yet isolated the cause.

Monitoring closely with much more tests, it was isolated that, component of the Client's OS machine is the cause. This was ignored saying it is acceptable earlier though it was mentioned in risk list of Test Design. But the question I got was why not all user is facing this problem and only part of user.
The Client OS vendor is pushing the change of this component to users in batch across the globe and not in one go. That batch of people whose component got updated in their OS, they doing that sequence of operation are getting blocked with the product. And the question is, will all the user do that? Of course, yes and not, it depends on mindset of the user. It was left to business to make informed decision now, ff business sees that user is important, then you make decision what you should be doing.
Learning the change in Client OS's component was not easy and identifying that is the source of problem when it was pushed as an update to Client OS.  The OS did not show any update changes but it was changed. The Test Design happens at every time when testing and it is not that it should happen at beginning. It is an ongoing activity. In this case, I took bit of time since I knew the architecture of product and Client OS communication process as the component future changes were in beta and the impact. Moreover it was a major release and one slight incorrect move can create trouble to users and business.

Reconfirmed the cause. The Client OS component's architecture and the changes it makes to Client interface, is the cause.  This was fixed in product to handle it and it went to production in a day.

The learning I shared to teams and my fellow testers from this Test Investigation
  • We design tests always when we test but also doing it by learning the context of changes happening outside the product, it always helps.
  • There is no good and right time to design test. Every time is time for designing the tests. In this context, it needed a chunk of time as it was a major change to product in short time. Hence I took time separately out for it.
  • We test to learn the change and its impact as well. Learning changes around our product is always useful.
  • Knowing the system architecture and supporting platform architecture is an advantage in testing.
  • Maintaining the record list of changes in system and associated system is useful no matter if it is a simple or complex software.