vyatta

open thought and learning

Kübler-Ross model – Tailored for operations teams

leave a comment »

Unless a direct alert for an application fires, the almost-always assumption is that the problem doesn’t exist with an application. Most of the time, this is seen for alerts triggered on upstream tiers in the application stack. I first came across this model while watching House and reading some articles online. It’s fairly known as the ‘5 Stages of Grief’ around the world.

From our perspective here’s how the responses change from the application team, in alignment with the Kubler Ross model:

Denial: “Nothing’s wrong with our tier. Why did you even call us?”, “This is a *false alarm*”.

This is only a temporary defense for the application team. The feeling is generally replaced by the kind of impact this incident might have.

Anger: “How can my application fail?!”, “Not a single alert fired!”, “Check the freakin’ network!!”

Once in the second stage, the team recognizes that denial cannot continue and extends to getting other teams on the line. It cannot be responsible, alone.

Bargaining: “This isn’t really a user-facing problem!”, “This is actually a dis-satisfaction report, not an incident, come on!”

The third stage involves the hope that the team can somehow postpone the impact or the creation of an ‘incident’. Usually, the negotiation to ignore the incident is made with the Tier 2  in exchange for improved alerting , network-bashing and other personal favors.

Depression: “[TIER2] XYZ APPTEAM, are you looking at it?…[TIER2] Ping? …..[TIER2] You there?….[APPTEAM] Still Looking ….. [TIER2] Any update?”

During the fourth stage the application team begins to understand the certainty of an incident. Because of this, the team representative may become silent, refuse disturbance, and spend more time on looking at application counters, Cactus etc. and determining what went wrong with their beloved application. Didn’t they love it enough? Did it catch ‘the bug’?

Acceptance: “Yes, we appear to be losing X dollars per 100 page hits”, “Can’t fix the code, might as well fail over traffic and mitigate impact.”

In this last stage, the application team begins to come to terms with the ‘mortality’ of the feature and understands that mitigation needs to be done.

Written by mohitsuley

April 16, 2011 at 12:49 am

Posted in sysadmin

Turn off Facebook Mobile Texts on new cellphone numbers

leave a comment »

After I got a new number, the first thing that happened to me was a barrage of of SMSes every day from 32-665 (FBOOK) for an unknown, hacked account and being in North America, those text messages were going to be billed to me. Here’s what I tried initially:

1. Called up my cellphone provider. They weren’t able to stop them; suggestions offered ranged from changing the number to blocking texts altogether or sending each such text message to their SPAM number to remove it from billing (naturally, tedious).

2. Tried sending OFF back to the number to stop it. It returned saying, yes, we’ll stop, but it did not.

The only way Facebook texts stopped on my new cellphone number was by adding that very number to my own account and confirming it. It appears FB can have one cellphone per FB account and whoever confirms the account last gets to ‘own’ the number somehow.

It worked for me and I have been spam-free on my cellphone for about 30 hours now!

Written by mohitsuley

December 8, 2010 at 4:52 am

Posted in internet

Tagged with , , ,

Standard Deviation and Degrees of Freedom

leave a comment »

In order to brush up some elementary statistics, I decided to read more about standard deviation as a measure of variability. What I couldn’t understand initially was the difference between

as ‘standard deviation of a sample’, and …

which is ‘sample standard deviation’.The difference here lies in the denominator – n versus n-1.

The wikipedia article on SD calls it Bessel’s correction and the wiki entry on that is equally impenetrable, only for qualified mathematicians. Here’s a more plausible (in my own words) explanation of the reason behind reducing n to n-1 that I read in the book ‘How to think about statistics’:

Enter Degrees of Freedom.

They are essentially values of data, or scores, that are free to vary. Consider a set of 5 numbers. If you were asked to guess those numbers, you could theoretically be free to think of any number. Let’s say that the first number is 3. What’s the second number? Again, you could think of anything. Let’s say you come up with 4 (let’s keep it simple); and so on (3, 4, 5 and 7). Now, assuming you know the mean (5 for our example) of the scores, what could be the missing value? There’s only one possible number – 6.

So, if you know the mean of a set of scores with a single value missing, you are no longer free to select its value and the only way to determine value for the 5th score is by using the remaining 4 scores. That’s the n-1 right there.

When you calculate the mean of a set of scores, all scores were free to have whichever values they were. So you divide them by n and get a mean for that set of scores. Remember, when you calculate the mean for score(s), your degrees of freedom remains n.

When you calculate standard deviation, you are not using those independent values, but the mean as well! If you had a score missing, using the mean would have been fine. However, in order to ensure our standard deviation isn’t biased and is based on ‘truly independent’ scores, you have to make corrections for the loss of a single degree of freedom.

Which is why an ‘accurate’ formula for standard deviation will have n-1 and not n. The reason this is usually ignored is that the effect on standard deviation for a large set of scores doesn’t really change that much. A sample of 100 scores will only have 1% difference on the number of samples to be used.

I know I sound like a complete novice at this – I am. Knowing mathematics and actually writing about it are two completely different things. Let’s see if I can learn both (wink).

Written by mohitsuley

September 30, 2009 at 5:54 am

Posted in mathematics

Tagged with , ,

Session persistence and OneConnect on the BigIP LTM

with one comment

It is standard practice to use the Insert Cookie mechanism to enable persistence on a virtual server on a BigIP LTM. However, that can be an expensive task for the load-balancer; inserting cookies takes CPU. Another way is to use an existing cookie that already marks your session and use cookie hashing. One example is the JSESSIONID cookie set by application servers such as Tomcat.

All you need to do is:

1. Go to Profiles -> Persistence -> Create New…
2. Using the cookie profile as the base, create one and mark it as Custom
3. Change the type to Cookie Hash and put in the cookie name
4. You can put in values for offset and length (for example, 1 and 32)
5. Go to Virtual Servers -> Your Virtual Server
6. Make sure client profile is HTTP
7. Under the Resources tab change the default persistence mechanism to the one you just created.

Viola!

But, you’ll soon notice it doesn’t really work as expected, especially when your clients are behind a proxy; why? If the source address of the client requests are the same and there’s already a TCP connection running, the LTM will forward that request there without even looking at the cookie. This is done to help performance. However, since you naturally need to accommodate proxies and such connections, make sure the HTTP Profile on your virtual server is OneConnect. That will make sure each of your requests get scanned and forwarded based on the cookie value and not the source IP address.

This is well-documented in the manual, but doesn’t seem to appear when I google/bing around. And for trigger-happy folks like me who don’t RTFM and start experimenting, this just might save 5 or 10 minutes.

Written by mohitsuley

July 23, 2009 at 6:38 am

Posted in networks

Tagged with , , ,

TCP/IP Drinking Game

leave a comment »

After a long hiatus from my activity here, I have decided on two things:

1. Blog entries can be short

2. They need to be more frequent – for my own sanity as well.

I plan to read and take cues from the TCP/IP Drinking Game and learn the nuances of the protocol a bit more. There are some interesting things I learnt which I will write about, later.

That’s it for now. Ciao!

Written by mohitsuley

April 14, 2009 at 6:52 pm

Posted in networks, sysadmin

disown and nohup

leave a comment »

This is the first time I started a file transfer, and on hindsight, struck my head and said “Wish I’d started this with nohup or screen…”; I could have left home on time with this laptop tagging along with me.

What I didn’t know *then* (now I know) that there’s a beautiful bash built-in called disown which can attch any/all running jobs to the init process. Yay!


[root@linux-test data]# scp RHEL4-U5-i386-AS-disc4.iso suleym@192.168.1.1:/appl/RHEL-AS4/
suleym@192.168.1.1's password:
RHEL4-U5-i386-AS-disc4.iso 0% 1016KB 1.0MB/s 06:21 ETA
[1]+ Stopped scp RHEL4-U5-i386-AS-disc4.iso suleym@192.168.1.1:/appl/RHEL-AS4/
[root@linux-test data]# bg
[1]+ scp RHEL4-U5-i386-AS-disc4.iso suleym@3.122.220.169:/appl/RHEL-AS4/ &
[root@linux-test data]# disown -h

More about disown here and in man bash.

Writing this post for the sake of not reverse-engineering searches on Google, and so that people see disown *related* to nohup.

Written by mohitsuley

August 21, 2008 at 10:09 pm

Posted in linux, sysadmin

Tagged with ,

Semaphore problems on Apache

leave a comment »

I came across a simple but intriguing problem – apachectl restart will work and restart apache processes, and in my case restart the CA/Netegrity Siteminder agent as well. However, the server didn’t respond, and neither were there any messages on the error log. SM logs said the agent initialized successfully.

When I remove mod_sm.so and restart apache (after removing environment variables related to SM), everything worked just fine. I naturally assumed that the problem was with this module that I just removed.

It turned out that the problem was with this particular semaphore which didn’t release since about the last 24 hours, and was somehow linked to the siteminder agent module. After I did an ipcrm -s ID, everything was working fine as before.

I always thought semaphores/shared memory segments not freeing up will result in apache not restarting successfully. This is the first time apache didn’t complain on a restart, no logs displayed any errors, ‘removing’ a module rectified the error, and putting it back actually made the issue recur!

Need to learn more about semaphore allocation in linux.

Written by mohitsuley

August 16, 2008 at 2:08 am

Posted in linux, sysadmin

Tagged with , , ,