Management Perception of System Administration
Jim Hickstein
jxh at jxh.com
Fri Feb 22 13:13:19 PST 2002
One thing struck me about the technology risks: Someone says "we want 5
nines", but then they set up a business that cannot tolerate any service
interruption above that point. 0.99999 is a _probability_, not a
certainty, and an average one at that. Some days will be below average.
I was thinking of systems where this would seem to matter more, and seem to
achieve better certainty: airline reservation systems, and better yet,
air-traffic control systems. Yet, in the latter case anyway, they _do_
have major system failures, and they _do_ have major service interruptions.
But they also have manual procedures. When the radar goes black, you talk
in the radio; when the radio falls silent, you look at your pieces of paper
and start talking on the telephone (to other radio operators). The
airplanes have procedures for clearing the airspace around such an
emergency, and they don't all fall out of the sky. This happens
_routinely_. (P.S. Don't tell the passengers.)
I didn't see a failsafe system when Paul was describing the totes going
round the distribution center. Partly this may because they didn't set out
to design one. But that, IMO, is a business failure, not a technological
one. Some service interruptions, at some level, are _inevitable_, period.
(And the harder you try -- and succeed -- to reduce the small disasters,
the larger the average disaster becomes.) If you set up a business that
won't survive one, and don't have the humility to admit that Plan B should
exist, that's not the technology's fault.
You can buy a certain number of nines these days. But so can your
competitors. The next couple of nines are harder, and they consist of
putting systems in place to help people avoid making mistakes. I've
achieved some modest success at this in my operations career. It's the
most interesting part, to me.
More information about the Baylisa
mailing list