Dedicated server hosting? -- sanity checks
Chuck Yerkes
chuck at snew.com
Wed May 28 13:48:48 PDT 2003
You've got to start using whole sentences with verbs and nouns and
articles. I can understand people with strong chinese and indian
accepts, but I can't read half of your posts...
Quoting Alvin Oga (alvin at maggie.linux-consulting.com):
...
> - ask to talk to their tech support guy
> ( pretend scenario ... we're looking at colo'ing with you,
> ( here's our "interview test"
>
> - my machine just crashed, it wont boot, the screen says
> "LI" .. what do you do ??
>
> - how many different ways can you boot a linux/freebsd box ??
>
> - once its back ... the machine has tons of "hd seek errors"
> what do you do ??
>
> - how do you know a fan died ?? or not running at full speed
>
> - remember, if the colo folks have to get involved, that machine
> is probably seriously hosed ...
Your test scenario also covers well why I tend to fear using most
Intel boxes and often Linux in a colo.
I have a box at a colo 3000 miles away. I've never seen or touched
it. A raging SPARC 10. It was passed to the ISP owner by a friend
with 2 disks inside and no OS. I booted it via serial port over
the net (boot net) and installed from a nearby SPARC 2.
In 4 years, the hard drive (old) died. I replaced it via fedex and
a guy there swapping.
I have no BIOS to fight, I have an "oh shit" backup of net booting
if I really needed it.
PCWeasel or Compaq's "Lights Out" stuff is imperative for remote
management.
Serial console access is imperative.
"the screen says..." question is silly. Where would you have a screen?
Why? If you can't tell it to boot from disk 2 (where you keep a spare
root partition for that moment), then you'll be driving a fair bit.
You ssh to your terminal server and YOU see "LI" on the serial port.
Another port on the TS may handle power cycling your box.
I'm moving some stuff local because I have a guy down the street
with fractional T3 access (via 802.11a and a big freaking antenna
to a colo 7 miles away in Oakland). But my house terminal server
is still the main access method (heaven forfend that I can't handle
it while on a trip or even just upstairs).
I don't really want NOC people to be touching my machines. I
CLEARLY mark the power and ethernet. I'll tape over unused ports
(parallel, USB, etc). If it can be plugged in wrong, it will be.
A tape drive may be useful. Easy to swap drives IS useful.
Mirroring (for cheap) or real external hardware RAID is compulsory
(internal RAID cards can be a hazard for remote production machines).
BANDWIDTH:
As for where, I have freinds doing low bandwidth and sharing a rack
at HE.net in hayward. Near enough to get to, cheap enough for
them. You need a true 5mb/s? That's a real colo demand and will
cost (at an office or at a colo). One presumes that you earn more
from it than bandwidth costs.
The win of a colo is bursts of HUGE bandwidth - like a store at a
mall gets bursts of infinite parking spaces - but high on-average
uses will still cost. It's just that now the Colo is a ready drop
point for one of several providers.
DIVERSITY:
Previous work used Level3 because we could hook in in SF and NYC
and in Europe. Our "intraoffice" packets went over their network
and could have bandwidth guarantees. Didn't have, but should have,
machines in geographicly diverse places. The Bay moves. That's
gonna suck one day. My little SPARC 10 hosts DNS for 200 domains
and backup MX for a bunch of domains which I trade with another
guy who's got good connectivity in the Bay.
A big lightning storm in New Jersey took it down for 4 hrs (until
a *working* generator was applied to the UPSs in the racks). My
mail happily queued in Boston and San Francisco for that time.
PRESUME the machine will crash. Presume that the NOC staff just
barely graduated 8th grade and were turned down at the McTacoKing.
Presume that moving parts will fail (fans, disks). Presume it
will happen during an earthquake whose only damage it to wreck
your car.
-How much does 3 days down cost you?
-Can you change DNS from another place to get packets routed to
a low-bandwidth desperate recovery site (even just a page that
says "down for maintainance, back on monday")?
Now build your box.
More information about the Baylisa
mailing list