network thruput
Alvin Oga
alvin at Mail.Linux-Consulting.com
Wed Jun 1 14:21:47 PDT 2005
hi ya michael
On Wed, 1 Jun 2005, Michael T. Halligan wrote:
> One quick suggestion. quadruple your # of switches. I've built beowulf
> clusters where each server has
changing hw is not an option ..
esp since reconfig the network or switches is not occuring
either
> 4-8 nics, each connected to a different switch, and then bonded,
> creating a 400 or 800mb (in your case
> 4gb or 8gb) network.
yes.. the suggestion was to channel bond to go faster.. but
its not much of a improvement, since not all machines
are bonded
> If you have the spare hardware,
no spares ... they run 24x7 environment .. no test machines, no
hotswap live backup servers either .. yikes
> When you say they're getting a paltry 5-10MB/s at best, are you saying
> all of the servers are at the
> same time, or at any given time?
that's the average thruput at any random given time between the
supposedly high performance cluster nodes
- measured with say copying 100MB or 500MB files between
any random node at any random time
- to get rid of disk latency issues, we used node1:/dev/loop
copying into nodexx:/dev/loop and its the same ... which
means the ultra-360 disks is fast enough to keep up on the
gigE lan
> Beyond that, the stacked switch setup could be bad if that means switch
> 10 has to traverse all of the
> other switches in order to get to switch 9.
that;s the stack i am tryingt break up ... to get rid of all the netbios
packets from the cluster ( there is nothing a windoze box needs to do
on the cluster )
- netbios packets are about 90% of all packets on the wire
> Another thing I'd do is collect some good stats to show to the PHB's ..
> Setup NTOP for a week and
> show them that it's windows chatter eating up all the bandwidth. If
already showed the traffic pattern ... but to no avail ... :-)
hard to convince PhD with managerial authority that they're not
quite up to puff with network design and topology issues
- push too hard, and one is on the streets ya know
> they're manageable switches,
> setup cacti to graph them via snmp.
cacti seems too complicated for me ... :-)
i like something simple like ... to show what is clogging the network
90% netbios packets
5% tcpip ( data ) not dns, arp, http, smtp, etc..
5% misc
> Might also be worth digging in to
> see if you're having any
> type of arp or broadcast storms, perhaps a screwed up vlan.
i was hoping to see dns/arp issues but thats not the case here ..
> $150k? Ouch.
they're very proud to own that $150K tape library...
that i will not touch ... not even for $500/hr... no way ...
tapes are a disaster waitng to happen in my book and i rather
not be restoring from tape or making tape backups, and besides,
they have another to take care of that for them
> For $20k nowadays you can get a 40 tape lto2 library that
> has 200GB (uncompressed)
we're looking at 3TB of data .. still pretty small systems actually
> I'm starting to give up on Tape to be honest.
:-) congrats .. :-)
i think after one or few full restores from tapes that someone
else did, i think one will no longer be "tape happy" and prefer
a more reliable way to restore from full backups ( bare metal restore )
where you have to restore in 5 seconds because the whole company
is shutdown until it is back up and online ...
- i will always prefer to have live warm-swap backups systems
even if i have to bring in my own 2GB - 5GB of disks
for those that are willing to pay my fees w/o discounts
> The value of tape and disk always goes back and fourth,
yes... depending on the sitation
fun stuff...
c ya
alvin
More information about the Baylisa
mailing list