"ssh | dd" pipeline breaks for cron-initiated job (Solaris 2.6)
David Wolfskill
david at catwhisker.org
Sun Jul 16 19:41:02 PDT 2006
[Caveat: long message: about 16 kB -- dhw]
OK; this is a little off-the-wall, and I realize that Solaris 2.6 is not
exactly cutting-edge technology. Trust me: the SS5/170 on which it
runs is also not cutting-edge technology. :-) And this is at home,
rather than ${paying_job}; I'm sure we're all familiar with adages about
"shoemaker's children" and all that, so there's little to be gained by
rehashing that sort of thing (IMO).
In the process of chasing this problem down, I have discovered a thing
or two I found not-so-intuitive, and sharing them here is about the
only way I have to thank those with the patience to wade through
the below prose. :-}
Summary: local "dd" process in an "ssh ${host} ... | dd ... >/dev/rmt/0n"
pipeline:
* (rarely, but) sometimes works
* sometimes hangs (and may thus be killed)
* often locks up, using >99% CPU so effectively that it is unkillable
and "reboot" doesn't; I need to resort to forcibly power-cycling the
machine to stop it and allow normal work to take place
on my SPARCstation 5/170 running Solaris 2.6 -- but *only* when the
process is invoked via "cron."
Whenever I invoke it "by hand," it works flawlessly.
I'd appreciate some help fixing the problem.
Yes, I am aware that "cron" runs things with a rather restricted (and
often different) environment; also, different shells tend to behave
differently. More on this below.
Further, this was never a problem until I migrated the OpenSSH running
on the SS5 from 2.5.1p2 to 4.3p2. Yes, I've checked for reported
problems with pipelines hanging under OpneSSH-4.3p2, and I've tried
applying a couple of the patches I found for the PR that I found;
neither appears to have had the slightest effect on the observed
symptoms.
Historical background: The subject process is a Perl script (that
originally dates from late 1994) that runs in the wee small hours of the
morning to create and store backup images and log what happens. Its
design (such as it is) involves 3 categories of machine:
* 1 "controller" -- this is the machine where the Perl script runs,
kicked off by cron, and where logs are written.
* 1 "tape host" -- this is the machine where the tape drive is. In the
current scenario, the above are the same machine -- the SS5 running
Solaris 2.6.
* >0 "client hosts" -- the machines that house the file systems to be
backed up. In the current scenario, the SS5 is one of these, as are a
couple of x86-arch boxen running FreeBSD (4.11 & 6.1).
The original version of the script used "rsh" for accessing the "remote"
hosts; I didn't migrate to using OpenSSH until early 2002. At that
time, the version of OpenSSH I installed on the SS5 was 2.5.1p2. I
didn't know a huge amount about SSH at the time, and allowed nearly
everything to use default values -- so protocol 1 was the protocol used.
I did set things up to use PK ("public key") authentication (without
a passphrase), since the exercise was intended for unattended
operation.
It has always used a non-root user for its execution; the name of that
user in this environment is "backup."
The normal mode of operation is to create level 0 "{,ufs}dump" images
earlyt Monday morning; the other days of the week, it creates level
5 "{,ufs}dump" images (based on the level 0 images).
In operation, the script takes a list of hosts; for each host, in the
specified order, it:
* Queries the host (via "uname -s -r") as to what it's running.
* Based on the results of the above, determines:
* A program to use to display the current capacity of each mounted
file system, selecting only those file systems that are backed
by disk partitions (vs. swap space, for example).
* A program to use to create a backup image at the desired level,
as well as any flags to use when invoking it. (FreeBSD 5.x and
above, for UFS2 file systems, support "snapshots," and the -L
flag for dump in such systems causes a snapshot to be created and
the dump to be run from the snapshot. This is useful for file
systems that are currently mounted as writable.)
* For full backups, determines some commands to run (e.g. "sysinfo
-level all") to run to document the hardware configuration, to
the extent that that is feasible.
Note that the script is generally written to use complete pathnames for
the above commands, as well as most others (possibly all; I haven't
checked thoroughly recently).
* Queries the host to find the currently-mounted, disk-partition-backed
file systems.
* For each such file system found:
* Create a backup image (using SSH to access the host unless the
current host happens to be the "controller" host), directing the
resulting stream of bytes to standard output (thus, incidentally,
avoiding a lot of problems with {,ufs}dump guessing the capacity of
the tape).
* pipe the standard output of the above to a "dd" process on the tape
host, the output of the dd process being directed to the (no rewind
on close) tape device), specifying a blocksize of 32 kB. (More on
this below).
There are the usual reality checks on the results, with provisions for
re-doing things, possibly after switching tapes.
One of the "client hosts" was the box to which SSH connection requests
from the Internet were being forwarded; since (at the time) I was still
allowing password authentication, I restricted the allowed sources of
SSH connection requests to "believed to be friendly" IP ranges on the
packet filter.
A few months ago, my wife drove up into Oregon & Idaho; I used this as a
catalyst to suggest to her that she might be able to check her email
from where she was if she were to migrate from using password
authentication to using PK authentication. (Each of us reads mail by
logging in to one of the boxen either directly or via SSH.) After
some discussion, she agreed to do that.
Once that appeared to work, I set about:
* disallowiong password authentication (finally!), permitting only PK
authentication;
* removing the packet filter restriction on SSH requests;
* upgrading the OpenSSH-2.5.1p2 on the SS5 to 4.3p2;
* ensuring that the OpenSSH configuration on each machine used SSH
protocol 2 (as a server, and preferred protocol 2 as a client).
My (interactive) tests all appeared to work as expected; I was
reasonably pleased with the outcome at this point.
The following morning, I found that the backup of the first file system
on the first-specified remote host had failed with a
write: Invalid argument
and
select: Bad file number
Huh??!? This script, has -- modulo a few minor tweaks here & there --
been running Just Fine for over a flaming DECADE! What could cause such
breakage?
I spent a couple of weeks "banging my head against" that wall, trying
all kinds of evasive maneuvers -- to no avail (of course).
I couldn't even determine with any semblence of assurance which program
was issuing those messages, or on which machine.
I finally(!) re-re-reviewed the Solaris 2.6 man page for "dd," and found
something rather counter-intuitive:
When dd reads from a pipe, using the ibs=X and obs=Y
operands, the output will always be blocked in chunks of
size Y. When bs=Z is used, the output blocks will be what-
ever was available to be read from the pipe at the time.
Now, the reason I noticed this is that up to the change I made after
reading that passage, the "dd" invocation had used "bs=32k" with the
intent to specify that both input and output block sizes were to be 32
kB.
And I now realized that the whines about "write: Invalid argument" were
from dd on the SS5, complaining (bitterly) when told to write to the
tape device with an oddball number of bytes.
Accordingly, I changed the "dd" invocation to specify "ibs=32k obs=32k,"
and ensured that "{,ufs}dump invocations wrote 32 kB "records."
As a result, those errors stopped occurring.
The problems that then surfaced, however, gave me reason to wax
nostalgic for the above-described problem. Here's a cut/paste of an RCS
log entry for the next salient change I made to the script; I think it
expresses things fairly well:
----------------------------
revision 1.29
date: 2006/06/02 11:40:37; author: david; state: Exp; lines: +1 -1
Change the PATH to be exactly the same directories, in exactly the same
order, as we get via "sudo - backup; exec tcsh; echo $path" -- because
once I do that, runing the script from the command line always works,
while the script (as run from cron) always fails on the first or second
"remote" file system. Usually, the mode of failure involves a "dd"
process on the tape host consuming CPU like there's no tomorrow, and
being unkillable -- to the point that while I can issue commands (VERY
slowly), a "reboot" has no effect after waiting 15 minutes.
I have been resorting to physically powering off the machine, then
relying on the fsck to clean things up -- and there's no polite way
to express just how undesirable that is.
I think I'm getting desperate.
----------------------------
That didn't help. Neither did tweaking the input block size for dd.
Nor did "conv=sync."
I recently augmented the "debugging" flag's output to include a dump
of the %ENV hash; here's what it reports when the script is invoked
via cron:
Environment:
HOME=/usr/local/etc/backup
LOGNAME=backup
PATH= /usr/local/sbin:/usr/local/bin:/usr/ucb:/usr/sbin:/usr/bin:/sbin:/bin:/usr
/local/etc/backup
SHELL=/usr/bin/sh
TZ=US/Pacific
Here's what it reports when I invoke the script interactively from the
Bourne shell (which is the login shell for the user in question), via
"sudo su - backup":
Environment:
HOME=/usr/local/etc/backup
LOGNAME=backup
PATH= /usr/local/sbin:/usr/local/bin:/usr/ucb:/usr/sbin:/usr/bin:/sbin:/bin:/usr
/local/etc/backup
SHELL=/bin/sh
TERM=xterm
USER=backup
Finally, here's what it reports when I invoke the script interactively
from tcsh, which is the way I normally invoke it for a restart (via
"sudo su - backup" followed by "exec tcsh"):
Environment:
GROUP=nogroup
HOME=/usr/local/etc/backup
HOST=pogo
HOSTTYPE=sun4
LOGNAME=backup
MACHTYPE=sparc
OSTYPE=solaris
PATH= /usr/local/sbin:/usr/local/bin:/usr/ucb:/usr/sbin:/usr/bin:/sbin:/bin:/usr
/local/etc/backup
PWD=/usr/local/etc/backup
REMOTEHOST=bunrab.catwhisker.org
SHELL=/bin/sh
SHLVL=1
TERM=screen
USER=backup
VENDOR=sun
At this stage, I have been unable to re-create the failure symptoms if I
invoke the script interactively. I always get a failure -- though not
always on the first remote file system; sometimes it's the 2nd -- when
the script is invoked via cron. There does not appear to be an issue of
interactive use allowing reporting & recovery from an error condition;
it Just Works.
Here's a sample script output from a failing attempt:
===========================================================
##### # # # # ##### ## #####
# # # # ## # # # # # # #
##### # # # # # # # # # #####
# # # # # # # ##### ###### # #
# # # # # ## # # # # # #
##### #### # # # # # # #####
bunrab runs "FreeBSD 4.11-STABLE"
Using "/bin/df -kt ufs" for filesystem query on bunrab
Using "/sbin/dump -" to perform the dump(s) for bunrab
Backups for bunrab on Sun Jul 16 03:08:21 PDT 2006
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s2a 161262 43006 105356 29% /
/dev/ad0s2e 1903014 747254 1003520 43% /usr
/dev/ad0s3e 2064302 292154 1607004 15% /var
/dev/ad0s3f 25286020 2633974 20629166 11% /common
Dump level for bunrab: 5
/dev/ad0s2a 161262 43006 105356 29% / ...
dev=/dev/ad0s2a; kb=161262; used=43006; avail=105356; cap=29%; mount=/; fudged=
51607.2
Starting dump of /dev/ad0s2a at Sun Jul 16 03:08:33 PDT 2006
Vendor 'ARCHIVE ' Product '4586XX 28887-XX' tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 95 block no= 0
Dumping /dev/ad0s2a on tape 1, file 95 (bunrab:/)
Trying to issue "ssh bunrab /bin/df -kt ufs /"
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s2a 161262 43006 105356 29% /
Issuing command "ssh bunrab /sbin/dump -5uf - -b32 /dev/ad0s2a | /bin/dd of=/dev
/rmt/0n conv=sync ibs=32k obs=32k"
DUMP: Date of this level 5 dump: Sun Jul 16 03:08:46 2006
DUMP: Date of last level 0 dump: Mon Jul 10 04:31:12 2006
DUMP: Dumping /dev/ad0s2a (/) to standard output
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 162 tape blocks.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: DUMP: 190 tape blocks
DUMP: finished in less than a second
DUMP: level 5 dump on Sun Jul 16 03:08:46 2006
DUMP: DUMP IS DONE
0+18 records in
18+0 records out
[And at this point, the "dd" process on the control/tape host hung, but
was killable.]
I then interactively re-started the backup:
Backup script restarted as
/usr/local/etc/backup/script -h /usr/local/etc/backup/hostslist -vxjl /v
ar/log/dumplog.incremental -r bunrab:/
on Sun Jul 16 04:27:54 PDT 2006
Capacity=7500000; density=54000; length=36000; add_fudge=0; mult_fudge=1.2
Current tape position is now 96
Saved file number is 95; current tape position is 96
Issuing /bin/mt -f /dev/rmt/0n bsf 2
Current tape position is now 95
^L===========================================================
##### # # # # ##### ## #####
# # # # ## # # # # # # #
##### # # # # # # # # # #####
# # # # # # # ##### ###### # #
# # # # # ## # # # # # #
##### #### # # # # # # #####
bunrab runs "FreeBSD 4.11-STABLE"
Using "/bin/df -kt ufs" for filesystem query on bunrab
Using "/sbin/dump -" to perform the dump(s) for bunrab
Dump level for bunrab: 5
/dev/ad0s2a 161262 43006 105356 29% / ...
dev=/dev/ad0s2a; kb=161262; used=43006; avail=105356; cap=29%; mount=/; fudged=
51607.2
Starting dump of /dev/ad0s2a at Sun Jul 16 04:28:27 PDT 2006
Vendor 'ARCHIVE ' Product '4586XX 28887-XX' tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 95 block no= 0
Dumping /dev/ad0s2a on tape 1, file 95 (bunrab:/)
Trying to issue "ssh bunrab /bin/df -kt ufs /"
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s2a 161262 43006 105356 29% /
Issuing command "ssh bunrab /sbin/dump -5uf - -b32 /dev/ad0s2a | /bin/dd of=/dev
/rmt/0n conv=sync ibs=32k obs=32k"
DUMP: Date of this level 5 dump: Sun Jul 16 04:28:37 2006
DUMP: Date of last level 0 dump: Mon Jul 10 04:31:12 2006
DUMP: Dumping /dev/ad0s2a (/) to standard output
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 162 tape blocks.
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: DUMP: 190 tape blocks
DUMP: finished in 1 seconds, throughput 190 KBytes/sec
DUMP: level 5 dump on Sun Jul 16 04:28:37 2006
DUMP: DUMP IS DONE
0+17 records in
17+0 records out
Finished dump of bunrab:/dev/ad0s2a at Sun Jul 16 04:28:41 PDT 2006 (bunrab:/) [
tape 1, file 95]
Current tape position is now 96
Vendor 'ARCHIVE ' Product '4586XX 28887-XX' tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 96 block no= 0
...
[and so the re-started backup completed normally....]
Insights for diagnosing -- or even better, resolving -- this problem
would be most welcome. Requests for additional information should
probably be directed to me, rather than the list; I've set Reply-To
accordingly, and promise to follow up with anything useful or
interesting that's relevant.
Thanks!
Peace,
david
--
David H. Wolfskill david at catwhisker.org
Doing business with spammers only encourages them. Please boycott spammers.
See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://www.baylisa.org/pipermail/baylisa/attachments/20060716/2b5df3a0/attachment.bin>
More information about the Baylisa
mailing list