From grog_ERASE@lemis.de Sun Jul 14 07:45:04 1996 From: grog_ERASE@lemis.de (Greg Lehey) Subject: Re: 8 * 0xFF bytes at intermittent multiples of 0x1000 To: jhs@ Date: Sun, 14 Jul 1996 16:23:13 +0200 (MET DST) Cc: scsi@freebsd.org In early June 1996, Julian H. Stacey wrote: > > To scsi@freebsd.org > Cc Adaptec 1542A SCSI Adapter People, Julian Elischer. > > [ I last posted to +1542A owners + bugs@ , > but scsi@ now seems more appropriate than bugs@ > I & some other 1542A people are most probably not on scsi@ list, > so please be careful if trimming CC line. > ] > > I (Julian H. Stacey ) did a load more hardware changes & tests, > including swapping my Adaptec 1542A for a 1542B, & swapping sd0 & sd1, > & eventually deduced it was not my 1542A that was mis-behaving, > (returning 8 * 0xFF bytes at intermittent multiples of 0x1000), > but was one of 2 HP 97548S SCSI 1 633MB disks. > > Either the disk is faulty, or maybe the scsi code might not be > allowing for some strange sequence, or some such. > > __HOWEVER__ > We can't dismiss it as an isolated equipment fault, as > - tomppa_ERASE@fidata.fi detects similar data corruptions, > - scott_ERASE@relay.forest.com seems to be having similar problems, > but with a 1542B, > - perhaps other people are suffering similar corruption > without realising it. > > Partial Conclusion: > 1542A people can `relax', to the extent that 1542B seems to be > able to trigger the fault too (I don't have a1542C or 2940 etc) I've just run into this same problem, but I can't confirm your findings. I'm putting together a machine out of old junk parts. Currently it has a 486/66 with 16 MB and two full-height 5\(14" drives: (aha0:0:0): "CDC 94161-9 6226" type 0 fixed SCSI 1 sd0(aha0:0:0): Direct-Access 148MB (304605 512 byte sectors) (aha0:1:0): "CDC 94171-9 5836" type 0 fixed SCSI 1 sd1(aha0:1:0): Direct-Access 308MB (631017 512 byte sectors) Although these drives both claim to be CDC, the second one has a Seagate label on it. I installed 2.1-RELEASE on the machine from CD-ROM, and immediately after booting lots of programs SIGSEGVed. I compared them with the original and found almost exactly the same symptoms you describe: here's the result of comparing /usr/bin at a later time: /usr/bin/cu bin/cu differ: char 40961, line 131 /usr/bin/uucp bin/uucp differ: char 32769, line 97 /usr/bin/uupick bin/uupick differ: char 32769, line 102 /usr/bin/uustat bin/uustat differ: char 32769, line 111 /usr/bin/as bin/as differ: char 81921, line 185 /usr/bin/awk bin/awk differ: char 32769, line 83 /usr/bin/bc bin/bc differ: char 32769, line 134 /usr/bin/cvs bin/cvs differ: char 212993, line 725 /usr/bin/gdb bin/gdb differ: char 475137, line 5209 /usr/bin/grep bin/grep differ: char 32771, line 107 /usr/bin/egrep bin/egrep differ: char 32771, line 107 /usr/bin/fgrep bin/fgrep differ: char 32771, line 107 (many more) It's interesting to note how many come immediately after the first 32 KB. In the cases I looked at, a number of bytes had been replaced by 0xff; the total size of the executable didn't change. In most other cases, too, the corruption was at or immediately after the beginning of a memory page. Another point: I've only seen this corruption on the second disk. Considering that they're almost identical, that's interesting. I don't know how to explain it, except that maybe it's a coincidence. The big difference from your experience is that I replaced the 1542A with a 1542B, and the problems completely disappeared. Let's look at the other responders: >> Date: Tue, 11 Jun 1996 16:56:50 -0400 >> From: Scott Kelly >> To: jhs@ >> Subject: Adaptec 1542A Users (from 12 Apr 1996) >> >> >> I seem to be having similar problems, but with a 1542B... Do you know if there >> has been a driver update since April? Are you sure that these are the exact problems? What other hardware are you running? > For reference, I'll append parts of my last mail: >> Tomi Vainio >> Has confirmed he sees the same Adaptec 1542A SCSI adapter bug that I do. >> >> > I connected sd1 to my 1542A and here are results: >> > >> > 1. No problems if testblock is only one that generates disk activity. >> > 2. I launched couple find processes to sd0 and at same time I >> > run testblock. Testblock failed only 1/10 of test runs. >> > 3. I copied files with cp to sd1 when running testblock on >> > sd1. Testblock failed on every time. Yes, I had a vague feeling that it was related to the amount of disk activity. >> So it looks like a generic bug in FreeBSD code: >> With a 1542A (& not a 1542B, which seems OK), >> In simultaneous multiple task write mode to sd1 (or 2 or 3 or 4), >> At random multiples of 0x1000 bytes, >> The first 8 bytes of a block get forced to 0xFF. >> (Of course it may well be that FreeBSD code is not `in error' but merely >> doesnt allow for some wart in the 1542A, that's fixed in the 1542B, >> but whatever, we need a fix). > > As above in this mail, I think I'm wrong there, it's not 1542A sepcific, > I get it with 2 different 1542B's as well Do you have 1542Bs with which you don't get it? When I get a bit of time, I intend to install BSD/OS on the same configuration and see if it has the same problems. Greg