Jump to content

Seagate Barracuda 7200.11 Troubles


Recommended Posts

I wouldn't touch anything for at least 3 or 4 days after the next release and follow Seagate's forums to see if the update is actually working.

I attempted one of my 4 drives. It didn't brick it, but as soon as it failed, I stopped and started reading... then I found that SD1A was bricking drives.. then I sat back and had a good chuckle. (at Seagate of course!)

So when the new SD1A comes out, should I re-flash to it again? I flashed three drives (individually of course or they would not be working...) with the second release of it on Monday evening and all three drives work fine. But should I be concerned since they pulled it? My drives are 1TB.
Link to comment
Share on other sites


The slashdot seagate guy said that when the internal drive log exceeds 320 entries and one powers down

and then up, that the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS.

Thus, is one able to dump the contents of this log using the Rs232 port or SATA i.e. smartdrive to see what

the log entries are and how many there are in it? I'd like to see if I'm close to the 320 limit.

The guy said "IT is a rare condition". Hmmm, one would think that any thing no matter how rare will

happen "Murphys Law". Thus, the firmware should have been designed to handle any error condition

and not just crash due to log (buffer) overflow. This is just bad engineering IMHO and RMA'g the drive

is not the solution either. That's just a copout. What a "loser" would do!

The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a non-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written... a perfect storm situation. IF this is the case, then seagate is trying to put in place a procedure where you can simply ship them the drive, they hook it up to a serial controller, and re-flashed with the fixed firmware. That's all it takes to restore the drive to operation! As for buying new drives, that's up to you. None of the CC firmware drives were affected - only the SD firmware drives.If you were to have flashed the drives with the 'bad' firmware - it would disable any read/write functions to the drive, but the drive would still be accessible in BIOS and a very good chance that flashing it back to a previous SD firmware (or up to the yet to be released proven firmware) would make it all better.

As far as I know, if your drive has the CC1G, CC1H, CC1J or any of the CC firmwares really, it is completely unaffected by this issue. However, it may need an update if you experience 'stuttering' (the drive pausing for more then a few seconds during data transfer). The CC1H and CC1J firmwares are *fine* and will absolutely not brick your drive.

The update script checks two things.. to make sure it's a BRINKS or a MOOSE drive, and to check the model number. If you get the firmware from the torrents (it's out there) and tear it apart with uniextract, you can see the batch file and what it checks for. It's a program that was built back in the 90's and used ever since! You remove those 2 checks, and it'll happily flash that IBM or Western Digital drive with the seagate firmware as well.

The 1.5Tb drives both stutter and are at risk of bricking due to the journal issue. The Stuttering issue is fairly recent and mostly runs in the 1.5tb drives - but the journal issue is older and exists across many 7200.11 drives. ES2 drives and Diamondmax drives.

SD1A fixes both of these problems in the 1.5Tb drives.

My suggestion to Seagate or any hardware manufacturer is to do what Intel does.

Since most people have MSOFT Windows installed, intel provided microcode updates to MSOFT.

MSoft then provides these microcode updates as product quality/reliability enhancements which are

downloaded (I assume automatically) and then (perhaps during a bootup) reflashes the microcode in the

Intel processor up doing a check for which Intel processor you have. Probably based on the intels S-SPEC

number that identifies the chip family, step increment and so on before flashing it.

Thus, the hardware manufacturers need to work with Microsoft to do the same thing for their hardware,

whether that be ethernet board flashing, hard disk flashing, BIOS, video and graphics, sound etc.

This will eliminate people having to call the vendor support to beg/plead to get these and spend alot of

time searching the internet for these patches and it will eliminate the user creating problems via bad firmware

flashing too!

As far as "BRINKS" "MOOSE" "GALAXY" etc.. are concerned, they are pretty much the internal development names of the drive family. There can be overlap, but most "BRINKS" drives are 7200.11, I believe, while "MOOSE" drives are almost all 7200.10, and "GALAXY" drives are 7200.9. Generally, those names don't make it out into public, but if you were to tear into the SD1A firmware, you'll notice that it looks for the "BRINKS" drive before it flashes the firmware to the drive. There can be different internal names for different revisions of the drive itself, but generaly they stick to one revision per family - a new internal name would only be used for a MAJOR revision on the drive.

SMART characteristics can vary from part number to part number - or even sometimes drive-to-drive; so what is 'out of tolerances' for one part number could be just fine for a different p/n (even though they are the same model number).

The BIOS bootup and LBA 0 GB explaination

The known issue manifests itself where the drive spins up fine but either reports no data to the drive controller (or BIOS if applicable) or shows up with zero capacity

Here's a person who worked in Disk drives explains ROMWARE vs. DISKWARE and how flashing the HDD really

works. I'm not sure if this is for all HDD manufacturers though.

The manufacturer can generally reload the firmware from scratch through a serial or diag port. After all that's what they do in manufacturing. When I worked with disk drives, we had ROMware, firmware (in flash) and Diskware. The ROM is mask programmed and has only boot code that can program the flash ROM, the flash ROM can be reloaded via the disk interface or a serial port (and can't do much more than load a track from disk), and the disk contains the actual code.
Then we got rid of the flash ROM and things became a little more exciting because the code in ROM had to be able to read and write a few sectors reliably - for the entire lifetime of the product [line], including cost reductions.

Seagates Refurb Process

Thing is, I know Seagate really does try to push for high manufacturing standards (for example, did you know that every last Refurb drive *must* go through the full new-drive qualification before it's sent out? - something only a percentage of actual new drives have to go through because it's time consuming).

Here's seagate techsupport explaining what happens to drives that are sent back for refurb/reflash.

If you have confidential data on the drive, you have two options:

a) if you send it in for a reflash, there will be a tech who flashes the drive using a serial interface, and then verifies good read/writes to the data. But he's likely unbricking a hundred drives a week, and doesn't care about what's on the drive unless he happens to maybe notice a folder when he does he read/write test labled "OMG HUGE AMOUNT OF CHILD PORNOGRAPHY". I can't even say that a person will even be doing the R/W test - but there is that chance.

or b) RMA your drive. The first thing that happens once the drive passes a visual inspection (verifying that the warranty is still valid and the drive hasn't been user-damaged physically) is the drive is thrown on a text machine. if the drive passes the physical tests, then it's firmware is flashed and the diag machine goes through a 7 pass zero-random-zero-random cycle that destroys any and all data on the drive. This not only ensures data wipe, but also helps diagnose any read/write errors on the drive. If you RMA the drive, it's not even hooked up to a human-accessable 'computer' (just diag equipment) until the next customer who received the drive as a refurb, puts it in their computer - at which point it should be so blank, not even the government could recover data from it using the most advanced tech that we know about.

Edited by mikesw
Link to comment
Share on other sites

I disagree completly when they says 320 log thing is "rare" to happen.

Is it NOT, 2 out 4 (750GB) failed, and the other two were reported as "not affected" @ Seagate site (by s/n check).

So to my case it was 50%. Super high! :wacko:

1 failed as soon as I turn on the computer, the drive wasn't there anymore (BSY error).

I double checked cables and all, and same thing, so I say "Holy! How come? This thing is NEW!". I used them for less than 5 months, and very few. Of course, a manufacture defect.

Going forward (I lost almost 700GB of hard work after all), I started with 3x750, one of them was with Vista, after a M$ update, I was forced to reboot, OK, lets do it, I always do that anyway after a critical fix, and guess what? Another 750GB died from just a reboot (another BSY)!

How come this thing to be rare? Isn't rare at all! They're completely wrong with the numbers. Their math is faulty.

Gradius

Link to comment
Share on other sites

So when the new SD1A comes out, should I re-flash to it again? I flashed three drives (individually of course or they would not be working...) with the second release of it on Monday evening and all three drives work fine. But should I be concerned since they pulled it? My drives are 1TB.

Hi,

That won't represent any problem.

Serial FLASH can support 100,000 writes/erases (yes 100k). We're very very safe here.

Gradius

Link to comment
Share on other sites

I found an interesting article in german:

http://news.magnus.de/artikel/87236

It states that three conditions must be met for the drives to fail:

1. The counter-log-thing must reach 320

2. The drive is powered off at that moment

3. In manufacturing, the drive must have been connected to a (then defective) testmachine!

Not all drives are being subjected to the test during manufacture.

While the guy at slashdot already mentioned the first two conditions, the third one is news to me, as it would suggest that not all drives are bound to fail. (even though labeled as "affected" by the online-check as mine is)

Link to comment
Share on other sites

As far as I know, if your drive has the CC1G, CC1H, CC1J or any of the CC firmwares really, it is completely unaffected by this issue. However, it may need an update if you experience 'stuttering' (the drive pausing for more then a few seconds during data transfer). The CC1H and CC1J firmwares are *fine* and will absolutely not brick your drive.

Well I have two CC1F 1TB here, and I'm worried with them.

Here's a person who worked in Disk drives explains ROMWARE vs. DISKWARE and how flashing the HDD really

works. I'm not sure if this is for all HDD manufacturers though.

The manufacturer can generally reload the firmware from scratch through a serial or diag port. After all that's what they do in manufacturing. When I worked with disk drives, we had ROMware, firmware (in flash) and Diskware. The ROM is mask programmed and has only boot code that can program the flash ROM, the flash ROM can be reloaded via the disk interface or a serial port (and can't do much more than load a track from disk), and the disk contains the actual code.
Then we got rid of the flash ROM and things became a little more exciting because the code in ROM had to be able to read and write a few sectors reliably - for the entire lifetime of the product [line], including cost reductions.

In case of those 7200.11, they have a very tiny IC (8-pins) on PCB and is a Serial Flash, it costs around $1 per million, and they make at least 10 millions of 3.5" per month, and 4 millions of 2.5" HDDs.

Btw, they won't explain why we cannot see Buffer (cache) by using progs like HD Tune. And this only happens on 7200.11 !

Gradius

Edited by Gradius2
Link to comment
Share on other sites

I found an interesting article in german:

http://news.magnus.de/artikel/87236

It states that three conditions must be met for the drives to fail:

1. The counter-log-thing must reach 320

2. The drive is powered off at that moment

3. In manufacturing, the drive must have been connected to a (then defective) testmachine!

Not all drives are being subjected to the test during manufacture.

While the guy at slashdot already mentioned the first two conditions, the third one is news to me, as it would suggest that not all drives are bound to fail. (even though labeled as "affected" by the online-check as mine is)

About 1 and 2, this isn't correct. My drive died (BSY error) after a reboot, when you do a reboot it records all necessary information only after that, it reboot (call POST process on BIOS), as soon I did reboot the drive died, and the power wasn't take out at any moment. And was a normal reboot (not hangs, crash, or anything).

It won't only happen when you turn off (or power off), it just need to meet this condition: a reboot (POST) and log to be 320.

Gradius

Link to comment
Share on other sites

Btw, they won't explain why we cannot see Buffer (cache) by using progs like HD Tune. And this only happens on 7200.11 !

Gradius

I'd think that HDD Tune can't read this info from the disk drive directly. I think what it does

is read the DRive model number and lookup in a local database the other drive charactertistics and display

them to the user. Thus HDD Tune needs to have it's database updated with the newer drive info by the developers

of HDD Tune. This includes info such as ATA/ATAPI-????? and the UDMA info.

So in the case of my ST31000333AS having 32mb of cache, HDD tune would modify the DBMS to say lookup

the drive model and display that 32mb info to the user.

Edited by mikesw
Link to comment
Share on other sites

Btw, they won't explain why we cannot see Buffer (cache) by using progs like HD Tune. And this only happens on 7200.11 !

Gradius

I'd think that HDD Tune can't read this info from the disk drive directly. I think what it does

is read the DRive model number and lookup in a local database the other drive charactertistics and display

them to the user. Thus HDD Tune needs to have it's database updated with the newer drive info by the developers

of HDD Tune. This includes info such as ATA/ATAPI-????? and the UDMA info.

So in the case of my ST31000333AS having 32mb of cache, HDD tune would modify the DBMS to say lookup

the drive model and display that 32mb info to the user.

Isn't that issue what this article describes?

http://www.seagate.com/www/en-us/support/d...wnloads/cuda-fw

Link to comment
Share on other sites

Well, I re-read the article, it does say power off, but I don't care.

The interesting thing is the third point, because it suggests all of a sudden that it's not the fault of the firmware, but a set of miss-confired test machines.

It explicitly states that all three conditions must be met in order for failure to occur.

Don't blame me, I didn't write the article.

Edited by gestalt
Link to comment
Share on other sites

Seagate stated that the Diamond Max 21 line also needs a fix too. However, they don't

state if it is the SATA or ATA/IDE/PATA line or both.

I bought the following MAXTOR 500 Gb Diamond Max 21 line about two years ago in a retail

kit as STM305004N1AAA-RK kit: L01Y500 7200.1 . PN: 9DP0A6-591

The kit actually contained this drive model STM3500630A PN: 9DP046-326

I also bought two Seagate 750GB retail drives ST3750640A-RK PN 98J048-305 and

same model but PN 9BJ748-550

In all three drives above, the Firmware is 3.AAE regardless if it is Maxtor or Seagate.

This probably means although my Maxtor is Diamond Max 21 that the Seagate 750 having the same

firmware version can be thought of really as a Maxtor Diamond Max 21 too although seagates website

doesn't mention these drives sizes i.e. 500 gb.

Conclusion: I'll probably need a firmware fix for the current line of problems in seagate drives.

Does anybody out there know what the latest firmware version is for these? I searched google and

people seemed to have problems with 3.AAK. This was supposed to fix speed problems, but made things

worse.....

:unsure:

Edited by mikesw
Link to comment
Share on other sites

Btw, they won't explain why we cannot see Buffer (cache) by using progs like HD Tune. And this only happens on 7200.11 !

Gradius

I'd think that HDD Tune can't read this info from the disk drive directly. I think what it does

is read the DRive model number and lookup in a local database the other drive charactertistics and display

them to the user. Thus HDD Tune needs to have it's database updated with the newer drive info by the developers

of HDD Tune. This includes info such as ATA/ATAPI-????? and the UDMA info.

So in the case of my ST31000333AS having 32mb of cache, HDD tune would modify the DBMS to say lookup

the drive model and display that 32mb info to the user.

Ouch, if is like that, then HD Tune is very limited.

Link to comment
Share on other sites

So when the new SD1A comes out, should I re-flash to it again? I flashed three drives (individually of course or they would not be working...) with the second release of it on Monday evening and all three drives work fine. But should I be concerned since they pulled it? My drives are 1TB.

Hi,

That won't represent any problem.

Serial FLASH can support 100,000 writes/erases (yes 100k). We're very very safe here.

Gradius

The number of re-writes isn't what concerns me. They pulled the firmware for "validation" so I'm wondering if I should re-download and re-flash my drive after they validate the firmware even though the SD1A update on Monday worked for me...

Link to comment
Share on other sites

Well, I re-read the article, it does say power off, but I don't care.

The interesting thing is the third point, because it suggests all of a sudden that it's not the fault of the firmware, but a set of miss-confired test machines.

It explicitly states that all three conditions must be met in order for failure to occur.

Don't blame me, I didn't write the article.

I'm not blaming you. :hello:

Well, about power off is wrong, because I see the whole thing happening before my own eyes.

About the third step, yes, this is normal, because imagine testing 10 millions HDDs per month, is a nightmare, too expensive and time consuming than testing on one-a-one basis. If HD were expensive then yes (perhaps), but since 1.5TB is $130 and 1TB $99 in U.S., it just doesn't makes sense (to them) anymore.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...