Jump to content

Seagate Barracuda 7200.11 Troubles


Recommended Posts

http://www.santools.com is my company, and I wrote vast majority of the code.

Yep! he's the one that wrote the article on storagesecrets.com

David A. Lethe, of SANtools, Inc.

Perhaps, one should ask what the software/firmware bug count is per product you have written.

I hate buggy software and firmware and yes I've written software and firmware too.

Seagate has admitted they've known about this problem since the early 1990's. It's very unacceptable

to release and keep releasing products with a known problem. Yes, it was explained by seagate tech

support that it was a race condition in firmware.

If you advocate that a few bugs are ok and is part of business, please detail all commercial software and

hardware products you and your company perform services for so I and the rest of us can avoid your

products. Seagate and any other manufacturer that delivers a known faulty product should be liable

for negligence. It's not to hard to sue to get the source code and have an outside person analyze the

bugs that the vendors hide. See the case on the breath analyzer and voting machines that the courts

now have required them to release regardless of trade secret or proprietary code. Beaware that in

some countries, US law doesn't apply and if faulty products are sold there I'm sure they can prevent

the further sale of these products. Hope, the companies bottom line survives. Numerous manufacturers

hope the warranty expires before a hardware or software bug shows itself. This is attitude is unacceptable.

Look at the lemon law for automobiles. There should be one for software and hardware manufacturers.

RMA'ng the drive is a cop-out. And no, seagate only replaces the drive, they don't fix it and send you back

your original drive. If the hard drives were designed properly, then data recovery companies such as

yours wouldn't be needed. This will be case once solid state devices takeover disk drive manufactures

and the amount of hardware and firmware is less thus reducing the chance of failure.

Unless you personally wrote the firmware for these seagate drives, you have no leg to stand on except

assumed speculation.

BTW, if your companies products had as much problems as seagate, what is your companies policy to

rectify the situation. If it is to ignore the problem, then you will have issues. I and just about everyone else

will not have a problem of informing the technical publishing companies about your products and services.

In the end, the last person to be damaged will be you and your company.

Please provide known information on your faulty products so that I may start contacting the tech websites

about it right away and let everyone else know !!!!

:realmad:

Edited by mikesw
Link to comment
Share on other sites


Come on, I told you all you need to know except the diagnostic bit pattern, and the offset for the 16 or 32-bit number that must be saved in the offset at the moment the drive is powered down that creates the bricking effect. Do you want that too? (Nobody has part of this info, for obvious reasons, but if you search the web, you can find reference to the magic number. Somebody at the TH site posted it yesterday (So if you want to do some detective work go for it).

The most you got from anybody else is a convoluted problem with firmware, which made it look as if everybody is vulnerable. Why do you think Seagate wants serial numbers to see if the disk is a candidate for the problem? If the firmware rev was the only factor, then all you would need is the date code to see if you have a vulnerable disk. Seagate would also be able to write an active-x plug-in to let people test their disk drives online. Instead you have to run a windows EXE and give them your serial number. Common sense is if problem was easy to identify, then they wouldn't go to all the effort to require users to run an executable. It would take all of an hour to write a plug-in to ID the disks.

Use some common sense here, factor in how many 'cudas that Seagate ships in a year, and tell me how many millions of disk drives SHOULD be failing if this is a firmware bug that affects all disks running this particular firmware. Seagate is on a 5-year run rate to ship 1,000,000,000 disk drives ANNUALLY by 2014. If the drive problem was as big as you say it is, then they would have caught it in QC. The problem is a purple squirrel (sorry about the yankee slang -- it means incredibly rare).

If the bricking issue is as big as you claim to be, then EMC, Dell, IBM, HP, Arrow, Apple, and all the others would have made press releases saying they were signing Fujitsu a long time ago. Where are these press releases?

Now here is the dirty little secret. High volume direct customers get bug reports in advance of firmware releases. Draw your own conclusions whether or not you believe consumer-oriented companies such as Apple are more interested in keeping seagate than their devoted customer base. Would Seagate dare to NOT tell Apple, Dell, HP what is going on? Unlikely. Would Apple take such a risk? Less likely? Would Apple and the others assess the risk to their customer base knowing full details and determine that the number of affected disks is statistically insignficant? You tell me.

I am sorry that I have not told you all that I know. Frankly, I have better things to do. It is just that this consipiracy nonsense has gone too far, and somebody has to set the record straight. Off to work, I have a company to run now.

Link to comment
Share on other sites

Just to add a bit of fuel :ph34r: , wouldn't this statement:

http://storagesecrets.org/2009/01/contact-...overy-services/

In my opionion, your disk died due to a mechanical failure, and not any firmware bug.

be, to say the least, unsupported by the evidence of the number of reported successes? :whistle:

Or am I misunderstanding it's meaning? :unsure:

jaclaz

Link to comment
Share on other sites

Come on, I told you all you need to know except the diagnostic bit pattern, and the offset for the 16 or 32-bit number that must be saved in the offset at the moment the drive is powered down that creates the bricking effect. Do you want that too? (Nobody has part of this info, for obvious reasons, but if you search the web, you can find reference to the magic number...)

320

Link to comment
Share on other sites

320

Not actually the "offset".

The "320" is not at all "very hidden", being on slashdot since a few days:

http://it.slashdot.org/article.pl?sid=09/01/21/0052236

Now that the problem, one way or the other, has come to the open, and mostly thanks to Gradius2 and a few other members of the board :thumbup, we can see the light at the end of the tunnel :), notwithstanding the poor way Seagate managed the issue, I would like to spend a few words against the argument of "few cases out of millions drives produced/sold":

There are actually only TWO possibilities:

  1. it's actually a case of few tens or hundreds out of millions, and if this is the case providing recovery would cost proportionally next to nothing to Seagate and should have been offered since day one
  2. there are more drives affected than those listed here on MSFN, in which case a public announcement and recall campaign would have been more than justified

Though of course in this particular case there is no fear of danger to health or risk of injuries something like this would have been justified, as I see it.:

http://www.cpsc.gov/CPSCPUB/PREREL/PRHTML97/97175.html

Because of 3 (THREE) incidents reported in the US, and probably a few more in other countries, the firm recalled some 120,000 Juice Extractors

The recall was published on newspapers all around the world and, far from provoking a damage to the firm it actually bettered it's public image.

Even if the reason for the (few? :unsure:) failures was still under investigation, simply instructing people at technical support, forum and/or call center to reply something like:

We know of this problem, our engineers are developing a solution for it, but you will have to wait a few days, please leave us an e-mail address, you will be notified as soon as such a fix is available, with instructions on how to apply it, by yourself or through our support.

Would have been more than enough to keep a number of enraged customers calm for the time needed.

And again if the problem is about a few hundreds drives, Seagate could have afforded sending a technician in a limousine to the customer home or office and apply the fix.

Still generally speaking there are two ways to avoid "panic" and keep customers satisfied:

  1. deny the evidence
  2. tell the truth, provide remedies and offer excuses

In this era of communication, choice 1. above is, besides immoral, unrealistic.

jaclaz

Link to comment
Share on other sites

And don't forget that medical offices and the military uses Hard disks to to name a few.

Sending the drive back for an RMA is not realistic. Small doctors offices don't always back up YOUR medical

records. Sorry we lost your medical records wouldn't be acceptable.

The military does use PCs in a war zone although they are ruggedized for abuse and the environment they are in.

However, no contractor, sysadmin, or user can design a ruggedized system against the failure rates seagate had

in this firmware bug. Commanding officer: give me the coordinates of the enemy on the map. I can't the computer

won't recognize the disk drive that was working before lunch!!!! I'd have to do backups every minute and even then

there is no guarantee, and I'd also have to have duplicate computer equipment to move the backup to, just in case. The officer,

I need it now! Sorry it'll take a few hours to restore.

So Seagate and hardware/software manufacturers, how many lives were lost because of your defective products?

:ph34r::blink:

Edited by mikesw
Link to comment
Share on other sites

On old days (back in 2000) I used to hack firmwares for Pioneer burners (DVR-Axx family) as you can see here:

http://gradius.rpc1.org

Those old days reminds me of "conversions" thru firmware patches (ie. Liteon SOHW-812S

to Liteon 832S). That makes me wonder if it's possible to convert/flash a ST3500320AS to

ST3500320NS (Enterprise) using firmware SN06C (or ST31000340AS to ST31000340NS). <_<

It's perfectly possible to do such thing (ASM makes the impossible, possible), however IF the hardware aren't the same (and I hope so) then you're just putting a new name for your HDD.

Since they call those drives as "Enterprise" they must be from better components and parts, otherwise the company would be doing some "dirty underground play" asking more for the very same thing, except the name (label and firmware).

Link to comment
Share on other sites

So Seagate and hardware/software manufacturers, how many lives were lost because of your defective products?

:ph34r::blink:

On the bright side, how many "Autovelox", "Speed Cameras" or similar Speed checking devices of the last generation went beserk, lowering the chances of you getting fined? :unsure::)

Another way ;):

http://gizmodo.com/5069422/the-muppets-ani...ng-police-crazy

:P

jaclaz

Link to comment
Share on other sites

So Seagate and hardware/software manufacturers, how many lives were lost because of your defective products?

:ph34r::blink:

On the bright side, how many "Autovelox", "Speed Cameras" or similar Speed checking devices of the last generation went beserk, lowering the chances of you getting fined? :unsure::)

Another way ;):

http://gizmodo.com/5069422/the-muppets-ani...ng-police-crazy

:P

jaclaz

LOL! Very funny, nice find!

Link to comment
Share on other sites

I can't believe Seagate.. They closed my case on a BIOS boot issue with a ST31000340AS drive after a week with one

email notice stating the following. What I wanted to know was who and where, I send my drive to get the data recovered as

I had heard Seagate was offering the service. So I needed prices and guarantees.

This is pathetic!! I have spend hours waiting on the phone also..

>>>>>

Thank you for contacting Seagate Support.

A firmware issue has been identified that affects a small number of Seagate Barracuda 7200.11 hard drive models which may result in data becoming inaccessible after a power-off/on operation. The affected products are Barracuda 7200.11, Barracuda ES.2 SATA, and DiamondMax 22.

Based on the low risk as determined by an analysis of actual field return data, Seagate believes that the affected drives can be used as is.

However, as part of our commitment to customer satisfaction, Seagate is offering a free firmware upgrade.

Please follow this link

(http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931)

to enter the Knowledge Base article(s) detailing the steps to update your drive.

In the unlikely event your drive is affected and you cannot access your data, the data still resides on the drive and there is no data loss associated with this issue. If your drive is no longer accessible, contact us directly for further assistance at http://www.seagate.com/www/en-us/about/contact_us/.

NOTE: If you have contacted Seagate Support regarding a separate issue or about another product, please visit http://www.seagate.com/www/en-us/about/contact_us/ to submit an email.

Thank you.

<<<<<<<<<<<<<<<<

Link to comment
Share on other sites

all you will get from seagate support via email is a blanket email (if any response at all). i thought this was old news. the only way to get any sort of interactive feedback is via the telephone support (and from my experience they usually will not be much use either until you ask to talk to a supervisor)

Link to comment
Share on other sites

I agree that not every Seagate drive (even if its a 7200.11) with a failure at BIOS

detection can linked to 'boot of death' issue and an attempt to repair the incorrect

problem may result in data-loss or even worse damage but the thread is specific,

it has many users with frozen drives that matches the basic requirements (model,

firmware version and serial number) and some are willing to take the risk....

It seens there has been some tension in the air because of the terminal procedure

posted here but rtos has been already available for many years on the net (almost

since the current firmware evolved from Conner) and it's no big deal since the risks

were cleary stated and only some will take their chances (a few may end-up frying

a pcb, losing data if something goes wrong, ex. bad connection). Most will prefer

sending their drives for the free recover/repair option now being offered by Seagate.

I don't think full knowledge of 'boot-of-death' details would allow to create blueprint for

virus writers as that can already be done for years with the flash code and most of today

malware favors information and networks instead of the old destructive payload.

Either defective test machine (read: someone which had just lost his job :whistle: ) or firmware

bug it does not matter for end-users which were caught by surprise. About the percentage of

affected drives it's something that only Seagate or an external audit may know for sure ...

Media is much more damaging than any obscure info or firmware bugs and Seagate took too

much time to act. The result was overloaded staff, angry customers with downtime, their serial

number tool failed, firmware correction messed up internal validation process and I'm sure that

some customers that paid a premium fee for third-party recovery services feel betrayed.

Seagate general support (chat, toll-free, RMA process, etc) is far from perfect (ie. canned

responses) but it's good when compared to other manufacturers, the 5-year warranty for

desktop items (non-enterprise) WAS a plus (for new desktop drives after Jan/03/2009 that

has been changed to 3 years) and in some cases they will replace a failed drive with

a better/bigger refurbished one to avoid losing a customer.

I hope they make firmware open-source so it can be improved and the bug tracking process is

more flexible/reliable. SSD will catch on in next years, no moving parts and it's lower cost to

manufacture (but it's not failure free). First at mobiles (where frequent standby/parking cycles

is big problem despite drive brand) and later on mainstream desktop/enterprise.

Cheers David. SanTools SMARTMon-UX is great tool and some people here (many had their drives

affected) got p***ed off when you minimized the problem and teased everyone saying that you know

the failure root cause details but are not going tell it because it's a dark secret under NDA... ;)

The obvious question comes to mind .. how do you know your disk suffers from boot-of-death, and not something like a circuit board failure or massive head crash?
Edited by sieve-x
Link to comment
Share on other sites

Finally here is the failure root cause "secret" details (no NDAs were hurt in the process :D).

Customer update :

Seagate has isolated a potential firmware issue in certain products, including some Barracuda 7200.11 hard drives and related drive families based on their product platform*, manufactured through December 2008. In some circumstances, the data on the hard drives may become inaccessible to the user when the host system is powered on. Retail products potentially affected include the Seagate FreeAgent® Desk and Maxtor OneTouch® 4 storage

solutions.

As part of our commitment to customer satisfaction, we are offering a free firmware upgrade to those with affected products. To determine whether your product is affected, please visit the Seagate Support web site at http://seagate.custkb.com/seagate/cr...p?DocId=207931.

Support is also available through Seagate's call center: 1-800-SEAGATE (1-800-732-4283)

Customers can expedite assistance by sending an email to Seagate (discsupport*seagate.com). Please include the following disk drive information: model number, serial number and current firmware revision. We will respond, promptly, to your email request with appropriate instructions.

For a list of international telephone numbers to Seagate Support and alternative methods of contact, please access

http://www.seagate.com/www/en-us/about/contact_us/

*There is no safety issue with these products.

Description

An issue exists that may cause some Seagate hard drives to become inoperable immediately after a power-on operation. Once this condition has occurred, the drive cannot be restored to normal operation without intervention from Seagate. Data on the drive will be unaffected and can be

accessed once normal drive operation has been restored. This is caused by a firmware issue coupled with a specific manufacturing test process.

Root Cause

This condition was introduced by a firmware issue that sets the drive event log to an invalid location causing the drive to become inaccessible.

The firmware issue is that the end boundary of the event log circular buffer (320) was set incorrectly. During Event Log initialization, the boundary condition that defines the end of the Event Log is off by one.

During power up, if the Event Log counter is at entry 320, or a multiple of (320 + x*256), and if a particular data pattern (dependent on the type of tester used during the drive manufacturing test process) had been present

in the reserved-area system tracks when the drive's reserved-area file system was created during manufacturing, firmware will increment the Event Log pointer past the end of the event log data structure. This error is detected and results in an "Assert Failure", which causes the drive to hang as a failsafe measure. When the drive enters failsafe further update s to the counter become impossible and the condition will remain through subsequent power cycles. The problem only arises if a power cycle initialization occurs when the Event Log is at 320 or some multiple of 256 thereafter. Once a drive is in this state, there is no path to resolve/recover existing failed drives without Seagate technical intervention.

For a drive to be susceptible to this issue, it must have both the firmware that contains the issue and have been tested through the specific manufacturing process.

Corrective Action

Seagate has implemented a containment action to ensure that all manufacturing test processes write the same "benign" fill pattern. This change is a permanent part of the test process. All drives with a date of

manufacture January 12, 2009 and later are not affected by this issue as they have been through the corrected test process.

Recommendation

Seagate strongly recommends customers proactively update all affected drives to the latest firmware. If you have experienced a problem, or have an affected drive exhibiting this behavior, please contact your appropriate

Seagate representative. If you are unable to access your data due to this issue, Seagate will provide free data recovery services. Seagate will work with you to expedite a remedy to minimize any disruption to you or your business.

FREQUENTLY ASKED QUESTIONS (FAQ)

Q: What Seagate drives are affected by this "drive hang after power cycle" issue?

A: The following product types may be affected by this problem:

Barracuda 7200.11, Barracuda ES.2 (SATA), DiamondMax 22, FreeAgent Desk, Maxtor OneTouch 4, Pipeline HD, Pipeline HD Pro, SV35.3, and SV35.4. While only some percentage of the drives will be susceptible to this issue, Seagate recommends that all drives in these families be update d to the latest firmware!

Q: What should I do if I think I have a Seagate drive affected by this issue?

A: Since only some drives have this problem, there is a high likelihood your drive is working and will continue to work perfectly. However, Seagate recommends that all drives in the effected families be update d to the latest firmware as soon as possible. Seagate realizes this recommendation may present challenges for some customers, particularly those with large distributed installed bases. Seagate will work with customers to correct this problem, but requests customers take the following initial actions depending on what type of customer they are. For individual end-users, please contact Seagate Technical Support via web, phone or email.

http://seagate.custkb.com/seagate/cr...p?DocId=207931 or 1-800-SEAGATE (1 800 732-4283), or discsupportnseagate.com. If emailing, please include the following disk drive information: model number, serial number and current firmware revision.

Q. If my drives are always on, could I see this issue?

A. No, this can only occur after a power cycle, however Seagate still recommends that you upgrade your firmware due to unforeseen power events such as power loss.

Q: How will Seagate help me if I lost data on this drive?

A. There is no data loss in this situation. The data still resides on the drive and is inaccessible to the end user. If you are unable to access your data due to this issue, Seagate will provide free data recovery services. Seagate will work with you to expedite a remedy to minimize any disruption to you or your business.

Q. Does this affect all drives manufactured through January 2008?

A. No, this only affects products that were manufactured through a specific test process in combination with a specific firmware issue.

Q. Why has it taken so long for Seagate to find this issue on Barracuda ES.2 and SV35?

A. In typical nearline and surveillance operating environments, drives are not power cycled and so are not as likely to experience this issue.

Q. Does this affect the Barracuda ES.2 SAS drive?

A. No, the SATA and SAS drives have different firmware.

Q. How will my RAID-set be affected?

A. If the error occurs, the drive will drop offline after a power cycle. The RAID will go into the defined host specific recovery actions which will result in the RAID operating in a degraded mode or initiating a rebuild if a hot spare is available. If you are unsure how your host will respond to a drop ped drive and have not yet experienced this issue, avoid unnecessary power cycles and refer to manufacturer or support for the appropriate instructions.

Q. Is there a way to upgrade the firmware to my drives if they are in a large RAID-set, or do I need to take the solution offline?

A. The ability to upgrade firmware in a RAID array is system dependant. Refer to your system manufacturer for upgrade instructions.

Q. How can I tell which Barracuda ES/SV35 drives are affected?

A. 1). Check the "Drive model #" against the list of affected models below or

2) check the PN of the drive against the PN list below or

3) Call Seagate Technology, support services at 1-800-SEAGATE (1 800 732-4283), or discsupport*seagate.com

If it is a SV35 SATA drive and it is affected, new firmware will be available 1/23/09

Edited by sieve-x
Link to comment
Share on other sites

Q: What Seagate drives are affected by this "drive hang after power cycle" issue?

A: The following product types may be affected by this problem:

Barracuda 7200.11, Barracuda ES.2 (SATA), DiamondMax 22, FreeAgent Desk, Maxtor OneTouch 4, Pipeline HD, Pipeline HD Pro, SV35.3, and SV35.4. While only some percentage of the drives will be susceptible to this issue, Seagate recommends that all drives in these families be update d to the latest firmware!

Now an English grammar question (provided that the answer was written by a mother tongue English speaking executive, possibly with some Law background besides technical and mathematical knowledge).

How much is "only some percentage"?

Let's see, if they were in total a few hundreds, say 800, i.e. 7 or 8 times the number of serials published on MSFN.

Let's also assume that the affected drive models are 1/3 of past year production.

According to dlethe:

Use some common sense here, factor in how many 'cudas that Seagate ships in a year, and tell me how many millions of disk drives SHOULD be failing if this is a firmware bug that affects all disks running this particular firmware. Seagate is on a 5-year run rate to ship 1,000,000,000 disk drives ANNUALLY by 2014.

Seagate is running for 1,000,000,000 disk drives annually, I guess that extimating 2008 production at 1/3 of that number would be prudential.

So let's try the math:

1/3*1,000,000,000= 333,333,333 drives produced in 2008 :unsure:

1/3*333,333,333=111,111,111 drives of the said to be affected models in 2008 :unsure:

Let's round this number by defect to 100,000,000.

now, let's say that 0,2% is the minimum that can be defined "some percentage" (if it was less than this, anyone in his right mind would have used a definition like "a fraction of percentage" or "less than a single percentage point" or something "diminutive" like that)

now:

0,002*100,000,000=200,000

OK, figures above may be exagerated/wrong, let's introduce a 10 times "safety factor":

200,000/10= 20,000

Anything between 20,000 and 200,000 appears "reasonable".

Even taking the lowest of the "range", 20,000 is several times bigger than the initially assumed 800. :w00t:

Is it one of those days where my understanding of English is failing AND my math skills lack any kind of precision? :ph34r::blink:

jaclaz

Edited by jaclaz
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...