Jump to content

Seagate Barracuda 7200.11 Troubles


Recommended Posts

Very nice arguments, this recalls my old days on CBBS/BBS age back in 1985. :)

Seagate boot-of-death analysis - nothing but overhyped FUD

Of course that statement above is a BIG bad joke from Seagate or whatever the source is.

To put the things simple to everyone the best proof is looking how many viewers we got on topics here related to Seagate's problems (aka 7200.11 syndrome).

Now lets just do a simple google search, by entering:

"Seagate 7200.11 failing": I got 72,100 links

"Seagate 7200.11 fail": 98,100 links

"Seagate 7200.11-failing": 371,000 links

The bad thing is I don't know how many those links might be related to the same site, so I'll take everything divided by 4:

If we take at least four drives are necessary for someone to write about this issue on the web (hence divided by 4), and at least 10 people will read that (because they have the same issue), and they all will have an average of 2 drives with problems, then we would have: 72,100/4*10*2 = 360,500 defective drives, until 371,000/4*10*2 = 1,855,000 drives (in rough math).

Now, lets looks to those who knows the issues those drives are reporting:

"Seagate 7200.11 bsy+error": 11,100 links

"Seagate 7200.11 0+lba": 4,980 links

16,080 links, unfortunately we cannot apply the same "math" as above, since this is a bit different, few people would know relatively well the problem, and will try to fix the thing themselfs, I would estimate as low as 1% of them. So in best case scenario (for Seagate) they're just by x10 factor, and worst, by x100 factor. So 16,080 * 10 = 160,800 until 16,080 * 100 = 1,608,000.

In both cases, it ultrapasses 1 million mark, coincidence?

overhyped FUD they said? :crazy: LAUGH! :w00t:

Edited by Gradius2
Link to comment
Share on other sites


Now lets just do a simple google search, by entering:

"Seagate 7200.11 failing": I got 72,100 links

"Seagate 7200.11 fail": 98,100 links

"Seagate 7200.11-failing": 371,000 links

Sorry to say so :(, but that's not really a "valid" argument, as I (and some other people) see it ;):

http://homepages.tesco.net/J.deBoynePollar...ess-metric.html

jaclaz

Link to comment
Share on other sites

Yep :), and we don't even have a clear idea on WHICH events are logged and HOW MANY such events take place in an "average powered on hour".

True, but we don't have to know. The probability of a drive failing is the same as long as at least one event is logged per power cycle.

If, as it has been hinted/reported somewhere on the threads, a S.M.A.R.T. query raises an event that is actually logged, we will soon fall in the paradox that the more you check your hardware status the more prone it is to fail.....:w00t:

No, the chance of a drive failing due to this condition is zero unless it is powered off.

All that matters is that the event counter changes at all from power-on to power-off. It does not matter whether it increases by 1, or by 50 or by any other value as long as such values are equally probable.

But the events are hardly equally probable. It's much more likely that you're going to get a very small number each power cycle. The chances of dozens or hundreds of entries each power cycle are almost non-existant unless your drive is hosed to begin with.

And consider this: if the log incremented by EXACTLY one each power cycle (I don't know if that's even possible), what's the probability an (affected) drive will fail? It's 100%. It will fail with certainty because it WILL occur on the 320th power cycle. It will take just under a year or so for this to happen for a lot of home users assuming a power cycle per day. Just an example of course. We have to consider that a lot of drives from the list can be seen failing after around 60 - 100 days. Would this be something roughly like 60 - 100 power cycles for those drives? So maybe for the first 'batch' of bad drives, you're seeing something like a 3 - 5 log entries on average per power cycle.

My point is that the probability of an affected drive failing may be as high as something like 3, 4 or 5:1. We have probably not seen the bulk of failures yet - it's too early! And the lower the average number of log entries per power cycle, the higher the probability eventually becomes for the initial 320th entry and each 256th circulation after that. It will take longer, i.e., more power cycles, but there's a better chance of hitting the bad entry each complete cycle. Even if the average number of entries is very low, like .5 per power cycle, there is an extremely high chance of the drive failing - eventually. It's just going to take around 640 power cycles, but you are unlikely to skip ending exactly on entry 320 (or x*254 thereafter).

Figuring out the probability of failure on any single power cycle isn't really useful. The question most 7200.11 owners have is: What are the chances my drive will fail AT ALL in the next year or two?

Edited by Gibby
Link to comment
Share on other sites

Seagate modify commands in the new firmware:

SD15:

Level T 'i': Rev 0001.0000, Overlay, InitDefectList, i[DefectListSelect],[saveListOpt],[ValidKey]

Level T 'm': Rev 0001.0000, Flash, FormatPartition,

m[Partition],[FormatOpts],[DefectListOpts],[MaxWrRetryCnt],[MaxRdRetryCnt],[MaxEccTLevel],[MaxCertif

yTrkRewrites],[ValidKey]

SD1A:

Level T 'i': Rev 0011.0000, Overlay, InitDefectList, i[DefectListSelect],[saveListOpt],[ValidKey]

Level T 'm': Rev 0012.0000, Flash, FormatPartition, m[Partition],[FormatOpts],[DefectListOpts],[MaxWrRetryCnt],[MaxRdRetryCnt],[MaxEccTLevel],[MaxCertif

yTrkRewrites],[ValidKey],[DataPattern]

Questions:

¿What is [DataPattern] in Level T 'm'?

Can be SD1A bricks repaired with the new commands table?

Edited by pichi
Link to comment
Share on other sites

Questions:

¿What is [DataPattern] in Level T 'm'?

Can be SD1A bricks repaired with the new commands table?

They should work as long you are were dealing with the same issue but SD1A

fixes that and then 'bricking' cause/solution would be something different. About

[DataPattern] I would guess the name says what it does (create/fill data pattern).

Updated my previous post #1045 to shed some light :unsure: around root cause and S.M.A.R.T.

Edited by sieve-x
Link to comment
Share on other sites

I have developed programs to automatize the repairing process, to do it more easy.

Some people have probed these programs and them works.

I am colaborating with Fatlip to give a worldwide low cost solution (adapter and torx), there is people that cannot find adapters.

Soldering station aren't neccesary, electronic knowledge neither.

The work is behind and thanks to a lithuanian webpage we have the solution:

http://yura.projektas.lt/files/seagate/720011_ES2.html

Due to some people that only know copy and paste, and later request donations ... I am thinking if I will give the programs or not. :angry:

Link to comment
Share on other sites

I have developed programs to automatize the repairing process, to do it more easy.

That would be a great thing. :)

I have a few PM's by people who don't know English very well, so I'm trying to find the time to translate existing guide (into Italian), but I am a bit reluctant as this "kind" of people tends to be also not particularly "tech savvy" and the procedure is fairly complex for the newbie, and the risk of somehow "frying" the drive by mistake is great.

Having something along the lines of what I hinted here:

http://www.msfn.org/board/index.php?showto...28807&st=48

tested and working, could make the difference. :thumbup

About the other point, of course you are free to choose your way, but:

We make a living by what we get, but we make a life by what we give.

;)

jaclaz

Link to comment
Share on other sites

All that matters is that the event counter changes at all from power-on to power-off. It does not matter whether it increases by 1, or by 50 or by any other value as long as such values are equally probable.

But the events are hardly equally probable. It's much more likely that you're going to get a very small number each power cycle. The chances of dozens or hundreds of entries each power cycle are almost non-existant unless your drive is hosed to begin with.

You are right. My statement was an oversimplification.

And consider this: if the log incremented by EXACTLY one each power cycle (I don't know if that's even possible), what's the probability an (affected) drive will fail? It's 100%. It will fail with certainty because it WILL occur on the 320th power cycle.

If you assume that the log is initially empty (event counter at 0), that's certainly true. To be more precise, the probability of the drive failing would be 0% on the first 319 days, jumping to 100% on day 320.

Figuring out the probability of failure on any single power cycle isn't really useful. The question most 7200.11 owners have is: What are the chances my drive will fail AT ALL in the next year or two?

Absolutely. That's in line with what I intended to point out. While the probability of anything failing is 100% in an infinite number of years, these drives are very likely to fail in their first year of service. My calculation estimates a 76% chance of failure within a year. Real numbers might be even worse.

I'v quickly calculated the chances of drive failure given a certain average number of log entries per power cycle. Again I've ignored the initial 320 boundary, as the log might not be empty when a drive ships. So for 5 entries on average we have about 50 days of 0% failure probability (as the log fills up to its 256 boundary), then a 20% chance of failure on day 51. The chances of a drive being still alive after that are thus 80%. On day 102 there is also a 20% chance of failure, making a total chance of 64% for a drive being alive on day 102 (it has two 20% chances to die until then). And so on. Given 5 log entries on average over one year, the failure probability would be 79%. For 3 log entries on average over one year, the failure probability would be 80.2%.

I'm wondering whether Seagate can really say with confidence that "ased on the low risk as determined by an analysis of actual field return data, Seagate believes that the affected drives can be used as is." (see KB article).

Edited by Oliver.HH
Link to comment
Share on other sites

@Gibby

@Oliver.HH

What I was trying to say here:

http://www.msfn.org/board/index.php?showto...092&st=1048

was that if certain events are logged "in pairs" or in "triplets", the actual probabilities would lessen a bit.

Take as an example the "normal" XP Event Log, you usually get when booting NT based systems a :

6009 - Microsoft ® Windows ® 5.01. 2600 Service Pack 2 Multiprocessor Free.

6005 - Event log service was started

And a:

6006 - Event log service was stopped

when switching off.

In "normal" power cycle of a (highly ;)) hypothetical install where no errors, no warnings, nor other notifications are logged, the entries would be always in triplets.

To get to 320 in such a system, the "initial" address x would have to be x+3*n=320 as a function of n power cycles, i.e. only values satisfying x=320-3*n, from the bottom:

317 as opposed to 319 - 318 -317

314 as opposed to 316 - 315 -314

311 as opposed to 313 - 312 -311

....

thus reducing the probabilities to 1/3 of what calculated for the "single event" addition.

About this:

Absolutely. That's in line with what I intended to point out. While the probability of anything failing is 100% in an infinite number of years, these drives are very likely to fail in their first year of service. My calculation estimates a 76% chance of failure within a year. Real numbers might be even worse.

Here is a simple graph for first 12 months (assumed as being 30 days each, using working days of course will flatten the curve):

chanceseo9.jpg

I see it more like gambling on a coin throw:

Everytime you switch the thing on you throw a coin, on average for the first six months you will win, besides being not at all what one is supposed to do with data, gambling beyond six months, where percentage gets to 50.6% is betting money on an "unfair" game. ;)

jaclaz

Link to comment
Share on other sites

To get to 320 in such a system, the "initial" address x would have to be x+3*n=320 as a function of n power cycles, i.e. only values satisfying x=320-3*n, from the bottom:
317 as opposed to 319 - 318 -317

314 as opposed to 316 - 315 -314

311 as opposed to 313 - 312 -311

....

thus reducing the probabilities to 1/3 of what calculated for the "single event" addition.

That's certainly true for the chance of hitting when you are near the boundary. But if you consider that you are approaching the boundary with three times the speed and there is always a next boundary to hit (initially at 320, then at every multiple of 256), you are getting close to those boundaries three times as often.

In the long run, that amounts to 1/3 (chance of hitting when near) * 3 (frequency of being near the boundary) = 1, so the overall probability would be the same.

Thanks for the graph! Should help people decide whether to participate in the game ;-).

Link to comment
Share on other sites

Now lets just do a simple google search, by entering:

"Seagate 7200.11 failing": I got 72,100 links

"Seagate 7200.11 fail": 98,100 links

"Seagate 7200.11-failing": 371,000 links

Sorry to say so :(, but that's not really a "valid" argument, as I (and some other people) see it ;):

http://homepages.tesco.net/J.deBoynePollar...ess-metric.html

jaclaz

That study is based from 2005~2007 datas, if google wants to keep on the top of search engine, they should always make better filters to avoid repetitive links.

Perhaps they're already doing better than the past (they should), but I know, I cannot trust such engine for metric, it was just a simple and speculative example. :blushing:

Link to comment
Share on other sites

@Gibby

@Oliver.HH

What I was trying to say here:

http://www.msfn.org/board/index.php?showto...092&st=1048

was that if certain events are logged "in pairs" or in "triplets", the actual probabilities would lessen a bit.

Take as an example the "normal" XP Event Log, you usually get when booting NT based systems a :

6009 - Microsoft ® Windows ® 5.01. 2600 Service Pack 2 Multiprocessor Free.

6005 - Event log service was started

And a:

6006 - Event log service was stopped

when switching off.

In "normal" power cycle of a (highly ;)) hypothetical install where no errors, no warnings, nor other notifications are logged, the entries would be always in triplets.

To get to 320 in such a system, the "initial" address x would have to be x+3*n=320 as a function of n power cycles, i.e. only values satisfying x=320-3*n, from the bottom:

317 as opposed to 319 - 318 -317

314 as opposed to 316 - 315 -314

311 as opposed to 313 - 312 -311

....

thus reducing the probabilities to 1/3 of what calculated for the "single event" addition.

About this:

Absolutely. That's in line with what I intended to point out. While the probability of anything failing is 100% in an infinite number of years, these drives are very likely to fail in their first year of service. My calculation estimates a 76% chance of failure within a year. Real numbers might be even worse.

Here is a simple graph for first 12 months (assumed as being 30 days each, using working days of course will flatten the curve):

chanceseo9.jpg

I see it more like gambling on a coin throw:

Everytime you switch the thing on you throw a coin, on average for the first six months you will win, besides being not at all what one is supposed to do with data, gambling beyond six months, where percentage gets to 50.6% is betting money on an "unfair" game. ;)

jaclaz

it seems to explain why there was a rash of failures beginning a few months ago during the summer then more reciently, as people who powered their system twice a day ( in the morning and evening) compared to those who like myself, tended to power on once a day and left it on and got almost twice the usage.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...