Jump to content

Oliver.HH

Member
  • Posts

    7
  • Joined

  • Last visited

  • Donations

    0.00 USD 
  • Country

    Germany

About Oliver.HH

Oliver.HH's Achievements

0

Reputation

  1. I became aware of the problem by the media and found out that Seagate's support website offered no remedy for OEM drives while the Dell website had no information about the problem at all. I contacted Dell and Seagate support to find out whether DE12 firmware was affected. Seagate provided only an automated answer not related to my case while Dell support was not aware of the issue at that time. Then Dell's fix quietly appeared on their website while German support staff still did not have any further information. However, I do have a support contact who tries to help as much as possible. He is now attempting to find out whether the published Dell fix is really the correct one (I'm not sure as there might be a BRINKS/MOOSE confusion). I guess what we're seeing here are huge delays in corporate information pipelines. My impression is that many OEM customers aren't even aware that they've got a Seagate drive in their PCs, let alone that there's a firmware bug out there. While drive failures happen, the reason may not be diagnosed correctly for some time so it may just be too early for these huge support organisations to discover an unusual pattern of failures.Initially, I could not tell whether my drive was affected so my first question was whether Dell's DE12 firmware was based on Seagate's SD15 (or another affected version). Dell support did not have that information. So I read the manufacturing date from my drive's label (September) and compared that to the manufacturing dates of drives which had already failed (from the fail/fine thread in this forum). My impression was that my firmware had a high probability of being derived from the buggy ones and this turned out to be true. In contrast, I've read statements on the web where people just compare the firmware version DE12 to the versions confirmed by Seagate as buggy and then incorrectly deduct that their firmware is OK (try googling "7200.11 +DE12"). By the way, Dell has another fix on their site for ST3750630AS and ST31000340AS drives.
  2. I'd say some of us made attempts at judging the affected drive population. They put all their numbers and assumptions on the table for further discussion. That's not a published number, right? Are you saying this simply to disparage any other attempt to estimate that number? Now you're making wild guesses without any factual basis. You don't know the percentage of test stations writing the "trigger code". That's a number Seagate didn't dare to publish so far and that might be for a reason. Pure speculation. You claim to be a technical expert but you're making assumptions based on corporate psychology. In addition, you're ignoring two little facts: (1) OEMs may be legally responsible for damages incurred by their customers. (2) There are not that many disk drive manufacturers around that a large scale product buyer would light-heartedly agree to reduce the number of competitors. That's the only way you can imagine. We might or might not agree. Anyway, it's probably just too early to tell. You still believe this though Seagate took several attempts to publish a working online serial number check? You are misstating the facts. Seagate simply states the usage patterns employed for AFR and MTBF calculations (2400 power-on-hours, 10,000 start/stop cycles). That does not mean at all, that desktop drives have a higher probability of failure when used 4800 hours per year or any other number for that matter. You cannot tell. Seagate didn't publish data for alternative usage patterns. So you're the one spreading FUD here.BTW, in some respects server disk drives operating 24/7 can have a weaker design compared to desktop, let alone notebook drives: They don't need to withstand the high number of start/stop cycles. So higher price point doesn't necessarily mean more robust design for every usage scenario.
  3. OK, I hope now I've got it ! You're saying that there are certain logging patterns, which might decrease the probability of hitting the critical counter values, right? If so, I'd say, it's entirely possible, though unlikely that such patterns exist. But certainly, we cannot know for sure. If there is some variance in the number of log entries written per power cycle, the probability of drive failure should be along the curves already presented. The lesser the average number of log entries, the higher the chance of failure, but the overall magnitude does not change much. On another topic: I happen to own a Dell OEM drive (ST3500620AS). It currently runs Dell's OEM firmware DE12. Dell has issued an updated version DE13 to address the Barracuda bug, but the update's batch file calls the updater's executable with a parameter "-m BRINKS". In contrast, Seagate's SD1A firmware update for that drive is called "MOOSE...". What happens if the Dell folks inadvertently published the wrong version and I incorrectly apply a BRINKS firmware to a MOOSE drive? Will it just stop working or will it get even worse (silently and slowly altering data, for example)?
  4. thus reducing the probabilities to 1/3 of what calculated for the "single event" addition. That's certainly true for the chance of hitting when you are near the boundary. But if you consider that you are approaching the boundary with three times the speed and there is always a next boundary to hit (initially at 320, then at every multiple of 256), you are getting close to those boundaries three times as often. In the long run, that amounts to 1/3 (chance of hitting when near) * 3 (frequency of being near the boundary) = 1, so the overall probability would be the same. Thanks for the graph! Should help people decide whether to participate in the game ;-).
  5. But the events are hardly equally probable. It's much more likely that you're going to get a very small number each power cycle. The chances of dozens or hundreds of entries each power cycle are almost non-existant unless your drive is hosed to begin with. You are right. My statement was an oversimplification. If you assume that the log is initially empty (event counter at 0), that's certainly true. To be more precise, the probability of the drive failing would be 0% on the first 319 days, jumping to 100% on day 320. Absolutely. That's in line with what I intended to point out. While the probability of anything failing is 100% in an infinite number of years, these drives are very likely to fail in their first year of service. My calculation estimates a 76% chance of failure within a year. Real numbers might be even worse. I'v quickly calculated the chances of drive failure given a certain average number of log entries per power cycle. Again I've ignored the initial 320 boundary, as the log might not be empty when a drive ships. So for 5 entries on average we have about 50 days of 0% failure probability (as the log fills up to its 256 boundary), then a 20% chance of failure on day 51. The chances of a drive being still alive after that are thus 80%. On day 102 there is also a 20% chance of failure, making a total chance of 64% for a drive being alive on day 102 (it has two 20% chances to die until then). And so on. Given 5 log entries on average over one year, the failure probability would be 79%. For 3 log entries on average over one year, the failure probability would be 80.2%. I'm wondering whether Seagate can really say with confidence that "ased on the low risk as determined by an analysis of actual field return data, Seagate believes that the affected drives can be used as is." (see KB article).
  6. True, but we don't have to know. The probability of a drive failing is the same as long as at least one event is logged per power cycle. No, the chance of a drive failing due to this condition is zero unless it is powered off. All that matters is that the event counter changes at all from power-on to power-off. It does not matter whether it increases by 1, or by 50 or by any other value as long as such values are equally probable.
  7. Another attempt to estimate the probability of a drive failing... Given the "root cause" document posted here by sieve-x, this is what we know: A drive is affected by the bug if it contains the defective firmware and has been tested on certain test stations. An affected drive will fail if turned off after exactly 320 internal events were logged initially or any multiple of 256 thereafter. We don't have the details on how often exactly the event log is written to. Someone mentioned that it's written to when the drive initializes on power-up (though I don't remember the source). If that's true, we would have one event per power cycle plus an unknown and possibly varying number in between. Given that, the probability of an affected drive being alive after one power cycle is 255/256. After two power cycles it's 255/256 * 255/256. After three power cycles it's (255/256)^3. And so on. While the isolated probability of the drive failing on a single power-up is just 0.4%, the numbers go up when you calculate the probability of a drive failing over time. Let's assume, a desktop drive is power cycled once a day. The probability of an affected drive failing then is: 0.4% for 1 day 11.1% over 30 days 29.7% over 90 days 76.0% over 365 days Obviously, I'm ignoring the fact that initially a higher number of events (320) must be logged to trigger the failure. Anyway, this would not change the numbers substiantally and the initial number might be even lower than 256 depending on the number of events logged during the manufacturing process. I'm also ignoring the number of events written while the drive is powered on, as it should not affect the overall probability.
×
×
  • Create New...