Jump to content

Updated Article on Free Hard Drive Testing Programs


Monroe

Recommended Posts


12 minutes ago, NotHereToPlayGames said:

Nobody wins if everybody strives to get the "last word".

Right! So what do you want to win? :dubbio:

16 minutes ago, NotHereToPlayGames said:

Just rambling...  Carry on...

Agreed! smilie_denk_24.gif

Link to comment
Share on other sites

17 hours ago, jaclaz said:

The programs that can read S.M.A.R.T. values usually offer a (highly unreliable/unproved) "grading" of these values, providing warnings if certain thresholds are reached, but it is the underlying technology which is not capable of providing (meaningful) predictions:

I'm sorry, I have to disagree, Hard disk Sentinel warns ahead of any failure, especially if we talk about Seagate (which one shouldn't purchase anyway, due to the high failure rate). From my observations, Seagate will fail within several days or weeks after HDD Sentinel showed its warning. One thing to point out, this software does not always know newest HDD models.

Link to comment
Share on other sites

4 hours ago, Dixel said:

I'm sorry, I have to disagree, Hard disk Sentinel warns ahead of any failure, especially if we talk about Seagate (which one shouldn't purchase anyway, due to the high failure rate). From my observations, Seagate will fail within several days or weeks after HDD Sentinel showed its warning. One thing to point out, this software does not always know newest HDD models.

You are very welcome to disagree, no need to be sorry.

The point I am trying to make is that even if it (the S.M.A.R.T.) was a reliable predictor (and it isn't, at least in large numbers) it is a "vague" one, just like "Seagate", each model/series of a given brand like Seagate may present different failure modes, and while some of them may actually cause a change in S.M.A.R.T. parameters that can lead to a valid warning by the monitoring software many others won't.

This and the lack of proper data (about the way some of the S.M.A.R.T. parameters are implemented by the manufacturer) lead to a situation that can be described as "when it works it works, when it doesn't, it doesn't" so that the reference I often make to flippism being equivalent (intentionally provocative) is not at all unjustified.

The issue I have with your statement revolves about the "any" in "ahead of any failure", as what happens in practice is ahead of some failures where - additionally -  the ahead could mean hours, days, weeks or even months ahead.

Better than nothing, but still nothing to be trusted upon.

jaclaz

 

Link to comment
Share on other sites

3 hours ago, jaclaz said:

You are very welcome to disagree, no need to be sorry.

The point I am trying to make is that even if it (the S.M.A.R.T.) was a reliable predictor (and it isn't, at least in large numbers) it is a "vague" one, just like "Seagate", each model/series of a given brand like Seagate may present different failure modes, and while some of them may actually cause a change in S.M.A.R.T. parameters that can lead to a valid warning by the monitoring software many others won't.

This and the lack of proper data (about the way some of the S.M.A.R.T. parameters are implemented by the manufacturer) lead to a situation that can be described as "when it works it works, when it doesn't, it doesn't" so that the reference I often make to flippism being equivalent (intentionally provocative) is not at all unjustified.

The issue I have with your statement revolves about the "any" in "ahead of any failure", as what happens in practice is ahead of some failures where - additionally -  the ahead could mean hours, days, weeks or even months ahead.

Better than nothing, but still nothing to be trusted upon.

jaclaz

 

By "any" I mean the below, with the bold ones being the most ominous, meaning the drive will fail soon,

and from my personal observations with many disks, it dies within days/weeks. (not months

Raw Read Error Rate (may live longer with it, in contrast to write error rate)

Spin Up Time

Start/Stop Count

Reallocated Sectors Count

Seek Error Rate (always huge in Seagate)

Power On Time Count

Spin Retry Count

Drive Calibration Retry Count

Drive Power Cycle Count

Power off Retract Cycle Count

Load/Unload Cycle Count

Disk Temperature

Reallocation Event Count

Current Pending Sector Count

Off-Line Uncorrectable Sector Count

Ultra ATA CRC Error Count (some small chance it could be the cable one bought for cheap)

Write Error Rate

And I'm not sorry anymore, if you insist.

Link to comment
Share on other sites

With all due respect to your personal experience, unless you are (and have been for a long time) an IT technician working on a very large organization (that actually monitors S.M.A.R.T. parameters on a large number of disks), I doubt that you can have enough data points to "cover"  17 different parameters and highlight as more common 8 of them (if you prefer your "many disks" may be too few to draw conclusions).

Backblaze, monitoring some 70,000 disk drives found only 5 (maybe 6) parameters correlated with impending drive failure:

https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/

Quote

SMART 5         Reallocated Sectors Count
SMART 187         Reported Uncorrectable Errors
SMART 188         Command Timeout
SMART 197         Current Pending Sector Count
SMART 198         Uncorrectable Sector Count

and possibly:

Quote

SMART 189 High Fly Writes

not fully overlapping with your list and with some caveats about the need of additionally interpreting the actual values depending on the timeframe they occurred/increased count.

A known (cited in the article) BIG unknown is whether power up cycles (S.M.A.R.T. 12) has any correlation as the drives they monitored belong to server farms so they have very, very few power cycles, unlike the disk drives in use by many organizations and most final users that are usually powered on/off on a daily basis, it is reasonable to add it to the list, making it a total of 7 parameters to look for.

jaclaz

Link to comment
Share on other sites

14 hours ago, jaclaz said:

an IT technician working on a very large organization (that actually monitors S.M.A.R.T. parameters on a large number of disks)

@jaclaz You compared Sacramento data center, that runs expensive Enterprise hard drives 24/7, with the consumer grade hard disks MSFN members use, really? 

 

14 hours ago, jaclaz said:

SMART 189 High Fly Writes

My consumer grade hards don't even have that parameter SMART 189 High Fly Writes.

 

What's the point of such comparison? Please enlighten.

Link to comment
Share on other sites

I didn't compare anything.

I plainly stated that I doubt that Dixel had enough data points, unless he actually deals (and has dealt for years) with many, many (hundreds, thousands) hard disks, while actually keeping a log of periodical reads, failure modes, etc. and that as such the conclusions he draws are - no matter if right or wrong - anecdotal data based on a relatively small experience, and anyway very likely spread over a too small sample for each make/model of hard disks.

Then - separately - I briefly summed up the results of the Backblaze report.

jaclaz

 

Link to comment
Share on other sites

Yes, once sectors start to be reallocated, the drive will likely fail even if the sectors later seem fixed. Ultra DMA errors can be caused by a bad connection, and can also be generated in software. An old verson of HDAT2 would try to access the disk in DMA mode that didn't work and rapidly generated errors.

High Fly Writes seem to be slowly accumulating on my VX000 drive without notable events, currenly at 1104 and the manufacturer's rating of 1%, really bad. The drive works. This parameter is present on VN000 where it is at 50, and VX007 where it remains at 0.

Link to comment
Share on other sites

@j7n, I'm sorry to inform, what you say won't pass the @jaclaz's certification, since he will obviously doubt that you had enough data points, unless you actually deal (and has dealt for years) with many, many (hundreds, thousands) hard disks, while actually keeping a log of periodical reads, failure modes, etc.

Link to comment
Share on other sites

4 hours ago, j7n said:

Maybe we can draw some conclusions from the collective experience of several members, if more than one reports similar findings.

Agree! Starting the petition would make sense then. For that, you would still need @jaclaz's blessings though.

Link to comment
Share on other sites

@Cocodile

There are no such things as jaclaz's certifications, let alone blessings.

@j7n

In theory yes, in practice, unfortunately, in the best case you will collect a number of vague, inaccurate reports creating the typical GIGO (Garbage In Garbage Out) situation.

jaclaz

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...