Jump to content

What happens to the truncated part of a data file?


Recommended Posts

Suppose you have created a file with 100 words, saved it, and reopened it to delete half of the words.

What happens physically to the deleted words? Are they still detectable on the hard disk?

If not: would it be a good idea to shred files by opening them in Notepad / Editor delete the content and save the cleared file?

Thanks for any reply,   B.

 

Edited by Burundi
Link to comment
Share on other sites


It is an interesting question, to which you already answered yourself, but - unfortunately -"wrongly".

If you delete the contents and save the cleared file you are not shredding anything.

Please follow me.

Let's say that your file was originally (for the sake of the example) any size bigger than 1024 bytes (i.e. normally 2 sectors) and smaller than 4 kB (i.e. the common size of a NTFS filesystem cluster), assume 1200 bytes.

At filesystem level the smallest accessible unit is the cluster, i.e. your file is addressed (more or less) with these info/instructions:
1) cluster #123456 is occupied (all 4096 bytes in it are cluster #4096)
2) it is occupied by a filed called "mynicefile.txt"
3) the length of this file is 1200 bytes

When you ask Notepad (or similar) to open that file, the instructions the OS and filesystem driver perform are roughly:

1) get to the beginning of cluster #123456
2) read 1200 bytes (the length of the file) from that start position

When you simply Save (in Notepad or similar) the file without modifications the instructions the OS and filesystem driver perform are roughly:

1) get to the beginning of cluster #123456
2) write 1200 bytes (the current length of the file) from that start position

Once you open the file in notepad in it, select all, delete, then save, the instructions the OS and filesystem driver perform are roughly:

1) get to the beginning of cluster #123456
2) write 0 bytes (the current length of the file) from that start position

So, no shredding of sorts happens, the whole contents of the file are still on cluster #123456, only they are not anymore addressed/indexed (as the length of the file is now 0 bytes).

Now, if you have a 1200 bytes file, you open it, and replace each character with a random one (or with a fixed one, whatever) and save, as long as the file after the edit is 1200 bytes, then the whole length of the previously saved file will be overwritten, and you will have sort of "shredded" the file contents.

An even more interesting question (that you didn't ask) is another one:

What happens if I write a .txt file around 700 bytes in length, save it, then continue typing until I reach 1200 bytes, save again, then decide to re-open it and "shred" it by replacing every character in it with something else and save again?

Surprisingly, on NTFS (on a common 512 bytes/sector device) the answer is that the original (roughly 700 bytes in length) file is "carved in stone" (in the $MFT) and can still be integrally recovered.

On new disks with 4096 bytes/sector the size of the file becomes around 3700 bytes :w00t: :ph34r:

https://www.forensicfocus.com/forums/general/mft-resident-data/

 https://www.forensicfocus.com/forums/general/mft-resident-data/#post-6565939

https://www.forensicfocus.com/forums/general/delete-file-in-safe-way/#post-6587693

jaclaz

 

Link to comment
Share on other sites

Dear jaclaz

Thank you very much for the detailed answer.

You have clarified very precisely that no data is "destroyed" when the file is rewritten without content.
I can read them (the data), since you brought me on the right track, in the HEX editor (if they are words).

On 6/7/2021 at 2:03 PM, jaclaz said:

So, no shredding of sorts happens, the whole contents of the file are still on cluster #123456, only they are not anymore addressed/indexed (as the length of the file is now 0 bytes).

You call them "only they are not anymore addressed/indexed".

Is there a way to recover them easily? The word "only" suggests this.

I tried with a program that is designed for data recovery, but it failed. IMHO they are gone for good.

Best regards   B.

Edited by Burundi
Link to comment
Share on other sites

Well, a file address is essentially:

1) an offset (actually a cluster number)
2) a length
3) a file name (+ other metadata, such as creation date/time, last modified date/time, etc.)

If you have #1 and #3 but #2 is 0, you can change #2 to (say) 500, then 700, then 1000, then 1200 (i.e. until you have the whole file).

If you do not have #1 but not #2 and not #3 you can still "carve directly" the disk until you find the beginning of the file and then copy 1200 bytes starting from that (you will have lost file name and other metadata but you will still be able to recover contents).

Apart .txt files (that by definition are "headerless" and "footerless", they are simple, plain, raw "text") most other files format have either a recognizable header, or a recognizable footer (or both) and thus searching the disk for these recognizable patterns will give you results, additionally some file formats have also internal structures from which you can derive size and other metadata.

This, until the file is not fragmented, i.e. it is contiguous.

So this is an excellent reason to keep files as contiguous as possible (i.e. run - within limits - defrag or similar often): data recovery will be much more likely to succeed.

jaclaz

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...