Jump to content

How can I Read and write (overwrite, not append) to the same file in p


Recommended Posts

Hello,

I'm having difficulty reading from and writing to the same file. I'm not sure what I'm doing wrong. It works fine if I'm in read/write append mode, but that doesn't accomplish what I want.

I'm trying to read the file in, do some processing on it, then overwrite the file. I honestly don't understand the purpose of "+> operator". "+<" works fine, but it appends to the file. "+>" deletes the contents of the file immediately after I open it. Is it that the "+>" operator is simply not the right tool for reading from and (over)writing to the same file? I saw a perl module called Tie::File but I am having difficulty using it and would appreciate some help with it. Otherwise, I'll just read from the file, close it, and then write to the file, if I can't get the "+>" operator working for this problem.

Thank you.

Here's my code:

	
#!/usr/local/bin/perl -w

use Fcntl qw(:DEFAULT :flock :seek); # import LOCK_* constants

local $/=undef;

#Read/Write with overwrite
open(FILE, "+>", $file) || die("Cannot open file");
flock(FILE, LOCK_EX);
seek(FILE, 0, SEEK_SET);

$file_data=<FILE>;

#Do some processing on $file_data here

# (Over)Write to same file
print FILE $file_data;
close(FILE);

Thank you.

Link to comment
Share on other sites


Of course "+>" is not right , if you open a file while deleting its content, you won't read too much data from it. The original purpose of this mode is to write first the in file after erasing its contend then read. For what you're triying to do "+<" should work.

Link to comment
Share on other sites

I honestly don't understand the purpose of "+> operator". "+<" works fine, but it appends to the file. "+>" deletes the contents of the file immediately after I open it.

+> truncates the file (makes it a 0 byte file) so it's hard to read from that for sure. It will also create the file if it doesn't exist already (either ways, you're getting that 0 byte file)

+< is read/write. However, after you're done reading the file, your "position" is at the end of the file, so if you start writing then that's where you'll be writing from -- essentially appending (very much like it would using any other language in this specific scenario). If you want to write from the beginning, you have to seek to the beginning first.

Not that I would do it this way, unless you're at least 100% certain that the content you'll be writing will never be smaller by *any* amount (even a single byte), because then you'll have garbage appended at the end of your new file. Your best bet (again, for any language -- so long as the files aren't huge) is to first open the file, reading its contents into some sort of variable, then closing it. Then you do whatever processing it is you wanted to do. Then you finally reopen it, this time for writing, *truncating* the old file, write the new stuff to it and close it once last. Or you can also rename the old file as a backup (if you want one), and create the new file. That's much more fool-proof in most cases.

Link to comment
Share on other sites

Yes, thank you, I see that now, "+>" is not the right tool for the job. However, I was looking at overwrite, as opposed to append, because I didn't wish to append data to the end of the file.

FYI -- I should've mentioned in this in the original post -- I am running Strawberry Perl 5.12.2.0 on Windows XP SP3 32-bit, and the file I'm opening is a text file and some of the processing involves regex.

After processing, I rewrite the entire file, instead of just a portion of it. The problem is the much of the rewrite starts at the beginning of the file. Again, my logic was instead of picking-and-choosing which parts of the file to rewrite, why not just read the entire contents (it's not a huge file) of the text file into Perl, do processing on it, then rewrite the entire file.

I looked at the seek doc and did some more research on append and seek, and at least from unix/linux, it is not possible to syseek or seek with append. I had tried it, too, and it would only append to the end of the file.

SOURCE: http://www.justlinux.com/forum/showthread.php?t=131467

MY CODE for read/append ("+>>")


#!/usr/local/bin/perl -w

use Fcntl qw(:DEFAULT :flock :seek); # import LOCK_* constants

open(FILE, "+>>", "test.txt") || die("Cannot open file");
flock(FILE, LOCK_EX);
seek(FILE, 0, SEEK_SET);
$file_data=<FILE>;
print $file_data;
print FILE "xxx";

close(FILE);

print "\n\n-------\n\n";

Beforehand, I opened the file in read-mode, copied its contents, closed the file, opened the file again in write-overwrite-mode, wrote to the file, and, finally, closed the file. But I thought why open and close the same file twice, when I may be able to do it all in one shot? It'll be more efficient, less code, making it potentially easier to maintain and debug, and less of a performance hit. With one file, it's no big deal. But if I'm processing many, many files (i.e. reading in a directory), I could see a performance hit. So, this is why I posted in the first place, to learn if there's a better way.

I like the suggestion for the backup, thank you. I suppose I'll just open it twice, unless I can get the Tie::File module to work correctly.

Thank you both again.

Link to comment
Share on other sites

I do not know if this will help or not but here is a VBS script

1\ Opens the textfile and read it contents into one varible called V1

2\ Then uses the V1 varible to rewrite the textfile and adds at Line 4 and Line 7, Add 1 , Add 2,

then closes the textfile with the changes saved.


Const ForReading = 1, ForWriting = 2, ForAppending = 8
'-> Object For Script
Dim Fso :Set Fso = CreateObject("Scripting.FileSystemObject")
'-> Varibles For Use
Dim C1, File, Ts, V1, V2
'-> Check To Make Sure File Is Present
File = "Test_Text.txt"
If Fso.FileExists(File) Then
'-> Loop To Read All The Text File Into Varible V1
Set Ts = Fso.OpenTextFile(File,ForReading,True)
Do Until Ts.AtEndOfStream
V1 = Ts.ReadAll
Loop
Ts.Close
'-> Loop To Add The New Add 1, Add 2 At Lines 4 And &
Set Ts = Fso.OpenTextFile(File,ForWriting,true)
For Each V2 In Split(V1, vbCrLf)
C1 = C1 + 1
'-> Add To Line 4 And Line 7
If C1 = 4 Then
Ts.WriteLine "Add 1 " & V2
ElseIf C1 = 7 Then
Ts.WriteLine "Add 2 " & V2
ElseIf V2 = "" Then
'-> Do Nothing It A Blank Line
Else
'-> Add The Unchange Line Back To File
Ts.WriteLine V2
End If
Next
Ts.Close
Else
MsgBox "Missing This Text : " & File
End If

Link to comment
Share on other sites

Quick question gsm, would I need to add to line four and line six to add to lines four and line seven respectively.

If I add to line four it would mean that old line four became new line five, old line five became new line six and old line six became new line seven! I'd suggest the term append to line

Link to comment
Share on other sites

If I add to line four it would mean that old line four became new line five,

No the script only adds to the front of the line and line 4 remains line 4 after the change.,

V2 would be line 4 from the varible V1 after it had been Split with vbCrLf


If C1 = 4 Then
Ts.WriteLine "Add 1 " & V2

Contents Of Test_Text.txt before script runs


Line 01
Line 02
Line 03
Line 04
Line 05
Line 06
Line 07
Line 08
Line 09
Line 10

After script ran once


Line 01
Line 02
Line 03
Add 1 Line 04
Line 05
Line 06
Add 2 Line 07
Line 08
Line 09
Line 10

If you ran the script 2 times then it


Line 01
Line 02
Line 03
Add 1 Add 1 Line 04
Line 05
Line 06
Add 2 Add 2 Line 07
Line 08
Line 09
Line 10

Link to comment
Share on other sites

would I need to add to line four and line six to add to lines four and line seven respectively.

Doesn't really matter. The OP is doing something completely different in the first place like he explained in post #4 (using regular expressions). And then again, he already had a working solution that did the file open/read/close, processing, then file open/write/close separately.

His only problem was opening the file just once with R/W access, reading from it, seeking back (which he wasn't doing so it was effectively appending) and then writing again -- which as-is was bad idea in the first place: if your new content is shorter than the old one, then you end up with junk (contents from the old file) tacked on at the end of it (I tried to explain before that he had to seek back, and that this often wouldn't be sufficient to solve the problem too)

So changing language without any real technical merits or benefits, not using regular expressions, or adding specific logic that's completely irrelevant to the problem and such? Ok, whatever... But this doesn't actually address his actual problem in any way: reading from & writing to the same file by opening it just once. Not mentioned (because he hasn't discovered the next problem that would arise once he gets this working), but it must be able to "shrink" the file too if necessary.

There is a way to do exactly what he's asking for (and in perl, still using regexp's and all), not that it really offers any actual benefits vs opening it twice IMO:

  • open the file with RW access (using "+<")
  • read its content into some variable
  • storing the size/length of the said "old" contents in a variable
  • do the processing on it just like before
  • seek to the beginning of the file: seek(FILE, 0, SEEK_SET); (what he wasn't doing after reading from the file, thus making it append)
  • write your new content
  • if the size of the "new" content is less than the size of the old contents previously stored, then call truncate(FILE, newSizeHere); on it (discarding the extraneous bytes)
  • close the file

Not that it's any better than his current/old solution IMO. He's seemingly trying to do that for performance, but saving a file open/close operation (just getting a file handle) vs the added seek & truncate operations... There's basically going to be no measurable difference between the two (way less than 1ms difference*). I'll much sooner use the code that's more solid (proper error handling for starters), better written, better written/documented, easier to understand, more versatile/reusable, is better tested (e.g. has good unit tests), is easier to use, will be better supported in the future, etc.

Either ways, I think this is completely pointless in the first place (and this is why I have not/will not bother spending the 5 minutes to write code that does exactly this). This particular problem (replacing text using regular expressions) was already solved 35 years ago by AWK (using sub or gsub). He's just reinventing the wheel, and poorly at that.

* Edit: allen2 sure has a good point there too (see post below). I mean, if this executes once in a while it's pointless trying to spend hours of coding to shave off a few microseconds of execution tme. But if you're going to use this in a situation where it actually matters (like running it a billion times in a loop) then a scripting language probably isn't the best tool for the job in the first place (you'd want something compiled for sure -- and probably make the tool iterate through the files instead of running it a bazillion times). Then again, sometimes regular expressions are also overkill (or not the best pick) for the job and something like a Boyer-Moore search might be faster to find the parts that need replacing. I don't personally bother much with optimization (assuming the code is already half-decently written) until it actually becomes a problem (then you profile and see what needs to be optimized -- the file I/O, the time spent on string ops, the time spent spawning the same process repeatedly, etc -- and then address that particular problem)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...