CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in
Add ListingModify ListingTell A FriendLink to TPASubscribeNew ListingsCool ListingsTop RatedRandom Link
Newest Reviews
  • review
  • hagen software
  • NOT GPL!
  • Hagan Software
  • Wasted Time with ...
  • poor pre-sale sup...
  • no response
  • rating the offer
  • Good Stuff
  • Good idea but use...


  •  
    Perl Archive : TLC : Programming : Perl : Files and Filehandles
    Guide Search entire directory 
     

    Date Published: 2000-01-15

    While many people know what files and filehandles are in Perl, there are many times that people forget exactly how they should be used. Too often people open files incorrectly, causing data to be lost, or they expect a file to exist, when it really doesn't, and their program lacks the proper error reporting to alert them. Let's fix these problems up first.

    Email comments to japhy@pobox.com

    The Basics

    The open() Function

    The most common way of opening files in Perl is to use the open() function. Here are the most common ways files are opened: open FH, "filename"; # for reading open FH, "<filename"; # also for reading open FH, ">filename"; # create, for writing open FH, ">>filename"; # for appending Let's be sure we have some terms straight. "Reading" means that you can get the contents of the file. "Writing" means that you are placing content in the file. "Appending" means that you are writing, starting at the end of the file; if the file does not exist, Perl attempts to create it for you. "Create" means the file is brought into existence if it does not exist, and clobbered (the contents are erased) if it does.

    If you have a filename stored in a variable, and you're opening the file for reading, it isn't necessary to put the variable in quotes: open FOO, $file or die "can't open $file: $!"; And, while it isn't common practice, you can include the symbols at the beginning of the value in the variable: $file = ">>/tmp/foo"; open FOO, $file or die "can't append to $file: $!"; The reason this isn't suggested is because the symbols help you know exactly what you're doing, and if you have 100 lines of code between the line where you set the variable, and the line that you use it in the open() function, you might not remember how the file is being opened. In addition, it's best to leave a variable holding a filename to hold JUST a filename, or you'd have to make adjustments every time you need to read the filename out of the variable. Therefore, it is good practice to keep the symbols and the filename separate: open FOO, $foo; # you can tell instantly open FOO, ">$foo"; # how the filename in $foo open FOO, ">>$foo"; # is being used here

    "Open this file or die!"

    Error handling is very important when using files. You can never be too sure; did you open the file successfully? If not, why? Perl's die() function, and the $! variable can answer your questions: $file = "/tmp/resutls.txt"; # NOTE the misspelling open RESULTS, $file; $result = <RESULTS>; # you aren't sure if the file opened correctly # unless you're using the -w switch to Perl: # it will whine about "reading on unopened filehandle" open RESULTS, $file or die "can't open $file: $!"; $result = <RESULTS>; # now you know, because Perl will complain # with "can't open /tmp/resutls.txt: file not found..." This remark on Perl's behalf will let you know that something is wrong with your request to open a file. The general rule is that you should always check the return value of system calls; open() is a system call. The $! variable holds the value of the latest system error, and comes in handy when die()ing.

    Closing a File

    To close a file, you simply use the close() function on the filehandle that you used to open the file. close FH; close FH or die "can't close file: $!"; Calling close() is a system call as well, and it can't hurt to ensure that a file was closed properly.

    Reading

    The <FH> Operator

    It is usually not smart to slurp the contents of a file into an array; for large files, this can use a large amount of memory. It is much more sound to iterate on the contents of the file, using a while loop: open FH, "file" or die $!; @contents = <FH>; close FH; foreach $element (@contents) { # line is held in $element chomp $element; # remove ending newline } # ...much better if written as... open FH, "file" or die $!; while (<FH>) { # until end of file... # line is held in $_ chomp; # if you don't want the ending newline } close FH; Perl does not automatically remove the ending newline from a line when you get it; more specifically, it does not remove the ending $/ at the end of a line -- look below to learn about this variable and its usefulness. Use the chomp() function to safely get rid of this ending sequence; while you had to use chop() in Perl 4, Perl 5 has added this safer function. A common mistake when using a while loop is skipping lines in the file, like so: while (<FH>) { # this stores the line in $_ $line = <FH>; # this put the NEXT line in $line } If you only use the $line variable there, you'll end up missing every other line. What was meant here was one of the following: while (defined($line = <FH>)) { ... } while (<FH>) { $line = $_; } while (!eof(FH)) { $line = <FH>; } The first example there shows how while (<FH>) actually works: it is the same as while (defined($_ = <FH>)). This is only true when this is the ONLY statement in the while loop's condition. The reason defined() is required here is to ensure a line consisting of a 0 and nothing else (a rare case, but hey...) is still considered a line. The third example uses the eof() function; this function is three-fold in nature, but we will only discuss the eof(FH) usage here (the rest will be explained in a future column, and you can read it on your own in the perlfunc documentation (see the Resources section at the end of the article)).

    The <FH> notation returns either a single line of the file, if used in scalar context, or the remaining lines in the file, if used in list context: $first = <FH>; $second = <FH>; @rest = <FH>; ($first,$second,@rest) = <FH>; That final line does the same as the first three; because there's a list on the left hand side, <FH> is called in list context. <FH> returns false (specifically, undef if called in scalar context, and an empty list in list context) upon reaching the end of the file, and the next call will start from the beginning of the file. Because @rest = <FH> is in list context, @rest does not have a final element of undef.

    The "End of Line" Variable, $/

    When you read from a file using <FH>, you get the content from your current position to the end of the "line"... but what denotes the end of a line? The $/ variable, which defaults to \n, is what Perl uses to determine if it's reached the end of a line. If you change the value, Perl changes its definition of a line. Here's an example: { local $/ = "\n%%\n"; # why use local? chomp($line = <FH>); } The "end of line" string \n%%\n is a common one used for signature file quotes, as well as for the fortunes for the popular fortune program found on many Unix boxes. Why do we use local() here, instead of my()? Short answer is, we have to, because $/ is a special Perl variable. Enclosing the code in a pair of braces as shown is a way of ensuring $/ gets its original value back. Also, we can use chomp() to remove the value of $/ from the end of a string.

    There are two special values $/ can be set to: undef, and "". They are not the same value, mind you. Setting it to undef means that is no "end of line" marker, so using the <FH> operator will return the entire file as one long string. This is not as inefficient as you may think it would be; it is a fast, and effective way to get the entire contents of a file into a string. The other value, "", turns on "paragraph mode", meaning a "line" will be any series of characters ended by two or more newline characters. In this special case, chomp() removes all newlines at the end of the string. Please note, however, that $/ is a string, and not a regular expression. Setting it to "\n+" will make a line a string of characters ending in a newline followed by a plus sign.

    Writing

    print() and select()

    The print() function is rather simple one to use; the syntax is (says the perlfunc manpage): print LIST print FILEHANDLE LIST print print FILEHANDLE FILEHANDLE can either be a filehandle (FOO), or a variable containing a reference to a filehandle, or a string containing the name of a filehandle (that will be discussed in the second article on files and filehandles). LIST is a regular list. If the LIST is omitted, print() uses $_. If the FILEHANDLE is omitted, print() defaults to STDOUT, or the filehandle currently select()ed.

    The one argument version of the select(FH) function makes the given filehandle the default one; Perl programs start out as though you had said select(STDOUT). This function returns the filehandle that is currently select()ed: print "This goes to STDOUT\n"; $oldfh = select(NEWFH); print "This goes to NEWFH\n"; select($oldfh); print "This goes to STDOUT\n"; Note: this example shows the use of a scalar in place of a filehandle. This "magic" is explained in the next article on this topic, which will describe more advanced file and filehandle operations.

    Here-docs

    As a programmer who's looked over other peoples' code, I must say one of the ugliest things I've seen is the overuse of print statements. I see gunk like: print "<a href=\"foo.html\">Click Here!</a>\n"; print "<br>\n"; print "<h1 align=\"center\">Other Links</h1>\n"; # etc... There are a couple things I find unfavorable: the need to backslash " everywhere, the multiple statements when ONE will do, and sometimes, the programmer doesn't put any \n's in at all, and the output is very messy to the eye. Since we know that print() can take a list, we could say: print "Come, listen to a story\n", "About a man named Jed.\n", "etc.\n"; But if we want to include quotes in there, as well as variables, single quotes around the lines won't help: the \n's won't be interpolated, and neither will the variables. We could use the qq() operator, which allows for a different symbol than " to be used to delimit quoted text: print qq!You can't use a \! in here without putting a backslash in front of it\n!; But just like regular quotes, you need to backslash the quote character. To get around this, we could use paired delimiters, like {}: print qq{You can nest { these things } safely\n}; And as the example shows, pairs can be nested; the number of left and right units of the pair must match. To make a hanging } or { you'd need to backslash it. The final workaround is one I highly suggest, the here-doc. Borrowed from sh, they have a rather simple syntax: print <<THIS; is double-quoted context THIS print << 'HERE WE HAVE'; single-quoted context HERE WE HAVE print << "AND THIS IS"; double-quoted AND THIS IS Note the semicolon after the label on the print() statement! You can also use backticks around the label, but that is seldom done. A very important rule is that if you do not use quotes around the label, it must immediately follow the <<. Another one is that the closing label must be reproduced exactly as shown in the print statement, on its own line, and that there must then be a newline after the closing label: print FH << " two leading spaces"; la dee da two leading spaces that line above was NOT a valid close to this here-doc two leading spaces If you get an error like "Can't find string terminator "END TEXT" anywhere before EOF at filename line nnn." then be sure you typed the label the same way in the beginning and the ending. If they are the same, and your ending label is on the last line of your file, be sure there is actually a newline after that last line.

    You can have multiple here-docs in one statement: print HTML << "end header", << "end body"; <html> <head><title>"I can use quotes!"</title> </head> end header <body>This is now in the 'end body' section.</body> end body As an aside, here-docs can be used when passing arguments to functions, etc.: makeHTML(<< "end of body", $title); <body> blah blah blah </body> end of body $text = << 'EOF'; this is a multi-line string placed into $text. and since pressing enter makes a real newline, I can make newlines while using single quotes! EOF

    Resources

    To read more on opening files, read perlopentut, available at http://language.perl.com/newdocs/pod/perlopentut.html. The documentation on the functions mentioned here is all available in the perlfunc section of the docs, or by typing perlfunc -f NAME at your command prompt. $/ is documented in perlvar. Here-docs are discussed in perldata. All this documentation is also found online at http://language.perl.com/.  

    © 2000, Jeff "japhy" Pinyan has 2.5 years of Perl, inc., written 3 CPAN modules and wrote for TPJ. He is currently is currently writing a Perl book entitled "The Art of Perl: Elegant Perl Style".
    Originally published on PerlMonth, Issue 8, January 2000
    Republished with permission.

     
     


    About The Perl ArchiveLink Validation ProcessSearch Tips
    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives