|
Date Published: 2000-05-01
Written for the Perl Archive by Turk Scripts
In this article we wanted to point out several perl tips, which might be helpful for beginner or
intermediate level perl programmers. Please feel free to send an email to
turkscripts@hotelspectra.com if you have any questions or
corrections.
Reading the whole file to a variable at one step.
Instead of reading a file line by line, you might want to read the whole file to a variable at one step.
This is useful especially if you are reading html files. If you open a file and try reading from that
file, perl reads only until the first [enter] character and stops. The reason of this behavior is that
the default "input record separator" in perl is the [enter] character. This separator is defined in the
special variable $/. By default $/ is
equal to "\n". If you undefine this variable using
undef, you can read the file at one step,
Example:
undef $/;
open(FILE, "data.htm");
$html = <FILE>;
close(FILE);
Using qq{} for printing strings:
If you take a look to most of the cgi scripts you might see a line like:
print "<a href=\"http://yahoo.com\">Yahoo is $property</a>";
Well, if you use double quotes to print strings, you have to escape all double quotes, which appear
in your string. An easier way is to use qq{} function. It is a replacement
for double quotes and you don't need to escape anything except you should escape any
"}" in your string unless it's preceded by a
"{". Same line can be written as:
print qq{<a href="http://yahoo.com">Yahoo is $property</a>};
Another benefit you might be interested in is that you can put multiple lines of data inside
qq{}.
Example:
print qq{
<html>
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>New Page 1</title>
</head>
<body>
<p>Some html page</p>
</body>
</html>
};
Regular expressions to search a variable:
I am assuming from now on that you are familiar with substitution operator in perl:
s///. A basic example:
s/apple/orange/;
would replace the word "apple" with the word "orange". The separator "/" we
used in this example can be replaced with any other non alpha-numeric character. The catch is; you have to escape the separator
character inside your regular expression. So it is a better idea to use a less common character as a separator
than "/". I prefer using "#" as a separator,
because it is less common in strings and visually it is a good separator. So same regular expression could be
written as:
s#apple#orange#;
A common mistake people do when using regular expressions is to try to match a variable in your regular
expressions.
Example:
$data =~ s#$url#http://yahoo.com#;
This is going to work properly most of the time. But sometime it won't behave as expected or you will be
experiencing occasional run time errors. For example, if your
$url is equal to
http://yahoo.com/do.cgi?action=go++&tell=poetry, the substitution operator is
going to fail and exit with an error message.
"/http://yahoo.com/do.cgi?action=go++&tell=poetry/: nested *?+ in regex..."
The reason for the failure is that you can't use "++" inside your regular
expression. You have to escape them. The variable might include several special variables, which have to be escaped
properly. To correct way to implement this substitution is:
$temp = quotemeta($url);
$data =~ s#$temp#http://yahoo.com#;
quotemeta() is a standard perl function and it escapes all non-alphanumeric
characters in your variable.
Using eval for clever substitutions:
If you used regular expressions in perl, you should have used substitution operator frequently. Most of the time
a simple substitution is satisfactory.
Example:
$html =~ s#\bdogs\b#cats#ig;
In this example all the occurrences of the word "dogs" are replaced by the word "cats". What if we want to
replace "dogs" with variable we calculated in our program rather than a fixed text.
Example:
$html =~ s#\bdogs\b#join(', ' , @animals)#ige;
In this example we used "e" switch, which enables us to use a result of an
expression as a replacement.
"e" means: evaluate right side as an expression.
If you want to do more complicated replacement using a chunk of code, you might want to use
eval function with curly brackets.
Example: In this example if the target of a link in an html page is "_top", then
we replace that link with a link to http://yahoo.com.
$html =~ s#<a href="([^"]*)" target="([^"]*)"#
eval{
if($2 eq "_top"){
$string = qq{
<a href="http://yahoo.com" target="_top"
};
}
else {
$string = qq{
<a href="$1" target="$2"
};
}
$string
}
#iges;
Company Info
Turk Scripts is a Turkish company specialized on high performance and bug-free CGI & Perl scripts.
Current projects are focused on fetching web pages on the fly, parsing databases, personalization,
spidering, information processing and retrieval. Please
visit our web site for more information.
Written for the Perl Archive by Turk Scripts
|