|
Date Published: 2002-08-28
No matter how much we try to persuade people that Perl is a multi-purpose programming language, we'd be deluding ourselves if we didn't admit that the majority of programmers first come into contact with Perl through their experience of CGI programs. People have a small web site and one day they decide that they need a guest book, a form mail script or a hit counter. Because these people aren't programmers, they go out onto the web to see what pre-written scripts they can find.
And there are plenty to choose from. Try searching on ``CGI scripts'' at Google. I found about two million hits. The first two were those well-known sites - Matt's Script Archive and the CGI Resource Index. Our web site owner will visit one of these sites, find the required scripts and install them on his web site. What could be simpler? See, the web is as easy as people make it out to be.
In this article, I'll take a closer look at this scenario and show that all is not as rosy as I've portrayed it above.
Finding CGI Scripts
Dave Cross <dave@dave.org.uk>
An important factor that Google takes into account when displaying search
results is the number of links to a given site. Google assumes that if there
are a large number of links to a given web page, then it must be a well known
page and that Google's visitors will want to visit that site first.
Notice that I said ``well known'' in that previous paragraph. Not ``useful'' or
``valuable''. Think about this for a second. The kinds of people that I
described in the introduction are not programmers. They certainly aren't
Perl programmers. Therefore they are in no position to make value judgements
on the Perl code that they download from the Internet.
This means that the ``most popular'' site becomes a self-fulfilling prophecy.
The best known site is listed first on the search engines. More people
download scripts from that site, assuming that the most popular site must
have the highest quality scripts and the popular sites end up becoming more
popular.
At no point does any kind of quality control enter into the process.
OK, so that's not strictly true. If the scripts from a particular site just
didn't work at all then word would soon get out and that site's scripts
would become very unpopular. But what if the problems were more subtle and
didn't manifest themselves on all sites. Here is a list of some potential
problems:
-
Not checking the results of an
open call. This will work fine if the
expected file exists and has the right permissions. But what happens when
the file doesn't exist? Or it exists but the CGI process doesn't have
permissions to read from it or write to it?
-
Bad CGI parameter parsing code. CGI parameter parsing is one of those things
that is easy to do badly and hard to do well. It's simple enough to write
a parser function that handles most cases, but does it handle both GET and
POST requests? What about keys with multiple associated values? And does it
process file uploads correctly?
-
Lack of security. Installing a CGI program allows anyone with an internet
connection to run a program on your server. That's quite a scary thing to
allow. You'd better be well aware of the security implications. Of course
if people only ever run the script from your HTML form, everything will
probably be fine, but a cracker won't do that. He'll fire ``interesting''
sets of parameters at your script in an attempt to find its weaknesses.
Suddenly a form mail script is being used to send copies of vital system
files to the cracker.
It's also worth bearing in mind that because these scripts are available
on the web, crackers can easily get hold of the source code. They can then
work out any insecurities in the scripts and exploit them. Recently a
friend's web site can under attack from crackers and amongst the traces
left in the access log were a large number of calls to well-known CGI
scripts.
For this reason, it is even more important that you are very careful about
security when writing CGI scripts that are intended to be used by novice
webmasters.
The fact is, unfortunately, that these kinds of problems are commonplace in
the scripts that you can download from many popular CGI script archives.
That's not to say that the authors of these scripts are deliberately trying
to give crackers access to your servers. It's simply evidence that Perl
has moved on a great deal since the introduction of Perl 5 in 1994 and many
of the CGI script authors haven't kept their scripts up to date with
current practices. In other cases, the authors know only too well how out
of date their scripts are and have produced newer, improved versions, but
other people are still dis
tributing the older versions.
Although the people who are downloading these scripts aren't usually
programmers, there often comes a time when they want to start changing the
way a program works and perhaps even writing their own CGI programs. When
this time comes they will go to the scripts they already have for examples
of how to write them. If the original script contained bad programming
practices then these will be copied in the new scripts. This is the way
that many bad programming practices have become so common amongst Perl
scripts. I therefore think that it's a good idea for any publicly distributed
programs to follow best programming practices as far as it is possible.
So now we have an obvious problem. I said before that the people who are
downloading and installing these scripts aren't qualified to make judgements
on the quality of the code. Given that there are some problematic scripts
out there, how are they supposed to know whether they should be using a
particular script that they find on the web?
It's a difficult question to answer, but there are some clues that you can
look for that give a idea of how well-written a script is. Here's a
brief checklist:
-
Does the script use
-w and use strict? The vast majority of Perl
experts recommend using these tools when writing Perl programs of any level
of complexity. They make any Perl program more robust. Anyone distributing
Perl programs without them probably doesn't know as much Perl as they think
they do.
-
Does the script use Perl's taint mode? Accepting external data from a web
browser is a dangerous business. You can never be sure what you'll get. If
you add
-T to a program's shebang line, Perl goes into taint mode. In
this mode Perl distrusts any data that it gets from external sources. You
need to explicitly check this data before using it. Using -T is a sign
that the author is at least thinking about CGI security issues.
-
Does the script use CGI.pm? Since Perl 5.004, CGI.pm has been a part of the
standard Perl distribution. This module contains a number of functions for
handling various parts of the CGI protocol. The most important one is
probably
param which deals with the parsing of the query string to extract
the CGI parameters. Many CGI scripts write their own CGI parameter parsing
routine which is missing features or has bugs. The one in CGI.pm has been
well-tested over many years in thousands of scripts - why attempt to
reinvent it?
-
How often is the script updated? One reason for a script not to use CGI.pm
might be that it hasn't been updated since the module was added to the Perl
distribution. This is generally a bad sign. You should look for scripts that
are kept up to date. If there hasn't been been a new version of the script
for several years, then you should probably avoid it.
-
How good is the support? Any program is of limited use if it's unsupported.
How do you get support for the program? Is there an email address for the
author? Or is there a support mailing list? Try dropping an email to either
the author or the mailing list and see how quickly you get a response.
Of course these rules will have exceptions, but if a script scores badly
on most of them then you might have second thoughts on whether you should
be using the script.
Having spent most of this article being quite negative about existing CGI
program archives, let's now get a bit more positive. In the summer of 2001,
a group of London Perl Mongers started to wonder what would be involved in
writing a set of new CGI programs that could act as replacements for the ones
in common use. After some discussion, the nms project was born. The name
nms originally stood for a disparaging remark about one of the existing
archives, but we decided that we didn't want the kind of negativity in the
name. By that time, however, the abbreviated name was in common usage so we
decided to keep it - but it no longer stands for anything.
The objectives for nms were quite simple. We wanted to provide a set
of CGI programs which fulfilled the following:
-
As easy (or easier) to use as existing CGI scripts.
-
Use best programming practices
-
Secure
-
Bug-free (or, at least, well supported)
We decided that we would base our programs on the ones found in Matt's Script
Archive. This wasn't because Matt Wright's scripts were the worst out there,
but simply that they were the most commonly used. We made a rule that our
scripts would be drop-in replacements for Matt's scripts. That meant that
anyone who had existing data from using one of Matt's scripts would be able
to take our replacement and simply put it in place of the old script. This,
of course, meant that we had to become very familiar with the inner workings
of Matt's scripts. This actually turned out not to be a hard as I expected.
The majority of Matt's scripts are very simple. It's only really formmail,
guestbook and wwwboard which are at all complex.
Sometimes our objectives contradicted each other. We decided early on that
part of making the scripts as easy to use as possible meant not relying on
any CPAN modules. We forced ourselves to only use only modules that came as
part of the standard Perl distribution. The reason for this is that our
target audience probably doesn't know anything about CPAN modules and wouldn't
find it easy to install them. A large part of our audience are probably
running a web site on a hosted server where they may not be able to install
new modules and in many cases won't have telnet access to their server. We
felt that asking them to install extra modules would make them far less likely
to use our programs. This, of course, goes against our objective of using
best programming practices as in many cases there is a CPAN module that
implements functionality that we use. The best example of this is in
formmail where we resort to sending emails by talking directly to sendmail
rather than using one of the email modules. In these cases we decided that
getting people to use the scripts (by not relying on CPAN) was more important
to us than following best practices.
nms is a SourceForge project. You can get the latest released versions of
the scripts from <http://nms-cgi.sourceforge.net> or, if you're
feeling braver, you can get the bleading edge versions from CVS at the
project page at <http://sourceforge.net/projects/nms-cgi/. Both
of those pages also have links to the nms mailing lists. We have two lists,
one for developers and one for support questions. There is also a FAQ which
will hopefully answer any further questions that you have about the project.
Here is a list of the scripts currently available from nms
-
Countdown Count down the time to a certain date
-
Free For All Links A simple web link database
-
Formmail Send emails from web forms
-
Guestbook A simple guest book script
-
Random Image Display a random image
-
Random Links Display a link chosen randomly from a list
-
Random Text Display a randomly chosen piece of text
-
Simple Search Simple web site search engine
-
SSI Random Image Display a random image using SSI
-
Text Clock Display the time
-
Text Counter Text counter
I should point out that this is very much a ``work in progress''. Whilst we're
happy with the way that they work we can always use more people looking at
the code. The one advantage that Matt's scripts have over ours is that they've
had many years of testing on a large number of web sites.
So now we have a source of well-written CGI programs that we can point users
at. What more needs to be done? Well, the whole point of writing this article
was to ask more people to help out. There's always more work to do :-)
-
Peer review. We think we've done a pretty good job on the scripts, but
we're not interested in resting on our laurels. The more people that look
at the scripts the more chance that we'll catch more bugs and insecurities.
Please download the scripts and take a look at them. Pass any bugs on to
the developers mailing.
-
Testing. We test the scripts on as many platforms with as many different
configurations as we can, but we'll always miss one or two. Please try to
install the scripts on your systems and let us know about any problems that
you have.
-
Documentation. Our documentation isn't any worse that the documentation
for the existing archives, but we think it could be a lot better. If you'd
like to help out with this, please get in touch with us.
-
Advocacy. This is the most important one. Please tell everyone that you
know about nms. Everywhere that you see people using other CGI scripts,
please explain to them the potential problems and show them where to get the
nms scripts. Having written these scripts, we feel it's important that
they get as wide exposure as possible. If you have any ideas for promoting
nms, please let us know.
Whilst I don't pretend for a minute that these are the only well-written and
secure CGI programs available, I do think that the Perl community needs a
well-known and trusted set of CGI programs that we can point people to. With
your help, that's what I want nms to become.
|