CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in
Add ListingModify ListingTell A FriendLink to TPASubscribeNew ListingsCool ListingsTop RatedRandom Link
Newest Reviews
  • review
  • hagen software
  • NOT GPL!
  • Hagan Software
  • Wasted Time with ...
  • poor pre-sale sup...
  • no response
  • rating the offer
  • Good Stuff
  • Good idea but use...


  • Brochure Templates  
     
    Perl Archive : TLC : Programming : Perl : Why use CGI.pm
    Guide Search entire directory 
     

    Date Published: 2002-08-27

    One of Perl's greatest strengths is the CPAN, an archive of programs, scripts, snippets, and modules. These are all made available to other programmers worldwide, usually under the same terms as Perl itself. Just about anything that can be done with Perl is on the CPAN, or will be there shortly. (Some writers even find inspiration for columns and articles by watching the list of recent uploads.)

    Because of (or contributing to) Perl's popularity as a language for web development and CGI programming, several CPAN modules handle everything from HTML formatting to CGI parameter processing. The grande dame is CGI.pm. Written by Lincoln Stein, it has the potential to make your CGI scripts shorter, more secure, more valid, and much easier to write. Even better, the CGI module has shipped in the core Perl distribution for several years. Any web host worth using will have it installed.

    Unfortunately, many coders are not aware of the module's existence. Others don't see the need, as it's possible to write CGI programs in Perl without CGI.pm. Doing so, however, is similar to reading webpages through telnet instead of using a web browser. This may be a good learning experience, but it's fragile and very difficult to debug.

    Two widely-used ``alternatives'' exist. One is cgi-lib.pl, an ancient Perl 4 libary. The other is a copied and pasted snippet of code that originated either in a web programming book or a free script. Both date back to the origins of the original CGI standard. While there are good alternatives to CGI.pm, these two solutions do not apply. They appear simple and effective, especially if they're familiar, but subtle and unsubtle bugs lurk underneath. Except in very specific cases, all new CGI programs written in Perl should use CGI.pm. This article explains three areas in which the module is superior to the other two common approaches.


    Author

    chromatic is a Perl hacker, author, and frequent contributor to several popular websites (including Slashdot and Perlmonks). He is the co-author of O'Reilly's "Running Weblogs with Slash", and occasionally annoys people by improving Perl's core test suite. He may be the only Perl 5 porter to have written Perl while riding a camel.


    Security Issues

    Unfortunately, a website open to the world around the clock is also open to a small but dedicated group of mischief makers -- and worse.

    • Resource exhaustion
      All sites and programs have finite resources. These limitations include, but are not limited to available bandwidth, disk space, processing time, memory, and the allowable number of open files. Running out of any of these can render the site unavailable to visitors, and, worse, can cause strange behavior in any running program. Well-written programs may degrade gracefully, but it takes knowledge and experience to handle these situations correctly. Consequently, common attacks seek to exploit artificial or real limitations.

      For example, some attackers send huge amounts of data to websites. Large requests eat up bandwidth and disk space, wasting processor time and memory that could be used to serve other users. Other attacks fake large uploads, leaving programs to expect more data than will ever arrive.

      If your form processing reads from STDIN without checking the CONTENT_LENGTH environmental variable, you're probably susceptible to both attacks. CGI.pm can limit the allowed size of POSTed content (including file uploads) to any size you like. The means to do so is as simple as assigning to a variable, and will be demonstrated in a future article.

    • File uploads
      Does your parsing routine handle file uploads? Decrypting GET parameters can be done (though badly) in five lines, but files are uploaded by POSTing multipart form data. Parsing this is more difficult, especially with the diverse behavior of popular web browsers. Good luck doing this by hand.

      Even if your parser works, does it handle files securely? Does it store them in a world-accessible directory, even temporarily? Can someone upload a program and then have the server execute it? Even if uploads are stored outside the web directory, could a local user secretly replace it, or hijack sensitive information from it?

      Again, CGI.pm handles these situations fairly sanely.


    Validity Issues

    Security is important, but it will hopefully never be tested. Data validity will always be tested. If one in ten client browsers interprets the specifications in a way your program did not forsee, you could produce corrupt data or turn away valuable users.

    • Multi-valued fields
      If you've ever written a program which allows users to select more than one thing, you may have wondered how to process multiple things. For example, consider a list of employees to assign to a project:
              <input type="checkbox" name="dev1" value="sunny" />Sunny
              <input type="checkbox" name="dev2" value="kam" />Kam
              <input type="checkbox" name="dev3" value="hannah" />Hannah
              <input type="checkbox" name="dev4" value="ann" />Ann
              <input type="checkbox" name="dev5" value="amanda" />Amanda

      Note that each checkbox has a unique name. To see if Sunny will work on this project means examining the dev1 parameter. To find all employees assigned to the project means looping through all potential devX parameters. The annoyance grows with the number of potential items. (chromatic industries isn't large, but it does have an attractive and highly intelligent workforce.)

      The HTML and CGI specifications do allow one slight trick to make our lives easier, though. Parameter names can be repeated. It's legal to write:

              <input type="checkbox" name="dev" value="sunny" />Sunny
              <input type="checkbox" name="dev" value="kam" />Kam
              <input type="checkbox" name="dev" value="hannah" />Hannah
              <input type="checkbox" name="dev" value="ann" />Ann
              <input type="checkbox" name="dev" value="amanda" />Amanda

      Checking ``Sunny'' and ``Kam'' produces a request similar to:

              dev=sunny
              dev=kam

      If the parameter parsing code expects only one value for each parameter name, the second dev will overwrite the first. Poor Sunny will have nothing to do.

      The venerable (read, ``moldy oldy'') cgi-lib.pl code, written in the days before references, created an artificial C-type array. Most handwritten parsers don't even do that. Behold the magic of CGI.pm:

              my $developer  = param('dev');  # gets the first one, ie 'sunny'
              my @developers = param('dev');  # gets both of them, ie ( 'sunny', 'kam' )

      This beats grepping through potential parameter patterns, testing for existence and definedness.

    • Validating HTML
      Moving away from input issues, all HTML generated with CGI.pm will validate against the official World Wide Web Consortium standards. This is very important; it enables all compliant clients to see the same information. It also protects against silly syntax typos: if you've ever spent hours debugging a missing table tag in Netscape, you'll appreciate this. CGI.pm's built-in shortcuts saves you having to remember the gory details of HTTP headers or nested tags, freeing you to focus on programming and not HTML syntax. (The code needed to build the checkbox group from the last example with CGI.pm is substantially shorter than writing the HTML by hand.)

    • RFC-compliant encoding
      Speaking of standards, which character should be used to separate parameters in a query string within a link? If you said ``the ampersand'', you're partially correct (but are you encoding it properly?). If you said ``the semicolon'', you're even more correct. The ampersand has potential conflicts with character entities, and has been deprecated since the HTML 4.0 recommendation, (See the standard itself, if you're curious.)

      Having mentioned character entities, are you forming them correctly? Have you escaped all special URI elements? Does your program produce valid HTTP headers, including the correct media type (say, "text/html", or "text/xml")? While most popular web browsers will silently correct even bad HTML, what happens when it breaks? If you don't have time to learn CGI.pm now, will you have time to fix things in the future?

      How do you handle extensions such as cookies, if at all? Though there are snippets of various quality to get and to set cookies, they often have similar security and validity issues. CGI.pm, however, is regularly updated with the latest features -- a recent release even included support for P3P cookies. You may not have heard of them and you may never use them, but if you need them, they're available immediately. That's more than can be said for the alternative, form-parsing code circa 1996.


    Pragmatic Concerns

    Given security and reliability benefits, the case for CGI.pm is very strong. The module has several additional features. Two stand out to ease certain specific programming tasks.

    • Sticky Widgets
      Some CGI programs display the same form multiple times during a session. Stickiness means that selected values persist through submissions. In the employee selection option, this means that the manager could enter her details just once in certain form widgets before creating and submitting several different jobs.

      This is already possible manually -- you just have to provide default values for your form widgets, or save state information on the server, passing some unique identifier back and forth to and from the client. CGI.pm handles it automatically, if you use its form widget generating functions. If you say:

              print textfield(-name => 'manager');

      then when a name parameter has been read from the request (or assigned via param()), the textfield will take that value as its own. This is often handy, and is the default behavior. (It can be disabled with the nosticky pragma.)

    • Easier debugging
      Perl's rapid development cycle is a useful feature. Instead of a compile-link-test-change loop, it's test-change. You can run a program from the command line and immediately make changes in your editor.

      Web programs can be harder to debug, especially if you lack a web server on your development box. (It's easy to install one and possible to write one, but that's not the point.) Besides that, web servers often shunt program errors to hidden or obtuse logfiles.

      CGI.pm has several handy debugging features. First, it allows programs to run from the command line as well as in a web server. CGI.pm is smart enough to tell the difference. Instead of reading from a client socket or from $ENV{QUERY_STRING}, it reads from the command line. To test the employee program, sending two employee names, run the program as:

              ./employees.pl dev=sunny dev=kam

      The param() function will work as expected.

      If you're fortunate enough to be running on a web server but lack error log access, the CGI::Carp module (included with CGI.pm) may save you time. It can intercept fatal errors and send them to the web browser, where you can read them immediately. It can do much more, but it is most often enabled with one line of code:

              use CGI::Carp qw( fatalsToBrowser );

      Everything else will work as you expect. (It's advisable to comment out this line when you've finished debugging, as it can reveal things about your setup best left hidden.)

      A final nicety of CGI.pm is the ability to save parameters to a file. That is, at any point in the program, you can save the exact data a user has submitted. This is a quick and easy way to gather data for command-line debugging or to log attempts to break your program. (It can also be used to implement user persistence, but a real database is better for that.) The necessary function is save_parameters(), and it takes a filehandle:

              {
                      local *OUTPUT;
                      open(OUTPUT, '>debug.txt') or die "Cannot open debug.txt: $!";
                      save_parameters(OUTPUT);
                      close OUTPUT;
              }

      The resulting file can be edited in any text editor.


    Common Criticisms of CGI.pm

    Given the fervor with which experienced CGI.pm advocates promote the module, it's no surprise that there are popular arguments against it. By far the most common reasons not to use this module are ``I didn't know it existed'' and ``I don't know how.'' Both are honest and fair (but will shortly be no excuse :). Other popular objections follow, along with debunkings.

    It's big.
    ``CGI.pm is a big module, and it does many things.''

    This is true. Version 2.78 contains 6671 lines. Around half (3266 lines) are documentation. That seems excessive, compared to the common ten-liner program, but this includes HTML generation, file uploading, and the persistence mechanisms in addition to several other features not yet discussed. It's big because it does many things. It's big because it has many bugfixes and workarounds for weird servers and browsers. It's big because it's robust.

    It's bloated and slow.
    ``CGI.pm is a big module and it takes forever to load and wastes resources.''

    Larger modules do take longer to load and take up more memory. CGI uses some clever (or obtuse) tricks to work around this. Instead of compiling everything as it loads, the module waits until a function is first used. This way, users only pay for what they use. This does complicate things, but can be changed as necessary.

    As for the speed issue, it's rarely a problem. Many of the programs that don't use CGI have other bottlenecks, such as running external programs to perform something Perl can do much faster. Besides that, CGI works with technologies such as mod_perl and FastCGI that can increase speeds far more than the hit of compiling the autoloading code.

    It's too complicated.
    ``It doesn't make sense, all the functions to use. Using a hash is much easier.''

    There are two kinds of complexity involved here: to learn something and to use it. Any process for retrieving CGI parameters has a learning curve. With simple parameters, a hash is easy to use. For anything more complex (multi-valued fields, sticky widgets, names without values), the simple model breaks down. There are exceptions upon exceptions. Though CGI.pm has a longer learning curve, it is consistent, and much easier to use.

    If you plan to spend years using something, a few hours invested to learn it is well worth the trouble.

    I don't use anything I can't understand.
    ``I don't want to use anything I couldn't write myself.''

    This is false hubris. Though it's a good learning experience to create your own tools, you're denying yourself the benefit of well-designed, well-debugged, and well-tested code. The kinds of people who make this claim often make the same mistakes, rarely learning from them, and almost never improve their code. Somehow, they don't extend this argument to Perl itself.

    Good programmers could reinvent the wheel (and some do, to great effect), but they know it's usually better to build on the work of others.


    Final Thoughts

    Some tools are clearly the best of their breed. If you don't have a hammer, a big rock might do. Given the choice, the hammer is obviously better. Similarly, it's possible to write CGI programs in Perl without CGI.pm, but good practice recommends against it.

    In a future article, we'll explain how to use the module for common tasks. It's easy.

     
     


    About The Perl ArchiveLink Validation ProcessSearch Tips
    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives