|
Date Published: 2002-09-14
CGI stands for "Common Gateway Interface". It is the interface between a browser and a server that allows a browser to submit a dynamic request. CGI is not a language; specifically, it is not Perl. Most languages have some means for dealing with CGI requests, whether it be Perl, C, C++, sh, Tcl, Java, or Python. The generally accepted term for a program that deals with CGI is "CGI program" or "CGI script" -- boasting having written "a CGI" is a faux pas (unless, of course, you rewrote the protocol).
Depending on your server, CGI programs do not have to end in
.cgi; in fact, they don't need an extension at all. However, if your
server allows it, a CGI program can have the extension .html, or
.jpg, or any other extension you wish. Servers like Apache can be easily configured to
run CGI programs on a per-extension or per-directory basis. CGI programs need to
be executable by the username given by the HTTP server to a browser; on many
systems, this user is called "nobody", and for security reasons has minimal
read, write, and execute permissions. If a CGI program is written in an
interpreted language such as Perl or Python or sh, it must also be readable by
this user. In Unix terms, the standard permissions mode for a CGI program is
0755, which means that the owner of the program (you, presumably) can
read, write, and execute the file; that other users in your group can read and
execute it; and that anyone else (the "nobody" user falls into this category)
can read and execute it.
1.1 HTTP - Hypertext Transfer ProtocolWhen a CGI program is called from
the browser, an HTTP request is made to the server, and the server sends
the needed information to the program, by way of the environment and
perhaps the command line, or by sending data to the program's standard input.
The environment is determined from a list of HTTP headers, which preceed the
content of the request (if any). The program returns an HTTP response,
which contains HTTP headers again, and then some optional content. This process
is done with all requests to the server, not just to CGI programs -- an
HTTP daemon, or server, catches all URL requests, and deals with them as
needed.
Understanding HTTP headers is integral to writing CGI programs.
A program that doesn't return headers, or returns invalid headers, will break
and give the user the infamous "500 Internal Server Error" message. There is
only one set of HTTP headers that can be returned for a response, and after the
headers have been ended (with a blank line), you can't write more. A common
problem I have come across is people trying to set cookies (which requires the
"Set-cookie" header) after they've printed their headers and blank line.
1.2 HTTP HeadersBoth requests and responses have two sections -- the
headers and the content. The content is optional in some cases, but the headers
aren't (unless your HTTP server knows how to deal with such a case). Between
these two sections, there's a blank line. After that line has been printed,
there's no going back. (This is a small lie -- you can create a document that
has multiple parts, but that is beyond the scope of this tutorial.)
An HTTP request
Header: value Other-header:
value | [blank line]
Content can be 0 or many
lines | |
The
most common headers that you have to return to the browser yourself are
Content-type, Location, Status, and
Set-cookie. Here is a brief description of how each is used:
- Content-type
- The MIME type of the content displayed. Most MIME types are easy to
understand and come up with:
text/html -- HTML
content text/plain -- plain
text image/jpeg -- JPEG image
data audio/mpeg3 -- MP3 audio data
- Location
- Where to redirect the browser. This is often used with a Status
code of 301, 303, or 307.
http://www.other.com/ -- Absolute
URL /otherpage.html -- Relative URL (full
path) ../blank.jpg -- Relative URL (relative path)
- Set-cookie
- Creates an HTTP cookie (client-willing).
(Cookies are complex, so there
will be a section devoted to them)
- Status
- A 3-digit response number and string describing the type of response. You
only need to use this if you aren't planning on sending a 200
response.
200 OK -- request succeeded 204 No
Response -- like 200, but no content returned 307 Temporary
Redirect -- passes the request on to another location 504
Gateway Timed Out -- request took too long
1.3 Requests and Responses
The next section will discuss specific types of requests, CGI queries. I'll show
how to parse them using module-driven code. Responses will be discussed in a later
section, along with generating requests.
Jeff "japhy" Pinyan
has 2.5 years of Perl, written 3 CPAN modules and wrote for TPJ. He is
currently is currently writing a Perl book entitled "The Art of Perl: Elegant
Perl Style".
|