|
Date Published: 2002-09-14
CGI queries generally come in two types, GET and POST.
2.1. GETIn a GET query, the content is visible in the
URL. It is either sent as a + delimited list of keywords:
Example 2.1: A keyword query
|
http://www.server.com/cgi-bin/prog?Perl+CGI+help |
or
as a string of key-value pairs, where pairs are separated by an &
or ; character, and the key is separated from the value by an
= character:
Example 2.2: A key-value query
| http://www.server.com/cgi-bin/prog?topic=Perl%26CGI&demand=help%21
orhttp://www.server.com/cgi-bin/prog?topic=Perl%26CGI;demand=help%21
|
Either form can be hand-entered into the URL.
GET queries can be recalled at any time by entering the query into the
URL. The environment variable REQUEST_METHOD is set to "GET".
The query data is in the environment variable QUERY_STRING. The
MIME type for a GET query is application/x-www-form-urlencoded
(which I will abbreviate as a/xwfu) [1].
Inside the field or value elements of the query string, the following list of
characters are allowed to be kept raw -- that is, not encoded:
Table
2.1: Key/value safe characters
| a-z A-Z 0-9 |
| . _ - ! ~ * ' ( ) |
Your mileage may
vary, and you might find it best to encode all characters that are not
alphanumerics or . _ - for simplicity. The encoding of characters
consists of determining the character's ASCII value in hexadecimal, and
preceeding it with a % character. A special consideration is made for
the space character; it is normally encoded as +, but can be decoded
from + or %20. In Example 2.2, the GET query's
values, "Perl%26CGI" and "help%21", are decoded to
"Perl&CGI" and "help!". The query string is appended to a
URL, with a ? preceeding the string (see Examples 2.1 and 2.2).
Since =, &, and ; are used in the query
string to denote the separation between field names and values, it is imperative
these be encoded if they are in the field name or value.
GET queries are incapable of file upload.
2.1.1. ISINDEXIf a GET query does not have an
= in it, then the program's argument list is populated with the
keywords from the query. This is called an ISINDEX query because the
effect can be achieved by using the <ISINDEX> tag in an HTML
document. The keywords are separated by the + character in the query
string. The argument list will have the decoded keywords, with certain special
characters escaped via the shell method of placing a \ in front of the
character. Some systems allow the & character to be used in an
ISINDEX query as a separator between keywords. The
QUERY_STRING will still have the & encoded as
%26.
Some servers treat multiple + or &
characters as one -- that is, there aren't any blank arguments.
2.2. POSTPOST queries are sent through a socket
connection, and are read from standard input. The number of bytes to be read
from standard input is held in the CONTENT_LENGTH environment variable.
Because a POST is a socket connection, a great deal more than just the
query data. The full set of HTTP headers are sent, containing the
CONTENT_LENGTH and CONTENT_TYPE variables, among others.
Example 2.3: A complete POST query via
multipart/form-data (Lynx)
POST / HTTP/1.0
Host: www.crusoe.net:5005
Accept: text/html, text/plain, audio/mod, image/*, [SNIPPED]
Accept-Encoding: gzip, compress
Accept-Language: en
Pragma: no-cache
Cache-Control: no-cache
User-Agent: Lynx/2.8.3dev.18 libwww-FM/2.14
Referer: http://www.crusoe.net/~jeffp/tmp/foo.cgi
Content-type: multipart/form-data; boundary=xnyLAaB03X
Content-length: 246
--xnyLAaB03X
Content-Disposition: form-data; name=feature
Content-Type: text/plain; charset=iso-8859-1
on
--xnyLAaB03X
Content-Disposition: form-data; name=other
Content-Type: text/plain; charset=iso-8859-1
what's up?
--xnyLAaB03X-- |
If a POST query is
sent via a/xwfu encoding, the encoded query looks like the query string
in a GET query. The length of the encoded data is held in
CONTENT_LENGTH. A POST query also sets CONTENT_TYPE,
which, in this case, would be "application/x-www-form-urlencoded".
Example 2.4: A POST query via
application/x-www-form-urlencoded
|
feature=on&other=what%27s%20up%3F | If
a POST query is encoded via multipart/form-data (which I will
abbreviate as m/fd), the data on standard input is considerably longer.
This encoding type is used primarily for file uploading. The browser will
determine a boundary to place between each query datum, whether it be a
key-value pair or a filename to be uploaded and the content of the file. The
boundary can be retrieved from the CONTENT_TYPE environment variable.
Example 2.5: A POST query via multipart/form-data
(Netscape)
-----------------------------171631639015018
Content-Disposition: form-data; name="feature"
on
-----------------------------171631639015018
Content-Disposition: form-data; name="other"
what's up?
-----------------------------171631639015018-- |
Each
datum has its own mini-set of HTTP headers, and the header of most importance is
the Content-Disposition header. It specifies the encoding type for the
datum, and then the name of the query field, such as "feature" or
"other".
The boundary is a group of characters, and is
preceeeded by 2 hyphens (--), and is followed by two hyphens at the
very end of the query. The boundary for the previous query was
"---------------------------171631639015018". The same POST
query made using the Lynx browser produces:
Example 2.6: A
POST query via multipart/form-data (Lynx)
--xnyLAaB03X
Content-Disposition: form-data; name=feature
Content-Type: text/plain; charset=iso-8859-1
on
--xnyLAaB03X
Content-Disposition: form-data; name=other
Content-Type: text/plain; charset=iso-8859-1
what's up?
--xnyLAaB03X-- |
As you can see, different
browsers send different amounts of headers, and the boundary-naming scheme is
not the same. Note that the value for the field is not encoded, but is
plain text.
References
1. RFC 1738, section 2.2 - URI encoding
(ftp://ftp.isi.edu/in-notes/rfc1738.txt) 2. RFC 1867
- file uploads (ftp://ftp.isi.edu/in-notes/rfc1867.txt)
Jeff "japhy" Pinyan
has 2.5 years of Perl, written 3 CPAN modules and wrote for TPJ. He is currently
is currently writing a Perl book entitled "The Art of Perl: Elegant Perl Style".
|