CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in
Add ListingModify ListingTell A FriendLink to TPASubscribeNew ListingsCool ListingsTop RatedRandom Link
Newest Reviews
  • review
  • hagen software
  • NOT GPL!
  • Hagan Software
  • Wasted Time with ...
  • poor pre-sale sup...
  • no response
  • rating the offer
  • Good Stuff
  • Good idea but use...


  • Brochure Templates  
     
    Perl Archive : TLC : Programming : Perl : Demystifying CGI : Part 2. Types of Queries
    Guide Search entire directory 
     

    Date Published: 2002-09-14

    CGI queries generally come in two types, GET and POST.

    2.1. GET

    In a GET query, the content is visible in the URL. It is either sent as a + delimited list of keywords:

    Example 2.1: A keyword query
    http://www.server.com/cgi-bin/prog?Perl+CGI+help

    or as a string of key-value pairs, where pairs are separated by an & or ; character, and the key is separated from the value by an = character:

    Example 2.2: A key-value query
    http://www.server.com/cgi-bin/prog?topic=Perl%26CGI&demand=help%21
    or
    http://www.server.com/cgi-bin/prog?topic=Perl%26CGI;demand=help%21

    Either form can be hand-entered into the URL. GET queries can be recalled at any time by entering the query into the URL. The environment variable REQUEST_METHOD is set to "GET".

    The query data is in the environment variable QUERY_STRING. The MIME type for a GET query is application/x-www-form-urlencoded (which I will abbreviate as a/xwfu) [1]. Inside the field or value elements of the query string, the following list of characters are allowed to be kept raw -- that is, not encoded:

    Table 2.1: Key/value safe characters
    a-z A-Z 0-9
    . _ - ! ~ * ' ( )

    Your mileage may vary, and you might find it best to encode all characters that are not alphanumerics or . _ - for simplicity. The encoding of characters consists of determining the character's ASCII value in hexadecimal, and preceeding it with a % character. A special consideration is made for the space character; it is normally encoded as +, but can be decoded from + or %20. In Example 2.2, the GET query's values, "Perl%26CGI" and "help%21", are decoded to "Perl&CGI" and "help!". The query string is appended to a URL, with a ? preceeding the string (see Examples 2.1 and 2.2).

    Since =, &, and ; are used in the query string to denote the separation between field names and values, it is imperative these be encoded if they are in the field name or value.

    GET queries are incapable of file upload.

    2.1.1. ISINDEX

    If a GET query does not have an = in it, then the program's argument list is populated with the keywords from the query. This is called an ISINDEX query because the effect can be achieved by using the <ISINDEX> tag in an HTML document. The keywords are separated by the + character in the query string. The argument list will have the decoded keywords, with certain special characters escaped via the shell method of placing a \ in front of the character. Some systems allow the & character to be used in an ISINDEX query as a separator between keywords. The QUERY_STRING will still have the & encoded as %26.

    Some servers treat multiple + or & characters as one -- that is, there aren't any blank arguments.

    2.2. POST

    POST queries are sent through a socket connection, and are read from standard input. The number of bytes to be read from standard input is held in the CONTENT_LENGTH environment variable. Because a POST is a socket connection, a great deal more than just the query data. The full set of HTTP headers are sent, containing the CONTENT_LENGTH and CONTENT_TYPE variables, among others.

    Example 2.3: A complete POST query via multipart/form-data (Lynx)
    POST / HTTP/1.0
    Host: www.crusoe.net:5005
    Accept: text/html, text/plain, audio/mod, image/*, [SNIPPED]
    Accept-Encoding: gzip, compress
    Accept-Language: en
    Pragma: no-cache
    Cache-Control: no-cache
    User-Agent: Lynx/2.8.3dev.18 libwww-FM/2.14
    Referer: http://www.crusoe.net/~jeffp/tmp/foo.cgi
    Content-type: multipart/form-data; boundary=xnyLAaB03X
    Content-length: 246
    
    --xnyLAaB03X
    Content-Disposition: form-data; name=feature
    Content-Type: text/plain; charset=iso-8859-1
    
    on
    --xnyLAaB03X
    Content-Disposition: form-data; name=other
    Content-Type: text/plain; charset=iso-8859-1
    
    what's up?
    --xnyLAaB03X--

    If a POST query is sent via a/xwfu encoding, the encoded query looks like the query string in a GET query. The length of the encoded data is held in CONTENT_LENGTH. A POST query also sets CONTENT_TYPE, which, in this case, would be "application/x-www-form-urlencoded".

    Example 2.4: A POST query via application/x-www-form-urlencoded
    feature=on&other=what%27s%20up%3F

    If a POST query is encoded via multipart/form-data (which I will abbreviate as m/fd), the data on standard input is considerably longer. This encoding type is used primarily for file uploading. The browser will determine a boundary to place between each query datum, whether it be a key-value pair or a filename to be uploaded and the content of the file. The boundary can be retrieved from the CONTENT_TYPE environment variable.

    Example 2.5: A POST query via multipart/form-data (Netscape)
    -----------------------------171631639015018
    Content-Disposition: form-data; name="feature"
    
    on
    -----------------------------171631639015018
    Content-Disposition: form-data; name="other"
    
    what's up?
    -----------------------------171631639015018--

    Each datum has its own mini-set of HTTP headers, and the header of most importance is the Content-Disposition header. It specifies the encoding type for the datum, and then the name of the query field, such as "feature" or "other".

    The boundary is a group of characters, and is preceeeded by 2 hyphens (--), and is followed by two hyphens at the very end of the query. The boundary for the previous query was "---------------------------171631639015018". The same POST query made using the Lynx browser produces:

    Example 2.6: A POST query via multipart/form-data (Lynx)
    --xnyLAaB03X
    Content-Disposition: form-data; name=feature
    Content-Type: text/plain; charset=iso-8859-1
    
    on
    --xnyLAaB03X
    Content-Disposition: form-data; name=other
    Content-Type: text/plain; charset=iso-8859-1
    
    what's up?
    --xnyLAaB03X--

    As you can see, different browsers send different amounts of headers, and the boundary-naming scheme is not the same. Note that the value for the field is not encoded, but is plain text.


    References

    1. RFC 1738, section 2.2 - URI encoding (ftp://ftp.isi.edu/in-notes/rfc1738.txt)
    2. RFC 1867 - file uploads (ftp://ftp.isi.edu/in-notes/rfc1867.txt)
     


    Jeff "japhy" Pinyan has 2.5 years of Perl, written 3 CPAN modules and wrote for TPJ. He is currently is currently writing a Perl book entitled "The Art of Perl: Elegant Perl Style".

     
     


    About The Perl ArchiveLink Validation ProcessSearch Tips
    Web Applications & Managed Hosting Powered by Gossamer Threads
    Visit our Mailing List Archives