HTTP::Promise::Parser - Fast HTTP Request & Response Parser
use HTTP::Promise::Parser; my $p = HTTP::Promise::Parser->new || die( HTTP::Promise::Parser->error, "\n" ); my $ent = $p->parse( '/some/where/http_request.txt' ) || die( $p->error ); my $ent = $p->parse( $file_handle ) || die( $p->error ); my $ent = $p->parse( $string ) || die( $p->error );
v0.1.0
This is an http request and response parser using XS modules whenever posible for speed and mindful of memory consumption.
As rfc7230 states in its section 3:
"The normal procedure for parsing an HTTP message is to read the start-line into a structure, read each header field into a hash table by field name until the empty line, and then use the parsed data to determine if a message body is expected. If a message body has been indicated, then it is read as a stream until an amount of octets equal to the message body length is read or the connection is closed."
Thus, HTTP::Promise approach is to read the data, whether a HTTP request or response, a.k.a, an HTTP message, from a filehandle, possibly chunked, and to first read the message headers and parse them, then to store the HTTP message in memory if it is under a specified threshold, or in a file. If the size is unknown, it would be first read in memory and switched automatically to a file when it reaches the threshold.
Once the overall message body is stored, if it is a multipart type, this class reads each of its parts into memory or separate file depending on its size until there is no more part, using the stream reader, which reads in chunks of bytes and not in lines. If the message body is a single part it is saved to memory or file depending on its size. Each part saved on file uses a file extension related to its mime type. Each of the parts are then accessible as a HTTP body object via the "parts" in HTTP::Promise::Entity method.
Note, however, that when dealing with multipart, this only recognises multipart/form-data, anything else will be treated as data.
multipart/form-data
The overall HTTP message is available as an HTTP::Promise::Entity object and returned.
If an error occurs, this module does not die, at least not voluntarily, but instead sets an error and returns undef, so always make sure to check the returned value from method calls.
undef
This instantiates a new HTTP::Promise::Parser object.
It takes the following options:
decode_body
Boolean. If enabled, this will have this interface automatically decode the entity body upon parsing. Default is true.
decode_headers
Boolean. If enabled, this will decode headers, which is used for decoding filename value in Content-Encoding. Default is false.
Content-Encoding
ignore_filename
Boolean. Wether the filename provided in an Content-Disposition should be ignored or not. This defaults to false, but actually, this is not used and the filename specified in a Content-Disposition header field is never used. So, this is a no-op and should be removed.
Content-Disposition
max_body_in_memory_size
Integer. This is the threshold beyond which an entity body that is initially loaded into memory will switched to be loaded into a file on the local filesystem when it is a true value and exceeds the amount specified.
By defaults, this has the value set by the class variable $MAX_BODY_IN_MEMORY_SIZE, which is 102400 bytes or 100K
$MAX_BODY_IN_MEMORY_SIZE
max_headers_size
Integer. This is the threshold size in bytes beyond which HTTP headers will trigger an error. This defaults to the class variable $MAX_HEADERS_SIZE, which itself is set by default to 8192 bytes or 8K
$MAX_HEADERS_SIZE
max_read_buffer
Integer. This is the read buffer size. This is used for HTTP::Promise::IO and this defaults to 2048 bytes (2Kb).
output_dir
Filepath of the directory to be used to save entity body, when applicable.
tmp_dir
Set the directory to use when creating temporary files.
tmp_to_core
Boolean. When true, this will set the temporary file to an in-memory space.
Provided with a string or a scalar reference, and this returns an hash reference containing details of the request line attributes if it is indeed a request, or an empty string if it is not a request.
It sets an error and returns undef upon error.
The following attributes are available:
http_version
The HTTP protocol version used. For example, in HTTP/1.1, this would be 1.1, and in HTTP/2, this would be 2.
HTTP/1.1
1.1
HTTP/2
2
http_vers_minor
The HTTP protocol major version used. For example, in HTTP/1.0, this would be 1, and in HTTP/2, this would be 2.
HTTP/1.0
1
The HTTP protocol minor version used. For example, in HTTP/1.0, this would be 0, and in HTTP/2, this would be undef.
0
method
The HTTP request method used. For example in GET / HTTP/1.1, this would be GET. This uses the rfc7231 semantics, which means any token even non-standard ones would match.
GET / HTTP/1.1
GET
protocol
The HTTP protocol used, e.g. HTTP/1.0, HTTP/1.1, HTTP/2, etc...
uri
The request URI. For example in GET / HTTP/1.1, this would be /
/
my $ref = $p->looks_like_request( \$str ); # or # my $ref = $p->looks_like_request( $str ); die( $p->error ) if( !defined( $ref ) ); if( $ref ) { say "Request method $ref->{method}, uri $ref->{uri}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}"; } else { say "This is not an HTTP request."; }
Provided with a string or a scalar reference, and this returns an hash reference containing details of the response line attributes if it is indeed a response, or an empty string if it is not a response.
code
The 3-digits HTTP response code. For example in HTTP/1.1 200 OK, this would be 200.
HTTP/1.1 200 OK
200
status
The response status text. For example in HTTP/1.1 200 OK, this would be OK.
OK
my $ref = $p->looks_like_response( \$str ); # or # my $ref = $p->looks_like_response( $str ); die( $p->error ) if( !defined( $ref ) ); if( $ref ) { say "Response code $ref->{code}, status $ref->{status}, protocol $ref->{protocol}, version major $ref->{http_vers_major}, version minor $ref->{http_vers_minor}"; } else { say "This is not an HTTP response."; }
Provided with a string or a scalar reference, and this returns an hash reference containing details of the HTTP message first line attributes if it is indeed an HTTP message.
The attributes available depends on the type of HTTP message determined and are described in details in "looks_like_request" and "looks_like_response". In addition to those, it also returns the attribute type, which is a string representing the type of HTTP message this is, i.e. either request or response.
type
request
response
If this does not match either an HTTP request or HTTP response, it returns an empty string.
my $ref = $p->looks_like_what( \$str ); die( $p->error ) if( !defined( $ref ) ); say "This is a ", ( $ref ? $ref->{type} : 'unknown' ), " HTTP message."; my $ref = $p->looks_like_what( \$str ); die( $p->error ) if( !defined( $ref ) ); if( !$ref ) { say "This is unknown."; } else { say "This is a HTTP $ref->{type} with protocol version $ref->{http_version}"; }
Creates a new temporary file. If tmp_to_core is set to true, this will create a new file using a scalar object, or it will create a new temporary file under the directory set with the object parameter tmp_dir. The filehandle binmode is set to raw.
raw
It returns a filehandle upon success, or upon error, it sets an error and return undef.
The filepath to the output directory. This is used when saving entity bodies on the filesystem.
This takes a scalar reference of data, a glob or a file path, and will parse the HTTP request or response by calling "parse_fh" and pass it whatever options it received.
It returns an entity object upon success and upon error, it sets an error and return undef.
This takes a string or a scalar reference and returns an entity object upon success and upon error, it sets an error and return undef
This takes a filehandle and parse the HTTP request or response, and returns an entity object upon success and upon error, it sets an error and return undef.
It takes also an hash or hash reference of the following options:
reader
An HTTP::Promise::IO. If this is not provided, a new one will be created. Note that data will be read using this reader.
Boolean. Set this to true to indicate the data is an HTTP request. If neither request nor response is provided, the parser will attempt guessing it.
Boolean. Set this to true to indicate the data is an HTTP response. If neither request nor response is provided, the parser will attempt guessing it.
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the headers found in the given string, if any at all.
It returns an hash reference with the same property names and values returned by "parse_headers_xs".
This method uses pure perl.
Supported options are:
convert_dash
Boolean. If true, this will convert - in header fields to _. Default is false.
-
_
no_headers_ok
Boolean. If set to true, this won't trigger if there is no headers
my $def = $p->parse_headers_xs( $http_request_or_response ); my $def = $p->parse_headers_xs( $http_request_or_response, $options_hash_ref );
It returns a dictionary as an hash reference upon success, and it sets an error with an http error code set and returns undef upon error.
Boolean. If true, this will parse the string assuming it is a request header.
Boolean. If true, this will parse the string assuming it is a response header.
The properties returned in the dictionary depend on whether request or response were enabled.
For request:
headers
An HTTP::Promise::Headers object.
length
The length in bytes of the headers parsed.
The HTTP method such as GET, or HEAD, POST, etc.
HEAD
POST
String, such as HTTP/1.1 or HTTP/2
String, the request URI, such as /
version
This is a version object and contains a value such as 1.1, so you can do something like:
if( $def->{version} >= version->parse( '1.1' ) ) { # Do something }
For response:
The HTTP status code, such as 200
The length in bytes of the headers parsed. This is useful so you can then remove it from the string you provided:
my $resp = <<EOT; HTTP/1.1 200 OK Content-Type: text/plain Hello world! EOT my $def = $p->parse_headers_xs( \$resp, response => 1 ) || die( $p->error ); $str =~ /^\r?\n//; substr( $str, 0, $def->{length} ) = ''; # $str now contains the body, i.e.: "Hello world!\n"
String, the HTTP status, i.e. something like OK
String, such as HTTP/1.1
If not enough data was provided to parse the headers, this will return an error object with code set to 425 (Too early).
425
If the headers is incomplete and the cumulated size exceeds the value set with "max_headers_size", this returns an error object with code set to 413 (Request entity too large).
413
If there are other issues with the headers, this sets the error code to 400 (Bad request), and for any other error, this returns an error object without code.
400
This takes an hash or hash reference of options and parse an HTTP multipart portion of the HTTP request or response.
It returns an entity object upon success and upon error it sets an error object and returns undef.
entity
The HTTP::Property::Entity object to which this multipart belongs.
The HTTP::Property::Reader used for reading the data chunks from the filehandle.
Provided with a filepath, and this will open it in read mode, parse it and return an entity object.
If there is an error, this returns undef and you can retrieve the error by calling "error" in Module::Generic which is inherited by this module.
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the request found in the given string, including the header and the body.
The properties returned are the same as the ones returned for a request by "parse_headers_xs", and also sets the content property containing the body data of the request.
content
Obviously this works well for simple request, i.e. not multipart ones, otherwise the entire body, whatever that is, will be stored in content
This is an alias and is equivalent to calling "parse_headers_xs" and setting the request option.
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and parse the reuqest line returning an hash reference containing 4 properties: method, path, protocol, version
path
This is the same as "parse_request", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
This takes a string or a scalar reference including a scalar object, such as Module::Generic::Scalar, and an optional hash or hash reference of parameters and parse the response found in the given string, including the header and the body.
The properties returned are the same as the ones returned for a response by "parse_headers_xs", and also sets the content property containing the body data of the response.
This is an alias and is equivalent to calling "parse_headers_xs" and setting the response option.
This is the same as "parse_response", except it uses the pure perl method "parse_headers" to parse the headers instead of the XS one.
Provided with an hash or hash reference of options and this parse a simple entity body.
read_until
A string or a regular expression that indicates the string up to which to read data from the filehandle.
This takes an HTTP version string, such as HTTP/1.1 or HTTP/2 and returns its major and minor as a 2-elements array in list context, or just the version object in scalar context.
Sets or gets the temporary directory to use when creating temporary files.
When set, this returns a file object
Boolean. When set to true, this will store data in memory rather than in a file on the filesystem.
Jacques Deguest <jack@deguest.jp>
rfc6266 on Content-Disposition, rfc7230 on Message Syntax and Routing, rfc7231 on Semantics and Content, rfc7232 on Conditional Requests, rfc7233 on Range Requests, rfc7234 on Caching, rfc7235 on Authentication, rfc7578 on multipart/form-data, rfc7540 on HTTP/2.0
Mozilla documentation on HTTP protocol
Mozilla documentation on HTTP messages
Mozilla documentation
HTTP::Promise, HTTP::Promise::Request, HTTP::Promise::Response, HTTP::Promise::Message, HTTP::Promise::Entity, HTTP::Promise::Headers, HTTP::Promise::Body, HTTP::Promise::Body::Form, HTTP::Promise::Body::Form::Data, HTTP::Promise::Body::Form::Field, HTTP::Promise::Status, HTTP::Promise::MIME, HTTP::Promise::Parser, HTTP::Promise::IO, HTTP::Promise::Stream, HTTP::Promise::Exception
Copyright(c) 2022 DEGUEST Pte. Ltd.
All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install HTTP::Promise, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTTP::Promise
CPAN shell
perl -MCPAN -e shell install HTTP::Promise
For more information on module installation, please visit the detailed CPAN module installation guide.