rfc2045 — RFC 2045 (MIME) parsing library
#include <rfc822.h> #include <rfc2045.h> cc ... -lrfc2045 -lrfc822
The rfc2045 library parses MIME-formatted messages. The rfc2045 library is used to:
1) Parse the structure of a MIME formatted message
2) Examine the contents of each MIME section
3) Optionally rewrite and reformat the message.
#include <rfc2045.h> struct rfc2045 *ptr=rfc2045_alloc(); void rfc2045_parse(struct rfc2045 *ptr, const char *txt, size_t cnt); struct rfc2045 *ptr=rfc2045_fromfd(int fd); struct rfc2045 *ptr=rfc2045_fromfp(FILE *fp); void rfc2045_free(struct rfc2045 *ptr); void rfc2045_error(const char *errmsg) { perror(errmsg); exit(0); }
The rfc2045 structure is created from an existing
message.
The function rfc2045_alloc
() allocates the structure,
then rfc2045_parse
() is
called to initialize the structure based on the contents of a message.
txt
points to the contents of the message, and
cnt
contains the number of bytes in the message.
Large messages are parsed by calling rfc2045_parse
()
multiple number of times, each time passing a portion of the overall message.
There is no need to call a separate function after the entire message is
parsed -- the rfc2045 structure is created
dynamically, on the fly.
rfc2045_alloc
() returns NULL if there was insufficient
memory to allocate the structure. The rfc2045_parse
()
also allocates memory, internally, however
no error indication is return in the event of a memory allocation failure.
Instead, the function rfc2045_error
() is called,
with errmsg
set to
"Out of memory"
.
rfc2045_error
() is also called by
rfc2045_alloc
() - it also
calls rfc2045_error
(), before returning a
NULL pointer.
The rfc2045_error
() function is not included in the
rfc2045 library, it must be defined by the application to report the error in
some appropriate way. All functions below will use
rfc2045_error
() to report an error condition
(currently only insufficient memory is reported), in addition to returning any
kind of an error indicator. Some functions do not return an error indicator,
so rfc2045_error
() is the only reliable way to detect a
failure.
The rfc2045_fromfd
() function initializes an
rfc2045 structure from
a file descriptor. It is equivalent to calling
rfc2045_alloc
(), then reading
the contents of the given file descriptor, and calling
rfc2045_parse
(). The
rfc2045_fromfp() function initializes an rfc2045
structure from a FILE.
After the rfc2045 structure is initialized, the
functions described
below may be used to access and work with the contents of the structure. When
the rfc2045 structure is no longer needed, the
function rfc2045_free
() deallocates and destroys the
structure.
struct rfc2045 { struct rfc2045 *parent; struct rfc2045 *firstpart; struct rfc2045 *next; int isdummy; int rfcviolation; } ;
The rfc2045 structure has many fields, only some are publicly documented. A MIME message is represented by a recursive tree of linked rfc2045 structures. Each instance of the rfc2045 structure represents a single MIME section of a MIME-formatted message.
The top-level structure that represents the entire message is created by the
rfc2045_alloc
() function. The remaining structures are
created dynamically by
rfc2045_parse
(). Any rfc2045
structure, except ones whose
isdummy
flag is set, may be used as an argument to
any function described in the following chapters.
The rfcviolation
field in the top-level
rfc2045
indicates any errors found while parsing the MIME message.
rfcviolation is a bitmask of the following
flags:
Illegal 8-bit characters in MIME headers.
Illegal 8-bit contents of a MIME section that declared a 7bit transfer encoding.
The message has too many MIME sections, this is a potential denial-of-service attack.
Ambiguous nested multipart MIME boundary strings. (Nested MIME boundary strings where one string is a prefix of another string).
In each rfc2045 structure that represents a
multipart MIME section (or one that contains message/rfc822
content) the firstpart
pointer points to
the first MIME section in the multipart MIME section (or the included
"message/rfc822" MIME section). If there are more than one MIME sections in a
multipart MIME section firstpart->next
gets you
the second MIME section, firstpart->next->next
gets you the third MIME section, and so on. parent
points to the parent MIME section, which is NULL for the top-level MIME
section.
Not all MIME sections are created equal. In a multipart MIME section,
there is an initial, unused, "filler" section before the first MIME delimiter
(see
RFC 2045
for more information). This filler section typically contains a
terse message saying that this is a MIME-formatted message.
This is not considered to be a "real" MIME section, and
all MIME-aware software must ignore those. These filler sections are
designated by setting the isdummy
field
to a non-zero value. All rfc2045
structures that have isdummy
set should be
ignored, and skipped over, when traversing the
rfc2045 tree.
const char *content_type, *content_transfer_encoding, *content_character_set; void rfc2045_mimeinfo(const struct rfc2045 *ptr, &content_type, &content_transfer_encoding, &content_character_set); off_t start_pos, end_pos, start_body, nlines, nbodylines; void rfc2045_mimepos(const struct rfc2045 *ptr, &start_pos, &end_pos, &start_body, &nlines, &nbodylines);
The rfc2045_mimeinfo
() function returns the MIME
content type, encoding method,
and the character set of the given MIME section. Where the MIME section does
not specify any property, rfc2045_mimeinfo
()
automatically supplies a default value. The character set is only meaningful
for MIME sections with a text content type, however it is still defaulted for
other sections. It is not permissible to supply a NULL pointer for any
argument to rfc2045_mimeinfo
().
The rfc2045_mimepos
() function locates the position of
the given MIME section in the original message. It is not permissible to
supply a NULL pointer for any argument to
rfc2045_mimepos
(). All arguments must be used.
start_pos
and end_pos
point to the starting and the ending offset, from the beginning of the
message, of this MIME section. nlines
is initialized to the number of lines of text in this MIME section.
start_pos
is the start of MIME headers for this
MIME section.
start_body
is the start of the actual content of
this MIME section (after all the MIME headers, and the delimiting blank line),
and nbodylines
is the number of
lines of actual content in this MIME section.
const char *id=rfc2045_content_id( const struct rfc2045 *ptr); const char *desc=rfc2045_content_description( const struct rfc2045 *ptr); const char *lang=rfc2045_content_language( const struct rfc2045 *ptr); const char *md5=rfc2045_content_md5( const struct rfc2045 *ptr);
These functions return the contents of the corresponding MIME headers. If these headers do not exist, these functions return an empty string, "", NOT a null pointer.
char *id=rfc2045_related_start(const struct rfc2045 *ptr);
This function returns the start
attribute of the
Content-Type:
header, which is used by multipart/related
MIME content. This function returns a
dynamically-allocated buffer, which must be
free
(3)-ed after use (a null
pointer is returned if there was insufficient memory for the buffer, and
rfc2045_error() is called).
const struct rfc2045 *ptr; const char *disposition=ptr->content_disposition; char *charset; char *language; char *value; int error; error=rfc2231_decodeType(rfc, "name", &charset, &language, &value); error=rfc2231_decodeDisposition(rfc, "name", &charset, &language, &value);
These functions and structures provide a mechanism for reading the MIME
attributes in the Content-Type:
and
Content-Disposition:
headers.
The MIME content type is returned by
rfc2045_mimeinfo
().
The MIME content disposition can be accessed in the
content_disposition
directly (which may be
NULL
if the Content-Disposition:
header was not specified).
rfc2231_decodeType
() reads MIME attributes from the
Content-Type:
header, and
rfc2231_decodeType
() reads MIME attributes from the
Content-Disposition:
header.
These functions understand MIME attributes that are encoded according to
RFC 2231.
These functions initialize
charset
,
language
, and
value
parameters, allocating memory automatically.
It is the caller's responsibility to use free
() to return
the allocated memory.
A NULL
may be provided in place of a parameter, indicating
that the caller does not require the corresponding information.
charset
and
language
will be set to an empty string
(not NULL
) if the MIME parameter
does not exist, or is not encoded according to
RFC 2231,
or does not specify its character set and/or language.
value
will be set to an empty string if the MIME
parameter does not exist.
char *url=rfc2045_content_base(struct rfc2045 *ptr); char *url=rfc2045_append_url(const char *base, const char *url);
These functions are used to work with
multipart/related
MIME content.
rfc2045_content_base
() returns the contents of either
the Content-Base:
or the
Content-Location:
header. If both are present, they are
logically combined.
rfc2045_append_url()
combines two URLs,
base
and
url
, and returns the absolute URL that results from the
combination.
Both functions return a pointer to a dynamically-allocated buffer that must
be free
(3)-ed after it is no longer needed. Both
functions return NULL if there was no sufficient memory to allocate the
buffer. rfc2045_content_base
()
returns an empty string in the event that there are no
Content-Base:
or
Content-Location:
headers. Either argument to
rfc2045_append_url
() may be a
NULL, or an empty string.
void rfc2045_cdecode_start(struct rfc2045 *ptr, int (*callback_func)(const char *, size_t, void *), void *callback_arg); int rfc2045_cdecode(struct rfc2045 *ptr, const char *stuff, size_t nstuff); int rfc2045_cdecode_end(struct rfc2045 *ptr);
These functions are used to return the raw contents of the given MIME
section, transparently decoding quoted-printable or base64-encoded content.
Because the rfc2045 library does not require the message to be read from a
file (it can be stored in a memory buffer), the application is responsible for
reading the contents of the message and calling
rfc2045_cdecode
().
The rfc2045_cdecode_start
() function begins the process of
decoding the given MIME section. After calling
rfc2045_cdecode_start
(), the
application must the repeatedly call rfc2045_cdecode
()
with the contents of the MIME message between the offsets given by the
start_body
and
end_pos
return values from
rfc2045_mimepos
(). The
rfc2045_cdecode
() function can be called repeatedly, if
necessary, for successive portions of the MIME section. After the last call
to
rfc2045_cdecode
(), call
rfc2045_cdecode_end
() to finish up
(rfc2045_cdecode
() may have saved some undecoded content
in an internal part, and
rfc2045_cdecode_end
() flushes it out).
rfc2045_cdecode
() and
rfc2045_cdecode_end
() repeatedly call
callback_func
(), passing it the decoded contents of the
MIME section. The
first argument to callback_func
() is a pointer to a
portion of the decoded
content, the second argument is the number of bytes in this portion. The
third argument is callback_arg
.
callback_func
() is required to return zero, to continue
decoding. If
callback_func
() returns non-zero, the decoding
immediately stops and
rfc2045_cdecode
() or rfc2045_cdecode_end
() terminates with callback_func
's return code.
This library contains functions that can be used to rewrite a MIME message in order to convert 8-bit-encoded data to 7-bit encoding, or to convert 7-bit encoded data to full 8-bit data, if possible.
struct rfc2045 *ptr=rfc2045_alloc_ac(); int necessary=rfc2045_ac_check(struct rfc2045 *ptr, int mode); int error=rfc2045_rewrite(struct rfc2045 *ptr, int fdin, int fdout, const char *appname); int rfc2045_rewrite_func(struct rfc2045 *p, int fdin, int (*funcout)(const char *, int, void *), void *funcout_arg, const char *appname);
When rewriting will be used, the rfc2045_alloc_ac
()
function must be used
to create the initial rfc2045 structure. This
function allocates some
additional structures that are used in rewriting.
Use
rfc2045_parse
()
to parse the message, as usual. Use
rfc2045_free
() in a normal way
to destroy the rfc2045 structure, when all is said and
done.
The rfc2045_ac_check
() function must be called to
determine whether
rewriting is necessary. mode
must be set to one of the
following values:
We want to generate 7-bit content. If the original message contains any 8-bit content it will be converted to 7-bit content using quoted-printable encoding.
We want to generate 8-bit content. If the original message contains any 7-bit quoted-printable content it should be rewritten as 8-bit content.
The rfc2045_ac_check
() function returns non-zero if
there's any content in
the MIME message that should be converted, OR if there are any missing MIME
headers. rfc2045_ac_check
() returns zero if there's no
need to rewrite the
message. However it might still be worthwhile to rewrite the message anyway.
There are some instances where it is desirable to provide defaults for some
missing MIME headers, but they are too trivial to require the message to be
rewritten. One such case would be a missing Content-Transfer-Encoding: header
for a multipart section.
Either the rfc2045_rewrite
() or the
rfc2045_rewrite_func
() function is used
to rewrite the message. The only difference is that
rfc2045_rewrite
() writes
the new message to a given file descriptor, fdout
, while
rfc2045_rewrite_func
() repeatedly calls the funcout
function. Both
function read the original message from fdin
.
funcout
receives
to a portion of the MIME message, the number of bytes in the specified
portion, and funcout_arg
. When either function rewrites
a MIME section,
an informational header gets appended, noting that the message was converted
by appname
.