Document Interchange

From
Revision as of 14:54, 1 May 2018 by Adminko (talk | contribs) (Created page with "==Document Interchange== The PDF specification defines various mechanisms facilitating the inclusion of the higher-level information about the content structure into the d...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Document Interchange

The PDF specification defines various mechanisms facilitating the inclusion of the higher-level information about the content structure into the document. It simplifies custom document processing, improves accesibility while not affecting the appearance of the PDF document. All these mechanisms are descibed below in the correspondind sections.

Metadata

A PDF document may include some information, such as the title, author, creation and modification dates. Such information about the document is called document's metadata. Starting with PDF 1.4, it became possible to include the metadata for individual objects in a document. This object-specific metadata is called object-level metadata.

Metadata can be stored in PDF document using two approaches:

  • For document data and for object-level metadata - metadata stream associated with the document or a component of the document (see the section 14.3.2, "Metadata streams" of the specification). Metadata streams are the preferred method in PDF 2.0. One can set the document's medatada using the following code:
using (FixedDocument document = new FixedDocument(documentStream))
{
    document.SetMetadata([metadata stream]);
    document.Save();
}
 


  • For document metadata only - in document information dictionary associated with the document(see the section 14.3.3, "Document information dictionary"). The usage of the document information dictionary for document metadata became deprecated with PDF 2.0 standard, except for the CreationDate and ModDate entries. Apitron PDF Kit sets this info automatically.

Marked Content

Marked content is the mechanism to incorporate a certain markup serving the interests of the particular PDF processor, it uses tags to add additional information to the portions of document's content. Using Apitron PDF Kit and its Fixed layout API one may use a special type called MarkedContent which is a descendand of the ClippedContent class implementing tagging and logical structure elements support.

Logical Structure

Tagged PDF