FixedDocument

From
Jump to: navigation, search

Overview

The FixedDocument class represents a PDF document model and contains the properties a typical PDF document has. It's the foundation for the Fixed layout API and this is where almost all the work with documents starts.

Document level API usage examples

Enumerate document pages and extract text

Existing document pages can be accessed using FixedDocument.Pages collection, it contains a set of Page objects with actual content.

Consider the following code:

public void EnumerateAllPages()
{
    // open existing PDF file
    using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read))
    {
        using(FixedDocument document = new FixedDocument(inputStream))
        {     
            // enumerate through all pages and extract text from it
            foreach (Page page in document.Pages)
            {
                Console.WriteLine(string.Format(page.ExtractText()));
            }
        }
    }
}
 

This code opens the PDF file and prints the text from each page by enumerating the document's page collection.

Add new page to PDF document

Let’s open the document and add a new empty page into it sized to Letter:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{        
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // add new page sized to Letter paper format into the document 
        // and save modified document to the output stream
        document.Pages.Add(new Page(Boundaries.Letter));         
        document.Save(outputStream);
    }
}
 

You can also change the desired page size by passing one of the predefined format values or creating a custom-sized page using an overload function.

Remove page from PDF document

Let’s open the PDF document and delete its first page:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // remove the first page and save
        document.Pages.RemoveAt(0);      
        document.Save(outputStream);
    }
}
 

You may remove the desired page by passing its index to the RemoveAt() method or by simply passing the Page object itself to the Remove() method defined in page collection class.

Move pages within PDF document

Let’s open the PDF document and move its third page to the 11th position in the original document:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // get the 3rd page
        Page page = document.Pages[2];

        // remove 3rd page from the document
        document.Pages.Remove(page);

        // insert 3rd page as 11th page
        document.Pages.Insert(10, page); 

        document.Save(outputStream);
    }
}
 

Alternatively, this task can be completed by removing the page from the document first and then inserting it at the desired position.

Copy pages from one PDF document to another

It’s possible to copy pages between documents, see the code below:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // create new document that will get the copied page
        using(FixedDocument outDocument = new FixedDocument())
        {
            // export page from source document and insert to the destination
            outDocument.Pages.Insert(0, Page.Export(outDocument, document.Pages[0]));

            outDocument.Save(outputStream);
        }
    }
}
 

In order to successfully copy the page from one document to another, the page should be exported first. It’s being done by calling Page.Export() static method accepting the target document and the page to be copied. Exporting is needed because the page can reference resources defined in the document that should also be transferred.
Alternative way exists in newer versions and you can read about it here: copy pages from one pdf document to another

Swap pages in PDF document

It’s possible to swap pages using add and remove operations demonstrated in preceding articles combined together:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // swap pages
        Page page = document.Pages[0];
        Page page2 = document.Pages[4];

        document.Pages.Remove(page);
        document.Pages.Remove(page2);

        document.Pages.Insert(4, page);
        document.Pages.Insert(0, page2);

        // save to output stream
        document.Save(outputStream);
    } 
}
 

We get the 1st and 4th pages and then swap them by removing from their original position and adding to the right place.

Setting PDF page parameters

Page has own properties which can affect its behavior and also its content, e.g. initial transformation that changes the content placement. Using the code below we change the size of the first page, initial rotation for the second page and content transformation matrix for the third page.

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // Resize first page
        Page page = document.Pages[0];
        page.Resize(new PageBoundary(Boundaries.A3));
        page.Transform(1, 0, 0, 1, 100, 100);

        // Rotate second page
        Page page2 = document.Pages[1];
        page2.Rotate = PageRotate.Rotate90;

        // Transform content on third page
        Page page3 = document.Pages[2];
        page3.Transform(0.5, 0, 0, 0.5, 0, 0);
        page3.Transform(1, 0, 0, 1, 100, 100);

        // save to output stream
        document.Save(outputStream);
    }
}
 

Setting initial viewer settings and page layout

PDF document may have settings which describe the preferred viewer state and page layout when the document is being opened. A conforming reader (Adobe PDF Reader for example) may respect these settings and react accordingly.
Viewer settings and initial page showing mode can be controlled by setting various flags using FixedDocument.ViewerPreferences property. E.g. HideToolbar setting controls whether the reader toolbars should be shown when the document is active.
Desired page layout can be set using FixedDocument.PageLayout property, so you’ll be able to choose which layout fits better for your document and ask the reader to apply it by default. The code below shows how to use these settings:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // change viewer preferences
        document.PageLayout = PageLayout.TwoPageLeft;
        document.PageMode = PageMode.UseThumbs;
        document.ViewerPreferences.HideMenubar = true;
        document.ViewerPreferences.HideToolbar = true;

        // save to output stream
        document.Save(outputStream);  
    }  
}
 

Working with PDF fields

PDF defines the term field as related to the interactive forms and annotations and it often acts as backing store for objects like textbox, checkbox, button etc. Each field has its entry in document catalog AcroForm dictionary which is defined in our case as AcroForm property of the FixedDocument class. Section 12.7 “Interactive Forms” of the PDF specification has a complete description of fields and interactive forms.

While the subject remains quite complex, it’s easier to think about document field as of some property or attribute attached to the document. Its content can be used for further processing by automation tools as a part of some workflow or for any other purposes where such attribute can be meaningful.

Fields can have visual representations placed on PDF page which are called widget annotations, but it’s not necessary. These widgets (visual field representations) can be attached to fields, and their appearance can be affected by fields values. It’s also possible to attach multiple widgets to a single field therefore affecting them all with a single value.

The code below shows how to work with fields, it enumerates existing fields and adds new text field to the document after that, fields and widgets are also discussed in the article describing interactive forms.

// open existing PDF file and add text field
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // print names of the existing fields
        foreach (Field field in document.AcroForm.Fields)
        {
            Console.WriteLine(field.FieldName);
        }

        // add text field
        TextField textField = new TextField("tx_field", "Text field", "This is text field");
        document.AcroForm.Fields.Add(textField);

        // save to output stream
        document.Save(outputStream);   
    } 
}
 

How to manage attachments in PDF documents

It can be surprising, but a PDF document can have multiple file attachments and these files can be linked from the document's content, therefore it turns PDF document to a self-contained unit that can be stored or transferred as a single object. Section 7.11.4 “Embedded File Streams” of the PDF specification describes this functionality in details.
If you want to attach a file to PDF document you may use FixedDocument.Attachments property, it returns an EmbeddedFileCollection that belongs to the given document.

Consider the code below:

// open existing PDF file
using (FileStream inputStream = new FileStream("testfile.pdf", FileMode.Open, FileAccess.Read),
    outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{             
    using(FixedDocument document = new FixedDocument(inputStream))
    {
        // get the list of attachments
        Console.WriteLine("The count of attachments is : " + document.Attachments.Count);
        Console.WriteLine("List of attachments : ");
        foreach (var attachment in document.Attachments)
        {
            Console.WriteLine(attachment.Key);
        }
        // add attachment
        document.Attachments.Add("Attachment # 1", new EmbeddedFile(@"attachment.pdf", "type/pdf"));
        // save to output stream
        document.Save(outputStream);    
    }
}
 

We enumerate the list of existing attachments first and then add another PDF file as a new attachment. Attachments can be removed from the document by using FixedDocument.Attachments.RemoveAttachment() member function.

Content flattening

If you don't want the original vector content to be present in your PDF documents, you can flatten it, that is convert to an image and replace the old content saving the result as a new document.

Incrementally update PDF documents

Section 7.5.6 “Incremental Updates” of the PDF specification says: "The contents of a PDF file can be updated incrementally without rewriting the entire file. When updating a PDF file incrementally, changes shall be appended to the end of the file, leaving its original contents intact."

    NOTE: The main advantage in updating a file this way is that small changes to a large document can be
    saved quickly. There are additional advantages: In certain contexts, such as when editing a document
    across HTTP connection or using OLE embedding (a Windows-specific technology), a conforming writer
    cannot overwrite the contents of the original file. Incremental updates may be used to save changes to
    documents in these contexts.
 

The Apitron PDF Kit provides a way to perform incremental updates and write update data to the same stream the FixedDocument instance was created from (stream should support writing).

The following code shows how to update an existing PDF document using this technique:

// open file for updating
using (Stream inputStream = File.Open("file_to_update.pdf",FileMode.Open,FileAccess.ReadWrite))
{
    // create document and add one page to it
    FixedDocument fixedDocument = new FixedDocument(inputStream);
    
    // create path, set color and fill it
    Path path = new Path();
    path.AppendRectangle(0,0,100,100);
    fixedDocument.Pages[0].Content.SetDeviceNonStrokingColor(new double[]{1,0,0});
    fixedDocument.Pages[0].Content.FillPath(path);
    
    // save changes as incremental update to the original document
    fixedDocument.Save();
}
 

Notice the Save() call without any arguments once the changes were added to the document. It performs an incremental update using the same stream the FixedDocument object was created from. Resulting file will have a red rectangle drawn at the lower left corner of its first page.