Difference between revisions of "Text in PDF"

From
Jump to: navigation, search
m
Line 52: Line 52:
  
 
==Text operators==
 
==Text operators==
A PDF text object consists of operators that may show text strings, move text position, set text state and certain other parameters. This text object represents a '''sequence''' of text commands and therefore results depend on their '''order'''. Code sample below, created using [[Fixed layout API], shows how one could use it for placing text on document's page.
+
A PDF text object consists of operators that may show text strings, move text position, set text state and certain other parameters. This text object represents a '''sequence''' of text commands and therefore results depend on their '''order'''. Code sample below, created using [[Fixed layout API]], shows how one could use it for placing text on document's page.
  
 
  <nowiki>
 
  <nowiki>

Revision as of 14:36, 13 February 2018

Introduction

The Apitron PDF Kit and Apitron PDF Rasterizer libraries implement all text features described in PDF specification. It’s important to note that they also automatically handle bi-directional text entries often used in Arabic and Asian cultures.

Any text in PDF has the following key attributes:

  • Font, can be one of the standard fonts, externally linked or embedded
  • Text positioning and showing operators, describing the text transformation and state
  • Stroking and non-stroking colors

Subtopics below will guide you through all aspects related to these properties and will show how to use them practically.

Fonts in PDF

Several font types are defined in PDF spec and described in terms of font file format, encodings, character maps and other usual font characteristics. We will discuss fonts from the other point of view, because in most of the cases you won’t be thinking whether your font is stored in TrueType, OpenType, CFF or other font file format. The most important things are however, whether it will be accessible to the viewer of prepared document and how it’ll affect the resulting PDF file.

So far there are three font types you have to deal with:

Standard fonts

Fonts defined by PDF specification as to be supported by any conforming PDF reader and therefore documents created using such fonts should be always viewable. These fonts don’t require any font data to be written into the resulting PDF file and don’t affect its size. These fonts are: Times-Roman, Helvetica, Courier, Symbol, Times-Bold, Helvetica-Bold, Courier-Bold, ZapfDingbats, Times-Italic, Helvetica-Oblique, Courier-Oblique, Times-BoldItalic, Helvetica-BoldOblique, Courier-BoldOblique.

Apitron PDF Kit defines a StandardFonts enum that maps one-to-one to this set.

See section 9.6.2.2 “Standard Type 1 Fonts (Standard 14 Fonts)” of the PDF specification for the details. Sample code below, shows how one could use a standard font for a text object:

Usage in Fixed layout API

// create text object based on standard Type1 font
TextObject text = new TextObject(StandardFonts.TimesBold, 12);
 

Usage in Flow layout API

// set the font using inline style property
TextBlock text= new TextBlock("Hello world!"){Font=new Font(StandardFonts.HelveticaBold,16)};
 

External fonts

Fonts assumed to be installed to the default system fonts folder location, e.g. one of the fonts from “C:\Windows\Fonts” or included with the reader app. They could be loaded when document is being viewed using any of the conforming readers. These fonts also don’t affect documents' size because their data is not included into the resulting file. If the requested file is not found during the generation or rendering of the the document, a fallback or substitution font will be used. It's also possible to specify font substitutions for both Apitron PDF Kit and Apitron PDF Rasterizer libraries.

Usage in Fixed layout API

// create text object based on external font that should exist in the target system
TextObject text = new TextObject("Arial", 12);
 

Usage in Flow layout API

// set the external font using inline style property
TextBlock text= new TextBlock("Hello world!"){Font=new Font("Arial",16)};
 

Embedded fonts

As their name suggests, these fonts are getting included into the PDF file making it self-contained and viewable on all systems where a conforming reader exists. They also affect the resulting file size. It’s possible to embed only the data needed to display the particular text contained in a certain PDF file and it’s what Apitron PDF Kit does when it has to embed font data. This technique is called font-subsetting and only the glyphs actually used in document's text along with the accompanying data needed to describe this new font subset are being embedded into the resulting PDF file as a reduced-size font file. Apitron PDF Rasterizer fully supports embedded font programs and doesn't need any special manipulation to handle them. Both Fixed layout API and Flow layout API handle this case fully automatically.

Text operators

A PDF text object consists of operators that may show text strings, move text position, set text state and certain other parameters. This text object represents a sequence of text commands and therefore results depend on their order. Code sample below, created using Fixed layout API, shows how one could use it for placing text on document's page.

// create output PDF file stream
using (FileStream outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{
    // create new document
    using(FixedDocument document = new FixedDocument())
    {
        // add blank first page
        document.Pages.Add(new Page(Boundaries.A4));

        // create text object and append text to it
        TextObject textObject = new TextObject(StandardFonts.Helvetica,12);                

        // apply identity matrix, that doesn't change default appearance
        textObject.SetTextMatrix(1,0,0,1,0,0);
        textObject.AppendText("Hello world using Apitron PDF Kit!");

        // set current transformation matrix so text will be added to the top of the page,
        // PDF coordinate system has Y-axis directed from bottom to top.
        document.Pages[0].Content.Translate(10, 820);

        // add text object to page content, it will automatically create text showing operators                                
        document.Pages[0].Content.AppendText(textObject);

        // save to output stream
        document.Save(outputStream);
    }
}
 

The output can be found below:

Text object usage

We created an empty document, added new page to it and appended an instance of TextObject class into its content. For this text object we also created a text matrix and indicated that we’d like to use one of the standard fonts. This clean example based on Fixed layout API hides many low-level details behind the scene e.g. creation of necessary operators, providing you with a clean and straightforward way to get job done. Other text options can be set using text object instance, e.g. leading, char and word spacing, text rise, rendering mode etc. See section 9.4 “Text Objects” of the PDF specification for the complete list.

Other thing to notice is how we positioned the text on the page. It was done by altering the current transformation object for the page's content to which our textobject was subsequently added. This transformation set the initial position of the text object's coordinate space.

// set current transformation matrix, so the text will be added to the top of the page,
// PDF coordinate system has Y-axis pointing from bottom to top.
document.Pages[0].Content.Translate(10, 820);
 

So we’ve added 10pt offset to the X-coordinate and 820pt to the Y coordinate of the page's initial transformation, moving our objects from left to the right and from bottom to the top. This way we got our text object placed on top of the page. If there were another text object added after the first one, it would have to have an additional transformation applied in order to not overlap with the first one.