Text in PDF

From
Jump to: navigation, search

Overview

The Apitron PDF Kit and Apitron PDF Rasterizer libraries support all text features described in PDF specification. It’s important to note that they also automatically handle bi-directional text entries often used in Arabic and Asian cultures.

Any text in PDF has the following key attributes:

Subtopics below will guide you through all aspects related to these properties and will show how to use them practically.

Fonts in PDF

Several font types are defined in PDF spec and described in terms of font file format, encodings, character maps and other usual font characteristics. We will discuss fonts from the other point of view, because in most of the cases you won’t be thinking whether your font is stored in TrueType, OpenType, CFF or other font file format. The most important things are however, whether it will be accessible to the viewer of prepared document and how it’ll affect the resulting PDF file.

Simple fonts

These fonts share a few key properties, namely:

  • Glyphs in the font are selected by single-byte character codes, so only 256 glyphs can be addressed. However by using different encodings (charcode to glyph mappings) this limitation can be overcome to certain extent
  • Each glyph has single set of metrics, including its horizontal displacement and width, thus these fonts support only horizontal writing mode
  • Except for certain exceptions, the font descriptor (PDF object describing the font used for drawing partucular text), contains a font-wide set of font attributes

Standard fonts

Fonts defined by PDF specification as to be supported by any conforming PDF reader and therefore documents created using such fonts should be always viewable. These fonts don’t require any font data to be written into the resulting PDF file and don’t affect its size. These fonts are: Times-Roman, Helvetica, Courier, Symbol, Times-Bold, Helvetica-Bold, Courier-Bold, ZapfDingbats, Times-Italic, Helvetica-Oblique, Courier-Oblique, Times-BoldItalic, Helvetica-BoldOblique, Courier-BoldOblique.

Apitron PDF Kit defines a StandardFonts enum that maps one-to-one to this set.

See section 9.6.2.2 “Standard Type 1 Fonts (Standard 14 Fonts)” of the PDF specification for the details. Sample code below, shows how one could use a standard font for a text object:

Usage in Fixed layout API
// create text object based on standard Type1 font
TextObject text = new TextObject(StandardFonts.TimesBold, 12);
 
Usage in Flow layout API
// set the font using inline style property
TextBlock text= new TextBlock("Hello world!"){Font=new Font(StandardFonts.HelveticaBold,16)};
 


Type 1 fonts

A Type 1 font program is a PostScript language program that describes glyph shapes. It uses compact encoding for glyph descriptions, and includes hint information that enables high-quality rendering at different sized and resolutions. In fact, standard 14 fonts are type 1 fonts.

TrueType fonts

TrueType is perhaps the most widely used font file format, developed initially by Apple and further evolved to OpenType jointly developed by Apple and Microsoft. In PDF both font formats are supported and TrueType or OpenType font programs can be embedded directly into the document as PDF streams.

Type 3 fonts

The Type 3 fonts differ significantly from the other font formats supported by PDF. They contain all necessary data for painting their glyphs which are in turn constructed using the PDF drawing operators. A special encoding maps character codes to glyph names associated with their graphical representation. Check the article explaining this feature in details by the following link.

Composite fonts

Type 0 fonts

Fonts of this type represent the so-called CID fonts or composite fonts. They support multibyte character encodings for selecting the corresponding glyphs and are often used for representing text in writing systems for languages with large character sets, such as Chinese, Japanese, and Korean (CJK). You don't need to do anything special to use them for generating PDF documents, all necessary operations are being performed for you automatically by the Apitron PDF Kit library and don't differ from using TrueType or Type 1 fonts.

Embedded font programs

As their name suggests, these fonts are getting embedded into the PDF file making it self-contained and viewable on all systems where a conforming reader exists. They also affect the resulting file size as the document itself carries all necessary fonts. It’s possible to embed only the data(glyphs, metrics etc.) needed to display the text contained in a particular PDF file and it’s what Apitron PDF Kit does when it has to embed font data. This technique is called font-subsetting and only the glyphs actually used in document's text along with the accompanying data needed to describe this new font subset are being embedded into the resulting PDF file as a reduced-size font file. PDF documents may include subsets of the following font types: Type1, TrueType or OpenType. Our PDF rendering solution, the Apitron PDF Rasterizer also fully supports embedded font programs and doesn't need any special manipulations to process them. Both Fixed layout API and Flow layout API handle this case fully automatically.

External fonts

By external fonts we assume font files in formats supported by the PDF specification and described above and installed to the default system fonts folder location, e.g. one of the fonts from “C:\Windows\Fonts” or included with the reader app. They could be referenced by font name and loaded when document is being viewed by using any of the conforming readers. These fonts also don’t affect documents' size because their data is not included into the resulting file. If the requested file is not found during the generation or rendering of the the document, a fallback or substitution font will be used. It's also possible to directly specify font substitutions for both Apitron PDF Kit and Apitron PDF Rasterizer libraries.

Usage in Fixed layout API

// create text object based on external font that should exist in the target system
TextObject text = new TextObject("Arial", 12);
 

Usage in Flow layout API

// set the external font using inline style property
TextBlock text= new TextBlock("Hello world!"){Font=new Font("Arial",16)};
 

Text state parameters and operators

A text state is a subset of graphics state which controls text parameters used for painting the text. A PDF text object encloses a sequence of text operators and associated parameters that may show text strings, alter text position and modify other text-related parameters. This text object represents a sequence of text commands and therefore results depend on their order. In Fixed layout API a PDF text object is represented by the TextObject class which acts as a container for both the text state and text operators and provides access to all functionality defined by its PDF-spec defined counterpart.

Font resource and font size

The text font parameter, controls which font resource will be selected to paint text, while text font size parameter sets the font size and is given in units used in user coordinate space. In the default user coordinate system 1 unit equals 1/72 inch and is also commonly defined as point (it is possible to change this by setting the UserUnit for the PDF page), so you have to specify the desired font size as a scale factor applied to this base unit.

The TextObject class provides SetFont instance method for this, see code the sample below:

// create text object based on standard font
TextObject text = new TextObject(StandardFonts.Helvetica, 16);
text.AppendTextLine("Helvetica");

// set font
text.SetFont(StandardFonts.TimesBoldItalic, 16);
text.AppendTextLine("TimesBoldItalic");
 

Character spacing

The character-spacing affects the distance between the glyphs and is specified in unscaled text space units, and is a subject for horizontal scaling if this parameter is also applied. For horizontal writing positive values increase spacing, for vertical writing a negative value provides the same effect.

The TextObject class provides SetCharacterSpacing instance method for this, see the code sample below:

// create text object based on standard font and set character spacing
TextObject text = new TextObject(StandardFonts.HelveticaBold, 11);
text.SetCharacterSpacing(1.33);
 

Word spacing

The word-spacing works the same way as character spacing but is applied only to ASCII SPACE character (20h) thus affecting the distance between words.

The TextObject class provides SetWordSpacing instance method for this, see the code sample below:

// create text object based on standard font and set word spacing
TextObject text = new TextObject(StandardFonts.HelveticaBold, 11);
text.SetWordSpacing(5.0);
 

Horizontal scaling

The horizontal scaling paramter scales the widths of the glyphs by stretching or compressing them in the horizontal direction. Its value is specified as a percentage of the normal width of the glyph, with 100% taken as a normal width. This scaling applies to the horizontal coordinate in text space, independently of the writing mode (horizontal or vertical). It affects the glyph’s shape and its horizontal displacement. If the writing mode is horizontal, it will also affect character spacing and word spacing parameters as well as any glyph positioning adjustments.

The TextObject class provides SetHorizontalScaling instance method for this, see the code sample below:

// create text object based on standard font and set horizontal scaling
TextObject text = new TextObject(StandardFonts.HelveticaBold, 11);
text.SetHorizontalScaling(200);
 

Leading

The text leading parameter specifies the vertical distance between the baselines of adjacent lines of text and is independent of writing mode.

The TextObject class provides SetTextLeading instance method for this, see the code sample below:

// create text object based on standard font and set text leading
TextObject text = new TextObject("Helvetica", 11);
text.SetTextLeading(16);
text.AppendText("1st line");
text.AppendTextLine("2nd line");
 

Text rendering mode

This parameter controls the text rendering mode, it determines whether showing text causes glyph outlines to be stroked, filled used as a clipping boundary, or some combination of the three.

The TextObject class provides SetTextRenderingMode instance method for this, see the code sample below:

TextObject text = new TextObject("MyFont", 11.5);
text.SetTextRenderingMode(RenderingMode.FillAndStrokeText);
 

Text rise

This parameter controls the text rise. It specifies the distance to move the baseline up or down from its default location, where positive values of text rise move the baseline up and negative do the opposite. It applies only to the vertical coordinate in text space, and doesn't take into account the writing mode.

The TextObject class provides SetTextRise instance method for this, see the code sample below:

TextObject text = new TextObject(StandardFonts.CourierBold, 11);
text.AppendText("1st line");
text.AppendTextLine("2nd line");
text.SetTextRenderingMode(RenderingMode.FillAndStrokeText);
text.SetTextMatrix(1, 0, 0, 1, 10, (Boundaries.A4).Height - 60);
text.SetTextRise(2.333);
text.AppendText("First");
text.SetTextRise(-2.333);
text.AppendText("Second");
text.SetTextRise(3.333);
text.AppendText("Third");
text.SetTextRise(-3.333);
text.AppendText("Forth");
 

Text objects

Text positioning operators

There are a few text positioning operators and all are implemented by the TextObject class as instance members. The first one is SetTextMatrix and it sets the mapping from the text space to the user space, in addition methods like SetTranslation, SetRotation, and SetSkew are implemented to simplify the task and essentialy set current text matrix with given parameters. When set, it should replace the previous setting and should not be multiplied with the current text matrix. See the code sample below:

TextObject text = new TextObject(StandardFonts.CourierBold, 20);
text.SetTextMatrix(1, 0, 0, 1, 10, 750);
 

Other methods are variations of MoveToNextLine method such as an overload accepting horizontal and vertical offset as well as MoveToNextLineAndSetLeading which changes the leading parameter used by the TextObject.

See the sample code below:

TextObject text = new TextObject(StandardFonts.CourierBold, 20);
text.SetTextRenderingMode(RenderingMode.FillAndStrokeText);
text.SetTextMatrix(1, 0, 0, 1, 10, (Boundaries.A4).Height - 50);
text.AppendText("A B C");
// to obtain the 3D effect we will add the text above the same text line with minimal offset
text.MoveToNextLine(-1, 1); 
text.AppendText("A B C");
 

it produces the following results:

TextObject and MoveToNextLine() call effect

Text showing operators

These operators are represented by the following set of members:

AppendText() - appends a text string to the text object and paints it using all parameters set at the moment of the call, see the sample code:

FixedDocument document = new FixedDocument();
TextObject text = new TextObject("Helvetica", 14);
text.SetTextRenderingMode(RenderingMode.FillText);
text.AppendText("Hello World!");
Page page = new Page(new PageBoundary(Boundaries.A4));
document.Pages.Add(page);
document.Pages[0].Content.AppendText(text);
 

AppendTextLine() with overloads - appends and paints the text using the current leading, see the sample code:

FixedDocument document = new FixedDocument();
TextObject text = new TextObject("Helvetica", 14);
text.SetTextRenderingMode(RenderingMode.FillText);
text.SetTextLeading(20);
text.AppendText("Hello World!");
// Appends the new line text.
text.AppendTextLine("Text on the new line.");
document.Pages[0].Content.AppendText(text);
 

AppendTextWithPositions() with overloads - appends and paints a set of strings with positions allowing for individual glyph positioning, see the sample code:

TextObject text = new TextObject(StandardFonts.CourierBold, 20);
text.SetTextRenderingMode(RenderingMode.FillAndStrokeText);
text.SetTextMatrix(1, 0, 0, 1, 10, (Boundaries.A4).Height - 50);

string[] worlds = new string[4] {"Apitron's", "world", "of", "PDF"};
double[] tjs = new double[4] {-5000f, -3000f, -1000f, 0f};
text.AppendTextWithPositions(worlds, tjs);
document.Pages[0].Content.AppendText(text);
 

Code samples

Code samples below, created using Fixed layout API, show how one could use text objects and supported operators for placing text objects on document's page.

Add text using one of the standard fonts

// create output PDF file stream
using (FileStream outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{
    // create new document
    using(FixedDocument document = new FixedDocument())
    {
        // add blank first page
        document.Pages.Add(new Page(Boundaries.A4));

        // create text object and append text to it
        TextObject textObject = new TextObject(StandardFonts.Helvetica,12);                

        // apply identity matrix, that doesn't change default appearance
        textObject.SetTextMatrix(1,0,0,1,0,0);
        textObject.AppendText("Hello world using Apitron PDF Kit!");

        // set current transformation matrix so text will be added to the top of the page,
        // PDF coordinate system has Y-axis directed from bottom to top.
        document.Pages[0].Content.Translate(10, 820);

        // add text object to page content, it will automatically create text showing operators                                
        document.Pages[0].Content.AppendText(textObject);

        // save to output stream
        document.Save(outputStream);
    }
}
 

The output can be found below:

Text object usage

We created an empty document, added new page to it and appended an instance of TextObject class into its content. For this text object we also created a text matrix and indicated that we’d like to use one of the standard fonts. This clean example based on Fixed layout API hides many low-level details behind the scene e.g. creation of necessary operators, providing you with a clean and straightforward way to get job done. Other text options can be set using text object instance, e.g. leading, char and word spacing, text rise, rendering mode etc. See section 9.4 “Text Objects” of the PDF specification for the complete list.

Other thing to notice is how we positioned the text on the page. It was done by altering the current transformation object for the page's content to which our textobject was subsequently added. This transformation set the initial position of the text object's coordinate space.

// set current transformation matrix, so the text will be added to the top of the page,
// PDF coordinate system has Y-axis pointing from bottom to top.
document.Pages[0].Content.Translate(10, 820);
 

So we’ve added 10pt offset to the X-coordinate and 820pt to the Y coordinate of the page's initial transformation, moving our objects from left to the right and from bottom to the top. This way we got our text object placed on top of the page. If there were another text object added after the first one, it would have to have an additional transformation applied in order to not overlap with the first one.

Add text using one of the external fonts, set its color and other properties

// create output PDF file
using (FileStream outputStream = new FileStream("outfile.pdf", FileMode.Create, FileAccess.Write))
{
    // create new document
    using(FixedDocument document = new FixedDocument())
    {
        // add blank first page
        document.Pages.Add(new Page(Boundaries.A4));

        // create text object and append text to it
        TextObject textObject = new TextObject("Arial", 14);

        // apply identity matrix, that doesn't change default appearance
        textObject.SetTextMatrix(1, 0, 0, 1, 0, 0);
        textObject.AppendText("Hello world using Apitron PDF Kit!");
                                
        textObject.SetFont("ArialItalic",14);
        // apply vertical scaling and offset
        textObject.SetTextMatrix(1, 0, 0, 2.5, 0, -40);

        // set mode to stroke only
        textObject.SetTextRenderingMode(RenderingMode.StrokeText);
        textObject.AppendText("Hello world using Apitron PDF Kit!");

        // set current stroking and non-stroking color
        document.Pages[0].Content.SetDeviceStrokingColor(new double[]{1,0,0});
        document.Pages[0].Content.SetDeviceNonStrokingColor(new double[]{1,0,0});

        // set current transformation                    
        document.Pages[0].Content.Translate(10, 820);
        // add text object to page content, it will automatically create text showing operators                                
        document.Pages[0].Content.AppendText(textObject);
                   
        // save to output stream
        document.Save(outputStream);
    }
}
 

This code produces the following results:

Add text to PDF file using external font and set text properties

These two lines of text were added using single text object. As it’s been said at the beginning of this article, text object is a sequence of text commands; therefore it’s possible to create several textual sequences with different appearance contained in one text object.

You may also note that we changed the text color, using this code:

// set current stroking and non-stroking color
document.Pages[0].Content.SetDeviceStrokingColor(new double[]{1,0,0});
document.Pages[0].Content.SetDeviceNonStrokingColor(new double[]{1,0,0});
 

It set both stroking and non-stroking colors to red by specifying its RGB value in so-called device color space. It was automatically detected from number of arguments and was set to DeviceRGB (see section 8.6.4.3 “DeviceRGB Colour Space” of the PDF specification). All subsequent drawing commands added after these calls and specifying filling or stroking would have this color applied to them. In our example we added text after these calls so it became red.

Right to left and bidirectional text

The Apitron PDF Kit provides automatic support for adding right to left and bidirectional text, the sample below demonstrates how it could be added.

// create output file
using (Stream outputStream = File.Create("rtl_and_bidi.pdf"))
{
    // create document and add one page to it
    using(FixedDocument fixedDocument = new FixedDocument())
    {
        fixedDocument.Pages.Add(new Page());

        ClippedContent content = fixedDocument.Pages[0].Content;
        content.Translate(10, 820);

        // add text using regular textobject, using system font
        TextObject textObject = new TextObject("Traditional Arabic", 12);
        textObject.AppendText("Bi-Directional test: Hello world ! مرحبا بالعالم! End of text.");              
        content.AppendText(textObject);
        // save document
        fixedDocument.Save(outputStream);
    }
}
 

Image below demonstrates how the resulting document looks:

Right to left and bi-directional text

It only requires you to set the proper font containing glyphs for the characters you used, and that’s it. It could be either the one of the external fonts or the one set using the explicit font path.