MoEML’s PDF Developer Documentation

This document is currently in draft. When it has been reviewed and proofed, it will be published on the site.

View the draft document.

Please note that it is not of publishable quality yet.

MoEML’s PDF Developer Documentation

XSLT files responsible for creating MoEML PDFs:

  • pdf_globals.xsl
  • createPdfs.xml (ant application)
  • add_special_styles_to_fo_master.xsl
  • get_attribute_sets_names.xsl
  • list_classes_for_pdf.xsl
  • list_fonts_for_pdf.xsl
  • list_images_for_pdf.xsl
  • xhtml5_to_fo_master.xsl
  • xhtml5_to_fo_styles_module.xsl
This documentation goes through the XSL and XML files responsible for building the PDFs.

pdf_globals.xsl

The pdf_globals is an XSL file that contains the variables, parameters, and functions necessary for the creation of every PDF. $docId is the variable that gets the xml:id of the document, so that we can use it to build this particular file and all of its required components. $attSetDoc is a variable that gets the docId-specific styling module and the master styling module (used later to assign corresponding attribute sets as per the respective classes). Two parameters assign the locations of necessary folders: the FO folder and the output folder. Two other parameters get the titles of the document at hand: one for born digital files and one for primary sources. pdf_globals also contains a function, getAtts, that allows us to add attributes from the class’s corresponding attribute sets, in addition to the attributes that are unique to the class, which we retrieve from the style element in the original HTML source document. This function finds the appropriate attributes selected in the master styles module, and then finds the document specific attribute sets. Attributes get added if they were not already, and if they are already written in, they get overwritten by the specific styling. The same function also manipulates some attributes and their values, to accommodate the FO restrictions, such as rem (replaced with pt), and the small-caps font variant, which gets a specific font-family selected for this particular purpose.

createPdfs.xml

createPdfs.xml is the Ant application that builds the selected PDFs. This file runs through the terminal application. It checks for FOP; if it is not available it gets downloaded. If FOP is downloaded, it checks whether or not it is up to date and updates it if need be. Images and fonts folders are defined in properties, to be used later in retrieving the appropriate material. The listFiles property lists the document ids that need to be built into PDFs.
The Ant application creates a list of images mentioned and used in the source documents and then downloads said images. The same process is applied to the fonts. In addition, the ant application runs the following targets:
  • createSpecialStylesModule
    Creates a special styles module, particular to the document that is being processed. This special module contains the formatting from the <html>/<head>/<style> element.
  • addSpecialModulesToMaster
    Adds the special module to other XSLTs responsible for the creation of the PDFs.
  • ValidateFo
    Validates the resulting FO document.
  • getSourceFileFromJenkins
    Is responsible for retrieving the source HTML file from our server.
  • processOneFile
    Is the main (though not the default) target that processes the file (docId) in the following order: gets the file from Jenkins, creates the special styles module, adds the special module to the master module, creates the images list, copies the images, creates the fonts list, copies the fonts, creates the FO file, creates the PDF, and finally gets the .png of the title (cover) page (which is used in the creation of ePubs).
  • createFO
    Creates the XSL:FO file from the HTML source file, as per the docId_xhtml5_to_fo transformation.
  • createPdf
    Creates the PDF file from the docID.fo file using the FOP application.
  • getTitlePage
    Gets the PDF’s title page in .png format, so we can use it for the ePubs. For this we use pdftoppm execution.
  • buildFiles
    Is the default target; it runs processOneFile to build all the documents listed in the listFiles property.

add_special_styles_to_fo_master.xsl

This XSL transformation is designed to include a special XSL styles module in the xhtml5_to_fo_master. It does so by creating a variable that gets the attributes from the special styles module. The file has templates that match onto the following elements: <div>, <span>, and <img>. These templates find the classes and their corresponding attribute sets to add the necessary attributes and their values, and apply other templates as required.

get_attribute_sets_names.xsl

This stylesheet is a rather simple one that reads the special styles module (docId_styles_module.xsl) and gets the attribute sets that need to be used. It stores them in an output file listOfSets.txt which gets used in xhtml5_to_fo_master.xsl.

list_classes_for_pdf.xsl

This XSL transformation is designed to read the <style> element in the HTML file and list the styling classes needed for the particular PDF being built. It starts by parsing the <style> element as a string, stored in a variable ($style). The variable $parsedClasses stores the name of the classes so that we use them later in naming the attribute-sets. $attribute is the variable that has the attribute names as well as their values.
The root template of this transformation creates the attribute sets corresponding to classes mentioned in the style element of the source file. It results in a document saved in db/data/static/xsl called $docId_styles_module.xsl. The template recreates the following variables: parsedClasses, and parsedStyle.

list_fonts_for_pdf.xsl

This XSLT is designed to read the CSS files and list the fonts needed for the particular PDF being built. We adopt it from the transformation designed for ePubs. The root (main) template creates a list of fonts mentioned in the corresponding CSS files, to be then used by the ant file to copy these fonts into the PDF container folder. It first creates a variable that lists the css files. The files are then tokenized in $tokenizedCss and normalized in $allCss. The variable $parsedCssFonts results in a document listOfFonts.txt that contains $distinctSiteFonts, which gets the distinct values of the $parsedCssFonts variable.

list_images_for_pdf.xsl

This XSLT reads the XHTML and CSS files and list the images needed for the particular PDF being built. It is very similar in its structure to list_fonts_for_pdf.xsl: The root template creates a list of images mentioned in the corresponding CSS files as well as the XHTML file, to be then used by the ant file to copy these images into the PDF container folder. This template first creates a variable that lists the CSS files. The files are then tokenized in $tokenizedCss and normalized in $allCss. The variable $parsedCssImages results in a document listOfImages.txt that contains $distinctSiteImages, which gets the distinct values of the $parsedCssImages variable. This transformation also has a few hard coded images, given that they are not present in the CSS or XHTML files.

xhtml5_to_fo_master.xsl

This XSLT is designed to convert a MoEML XHTML5 static site document into a PDF. It converts the document to XSL:FO, validates the FO, and then the calling Ant script uses FOP to generate a PDF. This file includes the styles module xhtml5_to_fo_styles_module.xsl (discussed below), and pfd_globals.xsl. The root template sets up the FO basics: four simple page masters (title page, first page, recto page, and verso page). The sequence of pages follows. The main page-sequence contains the footers (title page, recto, and verso) and headers (recto and verso). The headers differ in their code as per born digital or primary source, because of the structure of their titles. The template that matches on the <html> element applies templates (both named and unnamed as discussed below).
  • CreateTitlePage
    This template processes the metadata in the page header to get the key information. The aesthetic and stylistic components include: a background image, two flower logos (top and bottom), decorative lines (top and bottom) between which the title of the document sits, and a snippet of the agas map. The size of the title and the authors depends on the length of the title and the number of authors listed. The title page also contains the edition information (which is basically the release version at the time of the build).
  • CreateHybridTitlePage
    This named template creates another title page that contains hybrid metadata, which includes the title, authors, compilers, and editors, in addition to the publication information. This template contains two conditions: one is when the document at hand is a born digital file, and the other is when it is a primary source file.

Unnamed Templates

  • Matches <div>
    <div>s are transformed into <fo:blocks>. Their ids are replicated into @id attributes. Depending on the class attribute of the <div>, the <fo:block> element gets assigned the appropriate attribute set(s). <div>s that are children or descendant of the appendix <div>, get special attribute sets that correspond to the appendix styling of the PDF.
  • Matches <nav>
    <nav> elements are transformed into <fo:blocks>, and their ids are copied into corresponding @id attributes.
  • Matches <ul>
    <ul> elements are transformed into <fo:list-block> elements. Their ids are replicated into corresponding @id attributes. <ul>s that are descendant of the appendix get special styling to correspond with the appendix styles.
  • Matches <ol>
    <ol> elements are transformed into <fo:list-block> elements. Their ids are replicated into corresponding @id attributes. <ol>s that are descendant of the appendix get special styling to correspond with the appendix styles.
  • Matches <li>
    <li> elements are transformed into <fo:list-item> elements. Every <fo:list-item> contains an <fo:list-item-label> and an <fo:list-item-body>. Depending on their level in the list structure, <fo:list-item-label> and <fo:list-item-body> elements get the following attribute sets, respectively: list-item-label and list-item-body, list-item-label-descendant and list-item-body-descendant, list-item-label-level3 and list-item-body-level3, list-item-label-appendix and list-item-body-appendix. In tables, <fo:list-item-label> and <fo:list-item-body> get attribute sets list-item-label-table and list-item-body-table, respectively. The page menu <fo:list-item> gets the attribute set pageMenu.
  • Matches <table>
    <table>s are transformed into <fo:table>. When the table has a class attribute contentTable, the <fo:table> element gets the contentTable attribute set.
  • Matches <thead>
    <thead>s are transformed into <fo:table-header> elements, with attribute set table-head-td.
  • Matches <tbody>
    <tbody> elements are transformed into <fo:table-body> elements.
  • Matches <tr>
    <tr> elements are transformed into <fo:table-row> elements.
  • Matches <td>
    <td> elements are transformed into <fo:table-cell> elements. <td> elements with parents or ancestor <thead> elements acquire the attribute set table-head-td; if the <td> element has an ancestor <table> that has attribute class contentTable, the <fo:table-cell> gets the attribute set contentTable-td; otherwise, <fo:table-cell> elements get the table-cell attribute set.
  • Matches <a>
    <a> elements are transformed into <fo:basic-links>, with attributes: @id(if applicable), @external-destination or @internal-destination for the value of the @href attribute of the <a> element, and color. If <href> ends with .htm, or contains http, jpg, mp3, the <fo:basic-link> gets an external destination attribute. The <fo:basic-link> also gets an @external-destination attribute and other appropriate styling attributes, if the <a> element has a @class attribute that is not noteMarker nor returnFromNote nor local. Otherwise, <fo:basic-link> gets an @internal-destination attribute.
  • Matches <a>[@href[starts-with(., ‘#’)]][not(@class= ‘pilcrow’)]
    This template is responsible for links that refer to ids, mostly with internal references. We use the variable $thisid to identify the @id of the current <a> element. If $thisid is not stated anywhere else in the document, then the fo:basic-link will have an @external-destination, with https://mapoflondon.uvic.ca/$thisid.htm. Otherwise, the <fo:basic-link> will have an internal destination, without the #, and with the appropriate attribute sets as per the class attributes.
  • Matches <p>
    <p> elements become <fo:blocks>.
  • Matches <span>
    <span> elements become <fo:inline> elements.
  • Matches <strong>
    <strong> elements become <fo:block> elements when they have a parent element <li>, and <fo:inline> elements otherwise.
  • Matches <pre>
    <pre> elements become <fo:block> elements when they have a parent element <li>, and <fo:inline> elements otherwise.
  • Matches <q>
    <q> elements become <fo:block> elements when they have a parent element <li>, and <fo:inline> elements otherwise.
  • Matches <blockquote>
    <blockquote> elements become <fo:block> elements with the blockquote attribute set.
  • Matches <code>
    <code> elements are transformed into <fo:inline> elements with the code attribute set.
  • Matches <img>
    <img> elements are transformed into <fo:external-graphic> elements. They all have the images attribute set, and when appropriate, they have an additional attribute set that corresponds to their appropriate class, including acknowledgementImg and socialMediaImg.
  • Matches <figure>
    <figure> elements are transformed into <fo:block> elements.
  • Matches <figcaption>
    <ficgaption> elements are transformed into <fo:block> elements. When the <figcaption> element contains the strings horizontal rule or Printer’s ornament, the <fo:block> element gets the attribute set figcaption_special; otherwise it gets the attribute set figcaption.
  • Matches <h1>
    <h1> elements are transformed into <fo:block> elements. When <h1> has a child <span> that has a @class attribute titlePart, it gets both attribute sets h1 and h1TitlePart, otherwise it only get the attribute set h1.
  • Matches <h2>
    <h2> elements are transformed into <fo:block> elements. When they are appendix headers, they get the attribute set appendixH2, otherwise they get the attribute set h2.
  • Matches <h3>
    <h3> elements are transformed into <fo:block> elements. When they are appendix headers, they get the attribute set appendixH3; when they are appendix list headers, they get the attribute set appendixListH3, otherwise they get the attribute set h3.
  • Matches <h4>
    <h4> elements are transformed into <fo:block> elements. When they are appendix headers, they get the attribute set appendixH4, otherwise they get the attribute set h4.
  • Matches <br>
    <br> elements are transformed into <fo:block> elements.
  • Matches <hr>
    [Primary Source Element] <hr> elements are transformed into <fo:leader> elements, with @id attributes when appropriate.
This transformation is also responsible for removing the following components from the document: the top banner, See XML, More Info, blackletter typeface and toggle, script elements, Send Feedback, the footer menu, the info popup, document mentions, person’s contributions, person’s mentions, the citation header, facsimile figures, links to agas.css and agas_embedded.css from the header, pilcrow (¶) links, and social media logos. It also replaces lightbox.css with nav.css, rewrites some links as necessary, renames Personography into Contributors, rearranges appendix lists (historical persons and variant spellings), and sorts the personography alphabetically.
Note that there are other removals that happen through add_special_styles_to_fo_master.xsl.

xhtml5_to_fo_styles_module.xsl

This XSLT module contains the styling and layout data for the XHTML5 to XSL:FO transformation, which turns MoEML XHTML5 static site pages into PDFs. We will set up the pages initially so recto and verso have slightly different margins, to allow for binding along the long edge. We may decide to eliminate this distinction at some point. This module works with the special one created for the particular files being transformed. It contains all the master attribute sets needed, which have been mostly inspired from the various MoEML site CSS files. The attribute sets in this document do not include attributes or values that do not agree with XSL:FO or FOP. The structure is straight-forward: the <xsl:stylesheet> element contains all <xsl:attribute-set> elements that must have a @name attribute. These elements in turn contain <xsl:attribute> elements, which also must have a @name attribute and a value.

Cite this page

MLA citation

El Hajj, Tracey. MoEML’s PDF Developer Documentation. The Map of Early Modern London, Edition 6.6, edited by Janelle Jenstad, U of Victoria, 30 Jun. 2021, mapoflondon.uvic.ca/edition/6.6/pdfDev_about.htm. Draft.

Chicago citation

El Hajj, Tracey. MoEML’s PDF Developer Documentation. The Map of Early Modern London, Edition 6.6. Ed. Janelle Jenstad. Victoria: University of Victoria. Accessed June 30, 2021. mapoflondon.uvic.ca/edition/6.6/pdfDev_about.htm. Draft.

APA citation

El Hajj, T. 2021. MoEML’s PDF Developer Documentation. In J. Jenstad (Ed), The Map of Early Modern London (Edition 6.6). Victoria: University of Victoria. Retrieved from https://mapoflondon.uvic.ca/editions/6.6/pdfDev_about.htm. Draft.

RIS file (for RefMan, RefWorks, EndNote etc.)

Provider: University of Victoria
Database: The Map of Early Modern London
Content: text/plain; charset="utf-8"

TY  - ELEC
A1  - El Hajj, Tracey
ED  - Jenstad, Janelle
T1  - MoEML’s PDF Developer Documentation
T2  - The Map of Early Modern London
ET  - 6.6
PY  - 2021
DA  - 2021/06/30
CY  - Victoria
PB  - University of Victoria
LA  - English
UR  - https://mapoflondon.uvic.ca/edition/6.6/pdfDev_about.htm
UR  - https://mapoflondon.uvic.ca/edition/6.6/xml/standalone/pdfDev_about.xml
TY  - UNP
ER  - 

TEI citation

<bibl type="mla"><author><name ref="#ELHA1"><surname>El Hajj</surname>, <forename>Tracey</forename></name></author>. <title level="a">MoEML’s PDF Developer Documentation</title>. <title level="m">The Map of Early Modern London</title>, Edition <edition>6.6</edition>, edited by <editor><name ref="#JENS1"><forename>Janelle</forename> <surname>Jenstad</surname></name></editor>, <publisher>U of Victoria</publisher>, <date when="2021-06-30">30 Jun. 2021</date>, <ref target="https://mapoflondon.uvic.ca/edition/6.6/pdfDev_about.htm">mapoflondon.uvic.ca/edition/6.6/pdfDev_about.htm</ref>. Draft.</bibl>

Personography