Information for MoEML Programmers

This document is currently in draft. When it has been reviewed and proofed, it will be published on the site.

View the draft document.

Please note that it is not of publishable quality yet.

Information for MoEML Programmers

Introduction

This documentation provides basic information for programmers needing to work on the MoEML infrastructure and build process. It covers programming languages used, code organization, software requirements, the static build process, and tips and tricks for working more efficiently on the code.
The MoEML project consists of source data—a large collection of TEI-encoded XML files along with other resources such as images, stylesheets and scripts—along with a substantial codebase whose job is to check, validate and diagnose problems with the source data, and eventually to build it into a complete static website for deployment. The codebase is currently organized in a rather haphazard way, largely for historical reasons, and still includes many components from previous incarnations of the project which are no longer relevant. Figuring out which bits of the repository are responsible for which types of functionality can be difficult. This document should help to clarify some of these issues, pending a proper purge and reorganization of the codebase.

Organization of the SVN Repository

As mentioned above, the organization of the software repository is somewhat confusing for two reasons: first, the repo size is large, and we don’t want to force regular encoders to download the whole thing. This means the data component of the repo needs to be self-sufficient in some ways, so programming code and resources are included there which would be better placed elsewhere in a perfect world. Secondly, the project has gone through many phases in which different components were used (Cocoon, eXist, even PHP) so there are remnants of stuff which really should be reorganized.
These are the important areas for programmers:
  • backup_schemas contains copies of schemas which the build process would normally download from the web, to ensure we have the latest versions. When the download fails, these are used in order to allow the build to proceed.
  • db is confusingly named, because it was once part of an eXist XML db folder structure. It contains the following comonents:
    • agas contains the image files used in the Agas Map page.
    • data contains all the TEI XML and related files (images, binary documents) which form the intellectual content of the project. Because this is where encoders work, this folder also includes the project schemas (in rng, although the schema constraints are created in ODD and Schematron, and then build into RNG). This folder also includes a utilities folder with some important build components, in particular the diagnostics code and the schema build code.
    • redirects currently contains only one file, an XML file called redirects.xml where we specify how to handle ids (which are tied to URLs) which need to be retired. We specify where each retired id should be redirected to, so that pages do not simply disappear from the site when new versions are released.
    • site contains site components and resources that are edited by designers and programmers, used in building the website.
  • ise contains some old versions of Internet Shakespeare Edition plays which were part of an experiment to link between the two projects. The long-term status of this experiment is undecided; ignore this folder for the moment.
  • jenkins contains two components related to the build process for the project which runs on our Jenkins CI server. config.xml is the configuration file for that build; this should be updated whenever a change is made to the build configuration on the server. The other file, moeml_log_parse_rules.txt, is a set of rules which is used by Jenkins to determine whether a build has failed or succeeded. In the course of a normal build process, words such as error or warning may appear in the output from the process; normally these would cause the build to fail, but in some cases they are accidental (for instance, a filename may contain the word error, so this ruleset is used to refine the process to make sure it only fails when something is actually wrong.
  • obsolete is what you would expect: a place where we stash data and code files which are no longer needed.
  • presentations contains the materials for presentations made by project members that relate directly to MoEML.
  • static is the folder which contains all of the code used to build the current version of the site.
    • css has all the various CSS files used in the current XHTML5 version of the site.
    • exist contains code related to the version of the site which was hosted in the eXist XML database; this was used for version 6.3, but from version 6.4 onwards we have moved to a completely static version which does not require a backend database, so this code will eventually be moved to the obsolete folder.
    • externals is a folder which is configured to bring in some XQuery code from other repositories, used in the eXist version of the site. This will eventually be removed.
    • fonts contains all the web fonts used in the current version of the site, and in the PDF versions of the Mayoral Shows.
    • fopConfig contains a configuration file for the FOP PDF processor which is used for generating the PDF versions of Mayoral Shows.
    • js contains a variety of JavaScript libraries used in the static website.
    • ssExtras contains two files which are used as part of the staticSearch component of the site build (described at length below).
    • xsl contains all the XSLT code used in our current build processes, to create the website and the Mayoral Show PDFs. This will be described in detail below.
    This folder also contains a number of Ant build files which control the various build processes. You may also notice other files and folders inside static which are not part of the svn repository; examples are site and staticSearch. These are created during the build process.
  • utilities contains a range of libraries and code modules some of which are essential for everything (e.g. the Saxon XSLT processor) and some of which are one-off transformations used to fix problems. Many of these files are obsolete and a cleanup of this folder is long overdue.
  • workshops contains materials used for teaching workshops for RAs on specific topics such as regular expressions and XPath.

Software Requirements

This is a list of software that is required for running the various build processes. Some of it is actually stored in the repository, and some must be installed on the machine doing the build.

Software Included in the Repository

The following software is stored in the SVN repository, so does not need to be installed locally:
  • Saxon XSLT processor (saxon-he-10.jar)
  • Schematron library for Ant (ant-schematron-2010-04-14.jar)
  • The W3C HTML validator (vnu.jar)
  • The Jing RELAXNG validator (jing.jar)

Software to be Installed Locally

To run the various MoEML build processes, you will need the following software to be installed on your machine. At present most of the build processes have to be run on *NIX systems because they depend on command-line utilities. If you are forced to use Windows, you’ll probably have to install the Windows Subsystem for Linux. For running specific components of the build, you may not need all of these applications or libs.
  • Java
  • Ant
  • ant-contrib
  • linkchecker
  • jsonlint
  • xmllint
  • svn
  • git
  • zip
  • pdftoppm
  • sensible-browser

MoEML’s Build Processes

The project has two distinct build processes:
  • The extended validation build (run by build.xml in the project root folder)
  • The static site build (run by build.xml in the static folder)

The Extended Validation Build

The extended validation build is designed to provide a range of extra checks to be carried out before bothering to build the website. It is controlled by the Ant build.xml file in the project root folder. It checks that:
  • all XML documents are valid (with Schematron and RELAXNG)
  • TEI code in all egXMLs in praxis is valid
  • all inline CSS is valid
  • all internal links point to something real
  • there are no duplicate ids
It also runs the project diagnostics, to find problems which are not build-breaking but will require attention.

The Static Site Build

If all checks in the Extended Validation Build have completed successfully, then Jenkins will run the static site build. This is controlled by the Ant build.xml file in the static subfolder.
This is a long and complex process, and it takes a long time to complete. Programmers working on the project need to understand it well so that they can run subcomponents of the build process in order to reproduce build errors rapidly and fix them efficiently.

Cite this page

MLA citation

Holmes, Martin D. Information for MoEML Programmers. The Map of Early Modern London, edited by Janelle Jenstad, U of Victoria, 26 Jun. 2020, mapoflondon.uvic.ca/for_programmers.htm.

Chicago citation

Holmes, Martin D. Information for MoEML Programmers. The Map of Early Modern London. Ed. Janelle Jenstad. Victoria: University of Victoria. Accessed June 26, 2020. https://mapoflondon.uvic.ca/for_programmers.htm.

APA citation

Holmes, M. D. 2020. Information for MoEML Programmers. In J. Jenstad (Ed), The Map of Early Modern London. Victoria: University of Victoria. Retrieved from https://mapoflondon.uvic.ca/for_programmers.htm.

RIS file (for RefMan, EndNote etc.)

Provider: University of Victoria
Database: The Map of Early Modern London
Content: text/plain; charset="utf-8"

TY  - ELEC
A1  - Holmes, Martin
ED  - Jenstad, Janelle
T1  - Information for MoEML Programmers
T2  - The Map of Early Modern London
PY  - 2020
DA  - 2020/06/26
CY  - Victoria
PB  - University of Victoria
LA  - English
UR  - https://mapoflondon.uvic.ca/for_programmers.htm
UR  - https://mapoflondon.uvic.ca/xml/standalone/for_programmers.xml
ER  - 

RefWorks

RT Web Page
SR Electronic(1)
A1 Holmes, Martin
A6 Jenstad, Janelle
T1 Information for MoEML Programmers
T2 The Map of Early Modern London
WP 2020
FD 2020/06/26
RD 2020/06/26
PP Victoria
PB University of Victoria
LA English
OL English
LK https://mapoflondon.uvic.ca/for_programmers.htm

TEI citation

<bibl type="mla"><author><name ref="#HOLM3"><surname>Holmes</surname>, <forename>Martin</forename> <forename>D.</forename></name></author> <title level="a">Information for <title level="m">MoEML</title> Programmers</title>. <title level="m">The Map of Early Modern London</title>, edited by <editor><name ref="#JENS1"><forename>Janelle</forename> <surname>Jenstad</surname></name></editor>, <publisher>U of Victoria</publisher>, <date when="2020-06-26">26 Jun. 2020</date>, <ref target="https://mapoflondon.uvic.ca/for_programmers.htm">mapoflondon.uvic.ca/for_programmers.htm</ref>.</bibl>

Personography