Copyright held by
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Further details of licences are available from our
Licences page. For more
information, contact the project director,
Provider: University of Victoria
Database: The Map of Early Modern London
Content: text/plain; charset="utf-8"
TY - ELEC
A1 - Holmes, Martin
ED - Jenstad, Janelle
T1 - Program with MoEML
T2 - The Map of Early Modern London
ET - 7.0
PY - 2022
DA - 2022/05/05
CY - Victoria
PB - University of Victoria
LA - English
UR - https://mapoflondon.uvic.ca/edition/7.0/for_programmers.htm
UR - https://mapoflondon.uvic.ca/edition/7.0/xml/standalone/for_programmers.xml
ER -
Programmer, 2018-present. Junior Programmer, 2015-2017. Research Assistant, 2014-2017. Joey Takeda was a graduate student at the University of British Columbia in the Department of English (Science and Technology research stream). He completed his BA honours in English (with a minor in Women’s Studies) at the University of Victoria in 2016. His primary research interests included diasporic and indigenous Canadian and American literature, critical theory, cultural studies, and the digital humanities.
Data Manager, 2015-2016. Research Assistant, 2013-2015. Tye completed his undergraduate honours degree in English at the University of Victoria in 2015.
Research Assistant, 2012–2013. Cameron Butt completed his undergraduate honours degree in English at the University of Victoria in 2013. He minored in French and has a keen interest in Shakespeare, film, media studies, popular culture, and the geohumanities.
Research Assistant, 2012-2014. MoEML Research Affiliate. Sarah Milligan completed her MA at the University of Victoria in 2012 on the invalid persona in Elizabeth Barrett Browning’s
Director of Pedagogy and Outreach, 2015–2020. Associate Project Director, 2015. Assistant Project Director, 2013-2014. MoEML Research Fellow, 2013. Kim McLean-Fiander comes to
Janelle Jenstad is Associate Professor of English at the University of Victoria, Director of
Programmer at the University of Victoria Humanities Computing and Media Centre (HCMC). Martin ported the MOL project from its original PHP incarnation to a pure eXist database implementation in the fall of 2011. Since then, he has been lead programmer on the project and has also been responsible for maintaining the project schemas. He was a co-applicant on MoEML’s 2012 SSHRC Insight Grant.
Our editorial and encoding practices are documented in detail in the Praxis section of our website.
This documentation provides basic information for programmers needing to work on the MoEML infrastructure and build process. It covers programming languages used, code organization, software requirements, the static build process, and tips and tricks for working more efficiently on the code.
The MoEML project consists of source data—a large collection of TEI-encoded XML files along with other resources such as images, stylesheets and scripts—along with a substantial codebase whose job is to check, validate and diagnose problems with the source data, and eventually to build it into a complete static website for deployment. The codebase is currently organized in a rather haphazard way, largely for historical reasons, and still includes many components from previous incarnations of the project which are no longer relevant. Figuring out which bits of the repository are responsible for which types of functionality can be difficult. This document should help to clarify some of these issues, pending a proper purge and reorganization of the codebase.
As mentioned above, the organization of the software repository is somewhat confusing for two reasons: first, the repo size is large, and we don’t want to force regular encoders to download the whole thing. This means the data component of the repo needs to be self-sufficient in some ways, so programming code and resources are included there which would be better placed elsewhere in a perfect world. Secondly, the project has gone through many phases in which different components were used (Cocoon, eXist, even PHP) so there are remnants of stuff which really should be reorganized.
These are the important areas for programmers:
backup_schemas
contains copies of schemas
which the build process would normally download from the web,
to ensure we have the latest versions. When the download fails,
these are used in order to allow the build to proceed.
db
is confusingly named, because it
was once part of an eXist XML db folder structure. It
contains the following components:
agas
contains the image files used
in the Agas Map page.data
contains all the TEI XML and
related files (images, binary documents) which form
the intellectual content of the project. Because this
is where encoders work, this folder also includes the
project schemas (in rng
, although the schema
constraints are created in ODD and Schematron, and then
build into RNG). This folder also includes a
utilities
folder with some important
build components, in particular the diagnostics code and
the schema build code.redirects
currently contains only one
file, an XML file called redirects.xml
where
we specify how to handle ids (which are tied to URLs) which
need to be retired. We specify where each retired id should
be redirected to, so that pages do not simply disappear
from the site when new versions are released.site
contains site components and
resources that are edited by designers and programmers,
used in building the website.ise
contains some old versions of Internet Shakespeare
Edition plays which were part of an experiment to link between the
two projects. The long-term status of this experiment is undecided;
ignore this folder for the moment.jenkins
contains two components related to the build
process for the project which runs on our Jenkins CI server. config.xml
is the configuration file for that build; this should be updated whenever
a change is made to the build configuration on the server. The other file,
moeml_log_parse_rules.txt
, is a set of rules which is used
by Jenkins to determine whether a build has failed or succeeded. In the
course of a normal build process, words such as obsolete
is what you would expect: a place where we
stash data and code files which are no longer needed.presentations
contains the materials for presentations
made by project members that relate directly to MoEML.static
is the folder which contains all of the
code used to build the current version of the site.
css
has all the various CSS files used in the current XHTML5
version of the site.exist
contains code related to the version of the site
which was hosted in the eXist XML database; this was used for version
6.3, but from version 6.4 onwards we have moved to a completely static
version which does not require a backend database, so this code will
eventually be moved to the obsolete
folder.externals
is a folder which is configured to bring
in some XQuery code from other repositories, used in the eXist version
of the site. This will eventually be removed.fonts
contains all the web fonts used in the current
version of the site, and in the PDF versions of the Mayoral Shows.fopConfig
contains a configuration file for the
FOP PDF processor
which is used for generating the PDF versions of Mayoral Shows.js
contains a variety of JavaScript libraries used
in the static website.ssExtras
contains two files which are used as part
of the staticSearch component of the site build (described at length
below). xsl
contains all the XSLT code used in our current
build processes, to create the website and the Mayoral Show PDFs. This
will be described in detail below.static
which are not part of the svn repository; examples
are site
and staticSearch
. These are created
during the build process.
utilities
contains a range of libraries and code modules
some of which are essential for everything (e.g. the Saxon XSLT processor) and
some of which are one-off transformations used to fix problems. Many of these
files are obsolete and a cleanup of this folder is long overdue.workshops
contains materials used for teaching workshops
for RAs on specific topics such as regular expressions and XPath.This is a list of software that is required for running the various build processes. Some of it is actually stored in the repository, and some must be installed on the machine doing the build.
The following software is stored in the SVN repository, so does not need to be installed locally:
To run the various MoEML build processes, you will need the following software to be installed on your machine. At present most of the build processes have to be run on *NIX systems because they depend on command-line utilities. If you are forced to use Windows, you’ll probably have to install the Windows Subsystem for Linux. For running specific components of the build, you may not need all of these applications or libs.
The project has two distinct build processes:
build.xml
in the project root folder)build.xml
in the
static
folder)
The extended validation build is designed to provide a range
of extra checks to be carried out before bothering to build the
website. It is controlled by the Ant build.xml
file in the project root folder. It checks that:
It also runs the project diagnostics, to find problems which
are not build-breaking but will require attention.
RELAXNG and Schematron validation are vital components of MoEML’s quality control process, but they aren’t sufficient to find all of the issues we need to avoid. The project diagnostics provide a second level of checking and testing. See Holmes and Takeda’s 2019 article
Running the diagnostics is simple. In the root directory of
the MoEML project, type:
products/diagnostics
, which you can open in your
browser. This contains the results of all the tests and checks
performed. Our Jenkins CI server runs the diagnostics as part
of every build, and serves the results for everyone to use. If
you are working through problems raised in the diagnostics, and
you want to check whether your fixes have been successful, you
can run a local build of the diagnostics as specified above to
get quicker feedback than waiting for the whole build process
to complete on Jenkins.
If all checks in the Extended Validation Build have completed
successfully, then Jenkins will run the static site build. This
is controlled by the Ant build.xml
file in the
static
subfolder.
This is a long and complex process, and it takes a long time to complete. Programmers working on the project need to understand it well so that they can run subcomponents of the build process in order to reproduce build errors rapidly and fix them efficiently.
This is the list of tasks that run, in sequence, as part
of the static build (warning: may change; check build.xml
to get the precise details).
You can see the full set of tasks and subtasks that are available by typing
clean
: Delete products and by-products of previous buildsgetSvnInfo
: Get the latest svn version to use in footers etc.getStaticSearchCode
: download the latest version of the staticSearch codebase from its GitHub repositorycreateXslCaptions
: Process the boilerplate.xml
file to create an XSLT resource containing the captions, to be used when building the site.createBinaryDocList
: create a text file listing all binary documents (PDFs etc.) from the repository which are actually linked on the site, so that we copy only those documents to the output.createImageLists
: Create a text file listing all images from the repository which are actually used on the site.copySiteAncillaryFiles
: Copy CSS, JavaScript and
other static files from the static/ folder to the output
site/folder.extractSchematron
: Extract the Schematron ruleset from the tei_all
RelaxNG schema, so that it can be used for validation.copyBinaryDocs
: Copy required binary documents to the output folder.copyImages
: Copy required images to the output folder.createImportXsl
: Do some preprocessing to handle cases where MoEML uses its own custom mol-import processing instruction to create composite documents.applyImportXsl
: Finish processing the mol-import cases started in the preceding step.createOriginalXml
: Transform the source XML in db/data to create the more normalized and standardized version we publish as createGeneratedContent
: Create a set of additional TEI XML files mechanically constructed from existing data (document category lists, etc.), and some JSON.validateOriginalXml
: Validate the createStandaloneXml
: Transform the resolveStyleSelectors
: Process any rationalizeStyleAttributes
: Process all inline style attributes to make them into validateStandaloneXml
: Validate the createAjaxFragments
: Create versions of core entities (people, places, etc.) in the form of XHTML5 div elements which can be retrieved by AJAX when clicking on a site link. These fragments are also used later in the build process to generate the XHTML pages for entities from BIBL1, PERS1, and ORGS1.createStandardXml
: Process the createSimpleXml
: Process the createLiteXml
: Process the createXhtmlDocs
: Process all the copyAgasMapTiles
: Copy the collection of createAgasMapXhtml
: Create the actual page for the Agas Map in the site folder.validateXhtmlDocs
: Validate the entire constructed site using the W3C VNU validator (which also checks CSS).buildStaticSearch
: Run the staticSearch build/indexing process to create the search page and the JSON and other resource files which support it.createTxtList
: Create a list of primary source published documents that will be converted into text files to enable users to do text-analysis.createTxtFiles
: Process the list of files from the previous step to create plain text versions.ant
and pressing the tab key twice will in the static
folder.
The complete static build process takes hours. If you’re working on fixing a build problem and you need to test your changes, it is obviously not practical to run the entire build process and wait to see the results. However, in most cases, you don’t need to. Here are a number of examples of how you can run only a small component of the build process to test specific changes.
IMPORTANT NOTE: In most cases, you must have an existing completed build in
place before you can successfully run partial builds. That means that once in
a while, you will need to run a complete local build for yourself. You can of
course do that over lunch or overnight. Another alternative is to run this:
ant -f getBuiltSiteFromJenkins.xml
Once you have a full completed build available locally, you can start running
only the part of the build that you are interested in. For example, if you are trying
to work on a problem that relates to the generation of the
site/xml/original
folder.
If you’re working on something more substantial that requires several steps, you can just chain them together as appropriate. Make sure you run them in the order they’re shown in the long list above, because each process may depend on the output from a preceding process.
Another useful approach to rapid building is to process only a
specific subset of documents. For example, imagine that you are
dealing with an HTML problem that affects lots of documents, but you know
that one particular document (ABCH1) exemplifies the issue, and can
be used as a test. You can run this:
Finally, there is a specific target named static/site
folder before this will work
properly.
You can even do the same with entities such as people, places and
bibliography items, but you will need to build both the item itself
and the containing XML file. So to rebuild the Clothworkers’ Company
(CLOT2) page, you could run:
The various strategies described above provide the basis for a programmer to work efficiently on solving a specific problem or adding a specific feature without having to wait for long periods to see the results of changes. If you triage the issue you’re working on carefully, you’ll be able to break it down into small steps, and identify a specific subset of documents which can be used for testing, then develop and test your changes carefully, so that when you do commit changes to the repository, it’s much less likely that the full build will fail because of something you did.