Copyright held by
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Further details of licences are available from our
Licences page. For more
information, contact the project director,
Born digital.
Most MoEML documents, or significant fragments with mol:
prefix and accessed through the web application
with their id + .xml
.
The molagas prefix points to the shape representation of a location on MoEML’s OpenLayers3-based rendering of the Agas Map.
Links to page-images in the Chadwyck-Healey
Links to page-images in the
The mdt (MoEML Document Type) prefix used on
The mdtlist (MoEML Document Type listing) prefix used in linking attributes points to a listings page constructed from a category in the central MDT taxonomy in the includes file. There are two variants, one with the plain _subcategories
, meaning all subcategories of the category.
The molgls (MoEML gloss) prefix used on
This molvariant prefix is used on
This molajax prefix is used on
The molstow prefix is used on
The molshows prefix is used on
The sb prefix is used on
Our editorial and encoding practices are documented in detail in the Praxis section of our website.
Martin Holmes, Lead Programmer on MoEML, is deeply committed to open-access projects and open documentation of those projects. He has led the way in making MoEML’s documentation and tagging freely available in a variety of XML forms (including TEI Lite XML). His CodeSharing Service takes open documentation to a new level. Now, MoEML users can search our complete project and see every instance of every TEI element, attribute, and value that we have added to MoEML texts. He presented a formal paper at the TEI Conference at Northwestern University in October 2014. With his permission, we share the complete abstract of this paper here (republished from the TEI 2014 site and lightly edited). His paper concludes with an invitation to comment on the tool. We hope that many other projects will adopt this tool, thus making visible the usually invisible labour and critical decisions entailed in tagging. (
Although the TEI Guidelines are full of helpful examples, and other initiatives such as self-taught or learned by doing
, and Dee (2014) reports that users need a source for a compendium of examples suitable for inductive learning
. Many projects now share their XML code, but that in itself is only marginally helpful. It can take substantial time to sift through the XML code in a large project to find what you’re looking for.
This talk presents a simple specification for an Application Programming Interface, along with a sample implementation written in XQuery and designed for the eXist XML database, providing straightforward access both for applications and end-users to sample code from any TEI project. The API is modelled on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a mechanism designed to allow archival search tools to ingest metadata from repositories. The CodeSharing protocol and the sample implementation were first presented at the >Digital.Humanities@Oxford Summer School in July 2013, and a number of improvements have been made to the code and specification based on feedback from that presentation.
The CodeSharing proposal arises out of two separate but intersecting needs: those of novice encoders and project managers who are not really TEI experts, and those of people doing research into encoding practices on a large scale across multiple projects.
At the time of writing,
The MoEML encoders do have access to a lot of the existing codebase, but doing text searches of this is often ineffective. They are not normally familiar with XPath, XQuery, or regular expressions, and most will never learn them, so in searching for e.g. a <birth notBefore-custom
, and therefore miss all the
To serve these needs, we began to think about writing a simple search interface which would form part of our MoEML web application, and which would provide access to lots of examples of individual tags and attributes. This straightforward form-based interface enables our encoders to retrieve examples of encoding quickly and easily from across our text collection.
As I worked on the interface above, I also began to think about broader possibilities. In our work on the TEI Council, we frequently find ourselves asking:
We typically resort to posting questions to the TEI-L mailing list to ask for examples from the community, but this is a rather slow, hit-and-miss method of gathering data. If we could retrieve large numbers of sample usages of particular elements or attributes and analyse them, this would be very helpful in development of the
To answer these needs, I designed a web service that could be provided by large- and medium-scale encoding projects, enabling anyone to gather examples of their encoding practice directly from their data. I modelled my protocol on an existing, well-tested system: the Open Archives Initiative Protocol for Metadata Harvesting, or OAI-PMH, which I had previously implemented for another project.
OAI-PMH is commendably simple and well designed. A participating repository may be a data provider, which exposes structured metadata through a web service implementing the OAI-PMH API; or a service provider, which gathers that metadata through requests to the data providers. The service providers can then act as meta-repositories, or federated archives, providing search functionality that encompasses the collections of all the data providers who have exposed their metadata for harvesting. An example is OCLC’s WorldCat, which aggregates data from a large number of repositories and makes them searchable from a single interface. The OAI-PMH API is based on HTTP requests using GET or POST. It is designed to allow a harvester to find out what kinds of resources a repository has, and to gather full metadata records. All responses are in XML, and conform to a standard schema.
OAI-PMH is based around six core
A request like this one (made to the OAI-PMH interface provided by the Identify
ListMetadataFormats
ListIdentifiers
ListSets
ListRecords
GetRecord
http://bcgenesis.uvic.ca/oai.xq?verb=Identify
ListRecords
verb to retrieve individual records, which are typically in the form of Dublin Core elements embedded in a larger structure in the OAI namespace:
Most repositories will have thousands of records, and retrieving them all at once would place an unacceptable burden on the server and network infrastructure, so OAI-PMH has a built-in staging system. Records are supplied in batches, and each batch ends with a
The complete specification for the CodeSharing API is available at http://mapoflondon.uvic.ca/codesharing_protocol.xhtml, as part of the sample implementation on the MoEML site; in what follows I cover only some key aspects of it.
CodeSharing is an XML-based API provided over HTTP, just like OAI-PMH. On the model of OAI-PMH, it’s also based on a
The first four are designed to discover the nature of the repository, and what elements, attributes, and namespaces occur in its document collections. The final value is a request for actual example encodings; when that value is used, other key-value pairs provide details about what has been requested:
identify
listElements
listAttributes
listNamespaces
getExamples
Each parameter is used to specify what examples are being requested. These can obviously be combined, so a request for:
elementName
attributeName
attributeValue
elementName=hi&attributeName=rend
hi
elements which have attributeName=rend&value=italic
Two further parameters are available:
So a request for:
namespace
(the namespace for requested elements, defaulting to the TEI namespace)wrapped
(whether or not to return the parent containing the target element)elementName=hi&wrapped=true
Finally, we have to consider flow control, as in the case of OAI-PMH. It would be disastrous to attempt to honour a request for all of the
The provider service is not required to honour this request; it may decide to send fewer items. But it may be sensitive enough to know, for instance, that when the request is for a relatively small element such as maxItemsPerPage
(a positive integer)
What form should the server’s response take? The obvious answer is that it should be XML, and in fact that it should be TEI P5 XML. The exact format of the response document is only loosely specified, although some parts of it must follow certain rules. If the value of the verb
parameter is listElements
, for instance, then the body of the document must contain the list of all elements appearing in the collection as a list:
For returning actual examples, CodeSharing makes use of the
http://www.tei-c.org/ns/Examples
, and all the elements that are children of it, in the example code, are also by default in that namespace. This is useful, because it means that we can easily distinguish example code from other parts of the TEI file. (It also means we can use the API to retrieve examples of code In addition to the results of the query, the protocol specification also requires that the parameters of the original request be returned to the requestor; this means that the result document is a complete and self-contained record of the query and results. Full details are available in the protocol documentation.
A sample implementation of the CodeSharing protocol, including an HTML front-end, as shown in Figure 1, is available at http://mapoflondon.uvic.ca/codesharing.htm. It is written in XQuery 3.0 and runs in the eXist XML database which hosts the MoEML web application. The open-source code is available on SourceForge at https://sourceforge.net/projects/codesharing/, and includes these files:
I would welcome any contributions in the form of improvements, suggestions, and implementations in other languages.
codesharing.xql
(the XQuery implementation providing responses to queries in XML)codesharing_config.xql
(a simple settings file that tailors the service to your own project)codesharing.xsl
(a transformation which produces the HTML search page you see on the MoEML site)codesharing_protocol.xhtml
(a semi-formal description of the API)codesharing.odd
(an ODD file from which a schema can be generated to validate CodeSharing API responses)