General Encoding PracticesAuthorTye LandelsAuthorMartin HolmesAuthorCameron ButtCopy EditorCameron ButtEncoderCameron ButtEncoderTye LandelsData ManagerTye LandelsJunior ProgrammerJoey TakedaProgrammerMartin HolmesAssociate Project DirectorKim McLean-FianderProject DirectorJanelle JenstadThe Map of Early Modern Londonhttp://mapoflondon.uvic.ca/includes.xmlVictoria, BC, CanadaDepartment of EnglishP.O.Box 3070 STNC CSCUniversity of VictoriaVictoria, BCCanadaV8W 3W12016University of Victoria978-1-55058-519-3Janelle Jenstadlondon@uvic.ca
Copyright held by
The Map of Early Modern London on behalf of the contributors.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Further details of licences are available from our
Licences page. For more
information, contact the project director, Janelle Jenstad, for
specific information on the availability and licensing of content
found in files on this site.
Born digital.
Most MoEML documents, or significant fragments with xml:id attributes, can
be addressed using the mol: prefix and accessed through the web application
with their id + .xml.
The molagas prefix points to the shape representation of a location on
MoEML’s OpenLayers3-based
rendering of the Agas Map.
Links to page-images in the Chadwyck-Healey
Early English Books Online (EEBO)
repository. Note that this is a subscription service, and may not be accessible to those
accessing it from locations outside member institutions.
Links to page-images in the
English Broadside Ballad Archive (EBBA).
The mdt (MoEML Document Type) prefix used on catRef/target points
to a central taxonomy in the includes file.
The mdtlist (MoEML Document Type listing) prefix used in linking attributes points to a listings page constructed from a category in the central MDT taxonomy in the includes file. There are two variants, one with the plain xml:id of the category, meaning all documents in the specified category, and one with the suffix _subcategories, meaning all subcategories of the category.
The molgls (MoEML gloss) prefix used on term/corresp points
to a a glossary entry in the GLOSS1.xml file.
This molvariant prefix is used on ref/target attributes during automated
generation of gazetteer index files. It points to an element in the generated variant spellings
listing file which lists all documents which contain a particular spelling variant for a
location.
This molajax prefix is used on ref/target attributes during the static build
process, to specify links which point to MoEML resources which should not be loaded into the source
page during standalone processing; instead, these should be turned into links to the XML source
documents, and at HTML page load time, these should be turned into AJAX calls. This is to handle
the scenario in which a page such as an A-Z index of the whole site would end up containing
virtually the whole site inside itself.
The molstow prefix is used on facs attributes to link to the HCMC verison of the Stow facsimiles.
Usually the first group is the year (1633) and then last is the image number (0001).
The molshows prefix is used on facs attributes to link to the copies of page-images
from mayoral shows stored in the london account on the HCMC server.
The first group is the year (1633), the second is the source repository, and then last is the image
file name.
The sb prefix is used on ref/target attributes to link to
Stow’s Books URLs at UToronto.
Our editorial and encoding practices are documented in detail in the Praxis section of our website.
An abbreviated form of the. This character takes the form of a small latin letter y with a reversed hook above. The closest Unicode character we have to represent this is a small latin letter y with a combining left half ring above. This character appears only twice in the text, which is in black letter gothic.y͑yeþetheRemoved info/links for CodeSharing, not supported from version 6.4 onwards.Changed s/he and his/her to they and their respectively.Added sourceDesc information for born-digital documents.Added section on indexing praxis documents.Standardized respStmts for JENS1, MCFI1, and HOLM3 and added TAKE1 as Junior Programmer.Added note about putting ref tags outside of descriptive tags.Added section on encoding special characters.Added clarification regarding line-breaks and split-tags.Added XInclude for listPrefixDef in the header.Re-wrote and encoded section on tagging text styles. Removed section on CSS (now included in section on tagging text styles).Added section on hidden div elements and removed content on adding xml:ids to div elements.Added section on encoding tables.Added section on encoding non-standard characters.Added section on encoding spaces truthfully.Added global publicationStmt through XInclude.Removed section on using CodeSharing service (moved to praxis.xml) and added section on encoding decorative daisy.Put change elements inside revisionDesc into the correct (latest first) order.Added profileDesc containing document type information expressed in catRef elements.
Added link to encode_mayoral.xml in "More Encoding Practices section. Added xml:ids for the last to div elements.
Added section on using rendition.Converted @rend to @style, through XSLT
transformation. Created page and added content from
xml_encoding.xmlGeneral Encoding Practices
This manual contains TEI instructions for encoding situations that are common to both
born-digital documents and transcriptions of primary sources. The encoding instructions
for primary sources link to this manual when relevant. When in
doubt, always check with a senior member of the MoEML team.
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingdocument sections
Use the div Element
Whether encoding a primary source document or authoring a born-digital document, we
follow TEI practice by using div elements to divide distinct sections and
subsections of text from one another. In the case of born-digital documents, it is
important to assign an xml:id for each div element so that the
rendering system can automatically generate a table of contents for the document. The
xml:id assigned for a section or subsection div should take the form
of the xml:id for the whole TEI document, followed by an underscore (_) and
then a descriptive word for the section or subsection.
The born-digital document linking.xml serves as an
example for how to use div elements. This TEI document, which has been assigned
an xml:id of linking, consists of five sections, the last of which
contains three subsections. The author of this document uses div elements as
follows:
Introduction
Section content.
Link to External Web Pages
Section content.
Link to Other MoEML Pages
Section content.
Link to Youtube Videos
Section content.
Graphics
Section content.
Markup (Tag) and Pull Data from Databases
Section content.
Linking to Toponyms (Location Files)
Subsection content.
Linking to People in PERS1.xml
Subsection content.
Linking to Reference Material in BIBL1.xml
Subsection content.
Because the author has assigned xml:id attributes for each
div element in this example document, the rendering system will automatically
generate a table of contents for this document when it appears on the website. The table
of contents for linking.xml appears as follows when it is
rendered on the live website: Note the similarity between the structural hierarchy of div elements in
the XML code and the structural hierarchy of the table of contents.
Add Draft Content to a Published Page
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingpublication statusesstatuses of documentsdraft content
To add draft content to a published page, tag the draft content using the div element with a rend value of hidden. For example,
This is published content that is visible to the user.
This is draft content that is invisible to the user.
This is published content that is visible to the user.
Content tagged using the div element with a rend value of hidden does not appear on the rendered site nor in the document contents. It is, however, possible to see the hidden div on the rendered site by adding ?showDraft=true to the webpage’s url. Note that the hidden div will not appear in the document’s table of contents until the document is properly published.
For more information about document statuses, see documentation on revision descriptions.
It is important that encoders do not add extra spaces inside TEI tags. Note the extra space at the beginning of the ref tag in the following example:
the Hall of St. Helens Priory,
This line of XML code claims that the name of the priory begins with a space. It does not, of course, any more than it includes the trailing comma. Furthermore, should this code be uploaded to the site, it would output a hyperlink that begins with a space.
We defined editorial notes as notes written by MoEML authors, editors
and contributors. These are encoded using the note element, with
type=editorial. They will be rendered as clickable footnote
numbers in the text which cause a popup to appear containing the note; the
notes themselves are also rendered as a numbered list at the foot of the document.
Use the resp attribute to assign responsibility for the note using the
person’s xml:id. Make sure the person’s entry in the personography has an abbr
element inside persName containing their initials; these initials will then be appended
to the note.This is an example note written by Martin Holmes (HOLM3,
initials MDH). For example,
an ingenious
Say-Maister,I.e., assay-master.
with his Furnaces
Notes and marginal fragments that form part of the original text of a primary source document
are encoded slightly differently. In many cases, they are not in fact notes at all but marginal
labels that serve as finding aids for the reader. See Use the rendition Element and rendition Attribute to learn how to encode marginal notes.
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingforeign text
Encode Words and Phrases in a Language other than English
Mark foreign language strings with the element foreign and add the attribute
xml:lang=XX(X) where XX(X) is the two- or three-character code for the
language. Note that the content of the foreign tag must only contain a text string without mark-up (e.g., no p, title, or other tags).
MoEML follows the Internet Engineering Task Force guidelines,
whose Language
Subtag Registry is constructed based on the recommendations in BCP 47. In most cases, this means
that where the ISO Standard 639-1 provides a two-letter language code, that code is used,
but in the absence of a two-letter code, a three-letter code is chosen from ISO 639-2
(this conforms to the current practice outlined in the TEI Guidelines).
For example,
In the Gréeke a Cittie is tearmed ϖόλις.
CIties and well peopled places bee called Oppida, in
Latine
The following language codes occur frequently in MoEML’s early
modern texts:
Old English or Anglo-Saxon (ca. 450–1100): angLatin: laAncient Greek (–1453): grcModern Greek (1453–): elMiddle English (1100–1500): enmFrench: frMiddle French (ca. 1400–1600): frmItalian: itSpanish: es
Encode Special Characters
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingspecial charactersampersands (&)lesser-than characters (<)greater-than characters (>)straight quotation marks (")unicode
Some unicode characters that are integral to XML code cannot be used in a text string. There are only four such characters (i.e., &, <, >, and ") we use in our project. In order to use these special characters in a text string, you must declare them using specific codes as outlined in the following table. Note that these characters are prohibited by the MoEML Style Guide and therefore should only be used in primary source transcriptions or when otherwise absolutely necessary. Double quotation marks (") are rendered outwards using a variety of elements (title (with level=a), soCalled, quote) and thus should never be used explicitly, other than for demonstration purposes or primary-source transcriptions. The following table shows the proper encoding of these characters.
CharacterSymbolCodeExampleAmpersand&&
Janelle Jenstad, Kim McLean-Fianer, & Martin Holmes are MoEML’s project directors.
Lesser-than Character<<
The cost of a bible in early modern London was < twenty pennies.
Greater-than Character>>
The cost of a bible in early modern London was > five pennies.
Straight Quotation Mark""
Tye said this is how to encode straight quotation marks.
The TEI Consortium defines non-standard characters as characters not represented in the published repertoire of available characters [in Unicode] (5. Non-standard Characters and Glyph). Therefore, before encoding a non-standard character, always check to ensure that the Unicode Consortium has not already published encoding standards for the character.
The set of practices used to encode a non-standard character may be divided into two parts:
Adding a non-standard character metadata entry to the teiHeader of the document in which the non-standard character appears.Tagging a non-standard character in the text of the document and thereby linking the instance of the non-standard character to the character’s metadata entry in the teiHeader.
Declare Non-standard Characters in the teiHeaderencoding instructionsprimary source documentsborn-digital documentsgeneral encodingnon-standard charactersunicodemetadata
To encode a non-standard character, nest a charDecl element within the encodingDesc element in the teiHeader of the document. Next, nest a char element with an xml:id attribute inside the charDecl element. The value of the xml:id attribute should begin with the document’s xml:id followed by an underscore (_) and a simplified representation of the character being encoded. For example:
Each non-standard character in the document should correspond with an individual char element; if there are five non-standard characters in the document, there should be five individual char elements inside the charDecl element.
Within the char element, nest the following three elements:
desclocalPropmapping
Use the localProp element to tag the name of the character, borrowing form and terminology from the Unicode character database. For example:
Use the desc element to tag an extended description of the character. Your description should include the history of the form, variant forms of the glyph, and its relationship with similar typographical features or characters. For example:
An abbreviated form of the. This character takes the form of a small latin letter y with a reversed hook above.
The closest Unicode character we have to represent this is a small latin letter y with a combining left half ring above.
This character appears only twice in the text, which is in black letter gothic.
Note that, because there is very little published scholarship on early modern non-standard characters and glyphs, you should consult with the Project Director before writing an extended description of the character.
The localProp (with name=name element encodes the non-standard character’s name (i.e., what contemporary typographers call it). Then use another localProp element with name=entity to provide the entity value. For example:
Use mapping elements to tag and label the various forms in which the non-standard character may appear / has appeared. Each mapping element should have a corresponding type attribute with one of the following values:
Value
Explanation
standard
the character as it appears in the document being encoded
simplified
the simplified form of the standard character, without accents or ornamentation medieval
the medieval equivalent of the standard character
modern
the modern equivalent of the standard character
The following series of mapping elements serves as an example:
y͑yeþethe
Combined, the code for a non-standard character (char) entry looks like this:
An abbreviated form of the. This character takes the form of a small latin letter y with a reversed hook
above. The closest Unicode character we have to represent this is a small latin letter y with a combining left half ring above.
This character appears only twice in the text, which is in black letter gothic.y͑yeþethe
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingnon-standard charactersbody content
Tag Non-standard Characters in the text
Use the g element to tag the non-standard character in the document text. Add a ref attribute to the g element pointing to the xml:id of the character, as defined by the char element in the teiHeader. For example:
y͑
In some cases, a non-standard character functions as an abbreviation (e.g., characters inolving a breve [˘]). Markup such instances using the g element as described above, yet also include the choice and abbr elements per the instructions for encoding abbreviations. For example:
LondŏLondon
Encode Roman Numerals
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingroman numerals
To tag a roman numeral, use the num element with a type value of roman and a value attribute pointing to the arabic equivalent of the tagged roman numeral. For example: Henry VIII
Encode a Table
encoding instructionsprimary source documentsborn-digital documentsgeneral encodingtablescolumnsrows in a table
A table may be nested inside most elements in a born-digital document. To encode a table, use the table element with the rows and cols attributes. The value of the rows attribute specifies how many rows are in the table you are encoding. Likewise, the value of the cols attribute specifies how many columns are in the table you are encoding. For example, a table with five rows and two columns would be encoded thus:
Next, nest row elements inside the table element. The number of row elements should correspond with the number of rows in your table (specified by the row attribute attached to the table element). Each row element should have a role attribute with a value of either label or data. Use the role value of label to indicate that a row functions as a header (i.e., that its contents do not function as data but rather as descriptive labels of the data in other rows). Normally, the first row of a table will function as a header, so the first row element nested in a table element will have a role value of label. For example, a table with five rows, the first of which is a header, and two columns would be encoded thus:
Finally, nest cell elements inside each row element. The number of cell elements should correspond with the number of columns in the table (specified by the cols attribute attached to the table element). Therefore, if a table has two columns, there should be two cell elements inside each row element. Like the row element, each cell element must also have a role attribute with a value of either data or label. Generally speaking, the role value for a cell element should always match the role value of its parent element. For example, a table with five rows, the first of which is a header, and two columns would be further encoded thus:
Insert text content inside each cell element. You may markup the text content of each cell using most xml tags, such as ref and name. The text content will render in table form in accordance with the code structure inside the table element. You may also nest a head element above the first row element. Use the head element to tag a text string that functions as a title or other description for your table. Consider the following table:
Example Table
Label ALabel BData Point 1AData Point 1BData Point 2AData Point 2BData Point 3AData Point 3BData Point 3AData Point 3B
This table has been encoded thus:
Example Table
Label ALabel BData Point 1AData Point 1BData Point 2AData Point 2BData Point 3AData Point 3BData Point 3AData Point 3B
Note that MoEML’s style sheet does not support more complex tables at this time (e.g., tables with vertical header labels or tables with vertical and horizontal header labels). In most instances, you should be able to display data in a simple table form with a single row of header labels. If you must display data in a more complex table form, consult with the project’s lead programmer.
Use Split Tags to Represent Overlapping Hierarchies
encoding instructionsprimary source documentsgeneral encodingsplit tagsoverlapping hierarchiesinterrupted referring stringspage breaks
Occasionally, it may be necessary to split an interrupted referring string
into two or more tags. For example, it is possible for a page break, along with running
headers or footers, to interrupt a name tag in a transcribed text, as follows
(encoding simplified here from TRIU2.xml): Margaret, eldeſt
daughter to king Henrie there-vnited Britanniathe ſeauenth, to Iames the fourth king of
Scotland Here, the name Henrie the ſeauenth is
divided by the formwork and page break encoding. To include the formwork and page break
information as part of the name tag would be untruthful, so the tag must be
split. However, to tag Henrie and the
ſeauenth as two name tags would be equally misleading since it
erroneously suggests that each name part is actually a separate mention. The solution
shown above is to assign each name tag an xml:id in order to link them
with the next and prev attributes, which indicate that the two
separate tags are related.
In each name tag, add an xml:id attribute with a value that
follows the following formula: [xml:id of document]_[xml:id of person
mentioned]_[lowest possible unique integer (i.e., unique to the document)].
Then, add a next attribute to the
first name with a value that uses a pound sign (#) to point to the
xml:id of the second (next) name element. Finally, add a
prev attribute to the second name element with a value that
points to the first (previous) name element. These attributes link the two
separate tags as one.
When the split occurs at a genuine space in the referring string (as in the above example),
include the space in the tag (either at the end of the tag before the split or the beginning of the tag after the split).
Note that this is an exception to the ordinary rule that ref, name, and hi elements do not end or begin with spaces.
That the spaces are included in the case of split tags is especially important when tagging toponyms with ref because
without the manually entered space, the processor has no way of knowing that the toponym identified by the ref includes a space
at the split and therefore generates an erroneous variant toponym with no space in our gazetteer, such as LondonBridge for London Bridge.
Note that there is no reason to use split tags when a reference occurs across a line break, because a single ref tag can contain one or more self-closing lb element(s).
Index Praxis Documentation
encoding instructionsindex for praxispraxis index
When adding new documentation to praxis, always encode a list of index terms associated with the new documentation. To do this, insert an index element below the heading (head) for each new div. Add an indexName attribute with a value of documentation_manual to the index element. Nest a series of terms tagged with the term element inside the index element. For example,
New Praxis Documentation
Term 1Term 2Term 3Term 4
Documentation text.
Your list of index terms should be consistent with terms already used in the index, although it will likely be necessary to use new terms as well. All new terms should be lowercase and plural.
See
Applications for Encoders for information on using and regenerating the index file.
Tag an Interesting Snippetencoding instructionsprimary source documentsborn-digital documentsgeneral encodinginteresting snippetshomepage
MoEML’s v.6 website now displays interesting snippets on its homepage. Interesting snippets are short one- or two-sentence passages from MoEML library texts or encyclopedia articles that are in some way provocative, compelling, or humorous. Should you come across such a passage in your work as a MoEML encoder, we encourage you to tag it using the seg element with a type value of interestingSnippet and a unique xml:id. The following interesting snippet from
The Shoemaker’s Holiday by Thomas Dekker (SHOE1.xml) serves as an example:
The argument of the play I will set down in this epistle: Sir Hugh Lacy, Earl of Lincoln, had a young gentleman of his own name, his near kinsman, that loved the Lord Mayor’s daughter of London;
Note that the xml:id should be the document’s xml:id followed by an underscore (_) and a unique descriptor. Moreover, the text string inside the seg tag must be under 400 characters and be contained by a single block-level element such as a p.
Add the MoEML Decorative Daisy as a Block Element
encoding instructionsborn-digital documentsgeneral encodingornamentdecorative daisyMoEML daisyred daisy
It is possible to add the MoEML decorative daisy as a block element in between paragraphs as follows:
Note that we should be judicious in our use of the decorative daisy (i.e., only use it in born-digital, front-end pages).