Quickstart: Introduction to Markup

Introduction

Unlike past generations of editors, we are producing texts that must be readable by machines before they are rendered and made readable by humans. Therefore, virtually every editorial choice must be tagged in such a way that a computer can both interpret it and display it (render it) in an interface. We call this level of machine-readable information markup.
Markup has the additional advantage of allowing us to process a marked-up text in many different ways:
  • We can change how we render it.
  • We can give readers options (e.g., to turn things on or off, to change the font, to display the long S or convert them all to short s).
  • We can transform the marked-up text into many different types of outputs (e.g., HTML pages for display on a website, PDFs, ePubs, etc.)
  • We can index it, link to it, generate concordances from it, count things in it (words, lines, etc.), search it, and store it for long-term digital archiving.
The effort you put into markup makes your text extraordinarily valuable for a wide variety users because it can be used for many diverse purposes.

What is Markup?

Markup is information added to a text in order to say something about the text. As a skilled reader of texts, you already have an incipient understanding of textual markup. White space, paragraph breaks, italicization, punctuation, capitalization, square brackets, and other features of a printed text are all forms of markup that signal something to the reader. For example, we sometimes recognize poetry in early modern texts because it is (often) italicized. The early modern printer set poetry in italics to say something about the text. Are the italics part of the text or are they saying something about the text? That’s where print markup gets murky—computers need much greater clarity than we need as human readers.

Terminology

Tagging, marking up, and encoding are interchangeable gerunds.
The information added to a text is markup.
When we add markup to a text, we mark up the text.
You will also see markup spelled mark-up or mark up.

MoEML’s Markup Language

MoEML uses a markup language known as TEI-XML. It is a dialect of XML devised by the Text Encoding Initiative (thus, the acronym TEI), a consortium of people who came together to devise a markup language specifically for text-bearing objects (manuscripts, books, documents). XML stands for eXtensible Markup Language. It is not a single language, but a set of standards for writing XML languages. The standard was published in 1996 by the World Wide Web Consortium. It was designed to replace SGML.

Elements, Attributes, and Values

What does markup look like? Let’s start with an example using italics. Italics can indicate many different things:
  1. Do you really want to know the truth?
  2. Do you know what the word palimpsest means?
  3. Stow is the author of the 1598 Survey of London.
  4. This streete is also a part of Limestreete warde.
  5. In the anno mundi calendar, the year 1 is the year the world was created.
A human reader can read ambiguous markup. When a human sees italics, they can infer their meaning through contextual clues. A computer, however, can not. As encoders, we must specify what italics mean in each given scenario:
  1. Emphasis: Do you <emph>really</emph> want to know the truth?
  2. Words as words: Do you know what the word <term>palimpsest</term> means?
  3. Titles: Stow is the author of the 1598 <title>Survey of London</title>.
  4. Names: This streete is also a part of <placeName>Limestreete warde</placename>.
  5. Foreign words: In the <foreign>anno mundi</foreign>, the year 1 is the year the world was created.
These are all examples of descriptive markup.
An element is the tag that wraps an item in the text:
<title>Survey of London</title>
You can think of an element like a noun because it describes what something is. Here, Survey of London is the text node. The text node is the thing you add markup to. When marking up a text node, it must be wrapped in both an opening tag (<title>) and a closing tag (</title>).
But what if you want to specify what kind of title you have? You can add an attribute and value to your element tag:
<title level="m">Survey of London</title>
In this example, the attribute is @level and the value is "m". You can think about attributes as big categories—they are not specific until you add a value. In this example, @level asks what type of title is Survey of London? and "m" answers, it’s a monograph!
To refresh:
  1. The element describes what the text node is (i.e., <title>).
  2. The attribute is a category on the element (i.e., @level).
  3. The value specifies the attribute (i.e., "m").
As you can see, in Oxygen, elements, attributes, and values are different colours. Note that attributes and values are only added to the opening tag—the closing tag does not repeat them. It is also important to note that elements can have more than one attribute:
<title level="m" when="1603">Survey of London</title>
In some cases, an element can be self closing. A common example is the element <lb> (line beginning), which is explained in depth here.
While particular elements, attributes, and values can vary depending on the XML language, the structure of an XML element is always the same.

Exercises in Oxygen

Now that you have an idea how elements, attribute, and values can be used to mark up a text, it is time to open Oxygen and try some encoding. If you have not downloaded Oxygen, go here.
Once in Oxygen, follow these steps:
  1. Click File in the top toolbar and choose New
  2. Choose XML Document and click Create
Now you should have an empty XML document. The first thing you will need to do is create an element that all of your text will be nested within. Our documents at MoEML all use the element <TEI>. In your empty document, type <TEI>. Notice that once you type the opening tag, Oxygen automatically creates the closing tag:
<TEI>
</TEI>
Now lets nest a <name> element in the <TEI> element:
<TEI>
  <name></name>
</TEI>
What happens when you delete the closing <name> element? The squiggly red line tells you that there is an error in your encoding. Notice that Oxygen also tells us that the element type <name> must be terminated by the matching end-tag </name>.
Once the closing </name> has been restored, type your full name within the <name> element:
<TEI>
  <name>Kathryn Reese LeBere</name>
</TEI>
What if you wanted to tag each part of your name? If you highlight some text (i.e., Kathryn) and Ctrl+E/Command+E, a textbox will appear. If you write the tag in the textbox and click Ok/press Enter, the element will be automatically made by Oxygen and surround the text:
<TEI>
  <name>
    <firstName>Kathryn</firstName>
    <middleName>Reese</middleName>
    <lastName>LeBere</lastName>
  </name>
</TEI>
The Ctrl+E/Command+E will save you a lot of time when encoding.
Now let’s try adding attributes and values. Write a sentence and tag the parts of the sentence with different elements of your choosing:
<TEI> The <adjective>new</adjective> <noun>Research Assistant</noun> <verb>encoded</verb> the <noun>text</noun>. </TEI>
Now try specifying an element with an attribute and value. Remember that attributes are like categories and values specify the categories:
<TEI> The <adjective>new</adjective> <noun type="person">Research Assistant</noun> <verb>encoded</verb> the <noun type="thing">text</noun>. </TEI>
Keep experimenting until you feel comfortable! The more you encode, the more natural it will become.

Cite this page

MLA citation

LeBere, Kate. Quickstart: Introduction to Markup. The Map of Early Modern London, Edition 6.6, edited by Janelle Jenstad, U of Victoria, 30 Jun. 2021, mapoflondon.uvic.ca/edition/6.6/quickstart_markup.htm.

Chicago citation

LeBere, Kate. Quickstart: Introduction to Markup. The Map of Early Modern London, Edition 6.6. Ed. Janelle Jenstad. Victoria: University of Victoria. Accessed June 30, 2021. mapoflondon.uvic.ca/edition/6.6/quickstart_markup.htm.

APA citation

LeBere, K. 2021. Quickstart: Introduction to Markup. In J. Jenstad (Ed), The Map of Early Modern London (Edition 6.6). Victoria: University of Victoria. Retrieved from https://mapoflondon.uvic.ca/editions/6.6/quickstart_markup.htm.

RIS file (for RefMan, RefWorks, EndNote etc.)

Provider: University of Victoria
Database: The Map of Early Modern London
Content: text/plain; charset="utf-8"

TY  - ELEC
A1  - LeBere, Kate
ED  - Jenstad, Janelle
T1  - Quickstart: Introduction to Markup
T2  - The Map of Early Modern London
ET  - 6.6
PY  - 2021
DA  - 2021/06/30
CY  - Victoria
PB  - University of Victoria
LA  - English
UR  - https://mapoflondon.uvic.ca/edition/6.6/quickstart_markup.htm
UR  - https://mapoflondon.uvic.ca/edition/6.6/xml/standalone/quickstart_markup.xml
ER  - 

TEI citation

<bibl type="mla"><author><name ref="#LEBE1"><surname>LeBere</surname>, <forename>Kate</forename></name></author>. <title level="a">Quickstart: Introduction to Markup</title>. <title level="m">The Map of Early Modern London</title>, Edition <edition>6.6</edition>, edited by <editor><name ref="#JENS1"><forename>Janelle</forename> <surname>Jenstad</surname></name></editor>, <publisher>U of Victoria</publisher>, <date when="2021-06-30">30 Jun. 2021</date>, <ref target="https://mapoflondon.uvic.ca/edition/6.6/quickstart_markup.htm">mapoflondon.uvic.ca/edition/6.6/quickstart_markup.htm</ref>.</bibl>

Personography