Overlapping structures in verse

overlapping hierarchies poem
q quote lg l part next prev

Because of the way verse structures are encoded, they present many more opportunities for inconvenient overlap with other kinds of encoding, so learning to handle this overlap gracefully is an important component of verse markup. Unlike prose, where line breaks are marked with the empty element lb, verse lines are contained within l elements. Hence any encoding which is not nested within a single verse line stands a chance of overlapping.

The most frequent source of overlap is quoted speech, which often appears within verse (particularly narrative verse) and rarely fits neatly into the verse line structure. For instance:

<l>“Thy gold,” he cried, “the conqueror scorns,</l>
<l>He claims thy forfeit life,</l>
<l>Thy precious gems, and jewels rare,</l>
<l>He gives thy beauteous wife.”</l>

The quoted speech here is divided into two parts, the second of which spans three and a half lines. However, because of overlap we cannot simply place a start-tag at the beginning of each piece of the quotation and an end-tag at the end, since the verse lines will not nest completely inside the quotation (nor vice versa). Instead, we need to divide the quotation into several pieces:

<l><q rend="pre(&ldquo;)post(&rdquo;)">Thy gold,</q> he cried,      
   <q rend="pre(&ldquo;)"> the conqueror scorns,</q></l>
<l><q>He claims thy forfeit life,</q></l>
<l><q> Thy precious gems, and jewels rare,</q></l>
<l><q rend="post(&rdquo;)"> He gives thy beauteous wife.</q></l>

Since l may nest inside of q, we could also encode it a little more compactly this way:

<l><q rend="pre(&ldquo;)post(&rdquo;)">Thy gold,</q> he cried, 
   <q rend="pre(&ldquo;)">the conqueror scorns,</q></l>
<q rend="post(&rdquo;)"><l>He claims thy forfeit life,</l>
<l>Thy precious gems, and jewels rare,</l>
<l>He gives thy beauteous wife.</l></q> 

Ideally, we’d like to indicate that all of these individual q elements are part of the same quotation, and we can do this using the part attribute. The TEI by default does not provide a part attribute on q, but it is simple to add one, and the extended DTD that accompanies this Guide allows part on q and quote as well as on l and lg. Values for part are I, M, and F (for initial, medial, and final). There must be at least one initial and one final part for any fragmented element; there may be as many medial parts as necessary, or none at all. This allows for the following encoding:

<l><q rend="pre(&ldquo;)post(&rdquo;)" part="I">Thy gold,</q> he cried, 
    <q rend="pre(&ldquo;)" part="M">the conqueror scorns,</q>
</l>
<q rend="post(&rdquo;)" part="F">
  <l>He claims thy forfeit life,</l>
  <l>Thy precious gems, and jewels rare,</l>
  <l>He gives thy beauteous wife.</l>
</q>

Quotations are by far the most frequent sources of overlap, but you may encounter others depending on what kinds of poetry you’re encoding, and what other structures you are encoding within the poetry. Personal or place names might well run across a line break, as might foreign-language phrases, syntactic structures, and other verbal phenomena. Since the part attribute is not allowed by default on most elements in TEI, you may need to extend the TEI to add part to the elements you need to fragment.

You may sometimes encounter more complex situations where one fragmented quotation is nested inside another. In these cases, you can’t use the part attribute, because there is no way to tell which parts go together; processing software has no way of knowing which final part completes a given initial part. Instead, you need to use a method which specifically identifies each sequence of fragments as a sequence. The solution is the next and prev attributes, which are available on all elements when the additional tagset for linking, segmentation, and alignment is included. The DTD that accompanies this Guide includes these attributes.

The next and prev attributes are slightly more complex to use, but the basic idea is simple. Each fragment of the element you are encoding is given a unique identifier, using the id attribute. Each fragment also carries a next attribute, which holds the id value of the next fragment in the sequence, and a prev attribute which holds the id value of the previous fragment in the sequence. So for instance (a simplified version of the example above, showing just the quotation):

<q id="q01" next="q02">Thy gold</q> he cried, 
<q id="q02" prev="q01" next="q03">the conqueror scorns,</q>
<q id="q03" prev="q02">He claims thy forfeit life...</q>

The first fragment doesn’t have a prev attribute because there is no previous fragment in the sequence, and the same logic applies to the last fragment, which lacks a next attribute. The id values must be unique within the entire encoded file. Clearly this method is a bit more involved than using the part attribute, but it is also much more unambiguous, and since the next and prev attributes are available as part of the unmodified TEI DTD, this method doesn’t require any DTD extensions.

If you’re using an extended DTD which allows part on q and quote, and hence have both methods available to you, a good rule of thumb is to use part wherever possible (i.e. everywhere except where two q or quote elements are nested and both need to be fragmented), and to use next and prev only for those more difficult cases. Use a uniform system of id values with a common prefix (such as q or quote) and number them consecutively to make it easier to spot errors.