.. |tr| replace:: Text Repository Content Coordinate System ========================= Caveat lector: `HC SVNT DRACONES`__ __ https://en.wikipedia.org/wiki/Here_be_dragons Addressing Content ------------------ Suppose a document (e.g., a ``txt`` file) is in the |tr|, how can we then refer to portions of the text inside the document? Let's try some URIs. Suppose that:: curl $prefix/documents/$id will get you all of the document's ``txt`` contents. It would be nice if we could address a `character range`, e.g., from characters ``10-15`` in a manner such as:: curl $prefix/documents/$id/text/chars?start=10&end=15 or:: curl $prefix/documents/$id/text/chars?start=10&length=6 Similarly we could be interested in `lines` ``2..4`` at address:: curl $prefix/documents/$id/text/lines?start=2&end=4 The idea being that the ``text`` part in ``$prefix/documents/$id/text/chars?start=10&length=6`` signifies that we are interested in interpreting the document from a ``txt`` perspective, looking at lines, words, characters, etc. (as opposed to, e.g., an XML / TEI context where we could be interested in getting the author from the metadata, ...) Things get interesting once we challenge ourselves to the idea that we could be interested in viewing a document from this ``text`` perspective, irrespective of the actual format that was used to upload the document. So, even if a ``TEI`` (or ``PageXML``, ``hOCR``, ...) document is uploaded, we still want to address its `textual content` via ``/text/chars/...`` and we consider it a |tr| responsibility to be able to (in this case) yield the requested `character range` (lines, words, ...) from the ``TEI`` document. Perspectives ------------ In ``$prefix/documents/$id/text/{chars,lines,words,...}`` we will call ``text`` the `perspective`. **TODO**: `conjure up terminology for the` ``{chars,lines,words}`` `part`, perhaps ``selector``? ``$prefix/documents/$id//?params`` Sidestepping to current |tr| implementation, ``perspective`` could be a Resource class in the WebApp stack, translating the URI / addressing scheme. Then a separate `Perspective` class hierarchy is responsible for mapping various file formats to text (flatten the tree and yield the characters for a generic XML file, do something more intelligent for a TEI file, use "Rutger's" implementation for PageXML/hOCR, etc.). Then a `Selector` class hierarchy works in the ``txt`` domain and selects the requested fragment. **TODO**: diagrams :-)