Canonical Text Infrastructure (CTI)




Pillar 1: Canonical Text Service

Public Text Inventories
(!) = in editing process
NamespaceTextinventories
(capped at 10k)
ContentSource
dsb (!)1lower sorbian text corpusSerbski Institute
folgershakespeare1All Shakespeare's worksFolger Shakespeare Library
pbc120 copyright-free mutlilingual parallel bible translations Parallel Bible Corpus
pcp1Chrétien de Troyes's Le Chevalier de la Charrette (Lancelot, ca. 1180)The Princeton Charrette Project
tg 1 2 3 4 5 6 7 TextgridThe Digital Library in Textgrid
tgap1Thomas Gray Archiv Poems Thomas Gray Archive
voth1David Boder: Voices of the Holocaust David Boder: Voices of the Holocaust
Online Tools Namespace Resolver provides endpoint URLs based on URN namespaces
CTS Explorer provides an meta overview about available CTS instances
E-Book Style Reader
Resources and Source Code Source Code Repositories (Git hosted via Bitbucket.org)
Python API WIP
Suggested Citation Tiepmar, J. (). Canonical Text Infrastructure. https://urncts.eu

Selected Collections in E-Book Style

Autor Hans Christian Andersen | Wilhelm Busch | Johann Wolfgang von Goethe | Grimm's Fairy Tales | Friedrich Schiller | Shakespeare
Time Frame Age of Enlightenment (German) (1650 - 1800)





Pillar 2: Canonical Text Miner

Text Mining Instances Folger Shakespeare
Source Code Resources Source Code and Installation (Git hosted via Bitbucket.org)
Suggested Citation Tiepmar, J. (). Canonical Text Infrastructure. https://urncts.eu





Impressum and Data Protection Policy

This is a non-commercial academic research and data webservice.

Impressum
Dr. Jochen Tiepmar, c/o IP-Management #48412, Ludwig-Erhard-Str. 18, 20459 Hamburg, Germany.
Preferably Email: tiepilab at gmx.de or the usual academic communication channels.

Data Protection Policy
No user data is collected besides IP access logs that are stored by Apache Server software. These access logs are deleted automatically. Data sets are provided according to their public license or prior individual agreements. Tools may include publicly available software licenses (namely plotly.js and cytoscape.js).





FAQ

What is a Canonical Text Service?

The Canonical Text Services protocol defines interaction between a client and server providing identification of texts and retrieval of canonically cited passages of texts. The official specifications by David Neel Smith and Christopher Blackwell can be found here. To put it relatively simple: CTS serves text passages that are specified by URN like references. It is specified in a way that allows to create CTS URNs for any possible text passage in a document. The data can be requested using GET requests that are provided in an URL. Each request must contain one parameter request which specifies the CTS function to use. Function specific parameters - like the URN - are added as additional GET parameters.

Is the implementation feature complete?

Subpassage notation, GetPassagePlus and error messages are missing but will soon be implemented as well as a lot of additional features that extend the CTS protocol (e.g. license management on passage request level). See this dissertation for more information about what is planned.

How about data persistency and versioning? Can I reliably cite text passages via URNs or can the text content change?

CTS URNs are meant to be persistent references. However, mistakes and improvements happen and structure markup can change if documents are still edited. There is no clear solution for this problem but some kind of versioning will be implemented (e.g. numbered updates). Text corpora that are still worked on are marked with (!) in the above table. Generally CTS URNs can be considered safe for citation purposes.

How reliable is this service? Will you monetize it once people depend on it?

The server is financed privately and I am using these webservices for my own programming work and research. The software is open source and can be recreated by anyone. It is planned to implement CTS Cloning, which will allow decentralized distributed backups for texts once they got "CTSified"; this will eliminate any dependency on individual servers as it will allow anyone to mix and host their own data instances. Monetizing this service will not be neccessary and would be counter productive for me personally because it would undermine the reliability of my research output.