Text Repository

Backend repository to store and index corpora with metadata and versions.


  • Store texts of a corpus in a uniform domain model:
    • Keep track of file versions
    • Link all file types to the same source document
    • Store metadata about documents, files and versions
  • Use Rest API to create, read, update and delete texts
  • Search files using stock and custom elasticsearch indices
  • Explore API with concordion and swagger


Schematic overview of the Text Repository: a) search in corpus; b+c) manage file types, versions and metadata of corpus; d+e) build custom indices that are automatically kept in sync with corpus; f+g) store all data in a unified database model.


Prerequisites: docker-compose.

To run the Text Repository locally, run in a new directory:

git clone https://github.com/knaw-huc/textrepo .
cd examples/production
curl -o scripts/wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh
chmod +x scripts/wait-for-it.sh

Read more on basic usage


If you are having issues, please let us know at:



Want to improve the Text Repository? Submit a pull request at:



Copyright 2019-2021 Koninklijke Nederlandse Akademie van Wetenschappen

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.