Text Repository

Backend repository to store and index corpora with metadata and versions.

Features

  • Store texts of a corpus in a uniform domain model:
    • Keep track of file versions
    • Link all file types to the same source document
    • Store metadata about documents, files and versions
  • Use Rest API to create, read, update and delete texts
  • Search files using stock and custom elasticsearch indices
  • Explore API with concordion and swagger

_images/textrepo-overview.png

Schematic overview of the Text Repository: a) search in corpus; b+c) manage file types, versions and metadata of corpus; d+e) build custom indices that are automatically kept in sync with corpus; f+g) store all data in a unified database model.

Installation

Prerequisites: docker-compose.

To run the Text Repository locally, run in a new directory:

git clone https://github.com/knaw-huc/textrepo .
cd examples/production
curl -o scripts/wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh
chmod +x scripts/wait-for-it.sh
./start-prod.sh

Read more on basic usage

Support

If you are having issues, please let us know at:

https://github.com/knaw-huc/textrepo/issues

Contribute

Want to improve the Text Repository? Submit a pull request at:

https://github.com/knaw-huc/textrepo

License

Copyright 2019-2021 Koninklijke Nederlandse Akademie van Wetenschappen

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.