Digital Corpora

of Latin Texts

For some time now I have been looking for digital text corpora and wordlists of Latin. I would like to construct wordlists that can be used with the EML-spellchecker (in development now); I am specially interested in data which are chronologically tagged, since later iterations of the software should be able to produce information about the chronological stratification of an EML-text’s vocabulary.

Read More

ocr4all

OCR for Incunables

Probably the most significant step forward for quantitative (really any kind of text-oriented) research in Early Modern Latin (EML) in a long time is ocr4all, an OCR software that reliably converts scans of early printed books to machine-readable (and human-researchable) text, developed at the U. of Würzburg (github.com/OCR4all). High quality scans of early printed books have been abundant for some time now; that has, however, so far not translated into an increased availability of texts.

Read More

Empty

Used as template

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).

Read More