EML-txt2txt is a series of three scripts (o4a-solver, EML-spellchecker, EML-normalizer) under a common GUI that converts the output of OCR4all into texts that have (1) abbreviations solved (OCR4all outputs unicode), (2) scanning mistakes corrected, and - finally - (3) with normalized orthography.
For some time now I have been looking for digital text corpora and wordlists of Latin. I would like to construct wordlists that can be used with the EML-spellchecker (in development now); I am specially interested in data which are chronologically tagged, since later iterations of the software should be able to produce information about the chronological stratification of an EML-text’s vocabulary.
Probably the most significant step forward for quantitative (really any kind of text-oriented) research in Early Modern Latin (EML) in a long time is ocr4all, an OCR software that reliably converts scans of early printed books to machine-readable (and human-researchable) text, developed at the U. of Würzburg (github.com/OCR4all). High quality scans of early printed books have been abundant for some time now; that has, however, so far not translated into an increased availability of texts.
Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).