A Critical Examination of the Intertextual Phrase Matching Module in the Thesaurus Linguae Graecae and Its Relevance for Biblical and Patristic Studies
The rich availability of ancient Greek texts in the Thesaurus Linguae Graecae (TLG) has opened up new types of research for Biblical and Patristic scholars. A very helpful feature on the TLG’s website is the option to trace quotations with the help of an n-grams module (Intertextual Phrase Matching). However, it is virtually unknown how well this module performs and what scholars might expect from the results it produces. The core of this article, therefore, is devoted to a critical examination of the algorithm and of its results. The gospel according to John has been compared with the Paedagogus of Clement of Alexandria, with the gospel according to Matthew, and with the complete works of Plutarch. As it turns out, the algorithm performs well in cases of longer quotations with no or very few interpolations. Short quotations, however, are missed while interpolated or adapted quotations are poorly handled. It is suggested that the algorithm might perform better if the team of the TLG were to revisit its decision to ignore stopwords and if the algorithm were to allow for foreign words in its n-grams. Finally, it is advised that more transparency in the algorithm’s mechanisms and a possibility for manually tuning its parameters might improve its applicability.