Problem-oriented Corpus Annotation and the Hebrew Bible

  • Johan de Joode KU Leuven
Keywords: corpus linguistics, quality assurance, corpus annotation, Hebrew

Abstract

In this contribution, I argue that the exegetical and stylistic study of the Hebrew Bible would benefit from the creation and storage of qualitative and quantitative annotations using problem-oriented corpus annotation (de Haan 1984). Within Biblical studies exegetes are used to static interfaces which allow them retrieve information, but not enhance it with anything more elaborate than user notes. I present a roadmap for the development of an annotation tool tailored to the Hebrew Bible with the sole objective of enriching the data that is already present in open source datasets like that of the ETCBC. Based on my experience with a dataset to annotate conceptual metaphors in the book of Job (bibliametaphorica.com) and the literature on corpus annotation (Sinclair 2004; Leech 2005; Fort et al. 2012), I argue that data creation and enrichment is a challenging, yet rewarding endeavour. It is challenging because it is circular, viz. labels are informed by the data are hence difficult to a priori define. Furthermore, it is difficult to be consistent and the actual, manual labelling of the text requires interpretative choices that cause editorial fatigue. Fort et al. (2012) suggest that annotation campaigns have differing degrees of difficulty which can be mitigated not just by inter-annotator rating, but by conscious decisions to lower the annotation complexity. A user interface is needed that limits annotation complexity and that allows researchers to annotate the text with minimal effort. The end-result is an XML document, for instance, that contains both the text and the annotations, in a format that can be merged back into the original database, but need not be. The existence of a tool for the manual annotation of open data will increase the replicability of research as well as its democratisation, as students world-wide can create and share their data.

Published
2019-11-20
How to Cite
de Joode, Johan. 2019. “Problem-Oriented Corpus Annotation and the Hebrew Bible”. HIPHIL Novum 5 (2), 6-12. http://hiphil.org/index.php/hiphil/article/view/22.
Section
Articles