GBI Treebanks as a Resource for New Applications

  • Andi Wu Global Bible Initiative
Keywords: tree banks, dependency trees

Abstract

Global Bible Initiative (GBI) have developed Hebrew OT treebanks and Greek NT syntactic treebanks. The treebanks were first generated with a parser using computerized Hebrew and Greek grammars and then proofed verse by verse by Hebrew and Greek Scholars. All the corrections made by the scholars were kept as disambiguation data.

The phrase structures in the trees have been used to build interlinears, concordances, and translation memories which operate not only on the word level, but on the phrase and clause levels as well. The syntactic relations (dependencies) in the trees have also been used to do smart search where we can find texts that are different in form but similar in meaning.

Recently, we have also used the trees to improve the accuracy of automatic word alignment and explore tree-based interactive machine translation of the Bible. The auto aligner can be used to the Hebrew and Greek texts to translations in various languages. The interactive machine translation will speed up Bible translation without compromising quality by providing real time suggestions and checking.

We have already contributed two sets of Greek trees to Creative Commons, the Nestle 1904 version and the SBLGNT version. We also have trees for NA27 and NA28, but we do not own the texts. The Hebrew OT treebank we developed was owned by the Groves Center. We are also capable of creating new treebanks with the parser, grammar, and disambiguation data we own if we are given a text that is morphologically tagged.

Published
2019-11-20
How to Cite
Wu, Andi. 2019. “GBI Treebanks As a Resource for New Applications”. HIPHIL Novum 5 (2), 97-101. http://hiphil.org/index.php/hiphil/article/view/48.
Section
Conference paper