Publication

Engineering an aligned gold-standard corpus of human to machine oriented Controlled Natural Language

Hazem Safwat
Brian Davis
Manel Zarrouk
Citation
Safwat, Hazem, Davis, Brian, & Zarrouk, Manel. (2018). Engineering an aligned gold-standard corpus of human to machine oriented Controlled Natural Language. Paper presented at the IEEE/WIC/ACM International Conference on Web Intelligence (WI2018), Santiago, Chile, 03-06 December, doi: 10.1109/WI.2018.00-58
Abstract
Knowledge base creation and population are an essential formal backbone for a variety of intelligent applications, decision support and expert systems and intelligent search. While the abundance of unstructured text helps in easing the knowledge acquisition gap, the ambiguous nature of language tends to impact accuracy when engaging in more complex semantic analysis. Controlled Natural Languages (CNLs) are subsets of natural language that are restricted grammatically in order to reduce or eliminate ambiguity for the purposes of machine processability, or unambiguous human communication within a domain or industry context, such as Simplified English. This type of human-oriented CNL is under-researched despite having found favor within industry over many years. We describe a novel dataset which aligns a representative sample of Simplified English Wikipedia sentences with a well known machine-oriented CNL. This linguistic resource is both human-readable and semantically machine interpretable and can benefit a variety of NLP and knowledge based applications.
Publisher
IEEE
Publisher DOI
10.1109/WI.2018.00-58
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland