CCLS Research Scientist Nizar Habash Receives QNRF Award
CCLS receives its first international grant award. The granting agency is the Qatar National Research Fund (QNRF), part of its National Priorities Research Program (NPRP). Out of 1,400 letters of intent submitted to the current NPRP cycle, 631 proposals were considered for review, and 145 projects were awarded (for a total of $121M).
The awarded CCLS project is led by Dr. Nizar Habash in collaboration with researchers at Carnegie Mellon University in Qatar (CMUQ). The project's total is $980K (over three years). The Columbia portion is $340K.
QNRF Newsletter: http://www.qnrfnewsletter.org/issue6/news5.php
Project title: Automatic Correction of Standard Arabic Text: Resource and System Development
Modern Standard Arabic is a morphologically and syntactically complex language, the understanding of which challenges both humans and machines. We propose to study problems encountered in correcting Arabic text automatically, which addresses errors in spelling, lexical choice, and grammar (morphology and syntax). Our approach is twofold. First, we will build a large corpus (~2M words) of human-corrected Arabic text produced by native speakers, non-native speakers, and machines. The QALB corpus (Qatar Arabic Language Bank) will provide a resource for training and testing automatic-correction systems. QALB's annotations will also support several other Arabic NLP efforts. Secondly, we will build a general system (ACLE) for automatically correcting Arabic-language errors. In developing ACLE, we will investigate and compare various methods for detecting and correcting errors, methods that rely on differing degrees of training-data availability. To compensate for sparsity of error-correction data, our methods will incorporate models that reflect Arabic's complex morphology and orthography. This project will be the first to study automatic Arabic error-correction using large-scale, manually annotated data and will form the basis for a shared-task workshop on Arabic automatic correction.