Research in Natural Language Processing (NLP) has rapidly advanced in
recent years, resulting in exciting algorithms for sophisticated
processing of text and speech in various languages. Much of this work
focuses on English; in this book we address another group of interesting
and challenging languages for NLP research: the Semitic languages. The
Semitic group of languages includes Arabic (206 million native
speakers), Amharic (27 million), Hebrew (7 million), Tigrinya (6.7
million), Syriac (1 million) and Maltese (419 thousand). Semitic
languages exhibit unique morphological processes, challenging syntactic
constructions and various other phenomena that are less prevalent in
other natural languages. These challenges call for unique solutions,
many of which are described in this book.
The 13 chapters presented in this book bring together leading scientists
from several universities and research institutes worldwide. While this
book devotes some attention to cutting-edge algorithms and techniques,
its primary purpose is a thorough explication of best practices in the
field. Furthermore, every chapter describes how the techniques discussed
apply to Semitic languages. The book covers both statistical approaches
to NLP, which are dominant across various applications nowadays and the
more traditional, rule-based approaches, that were proven useful for
several other application domains. We hope that this book will provide a
"one-stop-shop'' for all the requisite background and practical advice
when building NLP applications for Semitic languages.