Current language technology is dominated by approaches that either
enumerate a large set of rules, or are focused on a large amount of
manually labelled data. The creation of both is time-consuming and
expensive, which is commonly thought to be the reason why automated
natural language understanding has still not made its way into
"real-life" applications yet.
This book sets an ambitious goal: to shift the development of language
processing systems to a much more automated setting than previous works.
A new approach is defined: what if computers analysed large samples of
language data on their own, identifying structural regularities that
perform the necessary abstractions and generalisations in order to
better understand language in the process?
After defining the framework of Structure Discovery and shedding light
on the nature and the graphic structure of natural language data,
several procedures are described that do exactly this: let the computer
discover structures without supervision in order to boost the
performance of language technology applications. Here, multilingual
documents are sorted by language, word classes are identified, and
semantic ambiguities are discovered and resolved without using a
dictionary or other explicit human input. The book concludes with an
outlook on the possibilities implied by this paradigm and sets the
methods in perspective to human computer interaction.
The target audience are academics on all levels (undergraduate and
graduate students, lecturers and professors) working in the fields of
natural language processing and computational linguistics, as well as
natural language engineers who are seeking to improve their systems.