This book explains how can be created information extraction (IE)
applications that are able to tap the vast amount of relevant
information available in natural language sources: Internet pages,
official documents such as laws and regulations, books and newspapers,
and social web. Readers are introduced to the problem of IE and its
current challenges and limitations, supported with examples. The book
discusses the need to fill the gap between documents, data, and people,
and provides a broad overview of the technology supporting IE. The
authors present a generic architecture for developing systems that are
able to learn how to extract relevant information from natural language
documents, and illustrate how to implement working systems using
state-of-the-art and freely available software tools. The book also
discusses concrete applications illustrating IE uses.
- Provides an overview of state-of-the-art technology in information
extraction (IE), discussing achievements and limitations for the
software developer and providing references for specialized literature
in the area
- Presents a comprehensive list of freely available, high quality
software for several subtasks of IE and for several natural languages
- Describes a generic architecture that can learn how to extract
information for a given application domain