This book addresses several knowledge discovery problems on
multi-sourced data where the theories, techniques, and methods in data
cleaning, data mining, and natural language processing are synthetically
used. This book mainly focuses on three data models: the multi-sourced
isomorphic data, the multi-sourced heterogeneous data, and the text
data. On the basis of three data models, this book studies the knowledge
discovery problems including truth discovery and fact discovery on
multi-sourced data from four important properties: relevance,
inconsistency, sparseness, and heterogeneity, which is useful for
specialists as well as graduate students. Data, even describing the same
object or event, can come from a variety of sources such as crowd
workers and social media users. However, noisy pieces of data or
information are unavoidable. Facing the daunting scale of data, it is
unrealistic to expect humans to "label" or tell which data source is
more reliable. Hence, it is crucial to identify trustworthy information
from multiple noisy information sources, referring to the task of
knowledge discovery. At present, the knowledge discovery research for
multi-sourced data mainly faces two challenges. On the structural level,
it is essential to consider the different characteristics of data
composition and application scenarios and define the knowledge discovery
problem on different occasions. On the algorithm level, the knowledge
discovery task needs to consider different levels of information
conflicts and design efficient algorithms to mine more valuable
information using multiple clues. Existing knowledge discovery methods
have defects on both the structural level and the algorithm level,
making the knowledge discovery problem far from totally solved.