The first truly interdisciplinary text on data mining, blending the
contributions of information science, computer science, and
statistics.
The growing interest in data mining is motivated by a common problem
across disciplines: how does one store, access, model, and ultimately
describe and understand very large data sets? Historically, different
aspects of data mining have been addressed independently by different
disciplines. This is the first truly interdisciplinary text on data
mining, blending the contributions of information science, computer
science, and statistics.
The book consists of three sections. The first, foundations, provides a
tutorial overview of the principles underlying data mining algorithms
and their application. The presentation emphasizes intuition rather than
rigor. The second section, data mining algorithms, shows how algorithms
are constructed to solve specific problems in a principled manner. The
algorithms covered include trees and rules for classification and
regression, association rules, belief networks, classical statistical
models, nonlinear models such as neural networks, and local memory-based
models. The third section shows how all of the preceding analysis fits
together when applied to real-world data mining problems. Topics include
the role of metadata, how to handle missing data, and data
preprocessing.