This text examines the goals of data analysis with respect to enhancing
knowledge, and identifies data summarization and correlation analysis as
the core issues. Data summarization, both quantitative and categorical,
is treated within the encoder-decoder paradigm bringing forward a number
of mathematically supported insights into the methods and relations
between them. Two Chapters describe methods for categorical
summarization: partitioning, divisive clustering and separate cluster
finding and another explain the methods for quantitative summarization,
Principal Component Analysis and PageRank.
Features:
- An in-depth presentation of K-means partitioning including a
corresponding Pythagorean decomposition of the data scatter.
- Advice regarding such issues as clustering of categorical and mixed
scale data, similarity and network data, interpretation aids, anomalous
clusters, the number of clusters, etc.
- Thorough attention to data-driven modelling including a number of
mathematically stated relations between statistical and geometrical
concepts including those between goodness-of-fit criteria for decision
trees and data standardization, similarity and consensus clustering,
modularity clustering and uniform partitioning.
New edition highlights:
- Inclusion of ranking issues such as Google PageRank, linear
stratification and tied rankings median, consensus clustering,
semi-average clustering, one-cluster clustering
- Restructured to make the logics more straightforward and sections
self-contained
Core Data Analysis: Summarization, Correlation and Visualization is
aimed at those who are eager to participate in developing the field as
well as appealing to novices and practitioners.