In many predictive modeling tasks, one has a fixed set of observations
from which a vast, or even infinite, set of potentially predictive
features can be computed. Of these features, often only a small number
are expected to be useful in a predictive model. Models which use the
entire set of features will almost certainly overfit on future data
sets. The book presents streamwise feature selection which interleaves
the process of generating new features with that of feature testing.
Streamwise feature selection scales well to large feature sets. The book
also describes how to use streamwise feature seleciton in multivariate
regressions. It includes a review of traditional feature selecitions in
a general framework based on information theory, and compares these
methods with streamwise feature selection on various real and synthetic
data sets. This book is intended to be used by researchers in machine
learning, data mining, and knowledge discovery.