Revision with unchanged content. In many predictive modeling tasks, one
has a fixed set of observations from which a vast, or even infinite, set
of potentially predictive features can be com-puted. Of these features,
often only a small number are expected to be use-ful in a predictive
model. Models which use the entire set of features will almost certainly
overfit on future data sets. The book presents streamwise feature
selection which interleaves the pro-cess of generating new features with
that of feature testing. Streamwise fea-ture selection scales well to
large feature sets. The book also describes how to use streamwise
feature seleciton in multivariate regressions. It includes a review of
traditional feature selecitions in a general frame-work based on
information theory, and compares these methods with streamwise feature
selection on various real and synthetic data sets. This book is intended
to be used by researchers in machine learning, data mining, and
knowledge discovery.