This book provides a general and comprehensible overview of imbalanced
learning. It contains a formal description of a problem, and focuses on
its main features, and the most relevant proposed solutions.
Additionally, it considers the different scenarios in Data Science for
which the imbalanced classification can create a real challenge.
This book stresses the gap with standard classification tasks by
reviewing the case studies and ad-hoc performance metrics that are
applied in this area. It also covers the different approaches that have
been traditionally applied to address the binary skewed class
distribution. Specifically, it reviews cost-sensitive learning,
data-level preprocessing methods and algorithm-level solutions, taking
also into account those ensemble-learning solutions that embed any of
the former alternatives. Furthermore, it focuses on the extension of the
problem for multi-class problems, where the former classical methods are
no longer to be applied in a straightforward way.
This book also focuses on the data intrinsic characteristics that are
the main causes which, added to the uneven class distribution, truly
hinders the performance of classification algorithms in this scenario.
Then, some notes on data reduction are provided in order to understand
the advantages related to the use of this type of approaches.
Finally this book introduces some novel areas of study that are
gathering a deeper attention on the imbalanced data issue. Specifically,
it considers the classification of data streams, non-classical
classification problems, and the scalability related to Big Data.
Examples of software libraries and modules to address imbalanced
classification are provided.
This book is highly suitable for technical professionals, senior
undergraduate and graduate students in the areas of data science,
computer science and engineering. It will also be useful for scientists
and researchers to gain insight on the current developments in this area
of study, as well as future research directions.