A concise introduction to the emerging field of data science,
explaining its evolution, relation to machine learning, current uses,
data infrastructure issues, and ethical challenges.
The goal of data science is to improve decision making through the
analysis of data. Today data science determines the ads we see online,
the books and movies that are recommended to us online, which emails are
filtered into our spam folders, and even how much we pay for health
insurance. This volume in the MIT Press Essential Knowledge series
offers a concise introduction to the emerging field of data science,
explaining its evolution, current uses, data infrastructure issues, and
ethical challenges.
It has never been easier for organizations to gather, store, and process
data. Use of data science is driven by the rise of big data and social
media, the development of high-performance computing, and the emergence
of such powerful methods for data analysis and modeling as deep
learning. Data science encompasses a set of principles, problem
definitions, algorithms, and processes for extracting non-obvious and
useful patterns from large datasets. It is closely related to the fields
of data mining and machine learning, but broader in scope. This book
offers a brief history of the field, introduces fundamental data
concepts, and describes the stages in a data science project. It
considers data infrastructure and the challenges posed by integrating
data from multiple sources, introduces the basics of machine learning,
and discusses how to link machine learning expertise with real-world
problems. The book also reviews ethical and legal issues, developments
in data regulation, and computational approaches to preserving privacy.
Finally, it considers the future impact of data science and offers
principles for success in data science projects.