The aim of this book is to give the reader a detailed introduction to
the different approaches to generating multiply imputed synthetic
datasets. It describes all approaches that have been developed so far,
provides a brief history of synthetic datasets, and gives useful hints
on how to deal with real data problems like nonresponse, skip patterns,
or logical constraints.
Each chapter is dedicated to one approach, first describing the general
concept followed by a detailed application to a real dataset providing
useful guidelines on how to implement the theory in practice.
The discussed multiple imputation approaches include imputation for
nonresponse, generating fully synthetic datasets, generating partially
synthetic datasets, generating synthetic datasets when the original data
is subject to nonresponse, and a two-stage imputation approach that
helps to better address the omnipresent trade-off between analytical
validity and the risk of disclosure.
The book concludes with a glimpse into the future of synthetic datasets,
discussing the potential benefits and possible obstacles of the approach
and ways to address the concerns of data users and their understandable
discomfort with using data that doesn't consist only of the originally
collected values.
The book is intended for researchers and practitioners alike. It helps
the researcher to find the state of the art in synthetic data summarized
in one book with full reference to all relevant papers on the topic. But
it is also useful for the practitioner at the statistical agency who is
considering the synthetic data approach for data dissemination in the
future and wants to get familiar with the topic.
Each chapter is dedicated to one approach, first describing the general
concept followed by a detailed application to a real dataset providing
useful guidelines on how to implement the theory in practice.
The discussed multiple imputation approaches include imputation for
nonresponse, generating fully synthetic datasets, generating partially
synthetic datasets, generating synthetic datasets when the original data
is subject to nonresponse, and a two-stage imputation approach that
helps to better address the omnipresent trade-off between analytical
validity and the risk of disclosure.
The book concludes with a glimpse into the future of synthetic datasets,
discussing the potential benefits and possible obstacles of the approach
and ways to address the concerns of data users and their understandable
discomfort with using data that doesn't consist only of the originally
collected values.
The book is intended for researchers and practitioners alike. It helps
the researcher to find the state of the art in synthetic data summarized
in one book with full reference to all relevant papers on the topic. But
it is also useful for the practitioner at the statistical agency who is
considering the synthetic data approach for data dissemination in the
future and wants to get familiar with the topic.
The discussed multiple imputation approaches include imputation for
nonresponse, generating fully synthetic datasets, generating partially
synthetic datasets, generating synthetic datasets when the original data
is subject to nonresponse, and a two-stage imputation approach that
helps to better address the omnipresent trade-off between analytical
validity and the risk of disclosure.
The book concludes with a glimpse into the future of synthetic datasets,
discussing the potential benefits and possible obstacles of the approach
and ways to address the concerns of data users and their understandable
discomfort with using data that doesn't consist only of the originally
collected values.
The book is intended for researchers and practitioners alike. It helps
the researcher to find the state of the art in synthetic data summarized
in one book with full reference to all relevant papers on the topic. But
it is also useful for the practitioner at the statistical agency who is
considering the synthetic data approach for data dissemination in the
future and wants to get familiar with the topic.