Data synthesis based on generative adversarial networks

Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hong‐Kyu Park, Youngmin Kim

Type: Article

Publication Date: 2018-06-01

Citations: 153

DOI: https://doi.org/10.14778/3231751.3231757

Abstract

Privacy is an important concern for our society where sharing data with partners or releasing data to the public is a frequent occurrence. Some of the techniques that are being used to achieve privacy are to remove identifiers, alter quasi-identifiers, and perturb values. Unfortunately, these approaches suffer from two limitations. First, it has been shown that private information can still be leaked if attackers possess some background knowledge or other information sources. Second, they do not take into account the adverse impact these methods will have on the utility of the released data. In this paper, we propose a method that meets both requirements. Our method, called table-GAN, uses generative adversarial networks (GANs) to synthesize fake tables that are statistically similar to the original table yet do not incur information leakage. We show that the machine learning models trained using our synthetic tables exhibit performance that is similar to that of models trained using the original table for unknown testing cases. We call this property model compatibility. We believe that anonymization/perturbation/synthesis methods without model compatibility are of little value. We used four real-world datasets from four different domains for our experiments and conducted in-depth comparisons with state-of-the-art anonymization, perturbation, and generation techniques. Throughout our experiments, only our method consistently shows a balance between privacy level and model compatibility.

Locations

Proceedings of the VLDB Endowment - View
arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+	Effective and Privacy preserving Tabular Data Synthesizing	2021	Aditya Kunar
+	TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing	2021	Aoting Hu Renjie Xie Zhigang Lü Aiqun Hu Minhui Xue
+	Synthetic Data -- Anonymisation Groundhog Day	2020	Theresa Stadler Bristena Oprisanu Carmela Troncoso
+	Synthetic Data -- Anonymisation Groundhog Day	2020	Theresa Stadler Bristena Oprisanu Carmela Troncoso
+	Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration	2020	Ju Fan Tongyu Liu Guoliang Li Junyou Chen Yuwei Shen Xiaoyong Du
+	A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data	2023	Meenatchi Sundaram Muthu Selva Annamalai Andrea Gadotti Luc Rocher
+	Generating tabular datasets under differential privacy	2023	Gianluca Truda
+ PDF Chat	Privacy Re-identification Attacks on Tabular GANs	2024	Abdallah Alshantti Adil Rasheed Frank Westad
+	Invertible Tabular GANs: Killing Two Birds with OneStone for Tabular Data Synthesis	2022	Jaehoon Lee Jihyeon Hyeong Jinsung Jeon Noseong Park Ji‐Hoon Cho
+	SynDiffix: More accurate synthetic structured data	2023	Paul Francis Cristian Berneanu Edon Gashi
+	Synthetic Data -- A Privacy Mirage	2020	Theresa Stadler Bristena Oprisanu Carmela Troncoso
+	Protecting Sensitive Attributes via Generative Adversarial Networks.	2018	Aria Rezaei Chaowei Xiao Jie Gao Bo Li
+ PDF Chat	TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing	2021	Aoting Hu Renjie Xie Zhigang Lü Aiqun Hu Minhui Xue
+ PDF Chat	Synthetic Data Privacy Metrics	2025	Amy Steier Lakshmish Ramaswamy Andre Manoel Alexa Haushalter
+	Application-driven Privacy-preserving Data Publishing with Correlated Attributes	2018	Aria Rezaei Chaowei Xiao Jie Gao Bo Li Sirajum Munir
+	Application-driven Privacy-preserving Data Publishing with Correlated Attributes	2018	Aria Rezaei Chaowei Xiao Jie Gao Bo Li Sirajum Munir
+ PDF Chat	KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation	2024	Anantaa Kotal Anupam Joshi
+	Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control	2022	Tânia Carvalho Nuno Moniz Pedro Faria Luís Antunes Nitesh V. Chawla
+	Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis	2023	Conor Hassan Robert Salomone Kerrie Mengersen
+	Generate synthetic samples from tabular data	2022	David Banh Alan Huang

Works That Cite This (60)

Action	Title	Year	Authors
+ PDF Chat	Evaluating the Clinical Realism of Synthetic Chest X-Rays Generated Using Progressively Growing GANs	2021	Bradley Segal David M. Rubin Grace Rubin Adam Pantanowitz
+ PDF Chat	A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective	2019	Yuji Roh Geon Heo Steven Euijong Whang
+ PDF Chat	AutoML: A survey of the state-of-the-art	2020	Xin He Kaiyong Zhao Xiaowen Chu
+	Differentially Private Synthetic Data: Applied Evaluations and Enhancements	2020	Lucas Rosenblatt Xiaoyan Liu Samira Pouyanfar Eduardo de Leon Anuj S. Desai Joshua E. Allen
+	GAN-based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and Solutions	2022	Mohammadali Fallahian Mohsen Dorodchi Kyle Kreth
+	DTGAN: Differential Private Training for Tabular GANs	2021	Aditya Kunar Robert Birke Zilong Zhao Lydia Y. Chen
+	Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data	2021	Zilong Zhao Robert Birke Aditya Kunar Lydia Y. Chen
+ PDF Chat	Data synthesis based on generative adversarial networks	2018	Noseong Park Mahmoud Mohammadi Kshitij Gorde Sushil Jajodia Hong‐Kyu Park Youngmin Kim
+	Effective and Privacy preserving Tabular Data Synthesizing	2021	Aditya Kunar
+	Holdout-Based Fidelity and Privacy Assessment of Mixed-Type Synthetic Data	2021	Michael Platzer Thomas Reutterer

Works Cited by This (16)

Action	Title	Year	Authors
+ PDF Chat	API design for machine learning software: experiences from the scikit-learn project	2013	Lars Buitinck Gilles Louppe Mathieu Blondel Fabián Pedregosa Andreas Mueller Olivier Grisel Vlad Niculae Peter Prettenhofer Alexandre Gramfort Jaques Grobler
+	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift	2015	Sergey Ioffe Christian Szegedy
+	On the design and quantification of privacy preserving data mining algorithms	2001	Dakshi Agrawal Charų C. Aggarwal
+	Scikit-learn: Machine Learning in Python	2012	Fabián Pedregosa Gaël Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron J. Weiss Vincent Dubourg
+	Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks	2015	Alec Radford Luke Metz Soumith Chintala
+ PDF Chat	Membership Inference Attacks Against Machine Learning Models	2017	Reza Shokri Marco Stronati Congzheng Song Vitaly Shmatikov
+	Generating Multi-label Discrete Patient Records using Generative Adversarial Networks	2017	Edward Choi Siddharth Biswal Bradley Malin Jon Duke Walter F. Stewart Jimeng Sun
+	Generating Multi-label Discrete Electronic Health Records using Generative Adversarial Networks.	2017	Edward Choi Siddharth Biswal Bradley Malin Jon Duke Walter F. Stewart Jimeng Sun
+	Progressive Growing of GANs for Improved Quality, Stability, and Variation	2017	Tero Karras Timo Aila Samuli Laine Jaakko Lehtinen
+ PDF Chat	Data synthesis based on generative adversarial networks	2018	Noseong Park Mahmoud Mohammadi Kshitij Gorde Sushil Jajodia Hong‐Kyu Park Youngmin Kim