Data synthesis based on generative adversarial networks

Type: Article

Publication Date: 2018-06-01

Citations: 153

DOI: https://doi.org/10.14778/3231751.3231757

Abstract

Privacy is an important concern for our society where sharing data with partners or releasing data to the public is a frequent occurrence. Some of the techniques that are being used to achieve privacy are to remove identifiers, alter quasi-identifiers, and perturb values. Unfortunately, these approaches suffer from two limitations. First, it has been shown that private information can still be leaked if attackers possess some background knowledge or other information sources. Second, they do not take into account the adverse impact these methods will have on the utility of the released data. In this paper, we propose a method that meets both requirements. Our method, called table-GAN, uses generative adversarial networks (GANs) to synthesize fake tables that are statistically similar to the original table yet do not incur information leakage. We show that the machine learning models trained using our synthetic tables exhibit performance that is similar to that of models trained using the original table for unknown testing cases. We call this property model compatibility. We believe that anonymization/perturbation/synthesis methods without model compatibility are of little value. We used four real-world datasets from four different domains for our experiments and conducted in-depth comparisons with state-of-the-art anonymization, perturbation, and generation techniques. Throughout our experiments, only our method consistently shows a balance between privacy level and model compatibility.

Locations

  • Proceedings of the VLDB Endowment - View
  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Effective and Privacy preserving Tabular Data Synthesizing 2021 Aditya Kunar
+ TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing 2021 Aoting Hu
Renjie Xie
Zhigang Lü
Aiqun Hu
Minhui Xue
+ Synthetic Data -- Anonymisation Groundhog Day 2020 Theresa Stadler
Bristena Oprisanu
Carmela Troncoso
+ Synthetic Data -- Anonymisation Groundhog Day 2020 Theresa Stadler
Bristena Oprisanu
Carmela Troncoso
+ Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration 2020 Ju Fan
Tongyu Liu
Guoliang Li
Junyou Chen
Yuwei Shen
Xiaoyong Du
+ A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data 2023 Meenatchi Sundaram Muthu Selva Annamalai
Andrea Gadotti
Luc Rocher
+ Generating tabular datasets under differential privacy 2023 Gianluca Truda
+ PDF Chat Privacy Re-identification Attacks on Tabular GANs 2024 Abdallah Alshantti
Adil Rasheed
Frank Westad
+ Invertible Tabular GANs: Killing Two Birds with OneStone for Tabular Data Synthesis 2022 Jaehoon Lee
Jihyeon Hyeong
Jinsung Jeon
Noseong Park
Ji‐Hoon Cho
+ SynDiffix: More accurate synthetic structured data 2023 Paul Francis
Cristian Berneanu
Edon Gashi
+ Synthetic Data -- A Privacy Mirage 2020 Theresa Stadler
Bristena Oprisanu
Carmela Troncoso
+ Protecting Sensitive Attributes via Generative Adversarial Networks. 2018 Aria Rezaei
Chaowei Xiao
Jie Gao
Bo Li
+ PDF Chat TableGAN-MCA: Evaluating Membership Collisions of GAN-Synthesized Tabular Data Releasing 2021 Aoting Hu
Renjie Xie
Zhigang Lü
Aiqun Hu
Minhui Xue
+ PDF Chat Synthetic Data Privacy Metrics 2025 Amy Steier
Lakshmish Ramaswamy
Andre Manoel
Alexa Haushalter
+ Application-driven Privacy-preserving Data Publishing with Correlated Attributes 2018 Aria Rezaei
Chaowei Xiao
Jie Gao
Bo Li
Sirajum Munir
+ Application-driven Privacy-preserving Data Publishing with Correlated Attributes 2018 Aria Rezaei
Chaowei Xiao
Jie Gao
Bo Li
Sirajum Munir
+ PDF Chat KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation 2024 Anantaa Kotal
Anupam Joshi
+ Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control 2022 Tânia Carvalho
Nuno Moniz
Pedro Faria
Luís Antunes
Nitesh V. Chawla
+ Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis 2023 Conor Hassan
Robert Salomone
Kerrie Mengersen
+ Generate synthetic samples from tabular data 2022 David Banh
Alan Huang

Works Cited by This (16)

Action Title Year Authors
+ PDF Chat API design for machine learning software: experiences from the scikit-learn project 2013 Lars Buitinck
Gilles Louppe
Mathieu Blondel
Fabián Pedregosa
Andreas Mueller
Olivier Grisel
Vlad Niculae
Peter Prettenhofer
Alexandre Gramfort
Jaques Grobler
+ Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2015 Sergey Ioffe
Christian Szegedy
+ On the design and quantification of privacy preserving data mining algorithms 2001 Dakshi Agrawal
Charų C. Aggarwal
+ Scikit-learn: Machine Learning in Python 2012 Fabián Pedregosa
Gaël Varoquaux
Alexandre Gramfort
Vincent Michel
Bertrand Thirion
Olivier Grisel
Mathieu Blondel
Peter Prettenhofer
Ron J. Weiss
Vincent Dubourg
+ Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks 2015 Alec Radford
Luke Metz
Soumith Chintala
+ PDF Chat Membership Inference Attacks Against Machine Learning Models 2017 Reza Shokri
Marco Stronati
Congzheng Song
Vitaly Shmatikov
+ Generating Multi-label Discrete Patient Records using Generative Adversarial Networks 2017 Edward Choi
Siddharth Biswal
Bradley Malin
Jon Duke
Walter F. Stewart
Jimeng Sun
+ Generating Multi-label Discrete Electronic Health Records using Generative Adversarial Networks. 2017 Edward Choi
Siddharth Biswal
Bradley Malin
Jon Duke
Walter F. Stewart
Jimeng Sun
+ Progressive Growing of GANs for Improved Quality, Stability, and Variation 2017 Tero Karras
Timo Aila
Samuli Laine
Jaakko Lehtinen
+ PDF Chat Data synthesis based on generative adversarial networks 2018 Noseong Park
Mahmoud Mohammadi
Kshitij Gorde
Sushil Jajodia
Hong‐Kyu Park
Youngmin Kim