We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data …
We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep …
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
Machine learning models leak significant amount of information about their training sets, through their predictions. This is a serious privacy concern for the users of machine learning as a service. …
Machine learning models leak significant amount of information about their training sets, through their predictions. This is a serious privacy concern for the users of machine learning as a service. To address this concern, in this paper, we focus on mitigating the risks of black-box inference attacks against machine learning models. We introduce a mechanism to train models with membership privacy, which ensures indistinguishability between the predictions of a model on its training data and other data points (from the same distribution). This requires minimizing the accuracy of the best black-box membership inference attack against the model. We formalize this as a min-max game, and design an adversarial training algorithm that minimizes the prediction loss of the model as well as the maximum gain of the inference attacks. This strategy, which can guarantee membership privacy (as prediction indistinguishability), acts also as a strong regularizer and helps generalizing the model. We evaluate the practical feasibility of our privacy mechanism on training deep neural networks using benchmark datasets. We show that the min-max strategy can mitigate the risks of membership inference attacks (near random guess), and can achieve this with a negligible drop in the model's prediction accuracy (less than 4%).
The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one …
The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one big limitation of previous research is that the security domain and the privacy domain have typically been considered separately. It is thus unclear whether the defense methods in one domain will have any unexpected impact on the other domain. In this paper, we take a step towards resolving this limitation by combining the two domains. In particular, we measure the success of membership inference attacks against six state-of-the-art defense methods that mitigate the risk of adversarial examples (i.e., evasion attacks). Membership inference attacks determine whether or not an individual data record has been part of a model's training set. The accuracy of such attacks reflects the information leakage of training algorithms about individual members of the training set. Adversarial defense methods against adversarial examples influence the model's decision boundaries such that model predictions remain unchanged for a small area around each input. However, this objective is optimized on training data. Thus, individual data records in the training set have a significant influence on robust models. This makes the models more vulnerable to inference attacks. To perform the membership inference attacks, we leverage the existing inference methods that exploit model predictions. We also propose two new inference methods that exploit structural properties of robust models on adversarially perturbed data. Our experimental evaluation demonstrates that compared with the natural training (undefended) approach, adversarial defense methods can indeed increase the target model's risk against membership inference attacks.
Major cloud operators offer machine learning (ML) as a service, enabling customers who have the data but not ML expertise or infrastructure to train predictive models on this data. Existing …
Major cloud operators offer machine learning (ML) as a service, enabling customers who have the data but not ML expertise or infrastructure to train predictive models on this data. Existing ML-as-a-service platforms require users to reveal all training data to the service operator. We design, implement, and evaluate Chiron, a system for privacy-preserving machine learning as a service. First, Chiron conceals the training data from the service operator. Second, in keeping with how many existing ML-as-a-service platforms work, Chiron reveals neither the training algorithm nor the model structure to the user, providing only black-box access to the trained model. Chiron is implemented using SGX enclaves, but SGX alone does not achieve the dual goals of data privacy and model confidentiality. Chiron runs the standard ML training toolchain (including the popular Theano framework and C compiler) in an enclave, but the untrusted model-creation code from the service operator is further confined in a Ryoan sandbox to prevent it from leaking the training data outside the enclave. To support distributed training, Chiron executes multiple concurrent enclaves that exchange model parameters via a parameter server. We evaluate Chiron on popular deep learning models, focusing on benchmark image classification tasks such as CIFAR and ImageNet, and show that its training performance and accuracy of the resulting models are practical for common uses of ML-as-a-service.
Abstract Consider users who share their data (e.g., location) with an untrusted service provider to obtain a personalized (e.g., location-based) service. Data obfuscation is a prevalent user-centric approach to protecting …
Abstract Consider users who share their data (e.g., location) with an untrusted service provider to obtain a personalized (e.g., location-based) service. Data obfuscation is a prevalent user-centric approach to protecting users’ privacy in such systems: the untrusted entity only receives a noisy version of user’s data. Perturbing data before sharing it, however, comes at the price of the users’ utility (service quality) experience which is an inseparable design factor of obfuscation mechanisms. The entanglement of the utility loss and the privacy guarantee, in addition to the lack of a comprehensive notion of privacy, have led to the design of obfuscation mechanisms that are either suboptimal in terms of their utility loss, or ignore the user’s information leakage in the past, or are limited to very specific notions of privacy which e.g., do not protect against adaptive inference attacks or the adversary with arbitrary background knowledge. In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user’s privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) and distortion privacy (inference error). This double shield of protection limits the information leakage through obfuscation mechanism as well as the posterior inference. We show that the privacy achieved through joint differential-distortion mechanisms against optimal attacks is as large as the maximum privacy that can be achieved by either of these mechanisms separately. Their utility cost is also not larger than what either of the differential or distortion mechanisms imposes. We model the optimization problem as a leader-follower game between the designer of obfuscation mechanism and the potential adversary, and design adaptive mechanisms that anticipate and protect against optimal inference algorithms. Thus, the obfuscation mechanism is optimal against any inference algorithm.
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because …
Releasing full data records is one of the most challenging problems in data privacy. On the one hand, many of the popular techniques such as data de-identification are problematic because of their dependence on the background knowledge of adversaries. On the other hand, rigorous methods such as the exponential mechanism for differential privacy are often computationally impractical to use for releasing high dimensional data or cannot preserve high utility of original data due to their extensive data perturbation. This paper presents a criterion called plausible deniability that provides a formal privacy guarantee, notably for releasing sensitive datasets: an output record can be released only if a certain amount of input records are indistinguishable, up to a privacy parameter. This notion does not depend on the background knowledge of an adversary. Also, it can efficiently be checked by privacy tests. We present mechanisms to generate synthetic datasets with similar statistical properties to the input data and the same format. We study this technique both theoretically and experimentally. A key theoretical result shows that, with proper randomization, the plausible deniability mechanism generates differentially private synthetic data. We demonstrate the efficiency of this generative technique on a large dataset; it is shown to preserve the utility of original data with respect to various statistical analysis and machine learning measures.
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We perform a comprehensive analysis of white-box privacy inference attacks on deep learning …
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We perform a comprehensive analysis of white-box privacy inference attacks on deep learning models. We measure the privacy leakage by leveraging the final model parameters as well as the parameter updates during the training and fine-tuning processes. We design the attacks in the stand-alone and federated settings, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We design and evaluate our novel white-box membership inference attacks against deep learning algorithms to measure their training data membership leakage. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, widely used to train deep neural networks. We show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants of a federated learning setting can run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity …
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus, there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.
Collaborative (federated) learning enables multiple parties to train a model without sharing their private data, but through repeated sharing of the parameters of their local models. Despite its advantages, this …
Collaborative (federated) learning enables multiple parties to train a model without sharing their private data, but through repeated sharing of the parameters of their local models. Despite its advantages, this approach has many known privacy and security weaknesses and performance overhead, in addition to being limited only to models with homogeneous architectures. Shared parameters leak a significant amount of information about the local (and supposedly private) datasets. Besides, federated learning is severely vulnerable to poisoning attacks, where some participants can adversarially influence the aggregate parameters. Large models, with high dimensional parameter vectors, are in particular highly susceptible to privacy and security attacks: curse of dimensionality in federated learning. We argue that sharing parameters is the most naive way of information exchange in collaborative learning, as they open all the internal state of the model to inference attacks, and maximize the model's malleability by stealthy poisoning attacks. We propose Cronus, a robust collaborative machine learning framework. The simple yet effective idea behind designing Cronus is to control, unify, and significantly reduce the dimensions of the exchanged information between parties, through robust knowledge transfer between their black-box local models. We evaluate all existing federated learning algorithms against poisoning attacks, and we show that Cronus is the only secure method, due to its tight robustness guarantee. Treating local models as black-box, reduces the information leakage through models, and enables us using existing privacy-preserving algorithms that mitigate the risk of information leakage through the model's output (predictions). Cronus also has a significantly lower sample complexity, compared to federated learning, which does not bind its security to the number of participants.
We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this …
We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this paper are mosaicing (also known as pixelation), blurring (as used by YouTube), and P3, a recently proposed system for privacy-preserving photo sharing that encrypts the significant JPEG coefficients to make images unrecognizable by humans. We empirically show how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.
Contact tracing is an essential tool in containing infectious diseases such as COVID-19. Many countries and research groups have launched or announced mobile apps to facilitate contact tracing by recording …
Contact tracing is an essential tool in containing infectious diseases such as COVID-19. Many countries and research groups have launched or announced mobile apps to facilitate contact tracing by recording contacts between users with some privacy considerations. Most of the focus has been on using random tokens, which are exchanged during encounters and stored locally on users' phones. Prior systems allow users to search over released tokens in order to learn if they have recently been in the proximity of a user that has since been diagnosed with the disease. However, prior approaches do not provide end-to-end privacy in the collection and querying of tokens. In particular, these approaches are vulnerable to either linkage attacks by users using token metadata, linkage attacks by the server, or false reporting by users. In this work, we introduce Epione, a lightweight system for contact tracing with strong privacy protections. Epione alerts users directly if any of their contacts have been diagnosed with the disease, while protecting the privacy of users' contacts from both central services and other users, and provides protection against false reporting. As a key building block, we present a new cryptographic tool for secure two-party private set intersection cardinality (PSI-CA), which allows two parties, each holding a set of items, to learn the intersection size of two private sets without revealing intersection items. We specifically tailor it to the case of large-scale contact tracing where clients have small input sets and the server's database of tokens is much larger.
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to …
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.
We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this …
We demonstrate that modern image recognition methods based on artificial neural networks can recover hidden information from images protected by various forms of obfuscation. The obfuscation techniques considered in this paper are mosaicing (also known as pixelation), blurring (as used by YouTube), and P3, a recently proposed system for privacy-preserving photo sharing that encrypts the significant JPEG coefficients to make images unrecognizable by humans. We empirically show how to train artificial neural networks to successfully identify faces and recognize objects and handwritten digits even if the images are protected using any of the above obfuscation techniques.
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, …
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensivehypothesis testing framework that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explainwhy different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform adifferential analysis between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of thePrivacy Meter software tool.
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information …
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to …
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this can change the information leakage of the model about its training data. We analyze the privacy risks of group fairness (e.g., equalized odds) through the lens of membership inference attacks: inferring whether a data point is used for training a model. We show that fairness comes at the cost of privacy, and this cost is not distributed equally: the information leakage of fair models increases significantly on the unprivileged subgroups, which are the ones for whom we need fair learning. We show that the more biased the training data is, the higher the privacy cost of achieving fairness for the unprivileged subgroups will be. We provide comprehensive empirical analysis for general machine learning algorithms.
Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different …
Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different groups, by imposing a fairness constraint on models. However, we show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness. We analyze data poisoning attacks against group-based fair machine learning, with the focus on equalized odds. An adversary who can control sampling or labeling for a fraction of training data, can reduce the test accuracy significantly beyond what he can achieve on unconstrained models. Adversarial sampling and adversarial labeling attacks can also worsen the model's fairness gap on test data, even though the model satisfies the fairness constraint on training data. We analyze the robustness of fair machine learning through an empirical evaluation of attacks on multiple algorithms and benchmark datasets.
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring …
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying potential robustness of MLMs to privacy attacks.In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM’s model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are indeed susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level.
When building machine learning models using sensitive data, organizations should ensure that the data processed in such systems is adequately protected. For projects involving machine learning on personal data, Article …
When building machine learning models using sensitive data, organizations should ensure that the data processed in such systems is adequately protected. For projects involving machine learning on personal data, Article 35 of the GDPR mandates it to perform a Data Protection Impact Assessment (DPIA). In addition to the threats of illegitimate access to data through security breaches, machine learning models pose an additional privacy risk to the data by indirectly revealing about it through the model predictions and parameters. Guidances released by the Information Commissioner's Office (UK) and the National Institute of Standards and Technology (US) emphasize on the threat to data from models and recommend organizations to account for and estimate these risks to comply with data protection regulations. Hence, there is an immediate need for a tool that can quantify the privacy risk to data from models. In this paper, we focus on this indirect leakage about training data from machine learning models. We present ML Privacy Meter, a tool that can quantify the privacy risk to data from models through state of the art membership inference attack techniques. We discuss how this tool can help practitioners in compliance with data protection regulations, when deploying machine learning models.
We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data …
We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
Can an adversary exploit model explanations to infer sensitive information about the models' training set? To investigate this question, we first focus on membership inference attacks: given a data point …
Can an adversary exploit model explanations to infer sensitive information about the models' training set? To investigate this question, we first focus on membership inference attacks: given a data point and a model explanation, the attacker's goal is to decide whether or not the point belongs to the training data. We study this problem for two popular transparency methods: gradient-based attribution methods and record-based influence measures. We develop membership inference attacks based on these model explanations, and extensively test them on a variety of datasets. For gradient-based methods, we show that the explanations can leak a significant amount of information about the individual data points in the training set, much beyond what is leaked through the predicted labels. We also show that record-based measures can be effectively, and even more significantly, exploited for membership inference attacks. More importantly, we design reconstruction attacks against this class of model explanations. We demonstrate that they can be exploited to recover significant parts of the training set. Finally, our results indicate that minorities and outliers are more vulnerable to these type of attacks than the rest of the population. Thus, there is a significant disparity for the privacy risks of model explanations across different groups.
The privacy risks of machine learning models is a major concern when training them on sensitive and personal data. We discuss the tradeoffs between data privacy and the remaining goals …
The privacy risks of machine learning models is a major concern when training them on sensitive and personal data. We discuss the tradeoffs between data privacy and the remaining goals of trustworthy machine learning (notably, fairness, robustness, and explainability).
Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As …
Machine learning models leak information about the datasets on which they are trained. An adversary can build an algorithm to trace the individual members of a model's training dataset. As a fundamental inference attack, he aims to distinguish between data points that were part of the model's training set and any other data points from the same distribution. This is known as the tracing (and also membership inference) attack. In this paper, we focus on such attacks against black-box models, where the adversary can only observe the output of the model, but not its parameters. This is the current setting of machine learning as a service in the Internet.
We introduce a privacy mechanism to train machine learning models that provably achieve membership privacy: the model's predictions on its training data are indistinguishable from its predictions on other data points from the same distribution. We design a strategic mechanism where the privacy mechanism anticipates the membership inference attacks. The objective is to train a model such that not only does it have the minimum prediction error (high utility), but also it is the most robust model against its corresponding strongest inference attack (high privacy). We formalize this as a min-max game optimization problem, and design an adversarial training algorithm that minimizes the classification loss of the model as well as the maximum gain of the membership inference attack against it. This strategy, which guarantees membership privacy (as prediction indistinguishability), acts also as a strong regularizer and significantly generalizes the model.
We evaluate our privacy mechanism on deep neural networks using different benchmark datasets. We show that our min-max strategy can mitigate the risk of membership inference attacks (close to the random guess) with a negligible cost in terms of the classification error.
Deep learning models leak significant amounts of information about their training datasets. Previous work has investigated training models with differential privacy (DP) guarantees through adding DP noise to the gradients. …
Deep learning models leak significant amounts of information about their training datasets. Previous work has investigated training models with differential privacy (DP) guarantees through adding DP noise to the gradients. However, such solutions (specifically, DPSGD), result in large degradations in the accuracy of the trained models. In this paper, we aim at training deep learning models with DP guarantees while preserving model accuracy much better than previous works. Our key technique is to encode gradients to map them to a smaller vector space, therefore enabling us to obtain DP guarantees for different noise distributions. This allows us to investigate and choose noise distributions that best preserve model accuracy for a target privacy budget. We also take advantage of the post-processing property of differential privacy by introducing the idea of denoising, which further improves the utility of the trained models without degrading their DP guarantees. We show that our mechanism outperforms the state-of-the-art DPSGD; for instance, for the same model accuracy of $96.1\%$ on MNIST, our technique results in a privacy bound of $ε=3.2$ compared to $ε=6$ of DPSGD, which is a significant improvement.
Using machine learning models in critical decision-making processes has given rise to a call for algorithmic transparency. Model explanations, however, might leak information about the sensitive data used to train …
Using machine learning models in critical decision-making processes has given rise to a call for algorithmic transparency. Model explanations, however, might leak information about the sensitive data used to train and explain the model, undermining data privacy. We focus on black-box feature-based model explanations, which locally approximate the model around the point of interest, using potentially sensitive data. We design differentially private local approximation mechanisms, and evaluate their effect on explanation quality. To protect training data, we use existing differentially private learning algorithms. However, to protect the privacy of data which is used during the local approximation, we design an adaptive differentially private algorithm, which finds the minimal privacy budget required to produce accurate explanations. Both empirically and analytically, we evaluate the impact of the randomness needed in differential privacy algorithms on the fidelity of model explanations.
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to …
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information …
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.
We quantitatively investigate how learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record …
We quantitatively investigate how learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on.
We empirically evaluate our inference techniques on classification models trained by commercial machine learning as a service providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which …
Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to …
We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data. Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8x more precise inference on all other users' otherwise-private data points. Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data.
In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user's privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) …
In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user's privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) and distortion privacy (inference error). This double shield of protection limits the information leakage through obfuscation mechanism as well as the posterior inference. We show that the privacy achieved through joint differential-distortion mechanisms against optimal attacks is as large as the maximum privacy that can be achieved by either of these mechanisms separately. Their utility cost is also not larger than what either of the differential or distortion mechanisms imposes. We model the optimization problem as a leader-follower game between the designer of obfuscation mechanism and the potential adversary, and design adaptive mechanisms that anticipate and protect against optimal inference algorithms. Thus, the obfuscation mechanism is optimal against any inference algorithm.
What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of …
What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.
Human mobility is highly predictable. Individuals tend to only visit a few locations with high frequency, and to move among them in a certain sequence reflecting their habits and daily …
Human mobility is highly predictable. Individuals tend to only visit a few locations with high frequency, and to move among them in a certain sequence reflecting their habits and daily routine. This predictability has to be taken into account in the design of location privacy preserving mechanisms (LPPMs) in order to effectively protect users when they continuously expose their position to location-based services (LBSs). In this paper, we describe a method for creating LPPMs that are customized for a user's mobility profile taking into account privacy and quality of service requirements. By construction, our LPPMs take into account the sequential correlation across the user's exposed locations, providing the maximum possible trajectory privacy, i.e., privacy for the user's present location, as well as past and expected future locations. Moreover, our LPPMs are optimal against a strategic adversary, i.e., an attacker that implements the strongest inference attack knowing both the LPPM operation and the user's mobility profile. The optimality of the LPPMs in the context of trajectory privacy is a novel contribution, and it is achieved by formulating the LPPM design problem as a Bayesian Stackelberg game between the user and the adversary. An additional benefit of our formal approach is that the design parameters of the LPPM are chosen by the optimization algorithm.
ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of …
ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of both the users' input queries as well as the model parameters at the server, with modest computation and communication overhead. Prior solutions primarily propose fine-tuning cryptographic methods to make them efficient for known fixed model architectures. The drawback with this line of approach is that the model itself is never designed to operate with existing efficient cryptographic computations. We observe that the network architecture, internal functions, and parameters of a model, which are all chosen during training, significantly influence the computation and communication overhead of a cryptographic method, during inference. Based on this observation, we propose SOTERIA -- a training method to construct model architectures that are by-design efficient for private inference. We use neural architecture search algorithms with the dual objective of optimizing the accuracy of the model and the overhead of using cryptographic primitives for secure inference. Given the flexibility of modifying a model during training, we find accurate models that are also efficient for private computation. We select garbled circuits as our underlying cryptographic primitive, due to their expressiveness and efficiency, but this approach can be extended to hybrid multi-party computation settings. We empirically evaluate SOTERIA on MNIST and CIFAR10 datasets, to compare with the prior work. Our results confirm that SOTERIA is indeed effective in balancing performance and accuracy.
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information …
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.
With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written …
With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of …
Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link: https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.
In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user's privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) …
In this paper, we design user-centric obfuscation mechanisms that impose the minimum utility loss for guaranteeing user's privacy. We optimize utility subject to a joint guarantee of differential privacy (indistinguishability) and distortion privacy (inference error). This double shield of protection limits the information leakage through obfuscation mechanism as well as the posterior inference. We show that the privacy achieved through joint differential-distortion mechanisms against optimal attacks is as large as the maximum privacy that can be achieved by either of these mechanisms separately. Their utility cost is also not larger than what either of the differential or distortion mechanisms imposes. We model the optimization problem as a leader-follower game between the designer of obfuscation mechanism and the potential adversary, and design adaptive mechanisms that anticipate and protect against optimal inference algorithms. Thus, the obfuscation mechanism is optimal against any inference algorithm.
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to …
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this can change the information leakage of the model about its training data. We analyze the privacy risks of group fairness (e.g., equalized odds) through the lens of membership inference attacks: inferring whether a data point is used for training a model. We show that fairness comes at the cost of privacy, and this cost is not distributed equally: the information leakage of fair models increases significantly on the unprivileged subgroups, which are the ones for whom we need fair learning. We show that the more biased the training data is, the higher the privacy cost of achieving fairness for the unprivileged subgroups will be. We provide comprehensive empirical analysis for general machine learning algorithms.
Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed …
Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed to the adversary. As a result, the R\'enyi DP bounds derived by such composition-based analyses linearly grow with the number of training epochs. When the internal state of the algorithm is hidden, we prove a converging privacy bound for noisy stochastic gradient descent (on strongly convex smooth loss functions). We show how to take advantage of privacy amplification by sub-sampling and randomized post-processing, and prove the dynamics of privacy bound for "shuffle and partition" and "sample without replacement" stochastic mini-batch gradient descent schemes. We prove that, in these settings, our privacy bound converges exponentially fast and is substantially smaller than the composition bounds, notably after a few number of training epochs. Thus, unless the DP algorithm converges fast, our privacy analysis shows that hidden state analysis can significantly amplify differential privacy.
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training …
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training set. The existing works empirically show the possibility of these tracing (membership inference) attacks against complex models with a large number of parameters. However, the attack results are dependent on the specific training data, can be obtained only after the tedious process of training the model and performing the attack, and are missing any measure of the confidence and unused potential power of the attack. A model designer is interested in identifying which model structures leak more information, how adding new parameters to the model increases its privacy risk, and what is the gain of adding new data points to decrease the overall information leakage. The privacy analysis should also enable designing the most powerful inference attack.
In this paper, we design a theoretical framework to analyze the maximum power of tracing attacks against high-dimensional models, with the focus on probabilistic graphical models. We provide a tight upper-bound on the power (true positive rate) of these attacks, with respect to their error (false positive rate). The bound, as it should be, is independent of the knowledge and algorithm of any specific attack, as well as the values of particular samples in the training set. It provides a measure of the potential leakage of a model given its structure, as a function of the structure complexity and the size of training set.
Camouflaging data by generating fake information is a well-known obfuscation technique for protecting data privacy. In this paper, we focus on a very sensitive and increasingly exposed type of data: …
Camouflaging data by generating fake information is a well-known obfuscation technique for protecting data privacy. In this paper, we focus on a very sensitive and increasingly exposed type of data: location data. There are two main scenarios in which fake traces are of extreme value to preserve location privacy: publishing datasets of location trajectories, and using location-based services. Despite advances in protecting (location) data privacy, there is no quantitative method to evaluate how realistic a synthetic trace is, and how much utility and privacy it provides in each scenario. Also, the lack of a methodology to generate privacy-preserving fake traces is evident. In this paper, we fill this gap and propose the first statistical metric and model to generate fake location traces such that both the utility of data and the privacy of users are preserved. We build upon the fact that, although geographically they visit distinct locations, people have strongly semantically similar mobility patterns, for example, their transition pattern across activities (e.g., working, driving, staying at home) is similar. We define a statistical metric and propose an algorithm that automatically discovers the hidden semantic similarities between locations from a bag of real location traces as seeds, without requiring any initial semantic annotations. We guarantee that fake traces are geographically dissimilar to their seeds, so they do not leak sensitive location information. We also protect contributors to seed traces against membership attacks. Interleaving fake traces with mobile users' traces is a prominent location privacy defense mechanism. We quantitatively show the effectiveness of our methodology in protecting against localization inference attacks while preserving utility of sharing/publishing traces.
Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the …
Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the training data and the explanation data used to generate them, thus undermining data privacy. To address this issue, we propose differentially private algorithms to construct feature-based model explanations. We design an adaptive differentially private gradient descent algorithm, that finds the minimal privacy budget required to produce accurate explanations. It reduces the overall privacy loss on explanation data, by adaptively reusing past differentially private explanations. It also amplifies the privacy guarantees with respect to the training data. We evaluate the implications of differentially private models and our privacy mechanisms on the quality of model explanations.
We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a …
We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This problem is key to measuring data **memorization** and **information leakage** in machine learning, and the **influence** of training data points on model predictions. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of information leakage using membership inference attacks. Our theoretical framework enables us to investigate the causes of information leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to optimize queries that disclose the most significant information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information …
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training …
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training accurate ML models requires massive labeled data and advanced learning algorithms. The resulting models are considered as intellectual property of the model owners and their copyright should be protected. Also, MLaaS needs to be trusted not to embed secret information about the training data into the model, such that it could be later retrieved when the model is deployed. In this paper, we present \emph{membership encoding} for training deep neural networks and encoding the membership information, i.e. whether a data point is used for training, for a subset of training data. Membership encoding has several applications in different scenarios, including robust watermarking for model copyright protection, and also the risk analysis of stealthy data embedding privacy attacks. Our encoding algorithm can determine the membership of significantly redacted data points, and is also robust to model compression and fine-tuning. It also enables encoding a significant fraction of the training set, with negligible drop in the model's prediction accuracy.
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do …
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data? Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying the potential robustness of MLMs to privacy attacks. In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM's model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are extremely susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level, with a significant improvement in the low-error region: at 1% false positive rate, our attack is 51X more powerful than prior work.
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity …
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.
We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as …
We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as gender or race) can propagate through the network to all the parties in the network. We analyze and explain bias propagation in federated learning on naturally partitioned real-world datasets. Our analysis reveals that biased parties unintentionally yet stealthily encode their bias in a small number of model parameters, and throughout the training, they steadily increase the dependence of the global model on sensitive attributes. What is important to highlight is that the experienced bias in federated learning is higher than what parties would otherwise encounter in centralized training with a model trained on the union of all their data. This indicates that the bias is due to the algorithm. Our work calls for auditing group fairness in federated learning and designing learning algorithms that are robust to bias propagation.
How effectively can LLM-based AI assistants utilize their memory (context) to perform various tasks? Traditional data benchmarks, which are often manually crafted, suffer from several limitations: they are static, susceptible …
How effectively can LLM-based AI assistants utilize their memory (context) to perform various tasks? Traditional data benchmarks, which are often manually crafted, suffer from several limitations: they are static, susceptible to overfitting, difficult to interpret, and lack actionable insights--failing to pinpoint the specific capabilities a model lacks when it does not pass a test. In this paper, we present a framework for automatically generating a comprehensive set of tests to evaluate models' abilities to use their memory effectively. Our framework extends the range of capability tests beyond the commonly explored (passkey, key-value, needle in the haystack) search, a dominant focus in the literature. Specifically, we evaluate models on atomic tasks such as searching, recalling, editing, matching, comparing information in context memory, and performing basic operations when inputs are structured into distinct blocks, simulating real-world data. Additionally, we design composite tests to investigate the models' ability to maintain state while operating on memory. Our benchmark enables an interpretable, detailed assessment of memory capabilities of LLMs.
Future LLM agents are likely to communicate on behalf of users with other entity-representing agents on tasks that entail long-horizon plans with interdependent goals. Current work does not focus on …
Future LLM agents are likely to communicate on behalf of users with other entity-representing agents on tasks that entail long-horizon plans with interdependent goals. Current work does not focus on such agentic networks, nor does it address their challenges. Thus, we first identify the required properties of agents' communication, which should be proactive and adaptable. It needs to satisfy 1) privacy: agents should not share more than what is needed for the task, and 2) security: the communication must preserve integrity and maintain utility against selfish entities. We design a use case (travel planning) as a testbed that exemplifies these requirements, and we show examples of how this can go wrong. Next, we propose a practical design, inspired by established network security principles, for constrained LLM agentic networks that balance adaptability, security, and privacy. Our framework automatically constructs and updates task-specific rules from prior simulations to build firewalls. We offer layers of defense to 1) convert free-form input to a task-specific protocol, 2) dynamically abstract users' data to a task-specific degree of permissiveness, and 3) self-correct the agents' trajectory.
Prior Membership Inference Attacks (MIAs) on pre-trained Large Language Models (LLMs), adapted from classification model attacks, fail due to ignoring the generative process of LLMs across token sequences. In this …
Prior Membership Inference Attacks (MIAs) on pre-trained Large Language Models (LLMs), adapted from classification model attacks, fail due to ignoring the generative process of LLMs across token sequences. In this paper, we present a novel attack that adapts MIA statistical tests to the perplexity dynamics of subsequences within a data point. Our method significantly outperforms prior loss-based approaches, revealing context-dependent memorization patterns in pre-trained LLMs.
Machine learning models can leak private information about their training data, but the standard methods to measure this risk, based on membership inference attacks (MIAs), have a major limitation. They …
Machine learning models can leak private information about their training data, but the standard methods to measure this risk, based on membership inference attacks (MIAs), have a major limitation. They only check if a given data point \textit{exactly} matches a training point, neglecting the potential of similar or partially overlapping data revealing the same private information. To address this issue, we introduce the class of range membership inference attacks (RaMIAs), testing if the model was trained on any data in a specified range (defined based on the semantics of privacy). We formulate the RaMIAs game and design a principled statistical test for its complex hypotheses. We show that RaMIAs can capture privacy loss more accurately and comprehensively than MIAs on various types of data, such as tabular, image, and language. RaMIA paves the way for a more comprehensive and meaningful privacy auditing of machine learning algorithms.
Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating …
Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design …
The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for approaches that account for multiple facets of real-world privacy risks.
Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which …
Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written …
With the advent of fluent generative language models that can produce convincing utterances very similar to those written by humans, distinguishing whether a piece of text is machine-generated or human-written becomes more challenging and more important, as such models could be used to spread misinformation, fake news, fake reviews and to mimic certain authors and figures. To this end, there have been a slew of methods proposed to detect machine-generated text. Most of these methods need access to the logits of the target model or need the ability to sample from the target. One such black-box detection method relies on the observation that generated text is locally optimal under the likelihood function of the generator, while human-written text is not. We find that overall, smaller and partially-trained models are better universal text detectors: they can more precisely detect text generated from both small and larger models. Interestingly, we find that whether the detector and generator were trained on the same data is not critically important to the detection success. For instance the OPT-125M model has an AUC of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT family, GPTJ-6B, has AUC of 0.45.
We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as …
We show that participating in federated learning can be detrimental to group fairness. In fact, the bias of a few parties against under-represented groups (identified by sensitive attributes such as gender or race) can propagate through the network to all the parties in the network. We analyze and explain bias propagation in federated learning on naturally partitioned real-world datasets. Our analysis reveals that biased parties unintentionally yet stealthily encode their bias in a small number of model parameters, and throughout the training, they steadily increase the dependence of the global model on sensitive attributes. What is important to highlight is that the experienced bias in federated learning is higher than what parties would otherwise encounter in centralized training with a model trained on the union of all their data. This indicates that the bias is due to the algorithm. Our work calls for auditing group fairness in federated learning and designing learning algorithms that are robust to bias propagation.
Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of …
Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link: https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.
We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a …
We introduce a new analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This problem is key to measuring data **memorization** and **information leakage** in machine learning, and the **influence** of training data points on model predictions. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of information leakage using membership inference attacks. Our theoretical framework enables us to investigate the causes of information leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to optimize queries that disclose the most significant information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information …
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence …
We analytically investigate how over-parameterization of models in randomized machine learning algorithms impacts the information leakage about their training data. Specifically, we prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets, and explore its dependence on the initialization, width, and depth of fully connected neural networks. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training. Notably, for the special setting of linearized network, our analysis indicates that the squared gradient norm (and therefore the escalation of privacy loss) is tied directly to the per-layer variance of the initialization distribution. By using this analysis, we demonstrate that privacy bound improves with increasing depth under certain initializations (LeCun and Xavier), while degrades with increasing depth under other initializations (He and NTK). Our work reveals a complex interplay between privacy and depth that depends on the chosen initialization distribution. We further prove excess empirical risk bounds under a fixed KL privacy budget, and show that the interplay between privacy utility trade-off and depth is similarly affected by the initialization.
Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when …
Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.
Membership inference attacks (MIA) aim to detect if a particular data point was used in training a machine learning model. Recent strong attacks have high computational costs and inconsistent performance …
Membership inference attacks (MIA) aim to detect if a particular data point was used in training a machine learning model. Recent strong attacks have high computational costs and inconsistent performance under varying conditions, rendering them unreliable for practical privacy risk assessment. We design a novel, efficient, and robust membership inference attack (RMIA) which accurately differentiates between population data and training data of a model, with minimal computational overhead. We achieve this by a more accurate modeling of the null hypothesis setting in our likelihood ratio tests, and effectively leveraging both reference models and reference data samples from the population. Our algorithm exhibits superior test power (true-positive rate) compared to prior methods, throughout the TPR-FPR curve including at extremely low false-positive rates (as low as 0). Under computation constraints, where only a limited number of pre-trained reference models (as few as 1) are available, and also when we vary other elements of the attack, our method performs exceptionally well, unlike some prior attacks that approach random guessing. RMIA outperforms the prior work in all configurations of the attack setup. RMIA lays the algorithmic groundwork for practical yet accurate and reliable privacy risk analysis in machine learning.
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, …
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensivehypothesis testing framework that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explainwhy different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform adifferential analysis between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of thePrivacy Meter software tool.
The privacy risks of machine learning models is a major concern when training them on sensitive and personal data. We discuss the tradeoffs between data privacy and the remaining goals …
The privacy risks of machine learning models is a major concern when training them on sensitive and personal data. We discuss the tradeoffs between data privacy and the remaining goals of trustworthy machine learning (notably, fairness, robustness, and explainability).
Using machine learning models in critical decision-making processes has given rise to a call for algorithmic transparency. Model explanations, however, might leak information about the sensitive data used to train …
Using machine learning models in critical decision-making processes has given rise to a call for algorithmic transparency. Model explanations, however, might leak information about the sensitive data used to train and explain the model, undermining data privacy. We focus on black-box feature-based model explanations, which locally approximate the model around the point of interest, using potentially sensitive data. We design differentially private local approximation mechanisms, and evaluate their effect on explanation quality. To protect training data, we use existing differentially private learning algorithms. However, to protect the privacy of data which is used during the local approximation, we design an adaptive differentially private algorithm, which finds the minimal privacy budget required to produce accurate explanations. Both empirically and analytically, we evaluate the impact of the randomness needed in differential privacy algorithms on the fidelity of model explanations.
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity …
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus, there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.
Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed …
Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed to the adversary. As a result, the R\'enyi DP bounds derived by such composition-based analyses linearly grow with the number of training epochs. When the internal state of the algorithm is hidden, we prove a converging privacy bound for noisy stochastic gradient descent (on strongly convex smooth loss functions). We show how to take advantage of privacy amplification by sub-sampling and randomized post-processing, and prove the dynamics of privacy bound for "shuffle and partition" and "sample without replacement" stochastic mini-batch gradient descent schemes. We prove that, in these settings, our privacy bound converges exponentially fast and is substantially smaller than the composition bounds, notably after a few number of training epochs. Thus, unless the DP algorithm converges fast, our privacy analysis shows that hidden state analysis can significantly amplify differential privacy.
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do …
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data? Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying the potential robustness of MLMs to privacy attacks. In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM's model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are extremely susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level, with a significant improvement in the low-error region: at 1% false positive rate, our attack is 51X more powerful than prior work.
We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to …
We introduce a new class of attacks on machine learning models. We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak significant private details of training points belonging to other parties. Our active inference attacks connect two independent lines of work targeting the integrity and privacy of machine learning training data. Our attacks are effective across membership inference, attribute inference, and data extraction. For example, our targeted attacks can poison <0.1% of the training dataset to boost the performance of inference attacks by 1 to 2 orders of magnitude. Further, an adversary who controls a significant fraction of the training data (e.g., 50%) can launch untargeted attacks that enable 8x more precise inference on all other users' otherwise-private data points. Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty computation protocols for machine learning, if parties can arbitrarily select their share of training data.
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity …
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life. Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets. An adversary can exploit this tendency to extract training data. Depending on the nature of the content and the context in which this data was collected, this could violate expectations of privacy. Thus there is a growing interest in techniques for training language models that preserve privacy. In this paper, we discuss the mismatch between the narrow assumptions made by popular data protection techniques (data sanitization and differential privacy), and the broadness of natural language and of privacy as a social norm. We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models. We conclude that language models should be trained on text data which was explicitly produced for public use.
Unlearning algorithms aim to remove deleted data's influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don't protect …
Unlearning algorithms aim to remove deleted data's influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don't protect the privacy of deleted records. We show that when users delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn't ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an accurate, computationally efficient, and secure machine unlearning algorithm based on noisy gradient descent.
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring …
The wide adoption and application of Masked language models (MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities. Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying potential robustness of MLMs to privacy attacks.In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM’s model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are indeed susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level.
What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is \emph{not} observable? How much is the contribution of …
What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is \emph{not} observable? How much is the contribution of each specific training epoch to the final leakage? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of Renyi differential privacy loss throughout the training process. Our analysis traces a provably tight bound on the Renyi divergence between the pair of probability distributions over parameters of models with neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems. For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility for differential privacy algorithms with a small gradient complexity.
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to …
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this can change the information leakage of the model about its training data. We analyze the privacy risks of group fairness (e.g., equalized odds) through the lens of membership inference attacks: inferring whether a data point is used for training a model. We show that fairness comes at the cost of privacy, and this cost is not distributed equally: the information leakage of fair models increases significantly on the unprivileged subgroups, which are the ones for whom we need fair learning. We show that the more biased the training data is, the higher the privacy cost of achieving fairness for the unprivileged subgroups will be. We provide comprehensive empirical analysis for general machine learning algorithms.
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information …
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.
What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is \emph{not} observable? How much is the contribution of …
What is the information leakage of an iterative learning algorithm about its training data, when the internal state of the algorithm is \emph{not} observable? How much is the contribution of each specific training epoch to the final leakage? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of Renyi differential privacy loss throughout the training process. Our analysis traces a provably tight bound on the Renyi divergence between the pair of probability distributions over parameters of models with neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems. For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility for differential privacy algorithms with a small gradient complexity.
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, …
How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.
What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of …
What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to …
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.
Contact tracing is an essential tool in containing infectious diseases such as COVID-19. Many countries and research groups have launched or announced mobile apps to facilitate contact tracing by recording …
Contact tracing is an essential tool in containing infectious diseases such as COVID-19. Many countries and research groups have launched or announced mobile apps to facilitate contact tracing by recording contacts between users with some privacy considerations. Most of the focus has been on using random tokens, which are exchanged during encounters and stored locally on users' phones. Prior systems allow users to search over released tokens in order to learn if they have recently been in the proximity of a user that has since been diagnosed with the disease. However, prior approaches do not provide end-to-end privacy in the collection and querying of tokens. In particular, these approaches are vulnerable to either linkage attacks by users using token metadata, linkage attacks by the server, or false reporting by users. In this work, we introduce Epione, a lightweight system for contact tracing with strong privacy protections. Epione alerts users directly if any of their contacts have been diagnosed with the disease, while protecting the privacy of users' contacts from both central services and other users, and provides protection against false reporting. As a key building block, we present a new cryptographic tool for secure two-party private set intersection cardinality (PSI-CA), which allows two parties, each holding a set of items, to learn the intersection size of two private sets without revealing intersection items. We specifically tailor it to the case of large-scale contact tracing where clients have small input sets and the server's database of tokens is much larger.
Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different …
Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different groups, by imposing a fairness constraint on models. However, we show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness. We analyze data poisoning attacks against group-based fair machine learning, with the focus on equalized odds. An adversary who can control sampling or labeling for a fraction of training data, can reduce the test accuracy significantly beyond what he can achieve on unconstrained models. Adversarial sampling and adversarial labeling attacks can also worsen the model's fairness gap on test data, even though the model satisfies the fairness constraint on training data. We analyze the robustness of fair machine learning through an empirical evaluation of attacks on multiple algorithms and benchmark datasets.
When building machine learning models using sensitive data, organizations should ensure that the data processed in such systems is adequately protected. For projects involving machine learning on personal data, Article …
When building machine learning models using sensitive data, organizations should ensure that the data processed in such systems is adequately protected. For projects involving machine learning on personal data, Article 35 of the GDPR mandates it to perform a Data Protection Impact Assessment (DPIA). In addition to the threats of illegitimate access to data through security breaches, machine learning models pose an additional privacy risk to the data by indirectly revealing about it through the model predictions and parameters. Guidances released by the Information Commissioner's Office (UK) and the National Institute of Standards and Technology (US) emphasize on the threat to data from models and recommend organizations to account for and estimate these risks to comply with data protection regulations. Hence, there is an immediate need for a tool that can quantify the privacy risk to data from models. In this paper, we focus on this indirect leakage about training data from machine learning models. We present ML Privacy Meter, a tool that can quantify the privacy risk to data from models through state of the art membership inference attack techniques. We discuss how this tool can help practitioners in compliance with data protection regulations, when deploying machine learning models.
ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of …
ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of both the users' input queries as well as the model parameters at the server, with modest computation and communication overhead. Prior solutions primarily propose fine-tuning cryptographic methods to make them efficient for known fixed model architectures. The drawback with this line of approach is that the model itself is never designed to operate with existing efficient cryptographic computations. We observe that the network architecture, internal functions, and parameters of a model, which are all chosen during training, significantly influence the computation and communication overhead of a cryptographic method, during inference. Based on this observation, we propose SOTERIA -- a training method to construct model architectures that are by-design efficient for private inference. We use neural architecture search algorithms with the dual objective of optimizing the accuracy of the model and the overhead of using cryptographic primitives for secure inference. Given the flexibility of modifying a model during training, we find accurate models that are also efficient for private computation. We select garbled circuits as our underlying cryptographic primitive, due to their expressiveness and efficiency, but this approach can be extended to hybrid multi-party computation settings. We empirically evaluate SOTERIA on MNIST and CIFAR10 datasets, to compare with the prior work. Our results confirm that SOTERIA is indeed effective in balancing performance and accuracy.
Deep learning models leak significant amounts of information about their training datasets. Previous work has investigated training models with differential privacy (DP) guarantees through adding DP noise to the gradients. …
Deep learning models leak significant amounts of information about their training datasets. Previous work has investigated training models with differential privacy (DP) guarantees through adding DP noise to the gradients. However, such solutions (specifically, DPSGD), result in large degradations in the accuracy of the trained models. In this paper, we aim at training deep learning models with DP guarantees while preserving model accuracy much better than previous works. Our key technique is to encode gradients to map them to a smaller vector space, therefore enabling us to obtain DP guarantees for different noise distributions. This allows us to investigate and choose noise distributions that best preserve model accuracy for a target privacy budget. We also take advantage of the post-processing property of differential privacy by introducing the idea of denoising, which further improves the utility of the trained models without degrading their DP guarantees. We show that our mechanism outperforms the state-of-the-art DPSGD; for instance, for the same model accuracy of $96.1\%$ on MNIST, our technique results in a privacy bound of $ε=3.2$ compared to $ε=6$ of DPSGD, which is a significant improvement.
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to …
Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this can change the information leakage of the model about its training data. We analyze the privacy risks of group fairness (e.g., equalized odds) through the lens of membership inference attacks: inferring whether a data point is used for training a model. We show that fairness comes at the cost of privacy, and this cost is not distributed equally: the information leakage of fair models increases significantly on the unprivileged subgroups, which are the ones for whom we need fair learning. We show that the more biased the training data is, the higher the privacy cost of achieving fairness for the unprivileged subgroups will be. We provide comprehensive empirical analysis for general machine learning algorithms.
Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the …
Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the training data and the explanation data used to generate them, thus undermining data privacy. To address this issue, we propose differentially private algorithms to construct feature-based model explanations. We design an adaptive differentially private gradient descent algorithm, that finds the minimal privacy budget required to produce accurate explanations. It reduces the overall privacy loss on explanation data, by adaptively reusing past differentially private explanations. It also amplifies the privacy guarantees with respect to the training data. We evaluate the implications of differentially private models and our privacy mechanisms on the quality of model explanations.
The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one …
The arms race between attacks and defenses for machine learning models has come to a forefront in recent years, in both the security community and the privacy community. However, one big limitation of previous research is that the security domain and the privacy domain have typically been considered separately. It is thus unclear whether the defense methods in one domain will have any unexpected impact on the other domain. In this paper, we take a step towards resolving this limitation by combining the two domains. In particular, we measure the success of membership inference attacks against six state-of-the-art defense methods that mitigate the risk of adversarial examples (i.e., evasion attacks). Membership inference attacks determine whether or not an individual data record has been part of a model's training set. The accuracy of such attacks reflects the information leakage of training algorithms about individual members of the training set. Adversarial defense methods against adversarial examples influence the model's decision boundaries such that model predictions remain unchanged for a small area around each input. However, this objective is optimized on training data. Thus, individual data records in the training set have a significant influence on robust models. This makes the models more vulnerable to inference attacks. To perform the membership inference attacks, we leverage the existing inference methods that exploit model predictions. We also propose two new inference methods that exploit structural properties of robust models on adversarially perturbed data. Our experimental evaluation demonstrates that compared with the natural training (undefended) approach, adversarial defense methods can indeed increase the target model's risk against membership inference attacks.
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training …
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training accurate ML models requires massive labeled data and advanced learning algorithms. The resulting models are considered as intellectual property of the model owners and their copyright should be protected. Also, MLaaS needs to be trusted not to embed secret information about the training data into the model, such that it could be later retrieved when the model is deployed. In this paper, we present \emph{membership encoding} for training deep neural networks and encoding the membership information, i.e. whether a data point is used for training, for a subset of training data. Membership encoding has several applications in different scenarios, including robust watermarking for model copyright protection, and also the risk analysis of stealthy data embedding privacy attacks. Our encoding algorithm can determine the membership of significantly redacted data points, and is also robust to model compression and fine-tuning. It also enables encoding a significant fraction of the training set, with negligible drop in the model's prediction accuracy.
Can an adversary exploit model explanations to infer sensitive information about the models' training set? To investigate this question, we first focus on membership inference attacks: given a data point …
Can an adversary exploit model explanations to infer sensitive information about the models' training set? To investigate this question, we first focus on membership inference attacks: given a data point and a model explanation, the attacker's goal is to decide whether or not the point belongs to the training data. We study this problem for two popular transparency methods: gradient-based attribution methods and record-based influence measures. We develop membership inference attacks based on these model explanations, and extensively test them on a variety of datasets. For gradient-based methods, we show that the explanations can leak a significant amount of information about the individual data points in the training set, much beyond what is leaked through the predicted labels. We also show that record-based measures can be effectively, and even more significantly, exploited for membership inference attacks. More importantly, we design reconstruction attacks against this class of model explanations. We demonstrate that they can be exploited to recover significant parts of the training set. Finally, our results indicate that minorities and outliers are more vulnerable to these type of attacks than the rest of the population. Thus, there is a significant disparity for the privacy risks of model explanations across different groups.
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information …
Privacy and transparency are two key foundations of trustworthy machine learning. Model explanations offer insights into a model's decisions on input data, whereas privacy is primarily concerned with protecting information about the training data. We analyze connections between model explanations and the leakage of sensitive information about the model's training set. We investigate the privacy risks of feature-based model explanations using membership inference attacks: quantifying how much model predictions plus their explanations leak information about the presence of a datapoint in the training set of a model. We extensively evaluate membership inference attacks based on feature-based model explanations, over a variety of datasets. We show that backpropagation-based explanations can leak a significant amount of information about individual training datapoints. This is because they reveal statistical information about the decision boundaries of the model about an input, which can reveal its membership. We also empirically investigate the trade-off between privacy and explanation quality, by studying the perturbation-based model explanations.
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to …
Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training …
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training set. The existing works empirically show the possibility of these tracing (membership inference) attacks against complex models with a large number of parameters. However, the attack results are dependent on the specific training data, can be obtained only after the tedious process of training the model and performing the attack, and are missing any measure of the confidence and unused potential power of the attack. A model designer is interested in identifying which model structures leak more information, how adding new parameters to the model increases its privacy risk, and what is the gain of adding new data points to decrease the overall information leakage. The privacy analysis should also enable designing the most powerful inference attack.
In this paper, we design a theoretical framework to analyze the maximum power of tracing attacks against high-dimensional models, with the focus on probabilistic graphical models. We provide a tight upper-bound on the power (true positive rate) of these attacks, with respect to their error (false positive rate). The bound, as it should be, is independent of the knowledge and algorithm of any specific attack, as well as the values of particular samples in the training set. It provides a measure of the potential leakage of a model given its structure, as a function of the structure complexity and the size of training set.
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training …
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training set. The existing works empirically show the possibility of these tracing (membership inference) attacks against complex models with a large number of parameters. However, the attack results are dependent on the specific training data, can be obtained only after the tedious process of training the model and performing the attack, and are missing any measure of the confidence and unused potential power of the attack. A model designer is interested in identifying which model structures leak more information, how adding new parameters to the model increases its privacy risk, and what is the gain of adding new data points to decrease the overall information leakage. The privacy analysis should also enable designing the most powerful inference attack.
In this paper, we design a theoretical framework to analyze the maximum power of tracing attacks against high-dimensional models, with the focus on probabilistic graphical models. We provide a tight upper-bound on the power (true positive rate) of these attacks, with respect to their error (false positive rate). The bound, as it should be, is independent of the knowledge and algorithm of any specific attack, as well as the values of particular samples in the training set. It provides a measure of the potential leakage of a model given its structure, as a function of the structure complexity and the size of training set.
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training …
Models leak information about their training data. This enables attackers to infer sensitive information about their training sets, notably determine if a data sample was part of the model's training set. The existing works empirically show the possibility of these membership inference (tracing) attacks against complex deep learning models. However, the attack results are dependent on the specific training data, can be obtained only after the tedious process of training the model and performing the attack, and are missing any measure of the confidence and unused potential power of the attack.
In this paper, we theoretically analyze the maximum power of tracing attacks against high-dimensional graphical models, with the focus on Bayesian networks. We provide a tight upper bound on the power (true positive rate) of these attacks, with respect to their error (false positive rate), for a given model structure even before learning its parameters. As it should be, the bound is independent of the knowledge and algorithm of any specific attack. It can help in identifying which model structures leak more information, how adding new parameters to the model increases its privacy risk, and what can be gained by adding new data points to decrease the overall information leakage. It provides a measure of the potential leakage of a model given its structure, as a function of the model complexity and the size of the training set.
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep …
Deep neural networks are susceptible to various inference attacks as they remember information about their training data. We design white-box inference attacks to perform a comprehensive privacy analysis of deep learning models. We measure the privacy leakage through parameters of fully trained models as well as the parameter updates of models during training. We design inference algorithms for both centralized and federated learning, with respect to passive and active inference attackers, and assuming different adversary prior knowledge. We evaluate our novel white-box membership inference attacks against deep learning algorithms to trace their training data records. We show that a straightforward extension of the known black-box attacks to the white-box setting (through analyzing the outputs of activation functions) is ineffective. We therefore design new algorithms tailored to the white-box setting by exploiting the privacy vulnerabilities of the stochastic gradient descent algorithm, which is the algorithm used to train deep neural networks. We investigate the reasons why deep learning models may leak information about their training data. We then show that even well-generalized models are significantly susceptible to white-box membership inference attacks, by analyzing state-of-the-art pre-trained and publicly available models for the CIFAR dataset. We also show how adversarial participants, in the federated learning setting, can successfully run active membership inference attacks against other participants, even when the global model achieves high prediction accuracies.
Collaborative (federated) learning enables multiple parties to train a model without sharing their private data, but through repeated sharing of the parameters of their local models. Despite its advantages, this …
Collaborative (federated) learning enables multiple parties to train a model without sharing their private data, but through repeated sharing of the parameters of their local models. Despite its advantages, this approach has many known privacy and security weaknesses and performance overhead, in addition to being limited only to models with homogeneous architectures. Shared parameters leak a significant amount of information about the local (and supposedly private) datasets. Besides, federated learning is severely vulnerable to poisoning attacks, where some participants can adversarially influence the aggregate parameters. Large models, with high dimensional parameter vectors, are in particular highly susceptible to privacy and security attacks: curse of dimensionality in federated learning. We argue that sharing parameters is the most naive way of information exchange in collaborative learning, as they open all the internal state of the model to inference attacks, and maximize the model's malleability by stealthy poisoning attacks. We propose Cronus, a robust collaborative machine learning framework. The simple yet effective idea behind designing Cronus is to control, unify, and significantly reduce the dimensions of the exchanged information between parties, through robust knowledge transfer between their black-box local models. We evaluate all existing federated learning algorithms against poisoning attacks, and we show that Cronus is the only secure method, due to its tight robustness guarantee. Treating local models as black-box, reduces the information leakage through models, and enables us using existing privacy-preserving algorithms that mitigate the risk of information leakage through the model's output (predictions). Cronus also has a significantly lower sample complexity, compared to federated learning, which does not bind its security to the number of participants.
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training …
Machine learning as a service (MLaaS), and algorithm marketplaces are on a rise. Data holders can easily train complex models on their data using third party provided learning codes. Training accurate ML models requires massive labeled data and advanced learning algorithms. The resulting models are considered as intellectual property of the model owners and their copyright should be protected. Also, MLaaS needs to be trusted not to embed secret information about the training data into the model, such that it could be later retrieved when the model is deployed. In this paper, we present \emph{membership encoding} for training deep neural networks and encoding the membership information, i.e. whether a data point is used for training, for a subset of training data. Membership encoding has several applications in different scenarios, including robust watermarking for model copyright protection, and also the risk analysis of stealthy data embedding privacy attacks. Our encoding algorithm can determine the membership of significantly redacted data points, and is also robust to model compression and fine-tuning. It also enables encoding a significant fraction of the training set, with negligible drop in the model's prediction accuracy.