Estimation in exponential family regression based on linked data contaminated by mismatch error

Type: Article

Publication Date: 2023-01-01

Citations: 1

DOI: https://doi.org/10.4310/22-sii726

Abstract

Identification of matching records in multiple files can be a challenging and error-prone task. Linkage error can considerably affect subsequent statistical analysis based on the resulting linked file. Several recent papers have studied post-linkage linear regression analysis with the response variable in one file and the covariates in a second file from the perspective of the "Broken Sample Problem" and "Permuted Data". In this paper, we present an extension of this line of research to exponential family response given the assumption of a small to moderate number of mismatches. A method based on observation-specific offsets to account for potential mismatches and $\ell_1$-penalization is proposed, and its statistical properties are discussed. We also present sufficient conditions for the recovery of the correct correspondence between covariates and responses if the regression parameter is known. The proposed approach is compared to established baselines, namely the methods by Lahiri-Larsen and Chambers, both theoretically and empirically based on synthetic and real data. The results indicate that substantial improvements over those methods can be achieved even if only limited information about the linkage process is available.

Locations

  • Statistics and Its Interface - View
  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Estimation in exponential family Regression based on linked data contaminated by mismatch error 2020 Zhenbang Wang
Emanuel Ben‐David
Martin Slawski
+ A General Framework for Regression with Mismatched Data Based on Mixture Modeling 2023 Martin Slawski
Brady T. West
Priyanjali Bukke
Guoqing Diao
Zhenbang Wang
Emanuel Ben‐David
+ Regression with linked datasets subject to linkage error 2021 Zhenbang Wang
Emanuel Ben‐David
Guoqing Diao
Martin Slawski
+ PDF Chat Linear Regression With Nested Errors Using Probability-Linked Data 2014 Klairung Samart
Ray Chambers
+ PDF Chat Analysis of Linked Files: A Missing Data Perspective 2024 Gauri Kamat
Roee Gutman
+ A general framework for regression with mismatched data based on mixture modelling 2024 Martin Slawski
Brady T. West
Priyanjali Bukke
Zhenbang Wang
Guoqing Diao
Emanuel Ben‐David
+ Regression analysis for longitudinally linked data 2010 Gunky Kim
Ray Chambers
+ Pairwise Estimating Equations for the Analysis of Linked Data 2018 Abel Dasylva
+ Domain estimation under informative linkage 2019 Ray Chambers
Nicola Salvati
Enrico Fabrizi
Andréa Diniz da Silva
+ PDF Chat Domain estimation under informative linkage 2019 Ray Chambers
Nicola Salvati
Enrico Fabrizi
Andréa Diniz da Silva
+ Analysis of probabilistically linked data 2011 Klairung Samart
+ Analysis of Correlated Data with Measurement Error in Responses or Covariates 2010 Zhijian Chen
+ PDF Chat Linkage-Data Linear Regression 2020 Li‐Chun Zhang
Tiziana Tuoto
+ Evaluating latent class models with conditional dependence in record linkage 2014 Joanne Daggy
Huiping Xu
Siu Hui
Shaun J. Grannis
+ PDF Chat Consistent estimation of linear regression models using matched data 2018 Masayuki Hirukawa
Artem Prokhorov
+ pldamixture: Post-Linkage Data Analysis Based on Mixture Modelling 2024 Priyanjali Bukke
Zhenbang Wang
Martin Slawski
Brady T. West
Emanuel Ben‐David
Guoqing Diao
+ On secondary analysis of datasets that cannot be linked without errors 2019 Li‐Chun Zhang
+ Regression Modeling and File Matching Using Possibly Erroneous Matching Variables 2016 Nicole M. Dalzell
Jerome P. Reiter
+ PDF Chat Regression Modeling and File Matching Using Possibly Erroneous Matching Variables 2018 Nicole M. Dalzell
Jerome P. Reiter
+ Regression Modeling and File Matching Using Possibly Erroneous Matching Variables 2016 Nicole M. Dalzell
Jerome P. Reiter

Works Cited by This (0)

Action Title Year Authors