Furio Camillo
University of Bologna
Keywords: surveys, cawi, cati, causal inference, self-selection bias, data mining
It is possible that the survey methods used may have influenced the answer given by respondents in a social or market research. In other words, since the information have been collected through different survey tools (CAWI and CATI), they may have caused distortions that are not casual. For example, the presence/absence of data collectors is an important determinant for the quality of the
information collected, since they may give explanations or supply further information during
the interview.
In consideration of the complexity of the topic of the research, it has become important
to determine if there are significant differences between the answers given by those who filled
in an online questionnaire and those who gave their answers during an usual telephone interview.
A new data driven method for evaluating an error deriving from a differentiated treatment (CATI or
CAWI) was developed by following a particular approach that is referred to the typical
notions of the so-called “causal inference”. The aim is to determine whether the kind of
undergone treatment T significantly influences the target variable Y.
More specifically, the treatment T has a causal effect on the target variable Y for the i-th
individual if the result in case of treatment (T1) is different from the one in absence of
treatment (or different treatment, T0).
Many telephone and web studies have proposed the use of propensity score approach in
postsurvey impact evaluation of different interview techniques. Particularly, propensity score
has been used in postsurvey weighting procedures to dicrease biases (Lee and Valliant, 2008).
We implemented a data driven monitoring system of selection bias due to different data
collection techniques. In this system an innovative approach was used (Camillo and
D’Attoma, 2008). It involves a data transformation that allows measuring and testing in an
automatic and multivariate way the presence of selection bias. Specifically, it involves the
construction of a multi-dimensional conditional space of the X matrix in which the bias
associated with the treatment assignment has been eliminated. In doing this, we propose the
use of a partial dependence analysis of the X-space as a tool for investigating the dependence
relationship between a set of observable pre-treatment categorical covariates X and a
treatment indicator variable T, in order to obtain a measure of bias according to their
dependence structure. The measure of selection bias is then expressed in terms of inertia due
to the dependence between X and T that has been eliminated. Given the measure of selection
bias, Camillo and D’Attoma propose a multivariate test of imbalance in order to check if the
detected bias is significant, by through the asymptotical distribution of inertia due to T
(Estadella et al., 2005), and by preserving the multivariate nature of data. Further, in our
strategy we propose the use of a clustering procedure as a tool to find groups of comparable
units on which estimate local causal efects, and the use of the multivariate test of imbalance
as a stopping rule in choosing the best cluster solution set. The method is non parametric, it
does not call for modeling the data, based on some underlying theory or assumption about the
selection process, but instead it calls for using the existing variability within the data and
letting the data to speak.
University of Bologna
Keywords: surveys, cawi, cati, causal inference, self-selection bias, data mining
It is possible that the survey methods used may have influenced the answer given by respondents in a social or market research. In other words, since the information have been collected through different survey tools (CAWI and CATI), they may have caused distortions that are not casual. For example, the presence/absence of data collectors is an important determinant for the quality of the
information collected, since they may give explanations or supply further information during
the interview.
In consideration of the complexity of the topic of the research, it has become important
to determine if there are significant differences between the answers given by those who filled
in an online questionnaire and those who gave their answers during an usual telephone interview.
A new data driven method for evaluating an error deriving from a differentiated treatment (CATI or
CAWI) was developed by following a particular approach that is referred to the typical
notions of the so-called “causal inference”. The aim is to determine whether the kind of
undergone treatment T significantly influences the target variable Y.
More specifically, the treatment T has a causal effect on the target variable Y for the i-th
individual if the result in case of treatment (T1) is different from the one in absence of
treatment (or different treatment, T0).
Many telephone and web studies have proposed the use of propensity score approach in
postsurvey impact evaluation of different interview techniques. Particularly, propensity score
has been used in postsurvey weighting procedures to dicrease biases (Lee and Valliant, 2008).
We implemented a data driven monitoring system of selection bias due to different data
collection techniques. In this system an innovative approach was used (Camillo and
D’Attoma, 2008). It involves a data transformation that allows measuring and testing in an
automatic and multivariate way the presence of selection bias. Specifically, it involves the
construction of a multi-dimensional conditional space of the X matrix in which the bias
associated with the treatment assignment has been eliminated. In doing this, we propose the
use of a partial dependence analysis of the X-space as a tool for investigating the dependence
relationship between a set of observable pre-treatment categorical covariates X and a
treatment indicator variable T, in order to obtain a measure of bias according to their
dependence structure. The measure of selection bias is then expressed in terms of inertia due
to the dependence between X and T that has been eliminated. Given the measure of selection
bias, Camillo and D’Attoma propose a multivariate test of imbalance in order to check if the
detected bias is significant, by through the asymptotical distribution of inertia due to T
(Estadella et al., 2005), and by preserving the multivariate nature of data. Further, in our
strategy we propose the use of a clustering procedure as a tool to find groups of comparable
units on which estimate local causal efects, and the use of the multivariate test of imbalance
as a stopping rule in choosing the best cluster solution set. The method is non parametric, it
does not call for modeling the data, based on some underlying theory or assumption about the
selection process, but instead it calls for using the existing variability within the data and
letting the data to speak.
1 comentario:
Excellent blog! I genuinely love how it' s easy on my eyes as well as the info are well written. I am wondering how I may be notified whenever a new post has been made. I have subscribed to your rss feed which should do the trick! Have a nice day!
Publicar un comentario