A Proposal for a Method to Proactively Complete Missing Data

Objective:

The objective of purposely generated survey data with missing value is to keep the interviewee focused and collaborative, especially, when surveys are long and exhausting. A collaborative and alerted respondent provides unbiased reliable answers. The possibility to create missing data at random means that it saves costs of making the interviewees alert and responsive. Savings derive from shorter surveys up to twenty five percent, yet the total quality remains unflawed. The purpose of this methodology is to provide answers for cases of lengthy surveys which gradually experience reduction in quality of answers. 

The Process:

To preserve data by tools of multiple imputations, a pre-questionnaire design is necessary. Questions appear or not randomly across different questionnaires. When data are coded into a data-set, missing questions appear as missing data. These missing data are imputed by means of randomness based on the distribution detected in the non-missing data, thus the distribution of the data is maintained. In fact, this methodology allows completion that retains the natural dispersion of the data. The procedure structure proposed here is built upon anchor questions for each topic. These questions appear in all questionnaires and are requested to be answered, while other questions appear randomly. Note that this missing does not include non-answered items due to refusal to answer, or simply not-known. There are techniques to overcome this missing as well, but they are applied after the collection of the data. Conceptually, it proposes, as mentioned above, key questions for each topic which are at the center, while other questions are only randomly presented. That offers another measure to evaluate the quality and efficiency of that methodology.

Advantages and Disadvantages:

In general, imputation of missing data experiences much progress in recent years and it gains support from leading statisticians around the world. Not only that this methodology saves all data, earlier rejection of missing data distorts the design of the sample frame. The major argument here is that reports are based on means and variances and not on individual response. Thus imputations do not harm these statistical moments and further analytical modeling. In cases of large enough samples, this methodology yields stable and reliable parameters, although this might create a little lower variance. The means by which missing data is replaced is a powerful simulation tool. Parameters that measure the performance and the randomness of the completed data are available.

Dr. Gabriel Liberman – Data-Graph Statistical Consulting