Published in International Journal of Advanced Research in Computer Science Engineering and Information Technology
ISSN: 2321-3337 Impact Factor:1.521 Volume:4 Issue:3 Year: 01 April,2016 Pages:767-774
Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set Retrieves a small number of selected missing values can greatly improve the imputation recall of the inferring based methods. The TRIP approach finds 20 of missing values and achieves high recall of data. TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving inferring scheduling scheme in Deterministic Data Imputation DDI, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in constrained Stochastic Data Imputation, but still, our proposed solution identifies an expected optimal scheme.
TRIP Interactive Retrieving Inferring data imPutation, Recall of data, inferable values.
1.R. Gummadi, A. Khulbe, A. Kalavagattu, S. Salvi, and S. Kambhampati. Smartint: using mined attribute dependencies to integrate fragmented web databases. Journal of Intelligent Information Systems, pages 1,25, 2. E. Agichtein and L. Gravano. Snowball: Extracting relations from large plaintext collections. In ACM DL, pages 85,94, 2000. 3.J. Barnard and D. Rubin. Small sample degrees of freedom with multiple imputation. Biometrika, 86.4 948,955, 1999. 4. G. Batista and M. Monard. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence