Abstract
Approximate query processing (or AQP) aims to quickly provide approximated answers for time-consuming search queries on large datasets. It brings enormous benefits in data science when the query execution efficiency weighs more than the accuracy. However, assessing the accuracy of an approximated answer from AQP still lacks study. Existing work usually relies on strict dataset assumptions that are often not satisfied in real-world datasets. In this work, we employ a non-parametric statistical method, called bootstrap sampling, to assess errors of an AQP system for selection queries (or σ-AQP). We implement a prototype AQP system integrated with a bootstrap sampling engine that can estimate the standard deviation and produce confidence intervals for selection query estimations. Extensive experiments operating the prototype system demonstrated that the confidence intervals generated can cover the ground truth query results with high accuracy and low computing costs. In addition, we introduce optimization strategies for bootstrap sampling which can improve the overall computing efficiency of the prototype AQP system.
| Original language | English |
|---|---|
| Pages (from-to) | 38-47 |
| Number of pages | 10 |
| Journal | International Journal of Computers and their Applications |
| Volume | 29 |
| Issue number | 1 |
| State | Published - Mar 1 2022 |
Keywords
- Approximate query processing
- bootstrap sampling
- error estimation
- non-parametric method
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver