# Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

@article{Kang2021ProofAA, title={Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates}, author={Daniel Kang and John Guibas and Peter Bailis and Tatsunori B. Hashimoto and Yi Sun and Matei A. Zaharia}, journal={ArXiv}, year={2021}, volume={abs/2107.12525} }

Given a dataset D, we are interested in computing the mean of a subset of D which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget N . In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate O(N 1 +N −1 2 +N 1/2 1 N −3/2 2 ), where N = K ·N1 +N2 for some integer constant K and K ·N1 and N2 represent the number of samples used in Stage 1 and Stage 2 of ABae… Expand

#### Tables from this paper

#### 2 Citations

Accelerating Approximate AggregationQueries with Expensive Predicates

- 2021

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep… Expand

Accelerating Approximate Aggregation Queries with Expensive Predicates

- Computer Science
- Proc. VLDB Endow.
- 2021

A query processing algorithm that leverages proxies (ABae) is developed and analyzed that converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate and outperforms on baselines on six real-world datasets. Expand

#### References

SHOWING 1-3 OF 3 REFERENCES

Connected Components in Random Graphs with Given Expected Degree Sequences

- Mathematics
- 2002

Abstract. We consider a family of random graphs with a given expected degree sequence. Each edge is chosen independently with probability proportional to the product of the expected degrees of its… Expand

U-statistics Notes for Statistics 200c, Spring 2005

- 2005

1. Definitions. The basic theory of U-statistics was developed by W. Hoeffding (1948a). Detailed expositions of the general topic may be found in M. Denker (1985) and A. J. Lee (1990). See also… Expand

Advanced statistical theory lecture

- September 24
- 2018