# Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

Given a dataset D, we are interested in computing the mean of a subset of D which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget N . In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate O(N 1 +N −1 2 +N 1/2 1 N −3/2 2 ), where N = K ·N1 +N2 for some integer constant K and K ·N1 and N2 represent the number of samples used in Stage 1 and Stage 2 of ABae… Expand

- 2021

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep… Expand

A query processing algorithm that leverages proxies (ABae) is developed and analyzed that converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate and outperforms on baselines on six real-world datasets. Expand

