ABSTRACT
We give a 1-pass Õ(m1-2⁄k)-space algorithm for computing the k-th frequency moment of a data stream for any real k > 2. Together with the lower bounds of [1, 2, 4], this resolves the main problem left open by Alon et al in 1996 [1]. Our algorithm also works for streams with deletions and thus gives an Õ(m 1-2⁄p) space algorithm for the Lp difference problem for any p > 2. This essentially matches the known Ω(m1-2⁄p-o(1)) lower bound of [12, 2]. Finally the update time of our algorithms is Õ(1).
- N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the 28th Annual ACM Symposum on the Theory of Computing (STOC), pages 20-29, 1996. Also; J. Comp. Sys. Sci. 58, pages 137--147, 1999. Google ScholarDigital Library
- Z. Bar Yossef, T.S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. In Proceedings of the 43rd Symposium on Foundations of Computer Science (FOCS), pages 209--218, 2002. Google ScholarDigital Library
- Z. Bar Yossef, T.S. Jayram, R. Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. RANDOM 2002, 6th. International Workshop on Randomization and Approximation Techniques in Computer Science, p. 1--10, 2002. Google ScholarDigital Library
- A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multiparty communication complexity of set-disjointness. In Proceedings of the 18th IEEE Conference on Computational Complexity (CCC), pages 107--117, 2003.Google Scholar
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Theor. Comput. Sci. 312(1), pages 3--15, 2004 (Extended Abstract appeared in Proceedings of the 29th International Colloquium on Automata Languages and Programming (ICALP), 2002). Google ScholarDigital Library
- D. Coppersmith and R. Kumar, An improved data stream algorithm for frequency moments. In Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 151--156, 2004. Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In SIAM Journal on Computing 32, pages 131-151, 2002 (Extended Abstract appeared in Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS), 1999). Google ScholarDigital Library
- S. Ganguly. Estimating Frequency Moments of Data Streams using Random Linear Combinations. In RANDOM, 2004.Google Scholar
- S. Ganguly. A Hybrid Technique for Estimating Frequency Moments over Data Streams. Unpublished Manuscript, 2004.Google Scholar
- P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Journal of the ACM, to appear. Google ScholarDigital Library
- Preliminary version in Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS), pages 187-197, 2000.Google Scholar
- P. Indyk. A Small Approximately Min-Wise Independent Family of Hash Functions. In Journal of Algorithms 38, 2001. Google ScholarDigital Library
- M. Saks and X. Sun. Space lower bounds for distance approximation in the data stream model. In Proc of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 360--369, 2002. Google ScholarDigital Library
- M. Thorup and Y. Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 615--624, 2004. Google ScholarDigital Library
Index Terms
- Optimal approximations of the frequency moments of data streams
Recommendations
Fast moment estimation in data streams in optimal space
STOC '11: Proceedings of the forty-third annual ACM symposium on Theory of computingWe give a space-optimal streaming algorithm with update time O(log2(1/ε)loglog(1/ε)) for approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream up to a factor of 1 +/- ε. This provides a nearly exponential ...
The Value of Multiple Read/Write Streams for Approximating Frequency Moments
We consider the read/write streams model, an extension of the standard data stream model in which an algorithm can create and manipulate multiple read/write streams in addition to its input data stream. Like the data stream model, the most important ...
Estimating hybrid frequency moments of data streams
We consider the problem of estimating hybrid frequency moments of two dimensional data streams. In this model, data is viewed to be organized in a matrix form ( A i , j )1 i , j , n . The entries A i , j are updated coordinate-wise, in ...
Comments