skip to main content
10.1145/1060590.1060621acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Optimal approximations of the frequency moments of data streams

Published:22 May 2005Publication History

ABSTRACT

We give a 1-pass Õ(m1-2⁄k)-space algorithm for computing the k-th frequency moment of a data stream for any real k > 2. Together with the lower bounds of [1, 2, 4], this resolves the main problem left open by Alon et al in 1996 [1]. Our algorithm also works for streams with deletions and thus gives an Õ(m 1-2⁄p) space algorithm for the Lp difference problem for any p > 2. This essentially matches the known Ω(m1-2⁄p-o(1)) lower bound of [12, 2]. Finally the update time of our algorithms is Õ(1).

References

  1. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proceedings of the 28th Annual ACM Symposum on the Theory of Computing (STOC), pages 20-29, 1996. Also; J. Comp. Sys. Sci. 58, pages 137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Bar Yossef, T.S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. In Proceedings of the 43rd Symposium on Foundations of Computer Science (FOCS), pages 209--218, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bar Yossef, T.S. Jayram, R. Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. RANDOM 2002, 6th. International Workshop on Randomization and Approximation Techniques in Computer Science, p. 1--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multiparty communication complexity of set-disjointness. In Proceedings of the 18th IEEE Conference on Computational Complexity (CCC), pages 107--117, 2003.Google ScholarGoogle Scholar
  5. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Theor. Comput. Sci. 312(1), pages 3--15, 2004 (Extended Abstract appeared in Proceedings of the 29th International Colloquium on Automata Languages and Programming (ICALP), 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Coppersmith and R. Kumar, An improved data stream algorithm for frequency moments. In Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 151--156, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In SIAM Journal on Computing 32, pages 131-151, 2002 (Extended Abstract appeared in Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS), 1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Ganguly. Estimating Frequency Moments of Data Streams using Random Linear Combinations. In RANDOM, 2004.Google ScholarGoogle Scholar
  9. S. Ganguly. A Hybrid Technique for Estimating Frequency Moments over Data Streams. Unpublished Manuscript, 2004.Google ScholarGoogle Scholar
  10. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Journal of the ACM, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Preliminary version in Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS), pages 187-197, 2000.Google ScholarGoogle Scholar
  12. P. Indyk. A Small Approximately Min-Wise Independent Family of Hash Functions. In Journal of Algorithms 38, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Saks and X. Sun. Space lower bounds for distance approximation in the data stream model. In Proc of the 34th Annual ACM Symposium on Theory of Computing (STOC), pages 360--369, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Thorup and Y. Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proc of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 615--624, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal approximations of the frequency moments of data streams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
      May 2005
      778 pages
      ISBN:1581139608
      DOI:10.1145/1060590

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 May 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,469of4,586submissions,32%

      Upcoming Conference

      STOC '24
      56th Annual ACM Symposium on Theory of Computing (STOC 2024)
      June 24 - 28, 2024
      Vancouver , BC , Canada

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader