HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm
Source
Evernote/Papers/HyperLogLog in Practice Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.md
Summary
이 논문은 데이터베이스 시스템에서 중요한 카디널리티 추정 알고리즘인 HyperLogLog(HLL)의 실용적 개선을 다룹니다. 기존 HLL 알고리즘을 수정하여 메모리 요구량을 줄이고, 특정 카디널리티 범위에서 정확도를 크게 향상시켰습니다. 이 개선된 알고리즘은 Google 시스템에 구현되어 평가되었으며, 원본 HLL과 동일하게 완벽한 병렬 처리와 단일 패스(single-pass) 추정이 가능합니다.
Key Points
- 기존 HyperLogLog 알고리즘의 메모리 효율성 및 정확도 개선
- 중요한 카디널리티 범위에서 성능 향상
- Google 시스템에서의 실제 구현 및 실증 평가
- 원본 알고리즘과 동일한 완벽한 병렬화 및 단일 패스 처리 유지
Related
-
Robust and Energy Efficient Multimedia Systems via Likelihood Processing
-
Efficient Estimation of Word Representations in Vector Space
-
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
-
Knowledge-Based Approaches to Information Management in Coalition Environments
-
Optimizing Google’s Warehouse Scale Computers: The NUMA Experience
-
Knowledge-Based Approaches to Information Management in Coalition Environments
-
Nonlinear Latent Factorization by Embedding Multiple User Interests
-
Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information
-
Stock Selection Model Based on Machine Learning with Wisdom of Experts and Crowds
-
Web Service Compositions with Fuzzy Preferences: A Graded Dominance Relationship-Based Approach
-
Optimizing Google’s Warehouse Scale Computers: The NUMA Experience
-
검색 광고의 예산 제약 최적화 (Optimizing Budget Constrained Spend in Search Advertising)
-
Patent Query Formulation by Synthesizing Multiple Sources of Relevance Evidence
-
Weakly Supervised Learning of Object Segmentations from Web-Scale Video
-
A term-based inverted index partitioning model for efficient distributed query processing
-
Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features
-
Algebraic Optimization for Processing Graph Pattern Queries in the Cloud
-
A Model for Context in the Design of Open Production Communities
-
Joint consideration of energy-efficiency and coverage-preservation in microsensor networks
-
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays
-
Eureka: Edge-Based Discovery of Training Data for Machine Learning