作者: 康凯森
日期: 2021-12-19
分类: OLAP
在 StarRocks 将近两年的时间里面,我们查询团队从零实现了向量化执行器,CBO 查询优化器,Pipeline 并行查询引擎,刷新国产 OLAP 数据库性能的里程碑,本文整理了我平时参考和学习的一些数据库资料,希望对大家有所帮助,也欢迎大家参与 StarRocks 开源社区。
本文章会努力持续更新,也欢迎大家一起贡献和修改。
TODO:
How does a relational database work
AnalyticDB: Real-time OLAP Database System at Alibaba Cloud
Apache Arrow: In Theory, In Practice
Introducing AresDB: Uber’s GPU-Powered Open Source, Real-time Analytics Engine
Ceph: A Scalable, High-Performance Distributed File System
CFS: A Distributed File System for Large Scale Container Platforms
The Secrets of ClickHouse Performance Optimizations
CockroachDB: The Resilient Geo-Distributed SQL Database.pdf)
DB2 with BLU Acceleration: So Much More than Just a Column Store
Using Apache Arrow, Calcite and Parquet to build a Relational Cache
Druid: A Real-time Analytical Data Store
F1 Query: Declarative Querying at Scale
Apache Flink™: Stream and Batch Processing in a Single Engine
FoundationDB: A Distributed Unbundled Transactional Key Value Store
Dremel: Interactive Analysis of Web-Scale Datasets
Procella: Unifying serving and analytical data at YouTube
Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google
Why Greenplum is the best compared with others
HAWQ: A Massively Parallel Processing SQL Engine in Hadoop
Apache Hudi Design And Architecture
Impala: A Modern, Open-Source SQL Engine for Hadoop
Kudu: Storage for Fast Analytics on Fast Data
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
Pinot: Realtime OLAP for 530 Million Users
Amazon Redshift and the Case for Simpler Data Warehouses
Data Warehousing in the Cloud: Amazon Redshift vs Microsoft Azure SQL
SAP HANA: A Data Platform for Enterprise Applications Purpose Built for Modern Hardware ★★★★★
SAP HANA: A Data Platform for Enterprise Applications Purpose Built for Modern Hardware 视频
Spanner: Google’s Globally-Distributed Database
SnappyData: A Unified Cluster for Streaming,Transactions, and Interactive Analytics
The Snowflake Elastic Data Warehouse
Building An Elastic Query Engine on Disaggregated Storage
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Spark SQL: Relational Data Processing in Spark
Splice Machine – An HTAP DB at Scale
Tidb a raft based HTAP database
The Vertica Analytic Database: C-Store 7 Years Later
X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing
Alibaba Hologres: A Cloud-Native Service for Hybrid Serving/Analytical Processing
编程语言是工具,工具重在使用,所以学习编程语言和学习英语一样,提升最快的方式就是多使用,多实践,不需要等到对语言完全熟悉或者精通后才开始使用,实践中遇到了某个点不清楚,就一个点一个点逐一突破。
待完善
如果学习了某部分理论,却不知如何实践,欢迎参与 StarRocks 开源社区,上面所有的理论在 StarRocks 中都有可以实践的地方,欢迎大家的 Star, Issue 和 PR。