Welcome
Welcome to the Databend documentation!
Databend is an open-source, elastic, and workload-aware cloud data warehouse, a cutting-edge
This welcome page guides you through the features, architecture, and other important details about Databend.
Why Databend?
- Performance
- Data Manipulation
- Object Storage
- Blazing-fast data analytics on object storage.
- Leverages data-level parallelism and instruction-level parallelism technologies for .Open in the new tab
- No indexes to build, no manual tuning, and no need to figure out partitions or shard data.
- Supports atomic operations such as
SELECT
,INSERT
,DELETE
,UPDATE
,REPLACE
,COPY
, andMERGE
. - Provides advanced features such as Time Travel and Multi Catalog (Apache Hive / Apache Iceberg).
- Supports in various formats like CSV, JSON, and Parquet.Open in the new tab
- Supports semi-structured data types such as .Open in the new tab
- Supports Git-like MVCC storage for easy querying, cloning, and restoration of historical data.
- Supports various object storage platforms. Click here to see a full list of supported platforms.
- Allows instant elasticity, enabling users to scale up or down based on their application needs.
Databend Architecture
Databend's high-level architecture is composed of a meta-service layer
, a query layer
, and a storage layer
.
- Meta-Service Layer
- Query Layer
- Storage Layer
Databend efficiently supports multiple tenants through its meta-service layer, which plays a crucial role in the system:
- Metadata Management: Handles metadata for databases, tables, clusters, transactions, and more.
- Security: Manages user authentication and authorization for a secure environment.
Discover more about the meta-service layer in the
The query layer in Databend handles query computations and is composed of multiple clusters, each containing several nodes. Each node, a core unit in the query layer, consists of:
- Planner: Develops execution plans for SQL statements using elements from , incorporating operators like Projection, Filter, and Limit.Open in the new tab
- Optimizer: A rule-based optimizer applies predefined rules, such as "predicate pushdown" and "pruning of unused columns", for optimal query execution.
- Processors: Constructs a query execution pipeline based on planner instructions, following a Pull&Push approach. Processors are interconnected, forming a pipeline that can be distributed across nodes for enhanced performance.
Discover more about the query layer in the
Databend employs Parquet, an open-source columnar format, and introduces its own table format to boost query performance. Key features include:
Secondary Indexes: Speeds up data location and access across various analysis dimensions.
Complex Data Type Indexes: Aimed at accelerating data processing and analysis for intricate types such as semi-structured data.
Segments: Databend effectively organizes data into segments, enhancing data management and retrieval efficiency.
Clustering: Employs user-defined clustering keys within segments to streamline data scanning.
Discover more about the storage layer in the
Community
The Databend community is open to data professionals, students, and anyone who has a passion for cloud data warehouses. Feel free to click on the links below to be a part of the excitement:
- Slack: Open in the new tab
- GitHub: Open in the new tab
- Twitter: Open in the new tab
- LinkedIn: Open in the new tab
- YouTube: Open in the new tab
Roadmap
- Open in the new tab
- Open in the new tab
- Open in the new tab
- Open in the new tab
- Open in the new tab
- Open in the new tab