Multiple Catalog
- RFC PR: Open in the new tab
- Tracking Issue: Open in the new tab
Summary
Allow users to maintain multiple catalogs for the databend.
Motivation
Databend organize data in three layers:
catalog -> database -> table
catalog
: the biggest layer for databend, contains all databases and tables, provided byOpen in the new tabdatabase
: the container of tables, provided byOpen in the new tabtable
: the smallest unit in databend, provided byOpen in the new tab
By default, all databases and tables will be stored in default
catalog (powered by metasrv
).
Databend supports multiple catalogs now, but only in a static way.
To allow accessing the hive
catalog, users need to configure hive
inside databend-query.toml
in this way:
[catalog]
meta_store_address = "127.0.0.1:9083"
protocol = "binary"
Users can't add/alter/remove the catalogs during runtime.
By allowing users to maintain multiple catalogs for the databend, we can integrate more catalogs like iceberg
more quickly.
Guide-level explanation
After this RFC has been implemented, users can create new catalogs like:
CREATE CATALOG my_hive
TYPE=HIVE
CONNECTION = (URL='<hive-meta-store>' THRIFT_PROTOCOL=BINARY);
SELECT * FROM my_hive.DB.table;
Besides, users can alter or drop a catalog:
DROP CATALOG my_hive;
Users can add more catalogs like:
CREATE CATALOG my_iceberg
TYPE=ICEBERG
CONNECTION = (URL='s3://my_bucket/path/to/iceberg');
SELECT * FROM my_iceberg.DB.table;
With this feature, users can join data from different catalogs now:
select
my_iceberg.DB.purchase_records.Client_ID,
my_iceberg.DB.purchase_records.Item,
my_iceberg.DB.purchase_records.QTY
from my_hive.DB.vip_info
inner join my_iceberg.DB.purchase_records
on my_hive.DB.vip_info.Client_ID = my_iceberg.DB.purchase_records.Client_ID;
Reference-level explanation
Databend has a framework for multiple catalogs now. The only change for us is to store catalog-related information in metasrv instead.
To make it possible to start a query without metasrv
, we will also support configuring catalogs in config like:
[catalogs.my_hive]
meta_store_address = "127.0.0.1:9083"
protocol = "binary"
[catalogs.my_iceberg]
URL = "s3://bucket"
Static catalogs will always be loaded from configs and can't be altered or dropped.
Drawbacks
None.
Rationale and alternatives
None.
Prior art
Presto
Take iceberg as an example:
connector.name=iceberg
hive.metastore.uri=hostname:port
iceberg.catalog.type=hive
While using:
USE iceberg.tpch;
CREATE TABLE IF NOT EXISTS ctas_nation AS (SELECT * FROM nation);
DESCRIBE ctas_nation;
Unresolved questions
None.
Future possibilities
Iceberg Catalog
Discussed in RFC
Delta Sharing Catalog
Discussed in