Data Loading Overview

Cloudberry Database loads data mainly by transforming external data into external tables (or foreign tables) via loading tools. Then it reads data from these external tables or writes data into them to achieve external data loading.

Loading process

The general process of loading external data into Cloudberry Database is as follows:

Assess the data loading scenario (such as data source location, data type, and data volume) and select an appropriate loading tool.
Set up and enable the loading tool.
Create an external table, specifying information such as the protocol of the loading tool, data source address, data format in the CREATE EXTERNAL TABLE statement.
Once the external table is created, data from the external table can be queried directly using the SELECT statement, or data can be imported from the external table using INSERT INTO SELECT.

Loading methods and scenarios

Cloudberry Database offers multiple data loading solutions, and you can select different data loading methods according to different data sources.

Loading method	Data source	Data format	Parallel or not
`copy`	Local file system • Coordinator node host (for a single file) • Segment node host (for multiple files)	• TXT • CSV • Binary	No
`file://` protocol	Local file system (local segment host, accessible only by superuser)	• TXT • CSV	Yes
`gpfdist`	Local host files or files accessible via internal network	• TXT • CSV • Any delimited text format supported by the `FORMAT` clause • XML and JSON (requires conversion to text format via YAML configuration file)	Yes
Batch loading using `gpload` (with `gpfdist` as the underlying worker)	Local host files or files accessible via internal network	• TXT • CSV • Any delimited text format supported by the `FORMAT` clause • XML and JSON (require conversion to text format via YAML configuration file)	Yes
Creating external web tables	Data pulled from network services or from any source accessible by command lines	• TXT • CSV	Yes

Data Loading Overview

Loading process​

Loading methods and scenarios​

Loading process

Loading methods and scenarios