Skip to main content

Data Loading Overview

Cloudberry Database loads data mainly by transforming external data into external tables (or foreign tables) via loading tools. Then it reads data from these external tables or writes data into them to achieve external data loading.

Loading process

The general process of loading external data into Cloudberry Database is as follows:

  1. Assess the data loading scenario (such as data source location, data type, and data volume) and select an appropriate loading tool.
  2. Set up and enable the loading tool.
  3. Create an external table, specifying information such as the protocol of the loading tool, data source address, data format in the CREATE EXTERNAL TABLE statement.
  4. Once the external table is created, data from the external table can be queried directly using the SELECT statement, or data can be imported from the external table using INSERT INTO SELECT.

Loading methods and scenarios

Cloudberry Database offers multiple data loading solutions, and you can select different data loading methods according to different data sources.

Loading methodData sourceData formatParallel or not
copyLocal file system

• Coordinator node host (for a single file)
• Segment node host (for multiple files)
• TXT
• CSV
• Binary
No
file:// protocolLocal file system (local segment host, accessible only by superuser)• TXT
• CSV
Yes
gpfdistLocal host files or files accessible via internal network• TXT
• CSV
• Any delimited text format supported by the FORMAT clause
• XML and JSON (requires conversion to text format via YAML configuration file)
Yes
Batch loading using gpload (with gpfdist as the underlying worker)Local host files or files accessible via internal network• TXT
• CSV
• Any delimited text format supported by the FORMAT clause
• XML and JSON (require conversion to text format via YAML configuration file)
Yes
Creating external web tablesData pulled from network services or from any source accessible by command lines• TXT
• CSV
Yes