Skip to main content

Transparent Data Encryption

To meet the requirements for protecting user data security, Cloudberry Database supports Transparent Data Encryption (TDE).

TDE is a technology used to encrypt database data files:

  • "Data" refers to the data in the database.
  • Files are stored in ciphertext on the hard drive disk and processed in plaintext in memory. TDE is used to protect static data, so it is also known as static data encryption.
  • "Transparent" means users do not need to change their operational habits. TDE automatically manages the encryption/decryption process without user or application intervention.

Introduction to encryption algorithms

Basic concepts

  • DEK (Data Encryption Key): The key used to encrypt data, generated by the database and stored in memory.
  • DEK plaintext: The same meaning with DEK, but can only be stored in memory.
  • Master key: The key used to encrypt the DEK.
  • DEK ciphertext: The DEK encrypted with the master key, stored persistently.

Key management module

The key management module is the core component of TDE, implementing a two-tier key structure: master key and DEK. The master key is used to encrypt the DEK and is stored outside the database; the DEK is used to encrypt database data and is stored in the database in ciphertext.

Algorithm classification

Encryption algorithms are divided into the following types:

  • Symmetric encryption: The same key is used for both encryption and decryption.
  • Asymmetric encryption: Public key for encryption, private key for decryption, suitable for one-to-many and many-to-one encryption needs.

Block encryption algorithms in symmetric encryption are the mainstream choice, offering better performance than stream encryption and asymmetric encryption. Cloudberry Database supports two block encryption algorithms: AES and SM4.

AES encryption algorithm

AES is an internationally standardized block encryption algorithm, supporting 128, 192, and 256-bit keys. Common encryption modes include:

  • ECB: Electronic Codebook mode
  • CBC: Cipher Block Chaining mode
  • CFB: Cipher Feedback mode
  • OFB: Output Feedback mode
  • CTR: Counter mode

More ISO/IEC encryption algorithms

More ISO/IEC encryption algorithms include:

  • ISO/IEC 14888-3/AMD1 (i.e., SM2): Asymmetric encryption, based on ECC, outperforms RSA.
  • ISO/IEC 10118-3:2018 (i.e., SM3): Message digest algorithm, similar to MD5, outputs 256 bits.
  • ISO/IEC 18033-3:2010/AMD1:2021 (i.e., SM4): Symmetric encryption algorithm for wireless LAN standards, supports 128-bit keys and block lengths.

User instructions

Before using the TDE feature, ensure the following conditions are met:

  • Install OpenSSL: OpenSSL is expected to be installed on the Cloudberry Database node. Typically, Linux distributions come with OpenSSL pre-installed.
  • Cloudberry Database version: Make sure your Cloudberry Database version is not less than v1.6.0, which is when TDE support was introduced.

When deploying Cloudberry Database, you can enable the TDE feature through settings, making all subsequent data encryption operations completely transparent to users. To enable TDE during database initialization, use the gpinitsystem command with the -T parameter. Cloudberry Database supports two encryption algorithms: AES and SM4. Here are examples of enabling TDE:

  • Using the AES256 encryption algorithm:

    gpinitsystem -c gpinitsystem_config -T AES256
  • Using the SM4 encryption algorithm:

    gpinitsystem -c gpinitsystem_config -T SM4

Verify TDE effectiveness

The transparent data encryption feature is invisible to users, meaning that enabling or disabling this feature does not affect the user experience during read and write operations. However, to verify the effectiveness of encryption, you can simulate a key file loss scenario and ensure that the database cannot start without the key file by following these steps.

The key file is located on the Coordinator node. To locate the key file, first find the data directory of the Coordinator node. For example:

COORDINATOR_DATA_DIRECTORY=/home/gpadmin/work/data0/master/gpseg-1

Then, find the key files:

$ pwd
/home/gpadmin/work/data0/master/gpseg-1

$ ls -l pg_cryptokeys/live/
total 8
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 relation.wkey
-rw------- 1 gpadmin gpadmin 48 Apr 12 10:26 wal.wkey

The relation.wkey file is the key used to encrypt data files, while the wal.wkey file is used to encrypt WAL logs. Currently, only relation.wkey is active; the WAL logs are not yet encrypted.

Verification process

  1. Create a table and insert data.

    • Create an append-only (AO) table and insert data:

      postgres=# create table ao2 (id int) with(appendonly=true);
      postgres=# insert into ao2 select generate_series(1,10);
    • Ensure the data has been successfully inserted.

  2. Stop the database.

    gpstop -a
  3. Simulate key file loss.

    • Switch to the directory where the key files are stored:

      cd /home/gpadmin/work/data0/master/gpseg-1/pg_cryptokeys/
    • Move the key files to another directory (to simulate key file loss):

      mv live backup
  4. Attempt to start the database.

    • Start the database using the gpstart command:

      gpstart -a

      The database will fail to start because of the missing key files. You will see an error in the database logs on the Coordinator node, similar to the following:

      FATAL: cluster has no data encryption keys

      This confirms that the database cannot start without the key files, ensuring data security.

  5. Restore the key files by moving the previously backed-up key files back to the original directory:

    mv backup live
  6. Restart the database and verify the data.

    1. Start the database again using the gpstart command:

      gpstart -a
    2. Once the database has successfully started, query the ao2 table to verify the data:

      postgres=# select * from ao2 order by id;
      id
      ----
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      (10 rows)

By following these steps, you can verify the effectiveness of the transparent data encryption feature, ensuring that the database cannot start without the key files, thus securing the data at rest.