BlueXP Blog

PostgreSQL Data Privacy Features [OpenSource & Commercial]

Written by Amit Ashbel, Senior Marketing and Strategy Manager | Jun 29, 2021 9:28:14 AM

This article is the second in a series on database privacy features, providing a look at a few popular database engines and how each approaches the task of allowing you to secure and protect your data, including the sensitive human profiles that require the highest degree of care. This time we focus upon the capabilities of PostgreSQL, aka Postgres, for short.

The first article, MySQL Data Privacy Features for Enterprise and Community Editions, framed the task as protecting the data bloodstream of your business from hackers determined to steal sensitive and valuable data. Industry and government regulations set a high standard in which to do this, and it must be enforceable despite the myriad workflows and devices that comprise your business practices.

Conceptual Overview

Similar to MySQL, Postgres offers ways to cover the three main aspects of data privacy that we’re tracking in this series:

  • Data Encryption: Data is vulnerable to determined hackers both when “at rest” (stored in tables on disk) or “in flight” (in memory or flowing across the network). Encryption scrambles the data so that only an actor with credential and key can access it.
  • Data Masking: When you have special category data, data masking is an effective way to protect sensitive information. Some examples include race, ethnicity, religious preference, and other biographical details of your customers, employees, and users that it’s important to safeguard. Data masking creates a version of the data that can be used like the original, but which hides or “masks” the actual information.
  • Data De-Identification: When you have personal data, de-identification can guard peoples’ identities. This includes names, telephone numbers, addresses and account numbers. De-identification removes this information in all processes except for those authorized to use it to avoid accidental disclosure or exploitation by hackers.

Postgres is an open-source database engine with robust offerings out of the box. Since it is open source, various commercial vendors have embedded it into their platforms as a standard and familiar database option, including Amazon RDS for PostgreSQL and Azure Database for PostgreSQL. This allows them to integrate Postgres with their platforms’ unique capabilities, providing some interesting data privacy options.

Let’s sample what Postgres has to offer to support these crucial functions.

Postgres Data Encryption

The Postgres engine offers support for several ways to do at-rest encryption, and also supports the standard SSL-based approach to in-flight encryption. Like other engines, it also has the older stand-by options to support disk encryption.

Open-Source PostgreSQL Data Encryption

Postgres comes with a flexible set of options out of the box to encrypt data. The at-rest options include the following:

  • Data Partition Encryption: Postgres supports encryption at the file system level or the block level, using facilities that are common to most operating systems, including Linux, FreeBSD and Windows
  • Encryption for specific columns: The pgcrypto module that can be used to encrypt specific columns in a table if only part of the data is sensitive. The client supplies the key to decrypt the data as part of the SELECT query.
  • Function-Based Encryption: The pgcrypto module also offers functions that work with private/public keys to encrypt data going into the database and decrypt it coming out. Postgres’ function-based encryption is offered with both public- and symmetric-key encryption support and aligns with the OpenPGP (RFC 4880) standards.

Postgres supports in-flight encryption via SSL with a few options:

  • The SSL implementation protects all network transmission of data, from the queries being made, the answers returned, and the passwords in use.
  • System administrators can crank up the security with options such as only allowing SSL connections and requiring SSL certificates at both the client and server side of the transactions.

You can read more about Postgres encryption options in the documentation.

Commercial Version Data Encryption

As mentioned above, Postgres has been adopted as a part of numerous commercial offerings from different vendors. As one commercial example, Azure embeds Postgres into its platform. The product, Azure Database for PostgreSQL, integrates with Azure Data Encryption to encrypt data at rest. The use of Microsoft-managed keys makes this similar to the Transparent Data Encryption (TDE) offered by other platforms.

Other vendors support similar offerings that tie into their platforms, such as AWS RDS Postgres.

Postgres Data Masking

Like other database engines, Postgres supports data masking with the use of SQL extension functions. Out of the box, this functionality has many options, including the ability to create your own extensions.

Open-Source PostgreSQL Data Masking

A summary of Postgres’ data masking capabilities:

  • Destruction: Completely exclude the stored data and replace with text like “CONFIDENTIAL.”
  • Adding Noise: Alter the numeric data by a randomized percentage to hide actual values but retain the ability to use this data for meaningful testing
  • Randomization: Replace specific types of data, such as phone numbers, dates and numeric values, with generated data that has the same type and structure
  • Faking: Supply random but plausible values for identity-related fields like first name, last name and email address
  • Advanced Faking: Get more rigorous with faking by making use of PostgreSQL Faker, an extension based on the well-known Faker library for Python
  • Pseudonymization: Similar to faking, these functions return the same value based on the seed and salt values, when you need predictable data patterns for testing
  • Generic Hashing: Approach hashing carefully as it is an advanced topic when used for data masking. There are several approaches that can be used depending on your requirements.
  • Partial Scrambling: Leaves out parts of the original data but lets the rest show through.
  • Generalization: Replaces the data—and the structure—with a more generic set of information. For instance, instead of a birthday, generalization would provide a date range. This may not be a useful choice for testing that requires a single date.
  • Write Your Own: Postgres supports developers writing additional functions that do either destructive or randomizing techniques.

You can read more about Postgres’ data masking functions here to see the various options available.

Commercial Version Data Masking

The extensible nature of Postgres and the requirements for data masking has created a market of third-party vendors that offer function libraries that you can plug in and use. Evaluating these is beyond the scope of this article.

Postgres Data De-Identification

As with MySQL, Postgres’ supports de-identification with a subset of its data masking functions.

Open-Source Data De-Identification

Pseudonimization serves a different purpose than anonymization, which is a requirement of data de-identification. Make sure you comply properly when requirements such as GDPR are involved by using the data masking approach that yields true anonymization.

Commercial Version Data De-Identification

The requirements for de-identification have created a market for tooling to do it properly. For example, Amazon RDS offers de-identification capabilities that you can plug into if you use the AWS platform. Demand and competition will assure that other such tool sets will become available.

Conclusion

Database administrators and system builders have a lot to think about when it comes to securing data. The Postgres library plug-in architecture enables a pantheon of community offerings to help in your efforts to secure data as well. Performance depends on the features used and how they are configured.

NetApp Cloud Data Sense now supports Postgres, as well as a number of other popular databases, including MySQL, MSSQL, Oracle, and SAP HANA, and MongoDB. Cloud Data Sense gives database deployment an additional utility for data privacy: AI-driven data mapping that can identify the sensitive private data stored in your database so you can pinpoint and report on that data to stay in compliance with GDPR, CCPA, and the host of new data privacy regulations that have been enacted around the globe.