August 4, 2021
Topics: Cloud Data Sense DatabaseAdvanced9 minute read
MongoDB is one of the most popular alternatives to the traditional relational database. It's highly flexible, easy to use and perfectly adapted to modern application development. But, above all, it's designed to store huge amounts of data with relatively limited impact on performance.
As a result, MongoDB has become one of the key drivers behind today's data-oriented landscape. Wide-ranging data protection regulations across the globe mean that companies that store and process large data sets with personal information in MongoDB need to be able to identify and manage personal information within those databases.
Database technologies such as MongoDB have reacted to provide IT and data protection teams with a range of tools and features to meet their data governance obligations. With NetApp Cloud Data Sense recently extending support to MongoDB deployments on-prem and in the cloud, users now have even more capabilities to manage personal information within MongoDB databases.
In this post, we run through the key compliance challenges of database management with MongoDB and how you can meet them.
What Is MongoDB?
MongoDB is an open-source, document-oriented database technology, which uses a different approach to structuring data than conventional relational databases.
It belongs to a class of non-relational databases known as NoSQL, which provide an alternative way of interacting with data to the traditional SQL method used by relational databases.
By contrast with the relational model, where each record is represented by a row in a table, MongoDB uses the concept of a document to represent an individual record.
In both types of database, each row or document consists of fields that store the items of data which belong to that specific record. In a relational database, however, each row is organized into a series of columns—one for each field. By contrast, a document-oriented database is far less rigidly structured and simply represents each item of data as a field-value pair.
A logical grouping of documents is known as a collection, which equates to the traditional relational database concept of a table.
An Example MongoDB Document. Source: MongoDB
The document-based model of organizing data is far better suited to the way in which developers code their applications. And the more flexible structure of MongoDB and other databases like it will often allow you to store all the information about a data subject in a single document rather than spreading it across several relational database tables.
What's more, you can distribute a MongoDB database across a series of servers, which makes it particularly well adapted to deployment in the cloud.
MongoDB is available as a self-managed offering, which you can host either on-premises or in the public cloud. Alternatively, you can deploy it using MongoDB Atlas—a fully managed cloud-based service which provides an easy way to host a seamless database cluster across AWS, Microsoft Azure, and Google Cloud Platform as part of a multicloud strategy.
AWS and Microsoft Azure offer similar database-as-a-service (DBaaS) solutions, Amazon DocumentDB and Azure Cosmos DB, both of which are also capable of supporting MongoDB workloads.
MongoDB Data Security
Encryption is an excellent defense that you can take against any kind of unauthorized access for data. The same applies to sensitive data that falls under the auspices of data privacy regulation.
By and large, when an organization needs to encrypt data, it will encrypt the entire disk that hosts the database. However, this all-or-nothing approach is potentially open to abuse by database administrators or any third party that has authorized access to your database.
To address this security issue, MongoDB data security offers more granular client-side field-level encryption—a feature that allows a developer to selectively encrypt specific fields within a document before sending it to the server. MongoDB encryption therefore not only prevents anyone hosting or managing the database from accessing confidential information, but it also prevents capture of data over an unprotected network. Moreover, it also allows you to disclose different types of data to different users based on their legitimate needs.
An additional MongoDB data security feature is its pseudonymization capabilities that will filter out or mask data depending on the access privileges of the user—without the need for coding changes at application level.
Besides encryption, MongoDB supports a range of other features to help you meet security and compliance requirements. These include:
- Role-based access control (RBAC): You can configure your MongoDB security settings to use the LDAP authentication protocol, which is widely supported by access management systems such as Active Directory. This provides centralized access control to database resources and operations, where the level of access is based on the processing and administrative requirements of each user.
- Data resilience and availability: MongoDB natively maintains a set of replicas to ensure fault tolerance and high availability of your data, providing automatic failover should a server in your cluster go down.
As a cloud-based offering, MongoDB Atlas provides for replication across availability zones, cloud regions and even different cloud providers. The service also includes operational tooling that can automate your snapshots, supported by queryable backups and point-in-time recovery.
- Database activity monitoring (DAM): MongoDB Atlas provides you with a metrics dashboard for monitoring your clusters, helping you to spot the signs of malicious database activity. However, the free tier covers only a limited number of operational metrics, whereas the premium service gives you real-time access to a much wider range.
You can also monitor database activity in the self-managed version of MongoDB. But you'll first need to enable it on your MongoDB server.
Right to Erasure
If you use field-level encryption, MongoDB can also help you simplify the process of serving right-to-erasure requests. In essence, you can delete an individual's personal data simply by destroying the encryption keys associated with their record, thereby rendering their information unreadable and irrecoverable to anyone.
Under data privacy laws such as the General Data Protection Regulation (GDPR) and its forthcoming Brazilian counterpart, Lei Geral de Proteção de Dados Pessoais (LGPD), many companies maintain records retention policies designed to ensure that personal data is stored for defined periods.
A retention policy will not only help ensure you meet legal requirements but will also contribute to reducing security risk, storage costs, and the number of right-to-erasure requests you'll need to perform.
However, policy enforcement can come with a high database overhead. This is because you'll need to periodically scan your data at application level for records that are due for deletion. MongoDB can relieve this burden through a built-in automation feature, which can remove individual pieces of data from the database after a predefined time period.
Traditional relational databases, such as MySQL, are designed to be single-node deployments. In order to partition them, you'd need to overcome significant technical challenges.
This means that, when you use a SQL database, you'll be hosting all your data in a single location. This can be problematic if you need to simultaneously comply with varying data governance requirements. On the other hand, as MongoDB can be distributed, you have far more flexibility to store your personal data where you need to.
With MongoDB Atlas multicloud support, you can choose from more than 70 cloud regions to host your data. So you can satisfy data residency requirements by mapping documents to specific cloud regions based on geographical location.
In the case of self-managed MongoDB deployments, you can set up zone-based sharding to confine personal data to specific locations.
Visibility into your data is essential to compliance. Because, without clear insight into your data, you cannot be certain you're giving it the protection it needs.
You may also find it cumbersome to comply with other privacy requirements. For example, your response to any right-of-access request must give a full account of all the personal information you store about the data subject. Doing that manually requires significant investments of time and resources.
And you may be unaware of other risky practices, such as excessive data collection, which, if unnoticed, could pose substantial business and legal risks.
MongoDB uses the MongoDB Query Language (MQL) to query data. It is strongly geared towards use by developers and very different from SQL. However, it shares similar concepts and is equally powerful and straightforward to learn. A number of tools are also available that allow you to query MongoDB using traditional SQL statements.
In-House Visibility Tools
MongoDB offers an in-house data visibility service MongoDB Compass, which is designed for data discovery and data management. This provides useful insights into the different types and frequency of data in your documents and collections.
One of the key merits of Compass is ease of use, as it doesn't require any knowledge of query languages. However, the service is more of an aggregate query and schema visualization tool and provides relatively few features to aid compliance.
Both MQL queries and MongoDB Compass are highly effective tools for identifying types of data that conform to very precise structures, naming conventions and patterns.
But, in modern storage environments, many forms of personal information are far less structured. So they're much more difficult to identify without a human understanding of the data.
This problem has led to the evolution of new AI-based tools that can understand the contextual meaning of database content and therefore provide accurate detection of personal data.
These solutions can not only recognize everyday personal information such as email addresses and credit card numbers, but also special category data, such as details about someone's sex life, political opinions or ethnic origin, as defined by privacy laws such as the GDPR.
But, above all, they're designed for compliance professionals whatever their technical expertise. So they can carry out their duties without relying on complex database queries to identify and protect sensitive information. This is what Cloud Data Sense can offer MongoDB.
A New Level of Support for MongoDB: Cloud Data Sense
While the MongoDB tools for data security offer a lot of help for keeping confidential data safe, there are limits to how effectively that personal data can be managed with them. NetApp Cloud Data Sense is the AI-driven data mapping and reporting technology that can be used on any on-prem or cloud-based MongoDB deployment that takes data privacy compliance one step further.
Cloud Data Sense’s contextually-aware search identifies, maps, and reports on data that is relevant to data privacy stored in your MongoDB database. Its easy dashboards allow users to gain full visibility of where sensitive personal data is stored within the database, even if it spans multiple repositories, and create instant reports on that data which can be used to respond to DSARs or to demonstrate compliance awareness, two critical components of major data privacy legislation.
Sign up now to try Cloud Data Sense free for up to 1 TB of data.