BlueXP Blog

How to Deploy Cloud Data Sense in Your On-Premises Data Center

Written by Shahar Livschitz, Product Manager | Jun 27, 2023 4:48:36 AM

Enterprise storage systems are increasing in complexity, with large amounts of data distributed across a diverse range of different technologies in hybrid-cloud and multicloud environments. Cloud Data Sense provides you with a way to stay on top of all of your data assets, no matter where they’re located.

In a previous post, we showed you how to deploy Cloud Data Sense in the cloud. But you can also deploy it to a Linux host in your on-premises data center. This quick-start guide shows you how.

Use the links here to jump down to get started right away, or read on for more details.

What Is Cloud Data Sense?

As part of BlueXP classification, Data Sense automatically maps and classifies your data across your entire data estate, identifying data by sensitivity level, data type, usage, and many other metrics to give you more data governance and control.

Data Sense Deployment

You can deploy Data Sense in a number of different ways. The following procedure is specifically for the on-premises installation that requires an Internet connection. Note that the installation method for on-premises deployment without Internet connectivity differs from the steps shown below.

This implementation will consist of the components and connections shown below.

Data Sense on-prem installment components

Once you've familiarized yourself with the setup shown, you'll be ready to deploy the software to your on-premises Linux host, as outlined in the following steps.

If you prefer to use our video guide, then simply click the play button to watch the tutorial below. Otherwise, just skip this part and scroll down to the next section.

Watch the Video

How to Install Data Sense On-Prem: Step-by-Step Instructions

Installation Requirements

Before you begin the installation process, you'll need to set up an environment that meets certain requirements. The following is a quick-start list of the main prerequisites for deploying and using Data Sense.

  • BlueXP Connector
    First, you'll need a BlueXP Connector installed in your on-premises network. If you haven't yet set this up then check out our documentation on how to set up the Connector.
  • Linux Host System
    You'll also need to prepare a virtual machine (VM) to host the Data Sense software. Depending on the scale of the data set you intend to scan and the timeframe you have to scan, you should provision a VM in one of the three following sizes.

System size

# of CPUs

RAM (ensure swap memory is disabled)

Disk

Large

16

64 GB

Either:

  • 500 GB SSD on /

Or:

  • 100 GB available on /opt
  • 395 GB available on /var
  • 5 GB on /tmp

Medium

8

32 GB

Either

  • 200 GB SSD on /

Or:

  • 50 GB available on /opt
  • 145 GB available on /var
  • 5 GB on /tmp

Small

8

16 GB

Either

  • 100 GB SSD on /

Or

  • 50 GB available on /opt
  • 45 GB available on /var
  • 5 GB on /tmp

The host environment must also meet the following specifications:

Operating system

Required software dependencies

Red Hat Enterprise

  • Linux version 7.8 and 7.9 (provided the Linux kernel version is 4.0 or later)

CentOS

  • Version 7.8 and 7.9 (provided the Linux kernel version is 4.0 or later)

Ubuntu

  • Version 22.04 (requires BlueXP classification version 1.23 and later)

Docker Engine

Python 3

  • Version 3.6 or later
  • The directory /TMP needs to have at least “rwxrwxrwt”
  • /OPT needs to have at least “rwxr-xr-x”
  • /VAR/LIB/DOCKER needs to have at least “rwx----”
  • /USER/LIB/SYSTEMD/SYSTEM needs to have at least “rwxr-xr-x”
  • Outbound Internet Access
    You'll also need to ensure your Data Sense deployment has outbound access to the Internet so it can access the following endpoints.

Endpoint

Purpose

https://api.bluexp.netapp.com

Allows communication for BlueXP and NetApp accounts.

https://netapp-cloud-account.auth0.com
https://auth0.com

For user authentication via the BlueXP website.

https://support.compliance.api.bluexp.netapp.com/
https://hub.docker.com
https://auth.docker.io
https://registry-1.docker.io
https://index.docker.io/
https://dseasb33srnrn.cloudfront.net/
https://production.cloudflare.docker.com/

Gives access to manifests, templates, software images, and enables sending logs and metrics.

https://support.compliance.api.bluexp.netapp.com/

Allows data to be streamed by NetApp from audit records.

https://github.com/docker
https://download.docker.com
http://mirror.centos.org
http://mirrorlist.centos.org
http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-3.el7.noarch.rpm

For packages required prior to installation.

  • Open Ports
    You need to make sure the relevant ports are open so that your system components will be able to communicate with each other, as follows in the table below:

Connection Type

Ports

Description

Connector <> Data Sense

8080 (TCP), 443 (TCP), and 80

  • Port 443 must be set to allow inbound and outbound traffic in your routing rules or firewall rules for the Connector and Data Sense to communicate.
  • Port 8080 must be open in order to view the installation progress in BlueXP.

Connector <> ONTAP cluster (NAS)

443 (TCP)

ONTAP clusters are discovered by BlueXP using HTTPS. Ensure any custom firewall policies adhere to the following:

  • The Connector host’s port 443 needs to allow outbound HTTPS access. For cloud-based Connectors, predefined firewall or routing rules will allow all outbound communication.
  • The ONTAP cluster’s port 443 must allow inbound HTTPS access. By default, the "mgmt" firewall lets all IP addresses have inbound HTTPS access. If that default has been altered, or if using a custom firewall policy, make sure to associate that policy with HTTPS and allow access from the Connector host.

Data Sense <> ONTAP cluster

  • For NFS - 111 (TCP\UDP) and 2049 (TCP\UDP)
  • For CIFS - 139 (TCP\UDP) and 445(TCP\UDP)

Network connectivity between Data Sense and each Cloud Volumes ONTAP subnet or on-prem ONTAP system is required. Inbound connections from Data Sense must be allowed by your Cloud Volumes ONTAP firewall/routing rules.

Data Sense needs the following ports open:

  • Ports 111 and 2049 for NFS
  • Ports 139 and 445 for SMB/CIFS

Access from Data Sense is required in your NFS volume export policies.

Data Sense <> Active Directory

389 (TCP & UDP), 636 (TCP), 3268 (TCP), and 3269 (TCP)

Active Directory must be set up for your users. Data Sense also requires Active Directory credentials in order to scan SMB/CIFS volumes.

Make sure to have the following Active Directory information on hand:

  • The address(s) of your DNS Server
  • Your server password and username
  • Your AD name/domain name
  • Determine if using LDAP or LDAPS (secure LDAP)
  • LDAP server port (usually port 636 for LDAPS, port 389 for LDAP)

Running the Data Sense Installation

Once your prerequisites are all in place and you’ve installed the Connector, follow these steps to install Data Sense in your on-prem system.

  1. Begin by downloading the installation software. Download it from the NetApp Support Site here.
  2. Connect via SSH to your prepared virtual machine (VM). Copy the downloaded installation software to the VM and unzip it.
  3. Now navigate to the BlueXP home page. Click on the “Go to Console” button to open the BlueXP console, as shown below.
  4. In the BlueXP console, select “Governance” in the left-side menu, then click on “Classification,” as shown here:
  5. On the following screen, click on the “Activate Data Sense” button. Select the “On Prem deployment.”

    A dialogue box will then appear with further information about the deployment.

  6. Copy the installation command provided.
  7. Paste the installation command into the command prompt in your prepared VM.
  8. Next, enter the parameters required for your installation at the following series of prompts.

    These include the IP address of your BlueXP Connector and, where necessary, the proxy settings of your system.

  9. The installer then performs a series of tests to check whether your configuration meets the system and network requirements for successful installation.

    Results of the test will be displayed in red, amber, or green to indicate whether a setting is unsuitable, suitable but with scan speed limitations, or good to go, respectively.
    In the example below, the user is able to install Data Sense with limitations and decides to proceed.

    Watch this video for more detailed steps.

  10. The installation process then takes 10–20 minutes to complete. While you're waiting, you can go back to BlueXP to confirm the installation is in progress.
  11. Once it has finished, you can then proceed to the Configuration page in BlueXP and start adding data sources you want to scan.

A Setup to Suit You

This post walked you through all the key steps involved in a basic Data Sense on-premises installation. However, you can choose from a number of alternative implementations and customized setups to suit your own specific needs.

For example, you can deploy Data Sense to an instance on any of the three leading cloud platforms—AWS, Microsoft Azure, and Google Cloud Platform. For customers with security demands that don’t allow the use of the Internet, you can install the BlueXP UI locally without the use of an Internet connection.

Users with extremely large systems that have petabytes to scan have the option to install using multiple hosts to increase processing power.

Whichever installation method is right for you, Data Sense will work the same.