More about Google Cloud Database
- Google Cloud Dataflow: The Basics and 4 Critical Best Practices
- Should You Still Be Using Google Cloud Datastore?
- Google Cloud PostgreSQL: Managed or Self-Managed?
- Google Cloud Data Lake: 4 Phases of the Data Lake Lifecycle
- Google Cloud NoSQL: Firestore, Datastore, and Bigtable
- Google Cloud Big Data: Building Your Big Data Architecture on GCP
- Google Cloud Database Services Explained: How to Choose the Right Service for Your Workloads
- Google Cloud MySQL: MySQL as a Service vs. Self Managed in the Cloud
- Understanding Google Cloud High Availability
- 8 Types of Google Cloud Analytics: How to Choose?
- Cloud Firestore: An In-Depth Look
- Google Cloud BigQuery: How and When to Use Google Cloud BigQuery to Store Your Data
- Oracle on Google Cloud: Two Deployment Options
- SQL Server on Google Cloud: Two Deployment Options
- Google Cloud SQL: MySQL, Postgres and MS SQL on Google Cloud
What is Google Cloud Analytics?
Google Cloud provides a suite of intelligent analytics services, most of which are built-into or can be embedded into Google tools. This level of flexibility enables organizations to quickly leverage Google Cloud analytics and with minimal effort on adoption.
The Google Cloud Platform (GCP) offers a wide range of analytics tools, all built with unique capabilities for data analytics and management. Google’s artificial intelligence (AI) and machine learning (ML) solutions, for example, can be integrated into existing tooling to provide real-time intelligence.
Related content: read our guide to Google Cloud Database.
In this article, you will learn:
- Google Cloud Big Data Architecture
- Google Cloud Analytics Services and Solutions in Detail
- Google Cloud Analytics with NetApp Cloud Volumes ONTAP
Google Cloud Big Data Architecture
In the past, storing all data often created the dreaded “data swamps”, which were quite difficult to use for analytics. Modern big data architectures let organizations store massive amounts of data, both structured and unstructured, while maintaining metadata and other mechanisms to make it easy to query and analyze. The architecture below leverages Google Cloud tools to economically store petabytes or even exabytes of data.
Data lakes need to ingest different volumes of data from multiple sources, like website clickstream activities, data generated by online transaction processing (OLTP), on-premise data, and data generated by Internet of Things (IoT) sensors. You can support this mechanism by using Cloud Storage API to integrate Google Cloud Storage with other data pipelines.
Processing and Analytics
Once data is ingested and stored, it needs to be made available for analysis. To make your data subsets widely available, you need to create focused data marts. A simple solution is storing this data in a highly organized schema right after it is ingested. This can simplify in-place querying.
Source: Google Cloud
Design and Deploy Workflows
To ensure your data marts remain updated and relevant, you can use an orchestrated data pipeline. Ideally, this pipeline ingests raw data and then transforms it into a format supported by downstream consumers. Orchestrated data pipelines often vary depending on the types of data analytics, but there are certain standards that can be applied in many cases.
Here are popular analytics workflows you can implement on Google Cloud:
- Combine ETL and SQL—you can use extract, transform, and load (ETL) processes to ingest data into BigQuery warehouses. You can then use SQL to query data.
- Use Hadoop for batch analytics—you can store transformed data in Cloud Storage, and then use Dataproc to run queries against. This works either Spark SQL, Hive, Spark, and other tools.
- Use BigQuery for real-time analytics—you can use BigQuery to create a SQL-based pipeline with stream processing and use Dataflow and Pub/Sub with Beam.
Related content: read our guide to Google Cloud SQL.
Google Cloud Analytics Services and Solutions in Detail
BigQuery is an enterprise-grade data warehouse that uses Google's infrastructure to enable fast SQL queries. You can move massive datasets to BigQuery and the platform handles the load. BigQuery customers still maintain control over access to data and projects, and can enable or restrict control to users according to business needs.
BigQuery ML lets you use standard SQL to build and deploy machine learning models directly in BigQuery. Data analysts and scientists can use BigQuery ML to quickly build ML models on either semi-structured or planet-scale structured data. You can export your models for future reference or use them for online prediction when using Cloud AI Platform.
BigQuery BI Engine
BigQuery BI Engine’s service provides a highly fast in-memory analysis for BigQuery. The service lets users to interactively analyze big and complex data sets, providing a sub-second query response time, and a high level of concurrency.
Connected Sheets lets you analyze millions and billions of rows of live BigQuery data without using SQL. The data is placed in Google Sheets and you can use tools you know, like charts, formulas, and pivot tables. Connected Sheets provides easy capabilities to leverage big data for insights.
Data QnA provides a natural language interface built for petabyte-scale analytics performed on BigQuery and federated data sources. You can integrate Data QnA with existing tools, including chatbots, Google Sheets, existing applications, and BI solutions. Once you integrate with Data QnA, users of various knowledge and skills can use natural human language to leverage data. The tool is often used to improve productivity and increase access to data.
The BigQuery Omni solution provides fully-managed and flexible multi-cloud analytics. Omni lets users analyze data across different cloud environments. You can quickly answer questions and share them across datasets, by using SQL in BigQuery's interface.
Related content: read our guide to Google Cloud BigQuery.
Dataflow provides fully-managed streaming analysis. The service uses batch processing and autoscaling to reduce costs, latency, and processing time. Here are key features:
- Streaming Engine—use this to improve data latency and autoscaling. The engine separates compute from state storage, moving components of pipeline execution into Dataflow.
- Dataflow SQL —this feature lets you use Google Sheets and other BI solutions to build real-time dashboards. To build these streaming Dataflow pipelines, you need to use SQL in BigQuery's web UI.
- Autoscaling—you can use this feature to automate scaling. Once enabled, Dataflow will automatically choose the number of worker instances when running each job.
Cloud Dataprep was developed by Trifacta. It offers intelligent data capabilities for visualizing the process of cleaning, preparing, and exploring unstructured and structured data. You can use the service for visualizing reporting, machine learning, and analysis. Cloud Dataprep is serverless and scalable, so there is no need to deploy or manage infrastructure. Data transformations are predicted and suggested for each UI input, and there is no need to write code.
Dataproc service provides fully managed cloud services for simplifying Apache Hadoop and Apache Spark clusters deployments. When using Dataproc, you can choose required resources per each cluster node and leverage autoscaling to reduce costs, optimize your clusters, and ensure high availability.
5) Stream Analytics
Ingest, process, and analyze event streams in real time. Google Cloud's stream analytics solutions make data more organized, useful, and accessible from the instant it’s generated. It lets you ingest and analyze hundreds of millions of events per second from applications or devices, or directly stream millions of events per second into your data warehouse.
6) Marketing Analytics
This service lets you apply Google Cloud’s machine learning on all your data. You can then gain a complete picture of customer behavior, map entire customer journeys, and then predict business and marketing outcomes. You can also use the insights to create personalized experiences for your customers.
7) Data Catalog
This is a fully-managed metadata management solution you can scale according to business needs. Data Catalog is serverless, providing a simple user interface with features for advanced structured searching. The tool comes with built-in cloud DLP integration, for simpler data governance.
Google Cloud Analytics with NetApp Cloud Volumes ONTAP
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
In particular, Cloud Volumes ONTAP helps in addressing database workloads challenges in the cloud, and filling the gap between your cloud-based database capabilities and the public cloud resources it runs on.