Every year, there is a shortlist of events that define the world of technology. Without any doubt, one of those events is AWS re:Invent. The yearly Amazon conference, which has been taking place since 2012, is a hallmark of innovation and announcements from both AWS and its partners about the latest breakthroughs in cloud technology.
The conference showcases the growth and expansion of cloud across technology areas, industries and geographies, such as AWS re:Invent 2022’s announcement of new AWS Local Zones GA.
In this article we will cover the key highlights from AWS re:Invent 2022 in the data area.
Scroll down to read more, or use these links to jump down:
- Amazon DataZone
- Amazon Security Lake
- Amazon OpenSearch Serverless
- Data Exchange and Clean Rooms
- Towards a Zero-ETL Future
- AWS re:Invent 2022 Summary
With the ongoing exponential growth in data being seen across organizations, the challenge of data governance and management is present in all industries. To face this challenge, AWS announced Amazon DataZone, a new service that helps to discover, share, and make data accessible across organizational boundaries.
With the Amazon DataZone service, an organization gains a 360-degree overview and management of their company data regardless of where it's stored: AWS, on-premises, or third-party providers. This enables AWS customers to create a business data catalog using their own industry terms to search, share, and access information across the entire organization without much effort.
Building a comprehensive organization data catalog brings a lot of value to both streamline workflow processes by enabling teams to collaborate more efficiently and access data in a self-service manner, and to simplify the access to analytics and business intelligence, by enabling information to be easily discovered, analyzed, and visualized by anyone in the company.
Amazon Security Lake
With the transition from on-premises and centralized IT systems towards the cloud, security teams were forced to adapt to a new reality where security data is spread across multiple environments such as on-premises, SaaS applications, or several cloud providers. To allow security data to be centralized, independent of their sources, AWS announced Amazon Security Lake, a purpose-built data lake that provides a complete and comprehensive understanding of the organization's security information.
Amazon Security Lake can be used to automatically centralize an organization's security data by leveraging open standards such as the Open Cybersecurity Schema Framework (OSCF) and the built-in AWS service integrations. Also, the native OSCF support enables data normalization from both AWS built-in services and third party enterprise security data sources.
Amazon Security Lake lowers the complexity and time required to build a security-oriented data lake, by bringing together multiple AWS built-in components and features while still allowing a lot of flexibility for security teams to use their preferred analytics, threat intelligence and incident management tools without having to duplicate data.
Amazon OpenSearch Serverless
The history between Amazon and ElasticSearch is long, and it changed significantly in 2021 when AWS launched OpenSearch, a new open source project derived from ElasticSearch after its license terms became more restrictive.
Since then, the OpenSearch project has been adopted by several companies and their managed Amazon OpenSearch service continues to be quite successful. However, the same complexity and challenges of managing, deploying, and operating ElasticSearch clusters have persisted with it.
In this AWS re:Invent edition, Amazon announced a few new database and analytics capabilities, reiterating the growing importance of these services in their product offering, and among them Amazon OpenSearch Serverless.
Amazon OpenSearch Serverless significantly lowers the operational overhead of having to manage and operate OpenSearch by automatically provisioning and scaling the underlying resources. The service effectively decouples compute and storage, separating the indexing and search components, and using AWS S3 storage as the primary data storage for indexes. This new architecture type allows sudden bursts of data to be ingested without impacting the search queries and their response times.
While the service follows the expected cost-efficient pay-as-you-go pricing model, there is a certain minimum amount of OpenSearch Computing Units (OCUs) that are billed per hour and storage is billed per month. These baseline costs bring the monthly cost of a single OpenSearch collection to around $600 per month. That breaks away from the “scale-to-zero” or “pay-only-what-you-use” serverless paradigm, potentially making it less appealing to experimental or development use cases. Nonetheless, Amazon OpenSearch Serverless is a welcome new option that will certainly evolve further.
Data Exchange and Clean Rooms
There is a growing need for organizations to exchange data and overall information with one another and trusted individuals. Allowing data to be accessed and often duplicated outside organizational boundaries is a challenge, both from a compliance and governance point of view.
In this re:Invent, Amazon announced several new Data Exchange features for services including Amazon S3, Lake Formation, and RedShift. In addition to making sharing data across organizations easier, this also opens the door for organizations to make their data sets available in the AWS Marketplace, with the potential for customers to adopt a subscription model towards data.
Another important announcement was AWS Clean Rooms, a new analytics service that builds upon the premise that there is a need to share data across companies. AWS Clean Rooms enables organizations to create virtual data clean rooms in minutes to analyze and collaborate in combined data sets without actually sharing the underlying data. With Clean Rooms, customers have built-in data access controls, and granular control over how queries are made to protect sensitive data.
Having the ability to collaborate with information, with or without sharing the underlying raw data, is an important step forward that got well-deserved attention in this re:Invent edition.
Towards a Zero-ETL Future
It is well understood that proper data-driven decision making requires good quality data. For several years, the extract, transform, and load (ETL) process has been an important cornerstone for building data lakes and making raw data usable for decision making.
There are signs that this status quo is about to undergo a big change. One announcement at re:Invent is that the AWS Glue service, the key building block for integrations and ETL in the Amazon ecosystem, is launching its 4.0 version. This version adds support for additional engines and data formats, proving the importance of transforming and enriching data. In parallel, Amazon also announced two new capabilities at re:Invent that will shift the company direction towards a zero ETL future:
- Amazon Aurora and Amazon Redshift now integrate to make it possible to analyze data in close to real time, and
- Apache Spark and Amazon Redshift can now integrate, giving Apache Spark access to a range of AWS machine learning and analytic services.
These announcements might at first be seen as conflicting but in fact they rather complement each other well. The new zero ETL capabilities in Amazon RedShift, the built-in AWS data warehouse solution, will significantly decrease the need for customers to perform data transformations between different AWS data storage services.
The data movement tasks that can now take place between Amazon Redshift and Aurora or Apache Spark can now happen under the hood, enabling customers to focus on analyzing the data without having to duplicate datasets, while still enabling enrichment and transformations with AWS Glue as much as needed.
AWS re:Invent 2022 Summary
As usual, the AWS re:Invent conference positively highlighted innovation and will shape the direction of cloud technologies for the upcoming months. The announcements feel evolutionary rather than revolutionary and that is a good sign of the level of maturity and adoption cloud has today.
The overall direction of AWS in the data area seems clear: there is a strong focus on databases, analytics, and data science managed services, with an increased shift towards serverless options in their built-in service portfolio. These and deep integrations with partner product offerings will enable customers to collaborate and potentially monetize their data.
- How much does it cost to go to AWS re:Invent?
The premiere conference around AWS is a must if you’re serious about cloud technology, but the conference has a considerable price tag: this year’s event cost $1,799 if you were looking for the full pass, giving you access to all the events on show at the conference. Of course, attendees also have to figure in travel, food, and accommodations into the total cost for the event.
- How many people attend re:Invent?
As the highest profile AWS event of every year, re:Invent sees huge numbers of attendees. In 2021, the total number of conference-goers was over 60,000.
- Where does AWS re:Invent take place?
Every year AWS re:Invent is held in Las Vegas, Nevada. It spans several hotel locations, as the sprawling event takes over the city. Attendees should be prepared to move around quite regularly between locations.