Systems | Development | Analytics | API | Testing

Data Lakes

Modern Data Architectures | Data Mesh, Data Fabric, & Data Lakehouse

For years, companies have viewed data the wrong way. They see it as the byproduct of a business interaction and this data often ends up collecting dust in centralized silos governed by data teams who lack the expertize to understand its true value. Cloudera is ushering in a new era of data architecture by allowing experts to organize and manage their own data at the source. Data mesh brings all your domains together so each team can benefit from each other’s data.

Partnering with AWS on Amazon HealthLake to Speed Insights

Gaps in patient healthcare, ranging from access and affordability, to those specific to race, gender, age and beyond, are widening across the US and leading to a variety of detrimental results for people, the healthcare system, and the economy itself. Such ongoing disparities are slowing the country’s ability to achieve population health and accounting for billions of dollars in unnecessary health care spending annually.

From Data Lake to Data Mesh: How Data Mesh Benefits Businesses

Current data architecture is going through a revolution. Enterprises are starting to shift away from the monolithic data lake towards something less centralized: data mesh. Data mesh is a relatively new concept, first coined in 2019, that addresses potential issues with data warehouses and data lakes that can cause businesses to be slow, unresponsive, or even suffer from data silos. Data mesh benefits are able to provide a wealth of advantages to your business.

10 Keys to a Secure Cloud Data Lakehouse

Enabling data and analytics in the cloud allows you to have infinite scale and unlimited possibilities to gain faster insights and make better decisions with data. The data lakehouse is gaining in popularity because it enables a single platform for all your enterprise data with the flexibility to run any analytic and machine learning (ML) use case. Cloud data lakehouses provide significant scaling, agility, and cost advantages compared to cloud data lakes and cloud data warehouses.

Transformation for Analysis of Unintegrated Data-A Software Tautology

What pray tell is a tautology? A tautology is something that, under all conditions, is true. It is kind of like gravity. You can throw a ball in the air and, for a few seconds, it seems to be suspended. But soon gravity takes hold, and the ball falls back to earth.

Integrating Observability into Your Security Data Lake Workflows

Today’s enterprise networks are complex. Potential attackers have a wide variety of access points, particularly in cloud-based or multi-cloud environments. Modern threat hunters have the challenge of wading through vast amounts of data in an effort to separate the signal from the noise. That’s where a security data lake can come into play.

Diving Deep Into a Data Lake

A Data Lake is used to refer to a massive amount of data stored in a structured, unstructured, semi-structured, or raw form. The purpose is just to consolidate data into one destination and make it usable for data science and analytics algorithms. This data is used for observational, computational, and scientific purposes. The database has made it easier for AI models to gather data from various resources and implement a flawless system that can make informed decisions.

Data Lakes: The Achilles Heel of the Big Data Movement

Big Data started as a replacement for data warehouses. The Big Data vendors are loath to mention this fact today. But if you were around in the early days of Big Data, one of the central topics discussed was — if you have Big Data do you need a data warehouse? From a marketing standpoint, Big Data was sold as a replacement for a data warehouse. With Big Data, you were free from all that messy stuff that data warehouse architects were doing.

Cloudera's Open Data Lakehouse Supercharged with dbt Core(tm)

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD).