Systems | Development | Analytics | API | Testing

July 2020

How Data Exchanges Enable Secure Data Collaboration

In a global economy, real-time data analysis is closely related to business success. Without data-driven insights, organizations find it challenging to remain competitive, improve company performance, and deliver strong user experiences, regardless of their industry. To match the pace of business, companies require transparent, data-driven relationships.

Faster Analytics with Cloudera Data Warehouse (CDW) Demo Highlight

The cloud-led journey to digital transformation requires organizations to become significantly more data-driven, yet traditional data warehouses have difficulty with new data volumes, new data types, and a variety of use cases. In this session, we will show you how Cloudera Data Warehouse offers a guide to your cloud journey by offering a modern hybrid cloud solution for an unprecedented scale that delivers insight to every part of your organization, faster while saving costs.

Meeting Medical Device Data Privacy, Governance, and Security Challenges

Medical devices have become increasingly complex as technology evolves, and the sheer number of these devices now being worn or implanted has grown exponentially over the past few years. There are currently over 500,000 different types of smart, connected medical devices in use that have the ability to collect, share, or store private patient data and protected health information (PHI)(1).

Snowflake Helps Finnair Improve Customer Experience with Cloud Data Analytics

Coronavirus has impacted the travel industry, but as it adapts, there is one factor airlines have always worked hard to minimize: delayed flights. Arriving late or missing a connection can severely impact the customer experience, which is why airlines work hard to maintain high rates of on-time performance (OTP). To that end, pilots may have to use extra fuel to make up for a delayed departure or to reach a destination early, even if it means circling the airport before landing.

Introduction to Yellowfin Embedded Analytics for Product Teams

As a recognized leader in embedded analytics, Yellowfin has been designed and built to enable you to embed amazing analytical experiences into your software. From a highly integrated dashboard module and full self service reporting, to enabling best practice integration that blurs the lines between analytics and your application and workflows. Modernize your reporting environment with Yellowfin to ensure your customers engage with your data, discover insights faster through automation and innovate with contextualised analytics.

CDO Sessions: Getting Real with Data Analytics

Big data leaders are no doubt being challenged with market uncertainty. Data-driven insights can help organizations assess, and uncover market risk and opportunities that may arise during uncertain times. As businesses around the world adapt to digitization initiatives, modern data systems have become more mission critical toward continuity and competitive differentiation.

The reinvention of the Telco: From Pipe to Processor

The next generation of 5G networks are unlocking a mind-bending array of new use cases. Blistering speed, super low latency, and access to more powerful mobile hardware bring VR, AR and ultra high-definition experiences into sharp focus for the near future. But there’s a bigger shift being driven by 5G, and it’s not actually about speed at all. It’s about re-thinking the modern telco business model.

Building a Scalable Process Using NiFi, Kafka and HBase on CDP

Navistar is a leading global manufacturer of commercial trucks. With a fleet of 350,000 vehicles, unscheduled maintenance and vehicle breakdowns created ongoing disruption to their business. Navistar required a diagnostics platform that would help them predict when a vehicle needed maintenance to minimize downtime.

Use AI To Quickly Handle Sensitive Data Management

The growing waves of data that you’re pulling in include sensitive, personal or confidential data. This can become a compliance nightmare, especially with rules around PII, GDPR and CCPA, and it takes too much time to manually decide what should be protected. In this session, we will show how AI-driven data catalogs can identify sensitive data and share  that identification with your data security platforms to automate its discovery, identification and security.  You'll see how this dramatically reduces your time to onboard data and makes it safely available  to your business  communities.

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used.

Data Modeling in a Post-COVID-19 World

As a result of the COVID-19 pandemic, organizations around the world have had to transform overnight. Businesses that had been delaying digital transformation, or that hadn’t been thinking about it at all, have suddenly realized that moving their data analytics to the cloud is the key to coping with and surviving the COVID-19 disruption. The next phase is about rebounding and thriving in a post-COVID-19 world.

Amazon EMR Insider Series: Optimizing big data costs with Amazon EMR & Unravel

Data is a core part of every business. As data volumes increase so do costs of processing it. Whether you are running your Apache Spark, Hive, or Presto workloads on-premise or on AWS, Amazon EMR is a sure way to save you money. In this session, we’ll discuss several best practices and new features that enable you to cut your operating costs and save money when processing vast amounts of data using Amazon EMR.

Digital Transformation is Way More than Just Digital

Over the last 25 years, I have an unparalleled front seat to the digital transformation that is now accelerating in the connected manufacturing and automotive industry. Not many people have had the opportunity to witness the transformation and be as active in this area as I have; I consider myself lucky.

5 Pointers For Great Analytics Storytelling

Most of us know the story of “The Tortoise and the Hare.” It is one of Aesop’s classic fables in which a speedy, overconfident hare becomes complacent and realizes, all too late, that the tortoise, although outmatched, has managed to beat him in a race. It teaches us lessons about overconfidence and perseverance and has caused phrases like “slow and steady wins the race” to creep into our everyday language.

Adoption of a Cloud Data Platform, Intelligent Data Analytics While Maintaining Security, Governance and Privacy

“You cannot be the same, think the same and act the same if you hope to be successful in a world that does not remain the same.” This sentence by John C. Maxwell is so relevant to rapidly changing cloud hosting technology. Businesses understand the added value and are looking at cloud technologies to handle both operational and analytical workloads.

Introduction to Yellowfin Embedded Analytics for Developers

Yellowfin provides a spectrum of ways to deliver embedded analytics - from simple rebranding to embedding content like reports or dashboards, to the full application integration that provides your end users with self service reporting and automated data discovery seamlessly from within your application.

Yellowfin Embedded Analytics Walkthrough for Developers

Yellowfin is the only enterprise and embedded analytics suite that enables organizations to extract transformational value from their data because we combine action based dashboards, automated data discovery, and data storytelling into a single, integrated platform. Suited to more technical people, this video demonstrates how, with minimal coding effort, you would integrate Yellowfin seamlessly into your application. See how our APIs and Code Mode will enable you to build, embed and extend your product’s analytics capabilities and make your data shine.

Lumada Analytics Roadmap For A Better Data Culture

Struggling to extract insights and actionable intelligence from your data? With more data science and analytic solutions available today, do the handoffs among data scientists, IT and the business continue to disrupt your analytics value chain or are they becoming even more difficult? We’ve seen this cause “data despair” and decelerate investments in analytics and machine learning projects. In fact, only a quarter of organizations believe that they are actually building the “data culture” that fosters success.

How To Weave Multicloud Data Fabric: Roadmap Session On Lumada Edge Intelligence

Like most enterprises, you are generating huge volumes of structured, semi- and unstructured data at edge devices, core data centers and even public clouds. How can you more easily manage your data across all your repositories and avoid delaying your applied use of analytics? Our answer is to weave a multicloud data fabric for you that simplifies connecting data repositories from the edge to the core and to the cloud. This helps you gain quicker business insights with a scalable and cost-effective approach. Join this session to get an in-depth view of our upcoming Lumada Data Services product vision and strategy.

To Manage All Your Data Pipelines, Let's Follow The Lumada Dataflow Studio Roadmap

You’re building data pipelines to help your business users innovate with data. But with the shift to self-service, the data management practices need to evolve. And in addition to building your own pipelines, you’ll also need to manage hundreds or even thousands of users’ pipelines. What now? - See for yourself Hitachi’s vision for Pentaho Data Integration and Lumada Dataflow Studio. You’ll learn how Lumada Dataflow Studio helps you address today’s and tomorrow’s challenges in data preparation, orchestration and monitoring.

Cloudera Operational Database experience (dbPaaS) available as Technical Preview

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.

The Rise Of Connected Manufacturing - How Data Is Driving Innovation Part II

A Shift Towards Industry 4.0 Is Improving Manufacturing Efficiency And Increasing Innovation In Part II of our series with Michael Ger, Managing Director of Manufacturing and Automotive at Cloudera, he looks in greater detail at how AI, big data, and machine learning are impacting connected living and the evolution of autonomous driving.

Operational Database Scalability

Cloudera’s Operational Database provides unparalleled scale and flexibility for applications, enabling enterprises to bring together and process data of all types and from more sources, while providing developers with the flexibility they need. In this blog, we’ll look into capabilities that make Operational Database the right choice for hyperscale.

Minimizing Cloud Concentration Risk for Financial Services Institutions, Regulators and Cloud Service Providers

Since the financial crisis of 2008, regulators have been consistently working to identify emerging risks that can potentially result in financial stability events. The growth in cloud adoption across the Financial Services Industry (FSI) and the associated increase in reliance on third-party infrastructure providers has gained the attention of regulators at global, regional, and national levels.

Snowflake Enables Modern Cloud Data Analytics at US Foods

Approximately 300,000 restaurants and food service operators across the United States rely on US Foods as their national distributor for food and supplies. In return, US Foods takes a holistic approach to servicing customers by going beyond food offerings. The distributor provides a comprehensive suite of e-commerce technology and business solutions that help restaurants manage their entire business.

Connected Manufacturing Insights from the Edge with Cloudera DataFlow

Connected Manufacturing’s Pivot to an Enterprise Data Solution Connected Manufacturing is at a turning point and it is catalyzed by a real, measurable change and shift in data types – real-time and time-series data is growing 50% faster than latent or static data forms and streaming analytics projected to grow at a 28% CAGR, leaving legacy data platforms that specialize in static historical data solutions, functioning on-prem or in discrete clouds, inadequate in addressing today’s rea

Snowflake Service Account Security, Part 3

In Part 1 and Part 2 of this blog post series, Snowflake Service Account Security, discussed service accounts threats and how to mitigate those threats with Snowflake features. Part 3 demonstrates how to manage credential rotation with a sample Hashicorp Vault plugin. You can use many platforms to achieve similar results. The important thing is to understand the patterns used to apply these controls to protect your service accounts.

Building an effective data approach in a hybrid cloud world

“In today’s world of disruption and transformation, there are a few key things that all organizations are trying to figure out: how to remain relevant to their customer base, how to deal with the pressure of disruption in their industry and, undoubtedly, how to look to technology to help deliver a better service.” Paul Mackay Today we are sitting down with Marc Beierschoder, Analytics & Cognitive Offering Lead at Deloitte Germany and Paul Mackay, the EMEA Cloud Lead at Cloudera to dis

Reach New Speeds and Unlock Unstructured Data Value

Unstructured data is the biggest untapped source of value in your organization, and we can help unlock that value. Since 80% of data is unstructured and growing at an exponential rate, just having BIG data isn’t good enough. You need big data FAST in order to make more accurate and timely decisions. Hitachi Vantara can deliver your data at the speed of business with faster data access to maximize your infrastructure advantage for accuracy, productivity, competitive edge and better outcomes.

Bringing multi-cloud analytics to your data with BigQuery Omni

Today, we are introducing BigQuery Omni, a flexible, multi-cloud analytics solution that lets you cost-effectively access and securely analyze data across Google Cloud, Amazon Web Services (AWS), and Azure (coming soon), without leaving the familiar BigQuery user interface (UI). Using standard SQL and the same BigQuery APIs our customers love, you will be able to break down data silos and gain critical business insights from a single pane of glass.

How To Get Your DataOps Initiative Prioritized And Paid For

You see the clear and immediate value in DataOps, but your opinion is not the only one that matters. You need  your team, your colleagues and your business partners to see that value too – or the project  won’t move ahead. In this  session, you'll learn the tips and techniques for building an inclusive story.  We'll discuss techniques of Design Thinking you can use to translate very technical concepts into business value and outcomes. You will  learn  practical ways to communicate the value in your DataOps initiative and ensure that it delivers that value when you  Implement it.

CDP Private Cloud ends the battle between agility & control in the data center

As a BI Analyst, have you ever encountered a dashboard that wouldn’t refresh because other teams were using it? As a data scientist, have you ever had to wait 6 months before you could access the latest version of Spark? As an application architect, have you ever been asked to wait 12 weeks before you could get hardware to onboard a new application?

Agile Insights During COVID-19 with ThoughtSpot, Snowflake, and Starschema

The COVID-19 pandemic is forcing every business to see the world differently. From examining business continuity plans, modernizing workforce plans, or building supply chain resiliency, no facet of business has gone untouched. As organizations combat the economic fallout now and in the coming years, agility has never been more important. The key to remaining agile is a better use of data.

Make Your Data Fabrics Work Better

To gain the full benefits of the DataOps strategy, your data lakes must change. The traditional concept of bringing all data to one place, whether on-premises or in the cloud, raises questions of timing, scale, organization and budget. The answer? Data fabric. It replaces traditional data lake organization concepts with a more flexible and economical architecture. In this session, we'll define what a data fabric is, show you how you can begin organizing around the concept, and discuss how to align it to your business objectives.

Apache Hadoop YARN in CDP Data Center 7.1: What's new and how to upgrade

This blogpost will cover how customers can migrate clusters and workloads to the new Cloudera Data Platform – Data Center 7.1 (CDP DC 7.1 onwards) plus highlights of this new release. CDP DC 7.1 is the on-premises version of Cloudera Data Platform.

5 Challenges of Simplifying DevOps for Data Apps

The benefits of building a DevOps culture for software companies are clear. DevOps practices integrate once-siloed teams across the software development lifecycle, from Dev to QA to Ops, resulting in both faster innovation and improved product quality. As a result, most software development teams have deployed tools to enable DevOps practices across their workflow.

A Cloud Data Platform for Data Science

Data scientists require massive amounts of data to build and train machine learning models. In the age of AI, fast and accurate access to data has become an important competitive differentiator, yet data management is commonly recognized as the most time-consuming aspect of the process. This white paper will help you identify the data requirements driving today's data science and ML initiatives and explain how you can satisfy those requirements with a cloud data platform that supports industry-leading tools.

5 Strategies to Improve Secure Data Collaboration

Many organizations struggle to share data internally across departments and externally with partners, vendors, suppliers, and customers. They use manual methods such as emailing spreadsheets or executing batch processes that require extracting, copying, moving, and reloading data. These methods are notorious for their lack of stability and security, and most importantly, for the fact that by the time data is ready for consumption, it has often become stale.

Overview of the Operational Database performance in CDP

This article gives you an overview of Cloudera’s Operational Database (OpDB) performance optimization techniques. Cloudera’s Operational Database can support high-speed transactions of up to 185K/second per table and a high of 440K/second per table. On average, the recorded transaction speed is about 100K-300K/second per node. This article provides you an overview of how you can optimize your OpDB deployment in either Cloudera Data Platform (CDP) Public Cloud or Data Center.

Ask questions to BigQuery and get instant answers through Data QnA

Today, we’re announcing Data QnA, a natural language interface for analytics on BigQuery data, now in private alpha. Data QnA helps enable your business users to get answers to their analytical queries through natural language questions, without burdening business intelligence (BI) teams. This means that a business user like a sales manager can simply ask a question on their company’s dataset, and get results back that same way.

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

A Message To You Kafka - The Advantages of Real-time Data Streaming

In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

Welcome and Introduction to DataOps.NEXT

DataOps matters, especially in today’s uncertain times. Data management and analytics are crucial to respond faster and drive results for your business, your customers and society. That’s why we built DataOps.NEXT to help you get from now to what’s next, with data. We’ll bring out Dr. Jennifer Hall, the chief of data science for American Heart Association (AHA) to discuss how Hitachi Vantara and AHA have worked together to support research for COVID-19. Tune in for Pedro Alves, Hitachi Vantara’s head of product design and designated “Community Guy.” He’ll provide our vision and strategy for DataOps, including an update on Pentaho Open Source and Enterprise Edition

Sifting Through COVID-19 Research With Qlik and Machine Learning

Research on COVID-19 is being produced at an accelerating rate, and machine intelligence could be crucial in helping the medical community find key information and insights. When I came across the COVID-19 Open Research Dataset (CORD-19), it contained about 57,000 scholarly articles. Just one month later, it has over 158,000 articles. If the clues to fighting COVID-19 lie in this vast repository of knowledge, how can Qlik help?

Genomics analysis with Hail, BigQuery, and Dataproc

At Google Cloud, we work with organizations performing large-scale research projects. There are a few solutions we recommend to do this type of work, so that researchers can focus on what they do best—power novel treatments, personalized medicine, and advancements in pharmaceuticals.

Building a genomics analysis architecture with Hail, BigQuery, and Dataproc

We hear from our users in the scientific community that having the right technology foundation is essential. The ability to very quickly create entire clusters of genomics processing, where billing can be stopped once you have the results you need, is a powerful tool. It empowers the scientific community to spend more time doing their research and less time fighting for on-prem cluster time and configuring software.

How Marketers Can Drive ROI from Customer Data Platforms

As first-party customer data continues to explode, companies have struggled to make it actionable for personalization, advanced analytics, and other business purposes. As a result, customer data platforms that consolidate and activate known customer information have emerged to help companies generate ROI from their data. At present, nearly 80% of marketing organizations already have a customer data platform or are developing one.

Snowflake: One Cloud Data Platform for All Your Analytics Needs

DELIVER ALL YOUR DATA WORKLOADS WITH SNOWFLAKE Gartner predicts that 75% of all databases will be deployed or migrated to a cloud platform by 2022. But how does e a cloud data platform enable a long-term strategy for maximizing all of an organization's data assets? Snowflake's cloud data platform is a highly extensible, multi-region and multi-cloud platform that powers all types of data workloads. Specifically, Snowflake: To learn everything Snowflake offers today's, forward-looking organizations, download our white paper, Snowflake: One Cloud Data Platform for All Your Analytic Needs.

10 Ways to Simplify DevOps for Data Apps with Snowflake

Most companies that build software have a strong DevOps culture and a mature tool chain in place to enable it. But for developers that need to embed a data platform into their applications to support data workloads, challenges emerge. DevOps for databases is much more complex than DevOps for code because database contain valuable data, while code is stateless. Instantly creating any number of isolated environments Reducing schema change frequency with variant data type