Systems | Development | Analytics | API | Testing

July 2020

How Data Exchanges Enable Secure Data Collaboration

In a global economy, real-time data analysis is closely related to business success. Without data-driven insights, organizations find it challenging to remain competitive, improve company performance, and deliver strong user experiences, regardless of their industry. To match the pace of business, companies require transparent, data-driven relationships.

Meeting Medical Device Data Privacy, Governance, and Security Challenges

Medical devices have become increasingly complex as technology evolves, and the sheer number of these devices now being worn or implanted has grown exponentially over the past few years. There are currently over 500,000 different types of smart, connected medical devices in use that have the ability to collect, share, or store private patient data and protected health information (PHI)(1).

Snowflake Helps Finnair Improve Customer Experience with Cloud Data Analytics

Coronavirus has impacted the travel industry, but as it adapts, there is one factor airlines have always worked hard to minimize: delayed flights. Arriving late or missing a connection can severely impact the customer experience, which is why airlines work hard to maintain high rates of on-time performance (OTP). To that end, pilots may have to use extra fuel to make up for a delayed departure or to reach a destination early, even if it means circling the airport before landing.

Faster Analytics with Cloudera Data Warehouse (CDW) Demo Highlight

The cloud-led journey to digital transformation requires organizations to become significantly more data-driven, yet traditional data warehouses have difficulty with new data volumes, new data types, and a variety of use cases. In this session, we will show you how Cloudera Data Warehouse offers a guide to your cloud journey by offering a modern hybrid cloud solution for an unprecedented scale that delivers insight to every part of your organization, faster while saving costs.

How to Move from Basic to Advanced Marketing Analytics in Four Steps

Advanced marketing analytics can improve campaign relevance, increase customer lifetime value, accelerate insights, reduce acquisition costs, and drive ROI. But moving to advanced analytics requires a thoughtful investment in the right infrastructure for storing, tracking, and analyzing customer data, which can be daunting to companies that only have basic analytics capabilities.

Predicting 1st-Day Churn in Real-Time - MLOps Live #7 - With Product Madness (an Aristocrat co.)

Michael Leznik - Head of Data Science Matthieu Glotz - Data Scientist Yaron Haviv - CTO & Co-Founder We discuss how technology and new work processes can help the gaming and mobile app industries predict and mitigate 1st-day (or D0) user churn in real time — down to minutes and seconds using modern streaming data architectures such as KAPPA. Also, we explore feature engineering improvements to the RFM (Recency, Frequency, and Monetary) churn prediction framework: The Discrete Wavelet Transform (DWT).

Why our new Streaming SQL opens up your data platform

SQL has long been the universal language for working with data. In fact it’s more relevant today than it was 40 years ago. Many data technologies were born without it and inevitably ended up adopting it later on. Apache Kafka is one of these data technologies. At Lenses.io, we were the first in the market to develop a SQL layer for Kafka (yes, before KSQL) and integrate it in a few different areas of our product for different workloads.

Why SQL is your key to querying Kafka

If you’re an engineer exploring a streaming platform like Kafka, chances are you’ve spent some time trying to work out what’s going on with the data in there. But if you’re introducing Kafka to a team of data scientists or developers unfamiliar with its idiosyncrasies, you might have spent days, weeks, months trying to tack on self-service capabilities. We’ve been there.

Data dump to data catalog for Apache Kafka

From data stagnating in warehouses to a growing number of real-time applications, in this article we explain why we need a new class of Data Catalogs: this time for real-time data. The 2010s brought us organizations “doing big data”. Teams were encouraged to dump it into a data lake and leave it for others to harvest. But data lakes soon became data swamps.

The reinvention of the Telco: From Pipe to Processor

The next generation of 5G networks are unlocking a mind-bending array of new use cases. Blistering speed, super low latency, and access to more powerful mobile hardware bring VR, AR and ultra high-definition experiences into sharp focus for the near future. But there’s a bigger shift being driven by 5G, and it’s not actually about speed at all. It’s about re-thinking the modern telco business model.

Building a Scalable Process Using NiFi, Kafka and HBase on CDP

Navistar is a leading global manufacturer of commercial trucks. With a fleet of 350,000 vehicles, unscheduled maintenance and vehicle breakdowns created ongoing disruption to their business. Navistar required a diagnostics platform that would help them predict when a vehicle needed maintenance to minimize downtime.

Introduction to Yellowfin Embedded Analytics for Product Teams

As a recognized leader in embedded analytics, Yellowfin has been designed and built to enable you to embed amazing analytical experiences into your software. From a highly integrated dashboard module and full self service reporting, to enabling best practice integration that blurs the lines between analytics and your application and workflows. Modernize your reporting environment with Yellowfin to ensure your customers engage with your data, discover insights faster through automation and innovate with contextualised analytics.

CDO Sessions: Getting Real with Data Analytics

Big data leaders are no doubt being challenged with market uncertainty. Data-driven insights can help organizations assess, and uncover market risk and opportunities that may arise during uncertain times. As businesses around the world adapt to digitization initiatives, modern data systems have become more mission critical toward continuity and competitive differentiation.

Enabling high-speed Spark direct reader for Apache Hive ACID tables

Apache Hive supports transactional tables which provide ACID guarantees. There has been a significant amount of work that has gone into hive to make these transactional tables highly performant. Apache Spark provides some capabilities to access hive external tables but it cannot access hive managed tables. To access hive managed tables from spark Hive Warehouse Connector needs to be used.

Data Modeling in a Post-COVID-19 World

As a result of the COVID-19 pandemic, organizations around the world have had to transform overnight. Businesses that had been delaying digital transformation, or that hadn’t been thinking about it at all, have suddenly realized that moving their data analytics to the cloud is the key to coping with and surviving the COVID-19 disruption. The next phase is about rebounding and thriving in a post-COVID-19 world.

What is data modeling and how can you model data for higher analytical outputs?

Being data-driven helps businesses to cut costs and produce higher returns on investments, increasing their financial viability in the fight for a piece of the market pie. But *becoming* data-driven is a more labor-intensive process. In the same way that companies must align themselves around business objectives, data professionals must align their data around data models. In other words: if you want to run a successful data-driven operation, you need to model your data first.

Use AI To Quickly Handle Sensitive Data Management

The growing waves of data that you’re pulling in include sensitive, personal or confidential data. This can become a compliance nightmare, especially with rules around PII, GDPR and CCPA, and it takes too much time to manually decide what should be protected. In this session, we will show how AI-driven data catalogs can identify sensitive data and share  that identification with your data security platforms to automate its discovery, identification and security.  You'll see how this dramatically reduces your time to onboard data and makes it safely available  to your business  communities.

Amazon EMR Insider Series: Optimizing big data costs with Amazon EMR & Unravel

Data is a core part of every business. As data volumes increase so do costs of processing it. Whether you are running your Apache Spark, Hive, or Presto workloads on-premise or on AWS, Amazon EMR is a sure way to save you money. In this session, we’ll discuss several best practices and new features that enable you to cut your operating costs and save money when processing vast amounts of data using Amazon EMR.

Digital Transformation is Way More than Just Digital

Over the last 25 years, I have an unparalleled front seat to the digital transformation that is now accelerating in the connected manufacturing and automotive industry. Not many people have had the opportunity to witness the transformation and be as active in this area as I have; I consider myself lucky.

5 Pointers For Great Analytics Storytelling

Most of us know the story of “The Tortoise and the Hare.” It is one of Aesop’s classic fables in which a speedy, overconfident hare becomes complacent and realizes, all too late, that the tortoise, although outmatched, has managed to beat him in a race. It teaches us lessons about overconfidence and perseverance and has caused phrases like “slow and steady wins the race” to creep into our everyday language.

Adoption of a Cloud Data Platform, Intelligent Data Analytics While Maintaining Security, Governance and Privacy

“You cannot be the same, think the same and act the same if you hope to be successful in a world that does not remain the same.” This sentence by John C. Maxwell is so relevant to rapidly changing cloud hosting technology. Businesses understand the added value and are looking at cloud technologies to handle both operational and analytical workloads.

ML / DL Engineering Made Easy with PyTorch's Ecosystem Tools

This blog post is a first of a series on how to leverage PyTorch’s ecosystem tools to easily jumpstart your ML / DL project. The first part of this blog describes common problems appearing when developing ML / DL solutions, and the second describes a simple image classification example demonstrating how to use Allegro Trains and PyTorch to address those problems.

Yellowfin Embedded Analytics Walkthrough for Developers

Yellowfin is the only enterprise and embedded analytics suite that enables organizations to extract transformational value from their data because we combine action based dashboards, automated data discovery, and data storytelling into a single, integrated platform. Suited to more technical people, this video demonstrates how, with minimal coding effort, you would integrate Yellowfin seamlessly into your application. See how our APIs and Code Mode will enable you to build, embed and extend your product’s analytics capabilities and make your data shine.

Introduction to Yellowfin Embedded Analytics for Developers

Yellowfin provides a spectrum of ways to deliver embedded analytics - from simple rebranding to embedding content like reports or dashboards, to the full application integration that provides your end users with self service reporting and automated data discovery seamlessly from within your application.

Predicting 1st Day Churn in Real Time

Survival analysis is one of the most developed fields of statistical modeling, with many real-world applications. In the realm of mobile apps and games, retention is one of the initial focuses of the publisher once the app or game has been launched. And it remains a significant focus throughout most of the lifecycle of any endeavor.

The benefits of building an on-demand data lake in healthcare

This blog was written in partnership with Navdeep Alam, Senior Director, Global Data Warehouse, IQVIA Healthcare is unique. It isn’t defined like other businesses by how much revenue can be generated, but more in terms of achieving positive health outcomes, better value, and saving lives through the rapid development of new treatments and therapies.

Grow your retail business through Advanced Analytics & Customer Review

When people ask you what EMR means, how many hospitals are in the area, or what’s the best way to understand your patients, you tell them “google it.” But when they ask you how to do something—track your heart rate, book an appointment, pay bills, consult with a doctor online—you say “you know, there’s an app for that.” Because most likely, there is. There’s an app for almost everything these days—and the healthcare industry is no exception.

Lumada Analytics Roadmap For A Better Data Culture

Struggling to extract insights and actionable intelligence from your data? With more data science and analytic solutions available today, do the handoffs among data scientists, IT and the business continue to disrupt your analytics value chain or are they becoming even more difficult? We’ve seen this cause “data despair” and decelerate investments in analytics and machine learning projects. In fact, only a quarter of organizations believe that they are actually building the “data culture” that fosters success.

Good Catch: Cloud Cost Monitoring

Aside from ensuring each service is working properly, one of the most challenging parts of managing a cloud-based infrastructure is cost monitoring. There are countless services to keep track of—including storage, databases, and computation—each with their own complex pricing structure. Monitoring cloud costs is quite different from other organizational costs in that it can be difficult to detect anomalies in real-time and accurately forecast monthly costs.

Cloudera Operational Database experience (dbPaaS) available as Technical Preview

The Cloudera Operational Database (COD) experience is a managed dbPaaS solution which abstracts the underlying cluster instance as a Database. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.

The Rise Of Connected Manufacturing - How Data Is Driving Innovation Part II

A Shift Towards Industry 4.0 Is Improving Manufacturing Efficiency And Increasing Innovation In Part II of our series with Michael Ger, Managing Director of Manufacturing and Automotive at Cloudera, he looks in greater detail at how AI, big data, and machine learning are impacting connected living and the evolution of autonomous driving.

How To Weave Multicloud Data Fabric: Roadmap Session On Lumada Edge Intelligence

Like most enterprises, you are generating huge volumes of structured, semi- and unstructured data at edge devices, core data centers and even public clouds. How can you more easily manage your data across all your repositories and avoid delaying your applied use of analytics? Our answer is to weave a multicloud data fabric for you that simplifies connecting data repositories from the edge to the core and to the cloud. This helps you gain quicker business insights with a scalable and cost-effective approach. Join this session to get an in-depth view of our upcoming Lumada Data Services product vision and strategy.

To Manage All Your Data Pipelines, Let's Follow The Lumada Dataflow Studio Roadmap

You’re building data pipelines to help your business users innovate with data. But with the shift to self-service, the data management practices need to evolve. And in addition to building your own pipelines, you’ll also need to manage hundreds or even thousands of users’ pipelines. What now? - See for yourself Hitachi’s vision for Pentaho Data Integration and Lumada Dataflow Studio. You’ll learn how Lumada Dataflow Studio helps you address today’s and tomorrow’s challenges in data preparation, orchestration and monitoring.

Will your streaming data platform disturb your holiday?

Here's why you need to double down on your DataOps before your vacation. In the past few months, everything has changed at work (or at home). Q1 plans were scrapped. Reset buttons were smashed. It was all about cost-cutting and keeping lights on. Many app and data teams sought quick solutions and developed workarounds to data challenges and operational problems as people prepared to work from home for the foreseeable future. And now, it’s time for a holiday.

Use IAM custom roles to manage access to your BigQuery data warehouse

When migrating a data warehouse to BigQuery, one of the most critical tasks is mapping existing user permissions to equivalent Google Cloud Identity and Access Management (Cloud IAM) permissions and roles. This is especially true for migrating from large enterprise data warehouses like Teradata to BigQuery. The existing Teradata databases commonly contain multiple user-defined roles that combine access permissions and capture common data access patterns.

Operational Database Scalability

Cloudera’s Operational Database provides unparalleled scale and flexibility for applications, enabling enterprises to bring together and process data of all types and from more sources, while providing developers with the flexibility they need. In this blog, we’ll look into capabilities that make Operational Database the right choice for hyperscale.

How No-Code Empowers Non-Engineers Without Sacrificing Data and Analytics

No-code is growing quickly. More and more people from all roles and departments are using a wide range of new tools and apps to build and experiment without fighting for engineering resources. This creates amazing opportunities for growth, but it also means data teams have to integrate data from ever-more sources to get a full picture of their business’s operations.

Future-proofing the supply chain with real-time data

Next to the healthcare system, COVID-19’s biggest infrastructural burden fell upon the supply chain. Fluctuations in supply and demand of essential goods, along with the oil surplus, led to a freight cliff in mid-April. Outbound tender volume and spot rates bottomed out, which highlighted a massive drop in demand. As the market rebounds, technological investments are key to the industry’s recovery.

Minimizing Cloud Concentration Risk for Financial Services Institutions, Regulators and Cloud Service Providers

Since the financial crisis of 2008, regulators have been consistently working to identify emerging risks that can potentially result in financial stability events. The growth in cloud adoption across the Financial Services Industry (FSI) and the associated increase in reliance on third-party infrastructure providers has gained the attention of regulators at global, regional, and national levels.

Snowflake Enables Modern Cloud Data Analytics at US Foods

Approximately 300,000 restaurants and food service operators across the United States rely on US Foods as their national distributor for food and supplies. In return, US Foods takes a holistic approach to servicing customers by going beyond food offerings. The distributor provides a comprehensive suite of e-commerce technology and business solutions that help restaurants manage their entire business.

Use Fivetran to Sync Any Unsupported Source

Fivetran is the industry leader of fully managed data integration from lots of different sources to your data warehouse. There are hundreds of popular sources, like Salesforce, Zendesk, NetSuite, and more, for which Fivetran has fully managed, prebuilt connectors that can deliver your data to your warehouse within literal minutes. However, people often struggle to get data from more obscure sources since most data integration tools focus on what is popular.

Why Should Your Business Move to the Cloud?

Cloud computing is a well-established IT option for businesses of all sizes in the modern era, but there are still plenty of organizations that have remained reticent about adoption. If you are in this camp, here is a look at a few of the reasons that migrating an on-premise data warehouse to the cloud may make a lot of sense, in addition to the other contexts in which the cloud could be a better fit for your business.

Connected Manufacturing Insights from the Edge with Cloudera DataFlow

Connected Manufacturing’s Pivot to an Enterprise Data Solution Connected Manufacturing is at a turning point and it is catalyzed by a real, measurable change and shift in data types – real-time and time-series data is growing 50% faster than latent or static data forms and streaming analytics projected to grow at a 28% CAGR, leaving legacy data platforms that specialize in static historical data solutions, functioning on-prem or in discrete clouds, inadequate in addressing today’s rea

Snowflake Service Account Security, Part 3

In Part 1 and Part 2 of this blog post series, Snowflake Service Account Security, discussed service accounts threats and how to mitigate those threats with Snowflake features. Part 3 demonstrates how to manage credential rotation with a sample Hashicorp Vault plugin. You can use many platforms to achieve similar results. The important thing is to understand the patterns used to apply these controls to protect your service accounts.

The Data Modernization Imperative. The Emergence of Kafka & the Growing Role of DataOps - Session 1

Featuring: Matt Aslett, Research Director, Data, AI and Analytics, 451 Research Key trends in data modernization Matt Aslett, Research Director, 451 Research highlights data modernization strategies influenced by the Coronavirus Pandemic. In particular, he points to data-driven decision making and real-time technologies such as #ApacheKafka as enablers. Key initiatives, such as #DataOps are discussed, and the pivotable role it shares in driving data modernization.

The Data Modernization Imperative. The Emergence of Kafka & the Growing Role of DataOps - Session 2

Featuring: Matt Aslett, Research Director, Data, AI and Analytics, 451 Research Kafka adoption and the role of #DataOps Matt Aslett, Research Director, 451 Research discusses the emergence of #ApacheKafka, as a leading platform for real-time event-driven architectures. While adoption is strong, organizations are challenged by the complexity of #Kafka as their deployments expand and additional use cases are adopted.

The Data Modernization Imperative. The Emergence of Kafka & the Growing Role of DataOps - Session 3

Featuring: Matt Aslett, Research Director, Data, AI and Analytics, 451 Research The Kafka skills shortage Matt Aslett, Research Director, 451 Research shares the challenges created by the wide-scale adoption of #ApacheKafka. In many cases, Kafka is a victim of its own success. Listen in on strategies of how organizations building applications on #Kafka can scale their teams in spite of a skills shortage

Bringing multi-cloud analytics to your data with BigQuery Omni

Today, we are introducing BigQuery Omni, a flexible, multi-cloud analytics solution that lets you cost-effectively access and securely analyze data across Google Cloud, Amazon Web Services (AWS), and Azure (coming soon), without leaving the familiar BigQuery user interface (UI). Using standard SQL and the same BigQuery APIs our customers love, you will be able to break down data silos and gain critical business insights from a single pane of glass.

Building an effective data approach in a hybrid cloud world

“In today’s world of disruption and transformation, there are a few key things that all organizations are trying to figure out: how to remain relevant to their customer base, how to deal with the pressure of disruption in their industry and, undoubtedly, how to look to technology to help deliver a better service.” Paul Mackay Today we are sitting down with Marc Beierschoder, Analytics & Cognitive Offering Lead at Deloitte Germany and Paul Mackay, the EMEA Cloud Lead at Cloudera to dis

Reach New Speeds and Unlock Unstructured Data Value

Unstructured data is the biggest untapped source of value in your organization, and we can help unlock that value. Since 80% of data is unstructured and growing at an exponential rate, just having BIG data isn’t good enough. You need big data FAST in order to make more accurate and timely decisions. Hitachi Vantara can deliver your data at the speed of business with faster data access to maximize your infrastructure advantage for accuracy, productivity, competitive edge and better outcomes.

Breaking the Silos Between Data Scientists, Engineers & DevOps with New MLOps Practices

Effectively bringing machine learning to production is one of the biggest challenges that data science teams today struggle with. As organizations embark on machine learning initiatives to derive value from their data and become more “AI-driven” or “data-driven”, it’s essential to find a faster and simpler way to productionize machine learning projects so that they can make business impact faster.

CDP Private Cloud ends the battle between agility & control in the data center

As a BI Analyst, have you ever encountered a dashboard that wouldn’t refresh because other teams were using it? As a data scientist, have you ever had to wait 6 months before you could access the latest version of Spark? As an application architect, have you ever been asked to wait 12 weeks before you could get hardware to onboard a new application?

Agile Insights During COVID-19 with ThoughtSpot, Snowflake, and Starschema

The COVID-19 pandemic is forcing every business to see the world differently. From examining business continuity plans, modernizing workforce plans, or building supply chain resiliency, no facet of business has gone untouched. As organizations combat the economic fallout now and in the coming years, agility has never been more important. The key to remaining agile is a better use of data.

How To Get Your DataOps Initiative Prioritized And Paid For

You see the clear and immediate value in DataOps, but your opinion is not the only one that matters. You need  your team, your colleagues and your business partners to see that value too – or the project  won’t move ahead. In this  session, you'll learn the tips and techniques for building an inclusive story.  We'll discuss techniques of Design Thinking you can use to translate very technical concepts into business value and outcomes. You will  learn  practical ways to communicate the value in your DataOps initiative and ensure that it delivers that value when you  Implement it.

Apache Hadoop YARN in CDP Data Center 7.1: What's new and how to upgrade

This blogpost will cover how customers can migrate clusters and workloads to the new Cloudera Data Platform – Data Center 7.1 (CDP DC 7.1 onwards) plus highlights of this new release. CDP DC 7.1 is the on-premises version of Cloudera Data Platform.

5 Challenges of Simplifying DevOps for Data Apps

The benefits of building a DevOps culture for software companies are clear. DevOps practices integrate once-siloed teams across the software development lifecycle, from Dev to QA to Ops, resulting in both faster innovation and improved product quality. As a result, most software development teams have deployed tools to enable DevOps practices across their workflow.

A Dose Of Data Science Demystification

Join two data engineers and analysts in pulling back the curtain on real customer engagements, showing how to select and implement advanced data science and analytic techniques. In this session we will discuss our implementation of two data science models at a large agricultural products manufacturer: a propensity-to-buy model and a recommendation engine. We will discuss how each of these models works and how they were implemented for our client.

Make Your Data Fabrics Work Better

To gain the full benefits of the DataOps strategy, your data lakes must change. The traditional concept of bringing all data to one place, whether on-premises or in the cloud, raises questions of timing, scale, organization and budget. The answer? Data fabric. It replaces traditional data lake organization concepts with a more flexible and economical architecture. In this session, we'll define what a data fabric is, show you how you can begin organizing around the concept, and discuss how to align it to your business objectives.

A Cloud Data Platform for Data Science

Data scientists require massive amounts of data to build and train machine learning models. In the age of AI, fast and accurate access to data has become an important competitive differentiator, yet data management is commonly recognized as the most time-consuming aspect of the process. This white paper will help you identify the data requirements driving today's data science and ML initiatives and explain how you can satisfy those requirements with a cloud data platform that supports industry-leading tools.

5 Strategies to Improve Secure Data Collaboration

Many organizations struggle to share data internally across departments and externally with partners, vendors, suppliers, and customers. They use manual methods such as emailing spreadsheets or executing batch processes that require extracting, copying, moving, and reloading data. These methods are notorious for their lack of stability and security, and most importantly, for the fact that by the time data is ready for consumption, it has often become stale.

Demand for Data Grows in Agriculture

Agriculture (Ag) is the oldest and largest industrial vertical in the world, and its importance continues to grow as it becomes more challenging for people to access healthy and fresh food. A recent Agriculture Analytics Market report, released by Markets and Markets, estimates that by 2023, the global agriculture analytics market size will grow from 585 million to 1.2 billion dollars as demands for real-time data analysis and improved operations increase.

Overview of the Operational Database performance in CDP

This article gives you an overview of Cloudera’s Operational Database (OpDB) performance optimization techniques. Cloudera’s Operational Database can support high-speed transactions of up to 185K/second per table and a high of 440K/second per table. On average, the recorded transaction speed is about 100K-300K/second per node. This article provides you an overview of how you can optimize your OpDB deployment in either Cloudera Data Platform (CDP) Public Cloud or Data Center.

Powered by Fivetran Fuels Savvy Data Insights Platforms and Agencies

Powered by Fivetran (PBF) is a new offering for modern data insights platforms that provide analytics-as-a-service companies. These firms build data products on top of disparate solutions such as Tableau, Snowflake and Redshift, and offer insights to decision-makers in diverse verticals, from finance and marketing to energy and transportation.

Ask questions to BigQuery and get instant answers through Data QnA

Today, we’re announcing Data QnA, a natural language interface for analytics on BigQuery data, now in private alpha. Data QnA helps enable your business users to get answers to their analytical queries through natural language questions, without burdening business intelligence (BI) teams. This means that a business user like a sales manager can simply ask a question on their company’s dataset, and get results back that same way.

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

New Connector: YouTube Analytics

The value of YouTube has grown significantly for companies looking to bolster their brands with video content. The YouTube API is report-based, and its prebuilt reports fall into one of two categories: channel reporting and content owner reporting. Channel reports refer to the videos on a specific YouTube channel, while content owner reports contain data on all the channels owned by a particular individual.

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

A Message To You Kafka - The Advantages of Real-time Data Streaming

In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.

Massive growth in data today: 3 must-have skills for Data Science

In recent years, there’s been an increasing demand for data scientists left and right, across industries and across departments. In the same vein, companies are getting more and more data than they know what to do with. In fact, according to IBM, 90% of the data in the world today has been created in the last two years alone. To put this influx to good use, organizations are turning to data scientists.

Managing ML Projects - Allegro Trains vs GitHub

The resurrection of AI due to the drastic increase in computing power has allowed its loyal enthusiasts, casual spectators, and experts alike to experiment with ideas that were pure fantasies a mere two decades ago. The biggest benefactor of this explosion in computing power and ungodly amounts of datasets (thank you, internet!) is none other than deep learning, the sub-field of machine learning(ML) tasked with extracting underlining features, patterns, and identifying cat images.

Sifting Through COVID-19 Research With Qlik and Machine Learning

Research on COVID-19 is being produced at an accelerating rate, and machine intelligence could be crucial in helping the medical community find key information and insights. When I came across the COVID-19 Open Research Dataset (CORD-19), it contained about 57,000 scholarly articles. Just one month later, it has over 158,000 articles. If the clues to fighting COVID-19 lie in this vast repository of knowledge, how can Qlik help?

Introduction to Machine Learning Models

Over the last 100 years alone, artificial intelligence has achieved what was once believed to be science fiction: cars that drive themselves, machine learning models that diagnose heart disease better than doctors can, and predictive customer analytics that lead to companies knowing their customers better than their parents do. This machine learning revolution was sparked by a simple question: can a computer learn without explicitly being told how?

Welcome and Introduction to DataOps.NEXT

DataOps matters, especially in today’s uncertain times. Data management and analytics are crucial to respond faster and drive results for your business, your customers and society. That’s why we built DataOps.NEXT to help you get from now to what’s next, with data. We’ll bring out Dr. Jennifer Hall, the chief of data science for American Heart Association (AHA) to discuss how Hitachi Vantara and AHA have worked together to support research for COVID-19. Tune in for Pedro Alves, Hitachi Vantara’s head of product design and designated “Community Guy.” He’ll provide our vision and strategy for DataOps, including an update on Pentaho Open Source and Enterprise Edition

Genomics analysis with Hail, BigQuery, and Dataproc

At Google Cloud, we work with organizations performing large-scale research projects. There are a few solutions we recommend to do this type of work, so that researchers can focus on what they do best—power novel treatments, personalized medicine, and advancements in pharmaceuticals.

Building a genomics analysis architecture with Hail, BigQuery, and Dataproc

We hear from our users in the scientific community that having the right technology foundation is essential. The ability to very quickly create entire clusters of genomics processing, where billing can be stopped once you have the results you need, is a powerful tool. It empowers the scientific community to spend more time doing their research and less time fighting for on-prem cluster time and configuring software.

How Marketers Can Drive ROI from Customer Data Platforms

As first-party customer data continues to explode, companies have struggled to make it actionable for personalization, advanced analytics, and other business purposes. As a result, customer data platforms that consolidate and activate known customer information have emerged to help companies generate ROI from their data. At present, nearly 80% of marketing organizations already have a customer data platform or are developing one.

Snowflake: One Cloud Data Platform for All Your Analytics Needs

DELIVER ALL YOUR DATA WORKLOADS WITH SNOWFLAKE Gartner predicts that 75% of all databases will be deployed or migrated to a cloud platform by 2022. But how does e a cloud data platform enable a long-term strategy for maximizing all of an organization's data assets? Snowflake's cloud data platform is a highly extensible, multi-region and multi-cloud platform that powers all types of data workloads. Specifically, Snowflake: To learn everything Snowflake offers today's, forward-looking organizations, download our white paper, Snowflake: One Cloud Data Platform for All Your Analytic Needs.

10 Ways to Simplify DevOps for Data Apps with Snowflake

Most companies that build software have a strong DevOps culture and a mature tool chain in place to enable it. But for developers that need to embed a data platform into their applications to support data workloads, challenges emerge. DevOps for databases is much more complex than DevOps for code because database contain valuable data, while code is stateless. Instantly creating any number of isolated environments Reducing schema change frequency with variant data type