Systems | Development | Analytics | API | Testing

April 2020

What's happening in BigQuery: Efficient new views and Cloud AI integrations

BigQuery, Google Cloud’s petabyte-scale data warehouse, lets you ingest and analyze data quickly and with high availability, so you can find new insights, trends, and predictions to efficiently run your business. Our engineering team is continually making improvements to BigQuery so you can get even more out of it. Recently added BigQuery features include new materialized views, column-level security, and BigQuery ML additions.

Operational Database Integrity

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post provides an overview of the OpDB data integrity capabilities that help you achieve ACID transactions and data consistency. OpDB guarantees certain properties to ensure atomicity, durability, consistency, and visibility.

Are you using the right data strategy based on the hierarchy of data needs?

Being data-driven is the holy grail of modern business. It allows you to grow 8x faster than your competition, boosts your company’s net earnings by 30% and will have VCs throwing money at you if your organization relies on AI. So, what strategy does one use to become data-driven? Well, it’s actually quite simple: If you follow this recipe to the T, you can have your data cake and eat it.

Your Next Decision Could Change Lives: Why We Need Data Skills and Analytics

The year was 1993. The place, a little town in Sweden. A serial killer was on the loose. He randomly shot at people standing at bus stops or sitting in their cars, killing one and wounding many others. The residents of Malmö lived in fear. Window blinds were shut, playgrounds were deserted. The police didn’t know where to start.

Interactive dashboard: Tracking the COVID-19 pandemic in real-time on Analance

We are proud of our Core Team for building an interactive, live COVID-19 global report on Analance. During this challenging time, Ducen presents researchers, public health authorities, and the general public a dashboard that precisely displays key pandemic statistics at a glance. It is accurate, timely, and easy-to-digest. The pandemic data set has been consolidated to report key worldwide statistics at a single glance, including the total numbers and data added in the last 24 hours.

The challenges you'll face deploying machine learning models (and how to solve them)

In 2019, organizations invested $28.5 billion into machine learning application development (Statistica). Yet, only 35% of organizations report having analytical models fully deployed in production (IDC). When you connect those two statistics, it’s clear that there are a breadth of challenges that must be overcome to get your models deployed and running.

One billion files in Ozone

Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t scale well beyond this limit.

Pentaho 9.0 Teaser: Multcluster Enhancements

Many organizations want to run any workload from any location without the burden of rearchitecting or refactoring applications. Often, they’ll want to leverage their existing on-premise Hadoop investments and provide a seamless experience to data consumers when they migrate to the cloud to take advantage of the usability, scalability and elasticity of cloud-native solutions. Watch this video to learn more about the Pentaho’s 9.0 multicluster enhancements.

Operational Database Availability

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the high availability configuration capabilities of Cloudera’s OpDB. Cloudera’s Operational Database (OpDB) is a cluster-based software, which comes configured for High Availability (HA) out of the box.

Augment EMR Workloads with CDP

The first thing that comes to mind when talking about synergy is how 2+2=5. Being the writer that he is, Mark Twain described it a lot more eloquently as “the bonus that is achieved when things work together harmoniously”. There is a multitude of product and business examples to illustrate the point and I particularly like how car manufacturers can bring together relatively small engines to do big things.

Create custom functionalities in Keboola's Developer Portal

Every time you write another piece of code that picks up data from an FTP server, a small piece of you dies. As a developer in the data space, you know what we’re talking about. 80% of your time can be taken by building and improving the environment and tools, maintenance tasks, and pieces of functionality. That's simply too much time dedicated away from tackling more important issues.

Ensuring data quality in healthcare: Inching closer to Analytics adoption

Advanced analytics has shaped the healthcare industry in significant ways. It has shown to impact the way healthcare is delivered and received by streamlining workflows, improving the patient experience, and lowering overall costs. Now, not only is the use of advanced analytics widespread—47% of providers are currently using the tool—but 93% of healthcare executives also consider it important to the future of their business.

Decision Making in Uncertain Times

Leaders know that making good, fast decisions is challenging under the best of circumstances. But, the trickiest decisions are those we call “big bets” – unfamiliar and high-stakes decisions. When you have a crisis of uncertainty, such as the COVID-19 pandemic, which arrived at overwhelming speed and enormous scale, organizations face a potentially paralyzing volume of these big-bet decisions.

Machine learning in production: Human error is inevitable, here's how to prepare.

You did it. You have machine learning capabilities up and running in your organization. Success! What started as a few nascent experiments (and maybe a few failures) are now carefully constructed models racing along in full production—with the ability to scale into the hundreds or thousands of productional models in sight. Assembling your expert team of data scientists and custodians seems like a distant memory. Now you’re looking ahead to the future—growth, innovation, revenue!

Augmented Analytics - How Associative and AI Technologies Are Changing the Face of Analytics

It’s hard to believe that we are now over 30 years into data warehousing. In that time, we have seen major changes in tools to help user report on and analyse data. In the last twenty years, we have seen the evolution from reporting, ad hoc analysis and advanced analytics. Today, BI/Analytics is a mature market with self-service BI and visual analysis standards in most organisations with self-service data preparation also widely deployed.

For Business Agility, Focus on Data - Not on Data Management

Effectively managing data in an edge-to-cloud world is becoming increasingly complex. Enterprises need data management simplicity and agility to maximize the benefits they can get from their data. The enterprise that will succeed will shift resources away from mundane data management tasks to focus on using data to innovate and add business value.

Fresh Features: The beautiful, flexible design experience

Yellowfin 9 is defined by the belief that design matters. The ability to create a cohesive design look and feel across analytics dashboards and reports is particularly crucial for independent software vendors (ISVs) that embed analytics into their applications. Interestingly, when you take a look at the wider analytics market, few vendors are providing the toolkit that designers and developers need to build the analytical experiences they want.

How to choose your ETL tool

ETL tools help companies to streamline and enhance their data operations. They automate the repetitive tasks involved in extracting raw data from sources, transforming data into a consumable format and loading into data warehouses, where it is ready to be analyzed. With so many offerings available to you, all of which do the heavy lifting ‘out of the box’, it is hard to discern which ETL tool is best suited to your needs.

Why Allegro AI? With Catherine K.C. Leung, MizMaa Ventures

In this video Catherine K.C. Leung, the Co-Founder & General Partner at MizMaa Ventures discuss the global AI market and Allegro AI. Allegro AI announced that it has closed a fundraising round, led by MizMaa Ventures, with participation from Robert Bosch Venture Capital GmbH (RBVC), Samsung Catalyst Fund and Dynamic Loop Capital.

Operational Database Management

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the OpDB management tools and features in the Cloudera Data Platform. The tools discussed in this article will help you understand the various options available to manage the operations of your OpDB cluster.

Challenges of running a big data distro in the cloud

There are many reasons to run a big data distribution, such as Cloudera Data Hub (CDH) and Hortonworks Data Platform (HDP), in the cloud with Infrastructure-as-a-Service (IaaS). The main reason is agility. When the business needs to onboard a new use case, a data admin can bring on additional virtual infrastructure to their clusters in the cloud in minutes or hours. With an on-prem cluster, it may take weeks or months to add the infrastructure capacity for the new use cases.

Predicting the Future With Linear Regression in Ruby

The world is full of linear relationships. When one apple costs $1 and two apples cost $2, it's easy to figure out the price of any number of apples. But what happens when you have 100s of data points? What if your data source is noisy? That's when it's helpful to use a technique called linear regression. In this article Julie Kent shows us how linear regression works, and walks through a practical example in Ruby.

Evolving Insurance with Data and Analytics

Insurance companies around the world are striving ahead with innovative offerings that are fundamentally changing the insurance landscape. Insurance companies are creating personalized offerings and products that are tailored to the specific needs of their customers. For example, they are implementing usage-based insurance (UBI) based on driving habits, miles driven and driving history and discounts on health insurance based on health trackers, etc.).

Allegro Trains trains-agent installation tutorial

Installation and configuration tutorial for Trains-Agent, Allegro AI's zero configuration fire-and-forget execution agent for the Allegro Trains solution. Allegro-Agent enables ML-Ops / DevOps orchestration, queue management, remote execution, automation and more - for the Allegro Trains solution. Allegro Trains is an open source machine and deep learning (ML / DL) experiment manager, versioning and ML-Ops full system solution for data science and data engineering teams and projects.

The U.S. Census Enters the Digital Age with Cloudera

2020 brings a new decade, and for the U.S Census Bureau, a new challenge. As the federal government’s—and the nation’s—leading provider of demographic and economic data, its largest initiative is the U.S. Census, which is conducted every 10 years and counts every resident in the United States. For the first time in U.S history, the census will be conducted primarily online instead of by mail.

Now Is the Time to Take Stock in Your Dataops Readiness: Are Your Systems Ready?

As the global business climate is experiencing rapid change due to the health crisis, the role of data to provide much needed solutions to urgent issues are being highlighted throughout the world. Helping customers manage critical modern data systems for years, Unravel sees a heightened interest in fortifying the reliability of business operations in healthcare, logistics, financial services and telecommunications.

The road to Advanced Analytics in Healthcare: Overcoming budget constraints

The healthcare space has much to gain from adopting advanced analytics, a fact that’s becoming more and more apparent during the coronavirus pandemic. In an industry where evidence-based decision-making and proactive intervention can save lives, it has become critical to analyze massive sets of data to streamline clinical and administrative workflows, enhance diagnostic accuracy, improve patient outcomes, and even lower overall expenditures.

Qlik Data Analytics - April 2020 Feature Demonstration

A longer detailed demonstration on more of the demonstrable features available in the Qlik Sense April 2020 release. Note an attempt was made to create an index with time code URLs - but for some reason clicking the time-code link in the description just brings you to the start of the video. Summary Video: Demo App.

Embedding AI-Powered Analytics into Your Application

Today’s leading software applications need more than just reports and dashboards to provide an edge. That edge is AI. In our next webinar series, Yellowfin SVP, Daniel Shaw-Dennis, goes beyond the buzz words and will cover the practicalities, marketability and steps to consider when embedding AI-powered analytics. This webinar explains how AI can multiply the value of your software application.

MLOps Automation From A to Z | Jupyter + KubeFlow + MLRun + Nuclio

Short but comprehensive end-to-end pipeline demo using the Iguazio real-time data science platform. MLOps (also known as DevOps for machine learning) is the practice of collaboration and communication between data scientists and data engineers to help manage the production machine learning (ML) lifecycle. Presented by Yaron Haviv, CTO & Co-Founder of Iguazio.

How do I move data from MySQL to BigQuery?

In a market where streaming analytics is growing in popularity, it’s critical to optimize data processing so you can reduce costs and ensure data quality and integrity. One approach is to focus on working only with data that has changed instead of all available data. This is where change data capture (CDC) comes in handy. CDC is a technique that enables this optimized approach.

Supercharge ML models with Distributed Xgboost on CML

Since childhood, we’ve been taught about the power of coalitions: working together to achieve a shared objective. In nature, we see this repeated frequently – swarms of bees, ant colonies, prides of lions – well, you get the idea. It is no different when it comes to Machine Learning models. Research and practical experience show that groups or ensembles of models do much better than a singular, silver bullet model. Intuitively, this makes sense.

How can your hospitality business survive the crisis?

Following the wave of COVID-19-related responses, the hospitality business has taken an unimaginable hit. Whether it’s the widespread fear of being infected, government-enforced shutdown or the implementation of social distancing, businesses in the hospitality industry are witnessing the decimation of their finances. In the space of just a few weeks, food traffic has fallen by 50%.

What's New in Talend Winter '20 Release

It has become commonplace to say that data is the lifeblood of digital transformation and that it affects every aspect in business. However, companies are faced with an uphill battle to close the data intelligence gap and truly enable the digital transformation of their organization. With the Winter ’20 release of Talend Data Fabric, we believe we are bringing the power of data intelligence to the next level.

Operational Database Administration

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. This blog post gives you an overview of the operational database (OpDB) administration tools and features in the Cloudera Data Platform.

Benchmarking NiFi Performance and Scalability

Ever wonder how fast Apache NiFi is? Ever wonder how well NiFi scales? When a customer is looking to use NiFi in a production environment, these are usually among the first questions asked. They want to know how much hardware they will need, and whether or not NiFi can accommodate their data rates. This isn’t surprising. Today’s world consists of ever-increasing data volumes. Users need tools that make it easy to handle these data rates.

Enabling Olympic-level performance and productivity for Delta Lake on Databricks

Recently, Databricks introduced Delta Lake, a new analytics platform that combines the best elements of data lakes and data warehouses in a paradigm it calls a “lakehouse.” Delta Lake expands the breadth and depth of use cases that Databricks customers can enjoy. Databricks provides a unified analytics platform that provides robust support for use cases ranging from simple reporting to complex data warehousing to real-time machine learning.

A New Yellowfin 9.1 Release

With Yellowfin 9, we introduced to the world an incredibly flexible, action-based dashboard builder and progressive data storytelling capabilities that advance the capability of the dashboard experience. We’ve received great feedback since then and this month, the newly-released 9.1 further enhances the user experience of analysts, developers, and business users in Yellowfin’s action-based dashboards, data storytelling, and data discovery products.

Yellowfin 9.1 Release

With Yellowfin 9, we introduced to the world an incredibly flexible, action-based dashboard builder and progressive data storytelling capabilities that advance the capability of the dashboard experience. We’ve received great feedback since then and this month, the newly-released 9.1 further enhances the user experience of analysts, developers, and business users in Yellowfin’s action-based dashboards, data storytelling, and data discovery products.

Supermarkets Optimizing Supply Chains with Unravel DataOps

Retailers are using big data to report on consumer demand, inventory availability, and supply chain performance in real time. Big data provides a convenient, easy way for retail organizations to quickly ingest petabytes of data and apply machine learning techniques for efficiently moving consumer goods. A top supermarket retailer has recently used Unravel to monitor its vast trove of customer data to stock the right product for the right customer, at the right time.

Hadoop: Decade Two, Day Zero*

One key aspect of the Cloudera Data Platform (CDP), which is just beginning to be understood, is how much of a recombinant-evolution it represents, from an architectural standpoint, vis-à-vis Hadoop in its first decade. I’ve been having a blast showing CDP to customers over the past few months and the response has been nothing short of phenomenal…

How Talend is joining the fight against COVID-19: unlocking the best data for health researchers

The novel coronavirus, COVID-19, presents challenges the world hasn’t seen for decades. Humans have fought global pandemics before, and it isn’t easy. But we have an additional weapon on our side this time — data.

Don't You (Forget About Your Data)

Back in 1985, Simple Minds sang “Don’t You (Forget About Me),” the soundtrack to – what is IMHO – one of the greatest movies of the ‘80s: “The Breakfast Club.” The song famously asks us not to forget, and – if any of the wedding parties I’ve attended are anything to go by – we certainly haven’t. However, when it comes to our sensitive data, that’s not always the case.

Operational Database Accessibility

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP. Cloudera’s OpDB provides a rich set of capabilities to store and access data. In this blog post, we’ll look at the accessibility capabilities of OpDB and how you can make use of these capabilities to access your data.

Fresh Features: the upgraded user experience

We’re continuing our series on the slick new features and design that you can find in Yellowfin 9 - a game-changing analytics product packed with new capabilities to help you get to actionable insights faster. It was time for a change to the look and feel of the Yellowfin platform and we also knew that some of the workflows could be enhanced. So, the major release of Yellowfin 9 was our chance to give Yellowfin a new look and improve the user interface and workflows while we were at it.

Has Data, AI and Bots Brought Us Closer Than Ever To Achieving The Modern Day KITT Car?

As a kid, I loved the TV show “Knight Rider.” But, for me, the star of the show wasn’t David Hasselhoff, it was the intelligent automobile KITT. KITT – the Knight Industries Two Thousand – was smart, funny and sarcastic, which is always well received by us Brits.

Augmented Analytics Explained

Now more than ever, data will play a huge role in business decision making. There is a lot of talk about how ‘AI’ and Machine Learning will influence how we live and work in the future. But how does this apply to data, analytics and business intelligence? And more importantly, how can you leverage the power of augmented analytics. Recently named a Visionary in the 2020 Gartner Magic Quadrant for Analytics and Business Intelligence Platforms, Yellowfin is at the forefront of Augmented Analytics.

Introducing BigQuery column-level security: new fine-grained access controls

We’re announcing a key capability to help organizations govern their data in Google Cloud. Our new BigQuery column-level security controls are an important step toward placing policies on data that differentiate between classes. This allows for compliance with regulations that mandate such distinction, such as GDPR or CCPA.

The journey to democratize data continues

Data is the new oil and a critical differentiator in generating retrospective, interactive, and predictive ML insights. There has been an exponential growth in the amount of data in the form of structured, semi-structured, and unstructured data collected within the enterprise. Harnessing this data today is difficult — typically data in the lakes is not consistent, interpretable, accurate, timely, standardized, or sufficient. Scully et. al.

How to empower your remote data team

The advantages of remote work have been praised for a while now. Team members gravitate towards the flexible work schedule and enjoy an increase in personal freedom, while companies benefit from a talent pool unencumbered by geographical borders. From remote-first tech giants like Automattic (which powers a third of the internet), to newbies such as NASA, companies are steadily progressing towards telecommuting. But working from home comes with its own unique set of challenges.

Databox - How it works

Databox is a decision-making platform built to help you track performance, discover insights and understand what's going on with your business. It connects your cloud services, spreadsheets, databases and custom integrations to organize all of your business KPIs in one place. Databox will deliver your metrics via mobile, browser, big screen, Apple Watch®, and even Slack.