Systems | Development | Analytics | API | Testing

Eliminate the pitfalls on your path to public cloud

As organizations look to get smarter and more agile in how they gain value and insight from their data, they are now able to take advantage of a fundamental shift in architecture. In the last decade, as an industry, we have gone from monolithic machines with direct-attached storage to VMs to cloud. The main attraction of cloud is due to its separation of compute and storage – a major architectural shift in the infrastructure layer that changes the way data can be stored and processed.

How to run queries periodically in Apache Hive

In the lifecycle of a data warehouse in production, there are a variety of tasks that need to be executed on a recurring basis. To name a few concrete examples, scheduled tasks can be related to data ingestion (inserting data from a stream into a transactional table every 10 minutes), query performance (refreshing a materialized view used for BI reporting every hour), or warehouse maintenance (executing replication from one cluster to another on a daily basis).

Introducing FlinkSQL in Cloudera Streaming Analytics

Our 1.2.0.0 release of Cloudera Streaming Analytics Powered by Apache Flink brings a wide range of new functionality, including support for lineage and metadata tracking via Apache Atlas, support for connecting to Apache Kudu and the first iteration of the much-awaited FlinkSQL API. Flink’s SQL interface democratizes stream processing, as it caters to a much larger community than the currently widely used Java and Scala APIs focusing on the Data Engineering crowd.

A Message To You Kafka - The Advantages of Real-time Data Streaming

In these uncertain times of the COVID-19 crisis, one thing is certain – data is key to decision making, now more than ever. And, the need for speed in getting access to data as it changes has only accelerated. It’s no wonder, then, that organisations are looking to technologies that help solve the problem of streaming data continuously, so they can run their businesses in real-time.

Welcome and Introduction to DataOps.NEXT

DataOps matters, especially in today’s uncertain times. Data management and analytics are crucial to respond faster and drive results for your business, your customers and society. That’s why we built DataOps.NEXT to help you get from now to what’s next, with data. We’ll bring out Dr. Jennifer Hall, the chief of data science for American Heart Association (AHA) to discuss how Hitachi Vantara and AHA have worked together to support research for COVID-19. Tune in for Pedro Alves, Hitachi Vantara’s head of product design and designated “Community Guy.” He’ll provide our vision and strategy for DataOps, including an update on Pentaho Open Source and Enterprise Edition

Sifting Through COVID-19 Research With Qlik and Machine Learning

Research on COVID-19 is being produced at an accelerating rate, and machine intelligence could be crucial in helping the medical community find key information and insights. When I came across the COVID-19 Open Research Dataset (CORD-19), it contained about 57,000 scholarly articles. Just one month later, it has over 158,000 articles. If the clues to fighting COVID-19 lie in this vast repository of knowledge, how can Qlik help?

Genomics analysis with Hail, BigQuery, and Dataproc

At Google Cloud, we work with organizations performing large-scale research projects. There are a few solutions we recommend to do this type of work, so that researchers can focus on what they do best—power novel treatments, personalized medicine, and advancements in pharmaceuticals.

Building a genomics analysis architecture with Hail, BigQuery, and Dataproc

We hear from our users in the scientific community that having the right technology foundation is essential. The ability to very quickly create entire clusters of genomics processing, where billing can be stopped once you have the results you need, is a powerful tool. It empowers the scientific community to spend more time doing their research and less time fighting for on-prem cluster time and configuring software.