Continuous integration, delivery and deployment, known as CI/CD, has become such a critical piece in every successful software project that we cannot deny the benefits it can bring to your project. At the same time, containers are everywhere right now and are very popular among developers. In practice, CI/CD delivery allows users to gain confidence in the applications they are building by continuously test and validate them.
There is a common misconception about data-informed decision making. It goes that once we implement the right tools, and figure out how to analyze the data correctly, the data will automatically turn into insights and will translate to better business decisions. It sounds great in theory.
When I speak to people who are thinking about implementing BI, they are often overwhelmed by all the things they could measure. Many start by wanting to measure everything, which doesn’t necessarily help them. That’s because there’s an inherent cost in measuring things – everything you report and track creates an ongoing burden that your organization has to maintain. That’s why it’s important to be selective about what you measure from the get-go.
In our last blog, we talked about developing data processing jobs using Apache Beam. This time we are going to talk about one of the most demanded things in modern Big Data world nowadays – processing of Streaming data. The principal difference between Batch and Streaming is the type of input data source. When your data set is limited (even if it’s huge in terms of size) and it is not being updated along the time of processing, then you would likely use a batching pipeline.