Building Streaming Data Pipelines, Part 2: Data Processing and Enrichment With SQL
In my last blog post, I looked at the essential first part of building any data pipeline—exploring the raw source data to understand its characteristics and relationships. The data is information about river levels, rainfall, and other weather information provided by the UK Environment Agency on a REST API. I used the HTTP Source connector to stream this into Apache Kafka topics (one per REST endpoint), and then Tableflow to expose these as Apache Iceberg tables.