On May 30th, we return to Munich for an evening of talks and conversations around stream processing. Learn more and sign up here - Stream Processing Meetup - Kafka, ML and more!, Tue, May 30, 2023, 6:30 PM | Meetup
This month we welcome:
Stefan Sprenger -
Learnings From Shipping 1000+ Streaming Data Pipelines To Production - Stefan Sprenger (Linkedin Bio)
Kafka Connect and Kafka Streams are foundational technologies in modern, real-time data architectures. They enable developers to build scalable, robust, and real-time data pipelines without having to handle the low-level consumer and producer APIs of Apache Kafka. In this talk, we share our most important, and often surprising learnings from using Kafka Connect and Kafka Streams to ship more than 1,000 streaming data pipelines to production. The goal of this talk is to enable you to build mature streaming data pipelines without having to go through the common pitfalls.
We walk you through our journey of adopting Apache Kafka, Kafka Connect, and Kafka Streams. We discuss the challenges that we faced and how we overcame them. Over the course of the talk, we provide answers to important questions, such as: Which metrics are useful for monitoring streaming data pipelines? How to deal with resource-leaking connectors impacting the health of a Kafka Connect cluster? How to start troubleshooting the performance of streaming data pipelines? How to tune Kafka Connect for handling slow data sources or data sinks? What’s missing in today’s ecosystem for streaming to become a commodity?
Simplifying Real-Time ML Pipelines with Quix Streams: An Open Source Python Library for ML Engineers - Tomas Neubauer (Linkedin Bio)
As data volume and velocity continue to increase, the need for real-time machine learning (ML) is becoming more pressing. However, building real-time ML pipelines can be complex and time-consuming, requiring expertise in both ML and streaming application development. This talk will address this problem by introducing Quix Streams, an open-source Python library that makes it easy for data scientists and ML engineers to build real-time ML pipelines without having to learn the intricacies of building a streaming application from scratch.
In this talk, we’ll cover:
• The growing importance of real-time ML in today’s application stack, and the use cases for real-time ML processing.
• A comparison of different ML architectures (batch, request-response, stream, and hybrid) and their pros and cons
• The current state of streaming architecture, which is typically Java-based, and the challenges this poses for data scientists and ML engineers who primarily work in Python
• An overview of Quix Streams and its features, including a demo of how to use it to build real-time ML pipelines
This talk is relevant for data scientists, ML engineers, and software engineers who are looking to adopt new technologies and practices in order to build real-time ML pipelines and stay current in their field.