Why Apache Kafka® and for which IIoT use cases? - Confluent Cloud, Kafka and event streaming explained simply

Processing data can cost real money if you tackle the issue without the right technologies – especially when processing real-time data. So today, using BMW as an example, we’ll talk about Apache Kafka®, an emerging standard in interface handling and streaming of large data packets.

Episode 74 at a glance (and click):

[09:18] Challenges, potentials and status quo – This is what the use case looks like in practice
[19:36] Solutions, offerings and services – A look at the technologies used
[33:55] Results, Business Models and Best Practices – How Success is Measured

Podcast episode summary

Confluent is the IoT tech partner for today’s episode and co-developed the Apache Kafka® standard. The standard is already used by 100,000 organizations worldwide to handle large data volumes in real-time. Looking at the word “fluent”, the goal is obvious: Efficient data flow – the data should flow and not be pushed into datalakes! We’re talking about this data hub, a toolbox that allows me to consume and process data streams openly into all systems in a flexible way.

Data engineering can be simple: In this podcast, Field CTO at Confluent, Kai Waehner, explains in detail from the field how data volumes are filtered, ingested, processed and reused. In addition, topics addressed include:

– Real-time data handling
– The Business Impact of Data in Motion
– Data streaming at customer BMW
– Brownfields at customers
– Functions of the data hub
– Coupling of system data with IT data (SAP)
– Data connection to the data hub

Podcast interview

Kai – for those who don’t know you: You are Field CTO at Confluent. You mainly deal with modern enterprise architectures, data streaming and also innovative open source and cloud technologies. You also have your own website, which I will also link.

At Confluent, you are a publicly traded pioneer of a fundamentally new category of data infrastructure focused on data in motion. Centrally, it’s about your cloud-native platform functioning as an intelligent nervous system. It allows data in motion to flow in real time from a wide variety of sources continuously throughout the enterprise, is that fair to say?

Kai

This can be summarized very well like this. We don’t want to look at it deeply technically, but more from a use case perspective today. That’s exactly the issue where I meet with clients to work it out together. The core idea is that I can continuously process and correct data. This includes both large data streams from IoT sensor data from the factory, but also the correlation of this data, such as with an ERP system or CRM system. That’s the basic rule, where data movement in real time can always add value versus just storing the data somewhere.

That’s exactly what I do with many customers around the world. Then I share those success stories with others. That’s why we’re here today, and why it’s so exciting. It really is a paradigm shift.

To go right in and frame the issue; we always talk about data streaming and real-time data. Why is what you do so important? Can you put the issue of “real-time data handling” in this context?

Kai

Over the last 20 to 30 years, when you’ve developed software – whether that was on the mainframe, later a modern server, or now in the cloud – it’s basically been the case that you’ve always gotten data from somewhere and then written it to a database or file. Then someone picked it up there later. That was then, for example, a report that I do every week or for some kind of analyses, right now also with AI, Big Data and such topics – but the basic idea with such things was that the data is always “addressed”. That is, I store it somewhere, and at some point later – which can be ten minutes or even a day later – I pick up the data to analyze it or process it further.

What we’re doing now is called “Data in Motion.” So while the data is still relevant, I also want to process and correlate it. There are use cases across all industries and business units. You can best imagine it in the IoT environment of predictive maintenance. Of course, I can also analyze the data adressed later in the database, but then the machine is probably already broken; then I can only see why it broke. But I want to use these very insights to respond to such changes in motion in the future. That’s the difference and paradigm shift between: Data adressed in the database and Data in Motion, when I process the data continuously. This is the data streaming. This is not only about large volumes, but also about transactional data. But how do I use them at the right moment, and now rather than later?

That is the core idea, and here the important term “real time”; always define this, because everyone defines it differently. There’s a wide spectrum here, too. Some are satisfied if you can solve it in ten seconds, and some have to do it in milliseconds. Then there is also the topic of “hard real time” in the OT world (Operational Technology) – that is, somewhat more in production. That’s the one where I can’t have safety critical latency at all; that’s something else again. That’s not IT either; that’s not us then; that’s just for distinction.

When we talk about real time, sometimes it’s in the millisecond range, sometimes it’s in the seconds or minutes range. It’s about reacting quickly to changes.

Do you have some use cases and project examples for today that we can use to understand how your technology works and what the business impact is behind it?

Kai

I’ll talk about three different customer examples, which show that it’s a broad spectrum, because I have these types of data streams everywhere, and I can always generate added value, even if it looks different.

This is on the one hand customer specific, or on the other hand more thought in production, with IoT sensor data. The question you can always ask yourself – even as a listener: if I can respond to events, any events, now instead of later; if the business here says, “This is worth more,” then I can reduce my costs or my risk, for example. Or I can improve revenue or customer experience. Then that’s one thing where data in motion can act better.

A specific example is at BMW; they are rolling out data streaming specifically to connect their smart factories to the cloud, and correlate that very data from the factories in real time and correlate it with other systems, such as SAP’s ERP. This is a back-end process to optimize OEE (Overall Equipment Effectiveness), among other things.

Another example is at Bosch. They have a logistics supply chain as a project, and there again it’s about how I can monitor in the end-to-end processes and monitors to then act on certain events in real time. That’s not just about this real-time information, but correlating the information that’s happening now with other information that’s been around for a while. Bosch has built various processes there, including track and trace, for example. When the equipment and machines sold by Bosch are in the factory or somewhere in use on a construction site, so I can track everything and react to changes. These are simple use cases, such as finding the tool that has been lost on the construction site; but also early alarms to replace the batteries or because the network access is no longer working, as a second example.

As a third example, more customer specific, i.e. customer experience – this is again independent of the industry – it is about issues such as after sales. How do I communicate with the customer? We have a good use case from Royal Caribbean. These are cruise ships and there data streaming is used on the ship. In this case, it’s completely disconnected from the Internet because there’s bad internet connection on the ship, and yet they’re giving recommendations, location-based services, upselling, with for example inventory management from the restaurant, and correlate old data to best improve the revenue, but also the customer experience.

These examples serve to illustrate how broad the spectrum of use cases is across industries and business domains. Today we can talk explicitly about BMW, the Smart Factory and SAP integration.

I’m linking this again for all listeners who want to read up on other projects at this point, for example, who are in the logistics or maritime environment.

Challenges, potentials and status quo – This is what the use case looks like in practice [09:18]

Let’s start with BMW; where are we at BMW and what are exemplary processes there?

Kai

To anticipate this: At BMW, there are very many projects where data streaming replaces the issue of “addressed data”, because in many cases this real-time processing creates added value. In this specific aspect we’re talking about, it’s about the smart factory and the shopfloor.

Like most automakers, BMW has a cloud-first strategy. That is, while production is taking place in the factory at the edge and will continue to do so in the future because it is material goods, this data is being replicated directly from the factories to the cloud in real time, and processed or visualized there using other systems.

As a result, in all these IoT scenarios we are generally talking about a hybrid infrastructure, but with as much as possible being outsourced to the cloud. This is also the example in this case with us. In this case, the strategic partner is BMW’s Microsoft Azure Cloud, and with us they are clarifying everything around data integration and data streaming from the applications by replicating the data from the Smart Factory directly into the cloud, and there we are practicing the data streaming hub, processing and producing other new applications that allow the data to dock.

Roughly speaking, that is the scenario for this use case. Again, it tends to start small with one factory and push it to the cloud in one region; but generally the goal is for it to go global, so I can dock multiple factories across the world to the systems over time.

Hybrid in this case means that everything that runs on the edge are hard real-time use cases where you have to react quickly, and then you have outsourced various services. Or they run on the Microsoft Azure cloud, where the individual other use cases take place, is that fair to say?

Kai

Right. At the end of the day, the safety critical use cases, so how do I control the robot so it doesn’t hit a human … that’s hard real-time, that’s embedded systems with C or Rust and things like that. This has nothing to do with the IT use cases we are talking about.

But as soon as it goes into IT, whether we’re talking about data warehousing or an SAP integration, it’s not about critical real-time, it’s “only” about milliseconds or seconds. And that is the data that can then be very well outsourced to the cloud in most cases.

Often, the whole thing is also bidirectional; so on the one hand, I push as much as possible into the cloud for real-time analysis, but I then also map more and more control decisions in the cloud as soon as it is no longer safety critical, but I only do a forecast, for example, or only the integration with inventory management. Because it’s not about real time here, but about milliseconds or seconds to make a decision automatically in the cloud.

I think it’s quite important to delineate your domain; your domain includes the non-hard real-time use cases.

To learn more of the challenges of BMW: Where did you come into this? BMW must have said: “We don’t do this ourselves, we now rely on a system that already exists”.

Kai

With automotive manufacturers, after all, they all don’t start from a greenfield, they already have systems in place. The basic problems are that they cannot provide information to others at all. They don’t even know what exactly happens in a factory. What does my inventory look like right now? Is the battery of this robot already three quarters empty? This is all information that this robot or the shopfloor may already have, but not the person responsible for reorders.

You can then digitize and automate more and more. But the problem is, the more I automate and digitize, the more data I have. Then the issue of scalability comes up because I don’t just want to operate in near real time or real time, I need to be able to scale the whole thing. At the same time, I still need to connect non-real-time systems. For example, if I’m still on premises in the data center with an old ERP system, that’s not real time either. So I need exactly this data hub, which can process and correlate data in a highly scalable way and in real time, but which can also connect and link data with other systems in the same way as in a classic database or ERP system.

Based on this, one comes to new technologies where data streaming can provide many solutions. DM and other components should be integrated in the cloud whenever it makes sense. The focus here is on solving the business problems and not on operating the infrastructure, which is where most of the work is done.

The biggest problem, not just at BMW, is that IT experts are scarce, and the few I have, I’d rather they be able to take care of applications rather than the infrastructure and its operation. That’s what the cloud service provider can cover with critical SLAs, so BMW can really take care of the business cases.

One of the very big business problems here is “time to market.” I can, once I have the data hub, get the data from the robots and the shopfloor, in the first line I may build that just for visualization. Then, when the data is flowing anyway, I can dock to that data from other systems later.

So the data needs to be available to the decision makers and various people. Can you give any other examples of what data packages were relevant to BMW?

Kai

Here’s also the exciting point; it’s about the OT world; it’s about high volumes of of sensor data there. So what comes from the machines, what comes from the shopfloor, what comes from the robots, or what comes, for example, from historical data historians, i.e., from legacy technology that is still proprietary, perhaps running on a Windows server in the factory. That’s where all the data comes from the factory.

This is often very large data, and in such cases I am often not interested in the information. For example, if I measure the temperature of a machine, then I don’t have to insert all these large amounts of data in the ERP system, but only possibly the correlation, i.e. an aggregation of the last hour. Or when any threshold holds are reached, i.e. when the temperature is above a certain threshold.

I don’t want to push all the large amounts of data somewhere else. Also because the systems are partly not even built for that, but only relevant data from the OT world. But then, when I go into the IT world, then on the one hand it’s about classic IoT integration components here, like an SAP solution for the MES or ERP system, but then also with customer master data, where I might rather use a Salesforce CRM in the cloud. Or in the direction of analytics, where I connect my data warehouse and/or my data lake for machine learning and AI with. Or if I build my own applications with Java, C++ or .NET.

All of this data is relevant depending on the use case, and that’s what’s exciting. Then I can combine and correlate the data. That’s why I can’t say generically what BMW uses here or there. They connect the data and then the business unit can decide how to correlate data. This always according to regulations, such as GDPR in Europe for example, so that the whole thing is nevertheless also data protection compliant.

It’s probably the case that BMW has hundreds of use cases, depending on the business impact and unit, running internally.

Kai

Often it’s not even clear now what I can do with the data later. But when it flows through in real time, where I ask the next business, “Hey, here’s the data in real time, over there you get it later, what’s better for you?” That’s exactly how more and more then come at these systems and grab the data with their own interface.

This all sounds relatively big with SAP, the Oracle data, and I likewise have a wide variety of OT data; how do your customers get started? If you take a data package and push that – it’s a little more complicated in the case of BMW because it’s a cloud-first strategy, but not all are there yet. What is the mindset of your customers here?

Kai

They start small. It’s always our recommendation to start with an initial pilot project, where I dock on just a few interfaces at first. When the whole thing is up and running, in the first case – the pipelines – the data is only pushed into one or two interfaces, for example for visualization. The first quick win is often to simply know what is happening on the shopfloor at the various interfaces.

The first project must always bring business value; in whatever form. Whether it’s cost reduction or better customer experience. This is the only way to start deploying successfully. Then I can roll the whole thing out further and further.

But it is also possible to think small and start small. BMW and other manufacturers don’t arrive and say, Okay, I’m going to start a huge project and network the entire world; instead, they start with a smart factory. This is a strategic method that is more digitally inclined.

This can also be seen at Mercedes at the moment. In Stuttgart, they have their digitized factory, and of course you tend to start with something lighter because it’s more greenfield and it’s more modern with open interfaces and the new technologies that they’re using there in the factory as well. Once that’s up and running, I can roll it out more and more strategically and then decide again per factory how I’m going to connect that. Because if you go to BMW or Mercedes, it’s not like every factory worldwide looks the same. Just because it works here in Germany doesn’t mean it will work the same way somewhere in Mexico.

That’s exactly what this step-by-step integration and rolling out of new systems is. And while this example with BMW is being rolled out to factories around the world and the integration is being implemented in the cloud, in parallel in Germany, with the first use case already live, other business units can already dock onto the data here and build further use cases. That’s separated and called domain-driven design. So that I can build new applications in an agile and decoupled way; that is very important in terms of time to market.

We in Germany still have to get away from this motto of first planning and then doing. The Asians and Americans have been doing this better for years. They try things out first, and we want it to go in the same direction. Especially in the cloud, I can start something new and try it out. If it doesn’t work, I just shut it down again, because then I just shut down the cloud service and try something new.

Solutions, offerings and services – A look at the technologies used [19:36]

We talk about different standards. How does the data acquisition work for you? I now have my interfaces, whether that’s in the OT world or in the different systems; how do I retrieve the data of the systems and data sources?

Kai

There are various possibilities. As a partial data streaming platform, this is a so-called “cloud-native integration platform”. Either if I have modern interfaces, then I can push them directly into the data hub – that’s where we’re talking about modern open interfaces, like OPC UA or MQTT, which is now well-known in the IoT world, or often still an http web service that I dock. Or I have out of the box connectors to different products, like SAP S/4HANA for example; that’s the easiest option and an ideal solution. This is also exactly the reason why most customers in such projects tend to start in the modern smart factory, because then modern robots are also involved that have modern standard interfaces.

Nevertheless, the reality as of today is that for most customers, the whole thing is a brownfield. We all know that factories don’t just run for five or ten years, but for 20 to 40 years. In such cases, I cannot dock via an open standard because these are proprietary PLCs and Data Historians. Then it is often the case that I still need a third party to dock such interfaces. This is then often the Data Historian, which is already running anyway, from which I push my data. Or I install another technique or directly attach the files that are still running on the Windows server.

These are all different possibilities; I’m relatively flexible there. Ultimately, I need to get the data into the turntable somehow. From then on, I can push them into all the other systems according to the defined standards; then it’s easy. That’s why in these IoT projects, the last mile, integration to the shopfloor, is the difficult one as a rule. Because it’s not open standards based here. When I’m in the cloud, it’s all standards-based in some form.

They are often still proprietary protocols. Even with Amazon, if I have an S3 Object Store, it’s proprietary, but it’s still an open interface, so I can plug in anything I need, with very clearly defined interfaces, and get data out again. This is the current trend. That’s why it’s much easier in the cloud, both to other software-as-a-service solutions or even if I want to build my own interfaces.

When you talk about the data hub and the solution from you guys; what’s the first step? I get access to the Confluent Cloud with all the resources and capabilities that I have to connect my data to that data hub, or how’s that?

Kai

That’s exactly the charm of the cloud; not only with Confluent Clouds, but also with all the AWS services or similar. These are then “serverless” or “fully managed”. That means I don’t have to worry about how to run and install, scale or support the whole thing later on, because that’s exactly what the Confluent Cloud will do. This allows me to just start a new project – I start the service and have a free budget at the beginning, and then I can just connect a first sensor. Maybe even earlier: I don’t even want to connect the sensor from the factory, but I generally try it in the cloud first and have a dummy service that produces various interface data for me. Then I start, consume that stream of data and then process it. That with the same tool, such as a SQL query that I enter there in the data platform. Then I filter or aggregate data, because it is usually the case that not all data should be entered into the SAP system, especially in the case of data from the factory. But only the correlation or aggregation of the last hours or if there were temperature spikes.

That’s the data that I’m really putting into the Confluent Cloud, so on the application side. I can either do the whole thing through a graphical interface, or if I say, no, I’m a data scientist, I use Python for everything, then I just use my Python client for the data streaming solution; so I’m completely flexible.

Meanwhile, many other proprietary solutions and data historians have a data streaming interface for Kafka and technologies like that, so I have that as a possibility as well. As a rule, even there, you use what is most comfortable for the customer.

But the important thing is, I’m pushing the data in our case to the Confluent Cloud, or that I need to know how to run and scale the whole thing there; that’s the huge added value, why with the time-to-market view I can start everything much easier in the cloud as a pilot.

What’s important to understand is that you’re not a classic IoT platform that I connect all my assets to, but you’re a single distributed storage layer where I can leverage all the capacity and power of infrastructure to put a wide variety of use cases on there; can you put it that way?

Kai

Absolutely, and that’s why we’re not competing with IoT platforms, from Siemens MindSphere to any open source Eclipse projects or anything like that. This is exactly complementary. For example, they do the last mile integration to Modbus or even to OPC UA interface and unlined gateways. They also have some use cases that they implement, but above us goes: the Rest of the Enterprise, I always call it. If I want to correlate different interfaces with each other, and that is often not just the pure OT world, but also the IT world. That’s the charm of it, as is the case with BMW. I first send the data from the shopfloor level to the data hub – or as some customers call it: Central Nervous System for Real-Time Data – and from here anyone can then retrieve this data. In most cases with the customer, it’s like I’ve built some IoT use cases around it; some directly with data streaming, some again with another third party solution, or it goes into the IT world that I’m pushing the whole thing towards data analytics, so the data warehouse, the data lake and so on.

The charm of it is, I push the data in once, but the data is still in Motion. That means I can, whether in the OT or IT world, build my real-time applications or push them into a data lake or data store addressed to do badge analytics.

That’s what makes this technology behind it so different from other middleware platforms, because nevertheless systems are completely decoupled from each other. Also, because each business unit can decide for itself whether to use .NET or Python, or buy a third-party product or software-as-a-service product.

The fact that the systems are decoupled from each other is also very important, in addition to the real-time messaging itself. Because the system also stores and saves the events and information as long as you want. I can also play historical data a week later.

How do customers who don’t yet use your standard do it these days? Do you then build individual solutions?

Kai

It has to be said very clearly, so that this doesn’t come across wrong, “solutions” is perhaps the wrong term here, because this is the infrastructure. This is not a product you buy that you really do predictive maintenance or customer interactions with. That’s why I its called “data hub”, because I can integrate my data streams with it. Both from the IoT world and from the enterprise and IT world. And then further integrate or filter and process the data. Then I have to build my applications based on that.

How do customers use us? They are building what they call the “Modern Data Flow” on top of real-time data and on top of that they are building applications. That’s something they can partially build with our components, but you either have to build that yourself or buy it back in as software as a service and integrate it.

That’s where the beauty is that this is de facto standard. Kafka is now used by over 100,000 organizations worldwide. It’s open source, so it’s also part of the idea behind it. You can use it free of charge and operate it yourself. And you can take it and hand it over to our cloud service later without any hassle or downtime when you start doing it yourself.

That’s exactly the idea of how customers start; many directly in the cloud, but there are also often projects that have been doing this for a few years and then they want to create a cloud with it for cost or SLA reasons. There are various possibilities how to start with it.

Because of the community issue; you are hundreds of thousands of people working on this and it is an open source standard. That means that as a developer, I have the opportunity to help shape that there. The whole thing follows your founding idea accordingly; is that right?

Kai

On the history; the thing is that this technology, it’s called “Apache Kafka”, was built over ten years ago by LinkedIn, so on Silicon Valley, because there was nothing better to process large amounts of data in real time. It became open source, and Kafka’s inventors got venture capital to make Kafka enterprise-ready about seven to eight years ago.

The whole thing has now become so established on the market as a de facto standard that there are hardly any companies that do not use it for such use cases. That’s why there’s not just one manufacturer there; we at Confluent, among others, do that. Kafka’s founders are also involved. But also the big companies, so whether you go to an IBM, Microsoft, Amazon, or even IoT manufacturers. They either all do something with Kafka themselves, or at least they integrate with Kafka because it’s the de facto standard in the market. There’s the huge community behind it, where you can open source it yourself or put it in a cloud service or combine it.

That’s the idea behind it. In other words, not only streaming data, but also operating the entire ecosystem around it, such as connectors. There are commercial connectors that we host in our cloud so I don’t have to worry about it. But you can also build your own connectors. That’s a very different idea than what many others know from the OT environment, with all the Data Historians and other platforms. It’s very far away from proprietary interfaces, but towards open interfaces. It’s not that everything has to be free or open source, it’s that I might build something myself and want to make money from it so I can still openly connect that. Then it’s back to the issue of time-to-market flexibility, and not just a vendor login to a vendor. Therefore, the guarantees of success, why Kafka has become so established in the market.

We also have a lot of partners that we work with that all mention you because you are the interface into the integration of that full-power infrastructure, whether that’s a Salesforce, SAP, or a wide variety of partners that are connectable there.

To elaborate on your USP with scalability; it’s also about connecting up to a thousand machines. Why is what you do scalable?

Kai

Why is it scalable? Because that’s why LinkedIn built the technology out of the architecture over a decade ago exactly because they had the scalability issues. De facto, it was built that way, so there is no “scalability” problem, as there often was with historical platforms.

I can use Kafka these days for both Big Data Analytics use cases where I’m processing 10 GB or more of data per second with one cluster – that’s very large amounts of data coming from the shopfloor. But I can also process transactional data with real-time guarantees and also without data loss. For example, if I want to correlate a transaction with SAP and an MES system. That’s the charm of it.

In the cloud, it’s even easier; I don’t even have to take care of it there, it’s completely handled by us. I can also use open source Kafka and scale it myself. That’s where I have to take over operations and assess the risk if something fails; how do I do performance tuning as an example.

That’s the idea of fully managed and serverless.

On that note, if someone is looking at this themselves, don’t just look at the marketing, but really look at the solutions. This is what many manufacturers unfortunately only claim. What they’re actually doing is they’re provisioning infrastructure and then handing it over to the customer without supporting the whole thing or automatically scaling it. That’s the Uniqe selling point; it’s fully managed with us and not just the hub, so the messaging, but also the storage, also the data integration and the data processing. So that I can really build – including data governance and security – end-to-end solutions. Then such solutions from Bosch for End to End Supply Chain and Logistics, BMW’s Smart Factory Integration or the Customer After Sales Scenarios, B2C and B2B come about.

Without these solutions, in the worst case scenario, the network goes down the tubes because I have so many different data points to work with. I need to build this infrastructure somewhere. You go in there, handle the infrastructure and provide the real-time data streaming to it.

Kai

But also very important, because not all systems can do real time: Others cannot scale as well as some other modern ones in the cloud. An SAP system doesn’t expect me to push in large amounts of data, and that’s why Kafka’s other unique selling point is that I do push in a lot of data, but then often just correlate the data and push relevant data into the next system. Or I can also send the data in real time to a Real Time Alerting system, while on the other side SAP only sometimes asks and gets certain information. This is the important decoupling that the SAP system requests data at a different time, at a different quantity than a monitoring or MES system.

Separate from that is a data analytics platform, a data lake in the cloud, I still push all the data back in there because it’s for other analytics functions. Or for AI, machine learning, to train models. Those are still badge work clouds in those systems; but also those systems are not built to do it in real time because they are different use cases; that’s why there is an analytics platform.

Usually the data from Kafka get rather Near Real Time or Badge, because here it makes no sense to send in every cent to information individually. And so each business unit, with its own technologies and products, can build its own interfaces and consume the data; when and how it wants. That’s an example from BMW and the main reason Kafka is so successful there. Because you can flexibly tap the data and nothing is prescribed, via the communication paradigm or others. Anyone can do that with a freedom of choice, with their own technology.

Results, Business Models and Best Practices – How Success is Measured [33:55]

What about your business case? What was BMW’s business case and what is the outcome for them working with you guys here?

Kai

To be fair, finding a business case for such infrastructure technologies is often difficult. The key point at BMW, or with IoT projects in general, is to first really connect the data and then make it available to various business units so that they can easily access it. In this case, time to market is a big win because I don’t have to integrate the data over and over again via point-to-point, which is more costly from a data transfer standpoint, but also from an implementation standpoint, and then be able to pick up all of that data, for new projects. As the business wants it; build real-time applications, or send the data to its own warehouse to catch reports.

The other point is that it’s based on a de facto standard and I can retrieve it anywhere. Today I may start on Microsoft Azure, but tomorrow there may be a factory in China as the mainland, and there is only Tencent and Alibaba Cloud; yet I can roll out exactly the same APIs there. These are the added values that you see very often in IoT scenarios, and ultimately the business case behind it. When I look at the Euros, it’s about topics like Overall Equipment Effectiveness (OEE). These are the standard measurement values at the shopfloor level. The point there is that every hour of downtime costs an incredible amount of money. If I can map topics such as monitoring, predictive maintenance or condition monitoring with real-time data, even for large data volumes, the business case behind this is often also risk minimization.

There are many examples of how to calculate and define that where the business value is for my project and for the infrastructure.

So the business case is scalability, for one thing. There’s really nothing that can bring Kafka to its knees, no matter how much data I’m working with. Likewise the ease of integration. Regardless of which connectors I need as well. Not to mention the whole issue with infrastructure and the traceability of individual data packets.

I can correlate the data with you. On the data hub, I can process – whatever data – with each other.

Kai

Very well summarized. That’s what we tell our customers too – start with a use case that really brings business value, because we’re still talking about data. We have seen over many years in the Data Lakes; just pushing all the data in and processing it later is not a win or business value. That’s the reason why we start a pilot project with our customers, but it has to be a real project, otherwise it doesn’t make sense. We can gladly talk about it again in the follow-up, in the appropriate domains, if you are interested.

Again, from a business value perspective, which people often underestimate, but also shows why we work with so many customers and also talk to them publicly: Even from a customer perspective, it’s very hard to find good people these days. If these projects are no longer built on proprietary legacy technology, but on open de facto standards, like Kafka, which people are already learning at university, then you will find people there. When you do the next project in the cloud with Kafka … that’s what people want to do and that’s not to be underestimated, also from a hiring perspective, to find new experts. Because it is a promising, scalable and modern technology.

Do you have any experiences from clients that you can share from the reports of the projects you have done so far?

Kai

I would like to touch on an example from another category. If we talk about Porsche, that nicely shows the journey of Porsche. What’s exciting about Porsche, which has a lot of data issues similar to BMW, is that they are a premium provider and one of their mottos is: it’s not a customer, it’s a fan. He must really love them – turn customers into fans. They do that with a platform they’ve named “Streamzilla.” This is the big platform of them that centrally connects data streaming and what many applications can be built on. Many of these use cases are not only IoT-rich, for example over-the-air updates, which they help map, but also customer-facing apps for after sales. In many use cases it is not mandatory, but at least better for customers if they work in real time. For example, when I buy a new feature in the car, that I can pay for it online and then directly get the feature in real time.

If anyone is interested, Porsche has already spoken at the Kafka Summit about how they are using Kafka to implement the theme. Porsche has now internally made Streamzilla the de facto standard for these event-driven applications. So they start with one, two, three or even four independent projects, but then they move more and more to the cloud as well. Then they define standards, such as on Apache Kafka. That’s why we’re cross-customer, not just in the automotive sector, which we’ve talked about a lot today.

Issues such as customer relationship management or after-sales; you can learn a lot from retailers there because they are often a few years ahead of us. There are also hundreds of examples, including the VOLLMER Group from the USA. Customers are welcome to reach out to us there to discuss more Success Stories.

I’m linking your Kafka Summit in the show notes to read the whole thing again.

Thank you so much for being there today and for sharing your knowledge with us. Especially also so that people from other areas know that this exists, what the advantages are, to also address the issue of scalability for the future.

Kai

Take care! Bye, hear you!

Why Apache Kafka® and for which IIoT use cases? – Confluent Cloud, Kafka and event streaming explained simply

Podcast episode summary

Podcast interview

Challenges, potentials and status quo – This is what the use case looks like in practice [09:18]

Solutions, offerings and services – A look at the technologies used [19:36]

Results, Business Models and Best Practices – How Success is Measured [33:55]

Quicklinks

Explanation of terms

Contact