Computer vision and real-time object detection with underwater acoustic modems - What we can learn from dolphins for IoT.

Locate e-scooters at the bottom of the Rhine? Thanks to autonomous real-time underwater robots, this is no longer a problem. Missing persons, oil and gas, old munitions or missing fishing nets can also be found more easily and quickly with the help of intelligent robots, sonic acoustics and smart evaluation methods – even fish populations are easily counted. Object detection (underwater), Artificial Intelligence like Computer Vision: These are the topics of the 53rd episode of the IIoT Use Case Podcast.

Podcast episode summary

Stephan Schiffner (CTO, Steadforce) develops such digital solutions and platforms with Steadforce. They are scalable, secure and sustainable – like those for underwater experts EvoLogics GmbH. A major challenge in this podcast episode: In water, only a data rate and transmission of a few kilobits per second is possible. Philipp Bannasch (Team Leader Sensor Integration, EvoLogics) talks about how the challenges of data transmission are overcome, how the Steadforce solution makes life easier for users in image evaluation, and how workflows are automated. This podcast episode gets to the heart of how it all works in detail. Rather like to read?

No problem: Here you can read the use case.

By the way: The use case is transferable and also applicable above water!

Who can benefit from it? Companies across industries – automotive, chemicals & pharmaceuticals, metalworking, mining, etc.

Podcast interview

Today I’m talking with Steadforce, the developer of scalable, secure and sustainable digital platforms for connecting data, services and devices in real time with associated analytics. Joining us today: their customer EvoLogics, the expert in underwater acoustic modems. In this podcast, you can find out what all this has to do with dolphin-style data transmission and what innovative projects Steadforce is involved in here.

Stephan, I would turn the floor over to you. A brief introduction to yourself and what exactly you do at Steadforce in terms of core business.

Stephan

My name is Stephan Schiffner. I am currently the CTO at Steadforce. I’ve been in the role for just under a year now, and in the four or five years before that, I built up the whole analytics and AI issue at our company, which is one of our three core areas at Steadforce. We are ultimately an IT company from Munich. We have been on the market for 35 years, which means we have a very long history and have done many things. At the moment, we are focusing primarily on topics that involve collecting data for our customers, preparing it, making it usable, using analytics methods, ultimately generating added value and then using this information to automate things, improve processes and optimize them. From my background, I studied computer science, did a master’s degree, and have also been with the company for 15 years, so I’m very close-knit. In between, I was also active in the area of business process management, for a few years in the research area. On the side, I have been teaching software engineering and software development at a university for ten years now.

To support education in the environment, of course, is very good. A quick question, you had said you are with different customers. You have a very broad focus, whether it’s networking of machines – the topic we’re discussing today, also with the robot topic – but also very different areas in terms of vehicle connectivity. Who are your customers?

Stephan

In principle, we have focused on three sectors. One is everything that relates to mobility, for example BMW, is already historical. Then the whole issue of industry and production. We have also been active in the healthcare sector for many years, for example with medical associations of various kinds in Germany.

Philipp, perhaps you could also briefly introduce yourself and what exactly your core business is?

Philipp

I am EvoLogic’s team leader for sensor integration, sensor development. We come from bionics, we are a spin-off of the TU Berlin, you could say; we have strong roots there. We formed after a research project in a field of dolphin research, dolphin communication and started with underwater communication devices and positioning devices. So these are based on the knowledge we gained in the research project by analyzing the animals’ signals and looking at what great things they do – are they better than the previous standards? From there, we’ve grown and evolved with more and more complex systems. Starting with sensor buoys and systems, i.e. tsunami warning systems and many measuring systems, which are then sunk in rivers such as the Elbe, for example. And then it went more and more into robotics. We have developed survey vehicles that have now been very successful on the market for several years and continue to develop – also autonomous – underwater vehicles for surveying and research. It goes on and on.

Dolphin language – what is the background to this? Bionics is, after all, a specific field. What can you learn from dolphin language that you can also process later in the direction of data analytics, where does that come from?

Philipp

What you can learn from dolphins, of course, is that they can handle the incredibly challenging environment of underwater communication very well. You have to see, under water it is very difficult to communicate acoustically because you have a lot of noise. You have a lot of echoes and a lot of problems, which makes it very challenging. There are thermoclines, reflections, diffraction, there’s all sorts of things – and the dolphins are very clever in that they use very broadband signals, for example. They sing their songs; they don’t just beep. Many of our technical devices up to that point were more or less doing Morse code, you might say, or simply using modulation techniques that were very successful over water with electromagnetic waves. OFDM systems, for example, but these are very susceptible to interference under water, for example. In contrast, the very complex signals of dolphins are much more robust, and it is easier to deal with them in more difficult situations.

Let’s go back to the robots for a moment. You had just touched on it a little bit, what you do. Can you elaborate on exactly what these autonomous robots, if that’s what they’re called, look like? Where do these move? Do they drive on the water? How does that work exactly?

Philipp

We now have a small zoo of robots. Zoo sums it up quite well. We started with the Sonobot, which is a survey vehicle that travels on the surface of water. It was originally developed to provide depth charts for inland waters, and was primarily a survey vehicle for this purpose. In the meantime, it has also evolved significantly in its areas of application and its shape. That’s where we are now with Generation 5. Then we also have a certain collection of underwater vehicles, which is where it fits our bionic roots again and the word zoo. We have a manta ray robot that looks like a manta ray. We have penguins, two different varieties at the moment. Very fast small and somewhat larger in the water, which act in the swarm and move very well there. For example, we also have a small whale, the Poggy, which is also an underwater measurement vehicle and complements the Sonobot underwater quite excellently. These are autonomous vehicles. In the near future, a so-called Remote Operated Vehicle, i.e. a wired underwater vehicle, will also be added. But we started with the autonomous ones first, because it’s more exciting.

Who are these robots actually interesting for? Who are your customers?

Philipp

That’s quite broadband. We started with known surveyors to develop for. This then continued. The first customers came from the field of nature conservation and underwater ecology. Many then came from research and fisheries research institutes and universities. Meanwhile, the police are also involved, the police Netherlands, for example. Munitions clearers are in; and it’s developing broadly – oil and gas is also growing.

The use case in practice

You said earlier that you can learn a lot from the dolphins. Be it noise or echoes that are being processed. What can you see now with this robot under water?

Philipp

They have optical sensors on board, of course, like cameras, so it looks somewhat similar to surface vehicles for investigating. But underwater, acoustics play a very big role, including acoustic sensors. That’s where we’re talking about different sonars that are trying to create images, for example, of underwater maps. As an imaging technique, side-scan sonar is the essential there. In the side-scan sonar, you can see many things that can also be seen with the eye; that also looks pretty similar. If you look at side-scan sonar images, some objects already look very clearly recognizable. But of course there are also big differences to how a normal image is created. Things get very distorted, a lot of information is missing, and a lot depends on using the sensor in just the right way – at the right angle, with the right distance – and then everything fits. Then you can also get very nice pictures.

We’re talking with Steadforce today together about your project. In the broadest sense, this is about the IIoT or digitization. What is your vision here towards the autonomous robots or vehicles that you have developed? What potential can be leveraged here? What is perhaps your overall vision as well?

Philipp

Basically, the fact is that vehicles need to become much smarter. You have to imagine underwater vehicles are a bit like vehicles on Mars, in terms of their environment. So even if you calculate, with the speed of sound, and how long communication takes – there are no electromagnetic waves under water, radio does not work there, that means everything runs on sound; positioning runs on sound. Then, when a vehicle is already a bit further away, communication can quickly take several seconds. The bandwidth is not particularly high, you can’t transmit much, there is no GPS in the area. That means you have to have vehicles here that can recognize things as autonomously as possible and also act smartly. That’s where we’ve now entered into a very exciting pilot project with Steadforce, where we’ve been working with the police in the Netherlands to find missing persons. So they’re looking for drowned people. We said that sounds scary at first, but it’s very important to find them. That is often surprisingly difficult, it has to be said. It’s very challenging, you’re often sitting at the screen for hours doing measurements with sonar to find the person, and you have to be very focused there. It is often very demanding, under pressure of course. That’s when we said we have to do better here. First, we need to help operators gain experience on a permanent basis, not just have individual specialists who can analyze the images. After all, a sonar image like this is often like a computer MRI – if you have a lot of experience and training, you can recognize it, but also not always right away. And with that hours of concentration, we need a system that looks over your shoulder and just says, here, this is something exciting, take a closer look. Is this perhaps what you are looking for? That was our entry project with Steadforce into this world.

There is an operator; it’s hours spent in front of the screen evaluating these things. If I imagine the whole thing in practice – unfortunately we don’t have a picture in front of our eyes now, but maybe we can create it a bit virtually. In other words, one of your employees is sitting at the laptop on site, so to speak, and receives this data live? Or how do you have to imagine it?

Philipp

No, the police do that themselves. In the past, it was like they always went out with a large boat. In the meantime, we are able to do this with our Sonobot and in the near future with the underwater vehicles. What it boils down to now is that they essentially enter a heading, a search pattern; the vehicle then travels a search area and informs the operator when it has found something of interest. Of course, he’s already looking at the screen the whole time to see what’s there, because he has to check it and is also very tense. But in the end, all the important things that we recognize or that the neural network recognizes are then highlighted and then highlighted, meanwhile also grouped. Then the operator can approach these objects very specifically again, analyze them, set a course over them again, at different intervals, is that really the exciting thing here? And then directly also mark the position with the vehicle so that the diver can jump in and recover or examine the object.

It is now somewhere about the training of so-called neural networks – I have to make this object recognition a bit more automated. Quick question before I get to that, you had already addressed a couple of challenges. That is, you had said there is no GPS, it’s about bandwidths. Can you tell me and the listeners what the challenge is in data transmission?

Philipp

Underwater communication – if you have vehicles that go underwater and you want to communicate with them – is of course acoustic, and acoustically the bandwidth is still limited to dimensions that we know from the modem domain. From the time when the modem beeped when dialing in. That means we’re talking about a few kilobytes per second, not megabits. This means that we are clearly limited in terms of data transmission. We can’t do cloud computing now and say we transfer all the data to the surface or to the operator and compute that somewhere in the cloud. But we have to calculate it on the vehicle itself. In addition, you have to consider that such a side-scan sonar produces gigabytes of data in a few minutes. That’s already a gigabyte per minute, depending on the resolution and the settings. So there’s quite a bit of data coming in, and it has to be analyzed on site. It’s hard to get them transferred via Wi-Fi all the time, in full resolution. But even if you have a surface vehicle that is connected to WLAN – that is often at a distance of one or two kilometers, you can no longer get the optimum bandwidth. And even if this is the case, you don’t get the full data in full resolution, but only examples that the operator can see approximately what is happening. But if he really wants to take a close look at something, he needs preselected data. There is also the neural network, which then says, I’ll look for exciting pictures and the original data, which I’ll transfer to you in the highest resolution; if you want something else, I’ll look for that, too. So this is quite exciting for data compression.

Solutions, offers and services

Stephan, you are the expert in connecting the data from the individual devices or from the robots to the cloud or to a system and analyzing it in the next step. Can you tell us what the solution looks like overall? I think you also have the holistic cloud platform that you bring to the table? How does this work in practice?

Stephan

In principle, we have various components in this whole system. Philipp has already hinted at it a bit, these are very large amounts of data. This means that the model training itself cannot take place on the Sonobot, but instead we have built an infrastructure that runs in the cloud, which can also scale, so that – because several end customers also want to access this portal – several different models can be trained simultaneously. The other construction site is that we have to get the trained model running on the BOT itself, and of course process the images there to identify the information – so where might such a searched object be? – and that then in turn transferred in the direction of the shore.

I want to know more about the practice. What hardware do you actually use underwater? In the industrial context, you know quite classically, I either have the sensor itself or maybe a controller. Is it the same here or what hardware do you use?

Stephan

We have a Jetson board from NVIDIA as hardware in the Sonobot. This brings the advantage that it has a GPU on board, which of course gives us better performance in computing. In turn, various services run on it. One of them is the service we developed, for this Object Detection Model. But the whole thing is containerized, so we are also future-proof. This means that if there is ever a change of hardware, or to other models, then we can transfer this here in a relatively platform-neutral way.

Philipp

Exactly. Of course, the whole thing has to be modularized so that you can grow with the technical developments that are happening. Especially in the area of these neural networks, deep learning and all the graphics analyses, of course, a lot is happening. The algorithms are developing, but the hardware is also developing a lot. It’s worthwhile to keep your finger on the pulse and to keep updating the hardware. Basically, however, the vehicles are very modular internally anyway, so that we always have the individual components under control and everything runs well. Also because we are considering making our components available to the customer individually. At this point, however, we are currently at NVIDIA.

This NVIDIA board then has the camera attached to it, or a sensor, and that’s the hardware where this data that you were talking about earlier – acoustic signals or whatever – and all the input comes together?

Philipp

Yes, that’s where it’s processed. Basically the board is connected to the network. This is an Ethernet network internally, where various components interact, and the side-scan sonar data is processed on our processor on the one hand, which is there specifically for this purpose, and also made available to the NVIDIA board on the other. That’s several pieces of data connected to the net. Basically, cameras are connected directly to the NVIDIA board, depending on which ones they are – but those are other drive-only cameras connected to other components.

That’s right, I had forgotten that word side-scan sonar. Right, that’s ultimately what records the data, isn’t it?

Philipp

Exactly, that’s one way to put it.

Now we had talked about this topic of neural networks and training the data several times. Stephan, can you tell us how this training of the data and the individual models works in the first place? Do you have all the data available yet? What do I have to do to accomplish this?

Stephan

Data is, of course, an important aspect of neural network training in general. The more data you have, the better. Now, of course, if you are looking for missing persons, you can imagine that there is not a lot of data directly available to you. As I learned during the project, there are also differences between the sonar devices – that is, I can’t just send images from one device to another. They look different when they arrive. We had to think about how we could approach this in order to obtain the corresponding data material in the first place and then, even with the data material that is not widely available, still come up with a model that has the necessary accuracy and can deliver the functionality. That’s where we took different approaches. The one point involves taking what is already there. Like past survey missions. But we also went out to a lake once, for example, where I was able to see the Sonobot in action for the first time. We simply created test data ourselves – with Philipp, who was active here as a diver and played our missing person – and then took pictures from different positions, with which you can train afterwards. Then, as a third step, once I have these images, I have the option of artificially enlarging the material using various data augmentation techniques. For example, rotating the images or coloring them differently so there is a different starting point for training.

Always with the aim of giving the operator, i.e. the customer, access, that this data is ultimately classified and used for object recognition? That one says, this is how it should be, that is, to establish this if-then relationship: If this looks like this, then this is probably, for example, a human being or another object. That’s the ultimate goal, isn’t it?

Stephan

In principle, I have two tasks. One is to recognize in the first place, is there an object in the picture? And if so, is this the object I am looking for? So classifying that. Another point – we had already mentioned briefly, this is a small amount of data to begin with, but we know we need a relatively large amount of data to train the model in the first place. The starting point was to first take a model that was generally trained for the object recognition use case – i.e., completely independent of sonar data or persons – and then to fine-tune this model in the second step, with the specific images, and then to train it for the respective use case; regardless of whether it is a missing person or something else that is being searched for.

That sounds “so simple” at first. What kind of knowledge do I actually need for this? Philipp, I don’t think you guys have worked that deeply in the field before, at least in terms of data analytics. Stephan, what practical knowledge do I need?

Stephan

Of course, this requires a lot of different knowledge that is interesting for the overall project. The first thing is very clear, I need to know about neural networks; about all the AI methods that are attached to them. What approaches I ultimately take there. That is one component. The other one we just talked about, so how do I get my data material in the first place and what kind of possibilities do I have to possibly increase this data material artificially? But I think what you also can’t ignore is just the other part. That even if I have such a model, I ultimately have to make it productive, i.e., a solution that can be used. That brings us quickly to the issues, once, how can I set up the training in the cloud? So all these infrastructure issues. But on the other hand – and this was of course interesting for our colleagues – how can I get this to work on the edge device so that everything runs with high performance? What restrictions are there?

Philipp, how was it for you? You have been working on the project for quite some time. But the issue was new territory for you, too, wasn’t it?

Philipp

For us, this was new ground in many places. In the area of neural networks in particular, we haven’t had such great expertise so far. We knew roughly what was possible. We had seen that with a lot of partners and so forth as well, and we work a lot with the Fraunhofer Institute, for example. We knew roughly what the possibilities were and had said we had to develop in that respect and find an entry point there. For us, this was new in many places, and we were very happy to have found an entry point with a very concrete goal, with a very concrete project together with Steadforce, and we learned an incredible amount in the short time. The thing is, Steadforce has been insanely supportive. We compared a wide variety of algorithms that are possible in order to select the right ones relatively quickly in a targeted manner and to say that this is our way. Of course, we had an incredible number of other ongoing tasks at the same time. We had to integrate the hardware, we had to combine the corresponding software with our vehicle software – it all had to work together. On the other hand, we ourselves had to train our personal neural networks in the fact that if we want to recognize data correctly, we ourselves are able – first of all as humans – to mark the correct ones. That’s also been challenging for us before, because we don’t have the decades of expertise now in analyzing side-scan sonar images for these exciting things. We are very grateful for the support we received, for example from the police. We had a lot of problem to solve, but this was an outstanding project thanks to the specific objectives and the very targeted work of Steadforce.

Because we’re already talking about it, a lot of problems to solve, Stephan, you also said a lot of infrastructure issues both on the edge and on the cloud side – I always really like the practice. If I want to start tomorrow, what would I need for components of such a solution? I don’t know if I can put it this simple, but are there steps where you say, First, Second, Third, what do I need plus the next steps, work packages like that?

Stephan

I guess the absolute minimum to start at all is the data, and in sufficient form. Without that, we can’t go any further. And then, of course, you have to approach it methodically, which model can I train, how can I build a training pipeline so that the whole thing runs reproducibly? Once we have that, then come the next steps – where can I run this and how can I transition this into a production application?

Philipp, now the question for you, in practice, what components did you need?

Philipp

Sure, everything stands and falls with the data, and in addition to the correct data also those that look so similar and are not objects. That’s always the other thing, that you not only have to find objects, but you also have to sort out many things that are not objects. This data structure was one of the biggest challenges. And we really needed a push and an expertise in what we were starting with in the beginning. In the meantime, we have also expanded our team ourselves; we are still in close contact, but I would say that we are now well versed in the subject.

You also said at the beginning that you have several other customers. Whether that’s surveyors, or conservation institutes, to fisheries. After all, these are often probably similar issues. Of course, these are also topics that are coming your way, aren’t they?

Philipp

There is an incredible amount coming up for us. We’ve also basically built this so that we’re not necessarily fixated on these objects, and certainly not necessarily on side-scan sonars. On the one hand, we want to find more objects with the help of the sonar. On the other hand, these neural networks are already running on video images, and we can use them to count fish stocks at fish farms, for example, and have found an entry point there. On the other hand, based on the findings from such detection, the next thing is that we say the vehicles have to react to it – automatically approach certain objects; an object detected in the side-scan sonar has to be verified in the video. Then a reaction must be performed accordingly. So there’s a connection to quite a lot of issues that make us smarter now, in the marketplace, and give us the opportunity to say, now we have really smart robots, and not just robots that can drive a course.

Results, business models and best practices

You gave me the perfect segue. My keyword is to become smarter on the market or to maintain this market position. I wanted to ask about the business models and the results, perhaps also a real business case analysis. This always interests many listeners as well. Can you tell us what the new business model behind it is for you, or to a certain extent also the business case? What’s your bottom line?

Philipp

There are several business models opening up for us. One thing is that the vehicle is fundamentally smarter and can perform various tasks on its own – which is, of course, a big advantage over the competition in many places. If the vehicle can independently recognize certain objects, verify, set the course to really analyze that and then say that’s something, then it can already do a lot that numerous other vehicles can’t do. This makes it a direct selling point for the vehicle. There’s a lot at stake. On the one hand, in the area of people search, which is of course already a broad field. Then there are safety aspects, ecological aspects: finding lost fishing nets, for example, is often a very big issue. Locating dangerous items. There is an incredible amount of old munitions and hazardous materials in the North Sea and Baltic Sea, as well as in our inland waters, that need to be found and cleared. At the moment, every technical development is very welcome in order to slowly eliminate these dangers, which are becoming more and more urgent. We see a lot of application potential there.

And you also see in areas like fish farms and fish counting and so forth that that also grows far beyond just discovering individual objects in the vehicle. But that this is fundamentally a technological development that will help us further.

I was about to say, yes, the fishing industry is big too. There’s an insane amount of potential. We also have a few topics and projects in our network that go in precisely these directions. There certainly seems to be a need to digitize such industries, to become smarter and smarter, and perhaps to open up new business opportunities.

Stephan, question for you, you are on the road with a wide variety of customers, as you said at the beginning. Is that kind of a business model, a development that you often see in other projects as well? I think about it a bit: this is now a project from practice that we have discussed; but can it be transferred? Do you see such issues of a similar nature elsewhere?

Stephan

I think, yes, this can definitely be transferred to other industries and areas. Of course, if we’re in the Industrial IoT podcast, you can imagine that this is transferable towards industry as well. Whether I’m counting fish or perhaps counting results that come out of my production; in general, of course, computer vision methods, also in the area of quality assurance, for example, are conceivable there. I believe that, quite fundamentally, irrespective of computer vision, all the AI topics, whether in the area of natural language processing, where we are also doing some things, or simply in the analysis of process data, are becoming an increasingly important component in order to remain competitive.

Transferability, scaling and next steps

What I see in a lot of projects is that it always feels like a similar approach from a technology standpoint. For example, I would also include this fish industry in our food and beverage industry. Felt what you described with the data acquisition: you go out on the boat, have that test data first – copy/paste actually the same thing in production, right? If I have such processes in other areas as well, it is always similar, right? I need data, the data needs to be recorded – those are always similar scenarios, right?

Stephan

The scenarios are similar. The challenge, of course, depends heavily on what are these images, what do I want to do with them? From that point of view, of course, it can vary in complexity. But the concept is definitely transferable, yes.

Thank you for this exciting insight into the practice. That was really coming from the dolphin language now, about the whole topic of autonomous robots – how that works in practice, what kind of hardware I need for that, what the connectivity and the transmission in the cloud looks like, what kind of infrastructure I need for that – all discussed once. That was really an all-around, mega exciting. Thank you, Stephan and Philipp, for taking the time today and reporting a bit from the field.

Philipp

Thank you very much for the invitation. That was a pleasure.

Stephan

Thank you, Madeleine, from my side as well. That was a lot of fun, and it’s exciting to be a part of something like that.

Computer vision and real-time object detection with underwater acoustic modems – What we can learn from dolphins for IoT.

Podcast episode summary

Podcast interview

The use case in practice

Solutions, offers and services

Results, business models and best practices

Transferability, scaling and next steps

Quicklinks

Explanation of terms

Contact