Efficient and safe voice control of mobile robots

Initial situation

NEXT. robotics is a research-oriented sales and development office with a central focus on lightweight robotics, advanced robotics and cobotics. The sales services also include the software area, including programming and customer-specific adaptation of existing solutions.

One overall goal is to intensify cooperation between humans and robots. In the field of mobile robotics, there is a greater need to catch up in this respect, especially with the latest models of walking robots, such as Spot from BostonDynamics or ANYmal from ANYbotics. Currently, walking robots are rather used in research projects due to their complex control system, but they have very suitable and robust characteristics for use in the close environment of humans. A new type of voice control could significantly expand the possible applications with regard to the safety and acceptance of mobile robots when interacting with humans.


Interaction with partially autonomous mobile robots is associated with certain hurdles and risks for humans. A voice control system can simply and intuitively facilitate operability and expand acceptance, safety, and application possibilities. A concept for implementing robust, efficient and safe voice control of mobile robots is required.

Solution approach

Mobile robots require adapted solutions when it comes to voice control. A mobile remote control is recommended as a voice interface for the human operator in order to minimize the problem of noise interference. The processing of the acoustic input signals by powerful AI models for speech recognition can be done either in the cloud or even directly on the mobile robot. For local processing, a GPU offers significant speed advantages. But mobile walking robots are mostly not equipped with such hardware, which supports the use of AI models. Therefore, the "AI box" developed at FZI is designed for such challenges, among others. It is mobile, flexibly integrable and energy-efficient. It provides an embedded GPU via simple interfaces, which could be used not only for speech recognition tasks but also for visual perception tasks.

QuickCheck results

During the QuickCheck, current speech recognition modules were presented and a concept for secure speech control of partially autonomous mobile robots was outlined. The use of a modular AI box enables the use of AI models locally on the walking robot. The concept provides voice control that encapsulates robot functionalities in such a way that the functional scope is maximized, but the complexity requirements for the system components are low. At the same time, potential causes for misunderstanding are prevented by limited and controlled command options to ensure the most responsive and user-acceptable voice control possible.