Taking a look at the 'engine under the bonnet' of Voice

INFORMATION: Free information is available on VOCOLLECT voice-directed solutions. Click here to request a copy

Voice technology is an increasingly popular solution for improving business results in logistics operations across many industries and geographic regions, writes Richard Adams, Northern region manager, Vocollect EMEA.
Voice is now used daily at thousands of work sites by hundreds of thousands of workers, enabling them to communicate verbally with the computer systems that manage logistics operations. Wearing a headset and microphone, each worker receives 'spoken' instructions delivered from a wireless, wearable computer and verbally confirms the completion of tasks back to the system, via a WiFi network. The reasons for the growing worldwide uptake of Voice are easy to see. The Voice-enabled warehouse helps customers to more effectively address various business challenges including growth, cost reduction, productivity, throughput, accuracy, traceability and product mix changes. Multiple case studies validate the benefits of Voice across a wide range of industries with productivity improvements of 20 to 40 per cent, order accuracy improvements to >99.995 per cent, traceability of controlled items and order fulfilment agility. It is also well-documented that mobile workers are more productive and focused and thereby more accurate when using wearable, hands-free, eyes-free solutions.
The proven benefits of using Voice are directly related to the technology employed within the solution and speech recognition technology is the absolutely critical component of any Voice solution for the warehouse. Speech recognition (by humans and computers) would be a relatively easy problem to solve if humans spoke identically and consistently. But we do not. Speech utterances are like snowflakes no two are exactly the same. Speech recognisers are challenged by factors including subtle differences in how we pronounce words in various situations and by background noise contamination, an issue of particular relevance in fast moving warehouse environments.
Positive results
All speech recognisers make errors, inserting words not spoken and ignoring or misinterpreting words spoken. The noisy environment of a busy warehouse obviously presents a particularly challenging scenario and the ideal of perfect speech recognition in a warehouse is an impossible one to meet. However, we are continuously moving closer to the ideal with speech recognition technology that today works extremely effectively in a wide range of noise environments, for a wide range of facility employees, responding 'instantly' to operator speech and minimising 'total cost of use'. We are achieving positive results by focusing on characteristics of work in the warehouse that our technology can use to its advantage, including the use of small, fixed vocabularies made up of short phrases and words that are repeated across a large numbers of transactions.
The technology Vocollect continues to develop for use in the warehouse is 'speaker-dependent' our warehouse speech recogniser is 'trained' by each individual user in a 20 minute one-time set up process. This contrasts with the alternative option, a 'speaker-independent' or 'untrained' speech recogniser. Choosing which option to go with is a hugely important decision for anyone putting Voice into the warehouse. So why does trained speech recognition continue to be the most popular approach? At first glance, untrained speech recognition may seem to have an advantage simply because it doesn't require the initial investment of user time to perform the initial set-up training. But and this is key - a trained recogniser will generate far better returns in the long run. The particular characteristics of working in the warehouse do not just allow a fully trained recogniser to be used, they make it the obvious optimal choice for anyone who designs a recogniser specifically for the warehouse. First and foremost, trained recognisers generate fewer word errors because they are able to better differentiate and recognise how each individual speaks each word they do not need to allow for all the pronunciation variations of a region or language. This specialisation also better enables them to reject sounds that should not be recognised, preventing costly insertion errors. 
Vocollect has recently run tests to provide a performance comparison of our own trained speech recogniser against several of the untrained recognisers available from others. In the tests, we looked at word-error rates, the number of times the recogniser makes an error per 100 words spoken. The results of our tests suggest that the increase in word error rate when moving from a trained recogniser to an untrained one is likely to be several percent or more. In fact for speakers with moderate to strong accents the increase ranged from 6 per cent to more than 20 per cent. So what does this mean in terms of cost to a warehouse operation? If it takes 3.5 seconds for a user to correct a word error by repeating what they said, the annual increased labour cost for a single 8-hour shift can be calculated to be 300 for every 1 per cent difference in the error rate. It is easy to see that for differences in word-error rates of more than 1 per cent and for operations with multiple shifts, these per year costs will have a major impact on a return-on-investment analysis. Furthermore, even if using a trained recogniser only decreases the word error rate by only 1 per cent (a very conservative estimate), then the 'payback' period for the 20-minute upfront investment in pre-use training is less than 6 work days. It is also worth mentioning the importance of not overlooking the 'soft' benefits of giving workers a high performance system.
Not standing still
A poorly performing recogniser is like a 'sticky' keyboard frustrating. And trained speech recognition is not standing still. Indeed, at Vocollect we are continuously developing our technology with recent enhancements including 'Adaptive Recognition' where the recogniser is being continuously 'tuned' by the warehouse worker as they use it. We are extremely confident that trained Voice recognition is a key technological component of Voice in the warehouse, and the universal acceptance of Vocollect solutions by more than 300,000 users in dozens of countries, speaking scores of languages and many more dialects and local accents, underpins our confidence. 
Let's be clear, Voice systems are not all the same. If you are looking at putting Voice into your warehouse, and you are examining case studies and visiting reference sites, then be sure to 'look under the bonnet' of the systems to understand what speech recognition technology is being used.

Comments (0)

Add a Comment

This thread has been closed from taking new comments.

Editorial: +44 (0)1892 536363
Publisher: +44 (0)208 440 0372
Subscribe FREE to the weekly E-newsletter