Analyzing voice user interfaces with a diary study about Amazon echo

alexa
alexa

A diary study with the use case Amazon Echo

Voice user interfaces are by no means new. Siri has been on the market since 2011. But nourished by the highly visible Amazon Echo advertising, the topic is more topical than ever. Competitors are also very active in this field. Google with Assistant, Microsoft with Cortana and Facebook with their Messenger M are vying for the users' favour. As a result, graphic interfaces will be increasingly replaced or only used in combination with voice interfaces, because language is the simplest and most practiced form of communication and has been for thousands of years. Until now, voice interfaces were still prone to errors. In combination with artificial intelligence and machine learning, however, a golden future can be predicted for them, even if they are certainly not suitable for all interactions between humans and computers. If complex selection and selection processes with high demands on the working memory are carried out via the interface or if several pieces of information are to be processed in parallel, written language and visually prepared content are often more suitable.

Why this study?

The starting point of the study was to find out how the acceptance and use of language interfaces in the first test phase is shaping up with the users. Using Amazon Echo as an example, we wanted to find out whether voice interaction can convince users, what the drivers of a good user experience are and how this technology is adapted. We were particularly interested in the question: How the design of the skills can contribute to a higher UX.

Diary study accompanied users during the first Alexa test

To answer these questions Facit Digital in cooperation with the voice interface agency VUI.agency conducted a two-week diary study among 26 first-time users of Amazon Echo. The participants were equipped with an Echo Dot, which they activated with their own Amazon account. Every two days, the users had to describe what experiences and feelings they had when using Echo, where problems occurred during use and what inspired them. In doing so, they were asked to regularly use, among other things, their own skill "Brain Challenge" developed for the test. Brain Challenge" is a skill for performing mental arithmetic, puzzles, quizzes or memory training in three levels of difficulty.

This skill was developed in two versions by our study partner VUI.agency. In the unguided skill, users had to "navigate" through the skill without the active offer of assistance. The guided version of the skill regularly offered help explaining how to navigate through the skill.

Expectations of Alexa are often met

The results show that after the two-week trial period, more than half of the participants indicated that their expectations of Alexa were met. They enjoy the interaction with the assistant, they appreciate the friendly voice, the ease of interaction and the wide range of skills that can be used. The average usage time of this group is about 30 minutes.

For 42% of the users the expectations were only partially fulfilled or not at all. They were often frustrated by the unnatural communication and inflexible handling of Alexa. In the process, they repeatedly encountered the problem that the speech assistant did not fully understand them. This was partly attributed to the fact that the distance to the device was too great, but background noise was also partly blamed for this. When the echo played music, for example, the participants had to "scream" to get Alexa to listen. Also when entering foreign language terms and names Alexa often could not understand correctly.

But also the correct command input caused difficulties for some participants. On the one hand, some participants did not remember the correct "Invocation Name", i.e. the call name for the skill, so that the call could not take place at all. Secondly, the commands are not given in the correct order, which also leads to the skills not being executed correctly. In this context, Alexa is also criticized for not arranging the context correctly. For example: "When was Mozart born?" Alexa answers correctly with 1756. When asked: "And where was he born?" Alexa is overstrained, because she cannot relate to Mozart anymore. More "intelligence" would be desirable here.

The more critically minded users also repeatedly state that they feel uneasy at the thought of being permanently "bugged".

Only few convincing contents available

Of course, the available content is also decisive for the user experience. During the two-week trial phase, the most frequently

Conclusion: What is important for a good skill?

A well-designed skill includes taking the user by the hand and guiding him through the functions. Just as with graphical interfaces, operation can be made easier and satisfaction increased by the fact that dialogues support the user. In the case of graphical user interfaces, this can be done, for example, through dialog boxes. "Do you want to save the file before closing it" or "Open one of the last used files". If this assistance is not provided, users lose themselves or make mistakes. This also applies increasingly to voice interfaces. In addition to professional programming of skills, it is therefore also important to know the needs and abilities of users when operating voice interfaces and to incorporate these into the design of the skills. Facit Digital and VUI.agency have the tools and possibilities to support these processes and thus contribute to an improved user experience.

 

Expert recommendation from Patrick Esslinger (VUI.agency): In the study, the guided user design clearly scored better than the slimmed-down version. But one must not forget that more help always means more time. In contrast to written assistance, you cannot simply skip introductions and explanations or continue to click quickly. A middle course between guided and unguided usage design is decreasing assistance. This means that when the skill is called up for the first time, the user receives a detailed explanation or, depending on the complexity, several detailed explanations of the different levels of the skill. The second time the user calls up the skill, the explanations are reduced, and from the third time on they are completely omitted (optionally, explanations can be replayed as soon as the user has not used the skill for more than fourteen days, for example).

Christian Bopp

Christian Bopp

Managing Partner

089 7404205578