Kinesthetic Language Learning in Virtual Reality


Fluid Interfaces Group

Lei Xia

Tapping into the physicality of language to enhance the way we learn.

by Christian David Vázquez Machado

The way we speak, organize our thoughts, and even structure our metaphors is directly related to our bodies. For instance, we use "up" as a metaphor for happiness because our bodies are usually upright when we feel good or energetic–we say things like "I’m feeling up," or "My spirits rose." Similarly, we use "down" as a metaphor for sadness, as when we are sad or feeling ill, our bodies are hunched or lying down ("I fell into a depression."). This connection between language and body is so intrinsic that sensorimotor regions in our brain associated with performing motor actions light up when we speak the word associated with that action. This relatedness also holds true for second languages, and it has been widely explored as a means to enhance the learning of new concepts and vocabulary. Studies have shown that performing iconic gestures when we learn a new word (such as waving when you say hello in a new language) increases recall and retention of that new vocabulary word. Performing actions, or "enacting," also has a positive impact when learning a concept in a new language [1,2,3]. This is how many children learn their first languages at home. Unfortunately, despite this leverageable connection between the body and language, second-language education remains primarily audiovisual in nature.

Previously, I’ve written about learning languages using augmented reality (AR) to seamlessly blend learning a language into one’s daily life. It’s fair to say, however, that not everybody learns the same way. Some people prefer a more structured approach to learning, and not everyone is going to be so eager to wear an AR headset while traveling in order to learn a new language. That’s why the Fluid Interfaces group is also looking at how language learning happens in the classroom. The setting is familiar: sitting in front of a blackboard and a teacher, memorizing lists of new words, reading sentences out loud, and occasionally fumbling around with our own understanding to catch up with the class. What if we can liven up this experience, leveraging kinesthetic elements and tapping into the physicality of language not only to enhance the way we learn, but also to engage playfully with the material?


Fluid Interfaces Group

Virtual Reality (VR) has often been proposed as a platform that affords embodied learning. VR can transport students to new places from the comfort of the classroom and create powerful learning experiences. Due to its immersive nature and body-tracking capabilities, VR can allow learners to do kinesthetic activities in an environment that is able to track and understand their movements, provide real-time feedback, and engage them in activities within novel contexts that strongly relate to their physical actions. Recent trends have also made affordable virtual reality devices readily available, allowing students to turn their smartphones into headsets. Google Cardboard variants make VR an approachable platform for teachers who want to try it out in their classrooms.


Fluid Interfaces Group

Over the past few months, I’ve been interested in understanding how we can leverage the natural affordances of virtual reality as a platform for language education. How could we tap into the connection between language and body to enhance the way we learn? In order to explore this question, we developed an application for the HTC Vive called Words in Motion, which leverages two powerful elements of VR:

  1. the ability to create meaningful context and
  2. the ability to understand kinesthetic interactions using body tracking.

The concept for Words in Motion is fairly straightforward: objects in the virtual world are embedded with words in the target language and associated with an action that could be performed with the object. For instance, making strokes with a virtual brush would make the word "pintar" (Spanish for "paint") float momentarily in front of the user. Multiple words could be imbued into a single object, allowing for multiple interactions that directly correlate to the associated vocabulary words. For instance, a virtual cup could be trained to recognize drinking or pouring. Moreover, any user is able to "teach" new actions by grabbing an object and performing the action a few times. You can then imagine a virtual playground with myriads of oddities, containing the collective knowledge of words not only provided by teachers, but by students as well, as they leave tidbits of wisdom behind for other learners to discover. The kinesthetic capabilities of the platform can also be combined with additional context to strengthen its benefits. Not only can students learn the word "cortar" (Spanish for "to chop") using a kitchen knife, but they could do so while preparing a plate of food in a virtual kitchen.

In the 1960s, James Asher proposed a pedagogical approach called Total Physical Response (TPR). TPR was simple: students carried out actions to fulfill orders spoken by a teacher in the target second language. This technique leveraged kinesthetic elements to teach students new vocabulary, but it had downsides, like the fact that TPR didn’t incorporate any conversational elements. Learners were tasked to perform actions, but were not required to communicate with others, which is the fundamental element of language. So, building on these ideas, we designed an activity that leverages the kinesthetic capabilities of Words in Motion, but also addresses some of the disadvantages noted in purely kinesthetic experiences like TPR. Our motivation was to engage learners in a game, incorporating both kinesthetic and conversational elements such that it could be employed as a classroom activity–in this instance, deployed at a university in Tokyo to support English-language instruction of college-level students.


Fluid Interfaces Group

The game’s goal was to perform a sequence of actions in the virtual kitchen environment with the correct set of objects. The activity was designed with multiple players in mind, with one player immersed in VR and the others participating from the real world. This was purposely done to address the fact that many educational facilities have constraints that limit the amount of room-scale VR devices that they can afford or accommodate in a classroom. The participant inside the virtual environment takes the role of the "performer," while the participants outside VR are denoted as observers.

The role of the "performer" is to execute the right sequence of actions using the target objects in the kitchen. However, there is no indication within the virtual environment that informs the performer

  1. which action needs to be performed,
  2. how to perform the action in space, or
  3. which object to perform the action with.

Participants outside the virtual environment take on the role of the observers. Observers have two views on external monitors: on one screen, they can monitor the performer’s point of view; the other screen displays an animation showing the performer’s required path, along with instructions of which object to perform it with. The role of the observers is to communicate verbally with the performer the actions the performer needs to do, how to do them, and which object to look for in the virtual kitchen.


Fluid Interfaces Group

There is no method that enforces how communication between the observer and the performer happens. This allows the teacher or moderator to set constraints that give the right measure of difficulty according to the participant’s fluency in the target language. This can range from full second language communication, to first language instruction (where only the performer learns the target language by kinesthetic means). This allows teachers to engage the whole classroom as observers that practice conversationally by communicating in the second language with the performers, taking turns to engage in kinesthetic reinforcement. Creating these semi-structured instances of kinesthetic language learning with the Words in Motion system is straightforward. Any object in a virtual scene can be used as a source for learning, and environments can be easily swapped to incorporate meaningful context.

We carried out a study with 60 subjects recruited from the university’s campus to explore the effects of the Words in Motion kinesthetic approach on second language learning. Students would learn 20 new vocabulary words in Spanish. Participants were divided into three conditions:

  1. a text-only condition outside of virtual reality,
  2. learning kinesthetically in virtual reality, and
  3. observing words inside virtual reality without performing actions.

Results showed that participants in the text-only condition initially outperformed virtual kinesthetic and virtual non-kinesthetic subjects for equal exposure time to the material. Both groups in virtual reality performed similarly immediately after exposure. However, a week after exposure, subjects in the virtual kinesthetic group significantly outperformed those in the virtual non-kinesthetic group, and showed no difference to participants in the text-only group. Moreover, the number of times a word was remembered by a student was directly correlated to the number of times the action associated with that word was performed both in immediate and delayed evaluations. In other words, performing actions in VR indeed had a positive effect on the retention of new vocabulary words in a controlled environment.

Our findings support the idea that virtual reality can benefit from explicit kinesthetic elements to enhance language learning activities. But beyond the numbers, we can also imagine a future were learning a new language is more engaging and playful. Instead of sitting down and memorizing lists of words, out of context (and quite possibly meaningless to the student), we can engage our bodies and tap into the physicality of language both to learn better and engage better with the class.

Christian David Vázquez Machado is a research assistant in the Fluid Interfaces Group and a ML Learning Fellow.

This post also appeared on Medium.


[1] S. D. Kelly, T. McDevitt and M. Esch, "Brief Training with Co-Speech Gesture Lends a Hand to Word Learning in a Foreign Language.," Language and Cognitive Processes, vol. 24, no. 2, pp. 313-334, 2009.

[2] K. M. Mayer, I. B. Yildiz, M. Macedonia and K. Von-Kriegstein, "Motor and visual brain areas support foreign language word learning," 2014.

[3] M. Tellier, "The effect of gestures on second language memorisation by young children," Gesture, vol. 8, no. 2, pp. 219-235, 2008.

Related Content