IEEE VR 2023 Workshop on Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality (MASSXR)

March 25, 2023 Online (hosted in Shanghai, China)

NEWS

Introduction

The recent advances in immersive technologies such as realistic digital humans, off-the-shelf XR devices that are able to capture user's speech, face, hands, and body, together with the development of sophisticated data-driven AI algorithms are offering a great opportunity for the automatic analysis and synthesis of social and affective cues in XR, which are essential for truly user-aware interaction systems.  Although affective and social signal understanding and synthesis are studied in other fields (e.g., for human-robot interaction, intelligent virtual agents, or computer vision), it has not yet been explored adequately in Virtual and Augmented Reality. This demands extended-reality-specific theoretical and methodological foundations. Particularly, this workshop focuses on the following research questions:

The objective of this workshop on Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality is to bring together researchers and practitioners working in the field of social and affective computing with the ones on 3D computer vision/graphics and computer animation and discuss the current state and future directions, opportunities, and challenges. The workshop aims to establish a new platform for the development of immersive embodied intelligence at the intersection of Artificial intelligence (AI) and Extended Reality (XR). We expect that the workshop will provide an opportunity for researchers to develop new techniques and will lead to new collaboration among the participants. 

Location and date

KEYNOTE Speakers

Taku Komura (The University of Hong Kong, HK)

Title: Building Virtual Humans for Real-Time Communication and Interaction

Abstract: In this talk, I will present about our recent work towards constructing virtual avatars that can interact with humans. First, I will present Faceformer, our recent transformer-based character control that produces facial animation given the speech. The Faceformer can produce realistic lip and upper face movements from the speech considering the longer context of the speech. Extensive experiments and a perceptual user study show that our approach outperforms the existing state-of-the-arts. I next talk about the Periodic Autoencoder (PAE), which can learn periodic features from large unstructured motion datasets in an unsupervised manner. The character movements are decomposed into multiple latent channels that capture the non-linear periodicity of different body segments while progressing forward in time. Our method extracts a multi-dimensional phase space from full-body motion data, which effectively clusters animations and produces a manifold in which computed feature distances provide a better similarity measure than in the original motion space to achieve better temporal and spatial alignment. We demonstrate that the learned periodic embedding can significantly help to improve neural motion synthesis in a number of tasks, including diverse locomotion skills, style-based movements, dance motion synthesis from music, synthesis of dribbling motions in football, and motion query for matching poses within large animation databases. 

Bio: Taku Komura joined The University of Hong Kong in 2020. Before joining  The University of Hong Kong, he worked at the University of Edinburgh (2006-2020), City University of Hong Kong (2002-2006) and RIKEN (2000-2002). He received his BSc, MSc and PhD in Information Science from University of Tokyo. His research has focused on data-driven character animation, physically-based character animation, crowdfunding simulation, 3D modelling, cloth animation, anatomy-based modelling and robotics. Recently, his main research interests have been on physically-based animation and the application of machine learning techniques for animation synthesis. He received the Royal Society Industry Fellowship (2014) and the Google AR/VR Research Award (2017).

Sylvia Xueni Pan  (Goldsmiths, University of London, UK)

Title: Immersive Social Interaction for Health and Healthcare

Abstract: Immersive technology has the potential to be transformative for many aspects of our lives, in particular in the area of how we socially engage with each other. In this talk, I will give a few examples of how we have been collaborating with neuroscientists, psychiatrists, and medical doctors on using virtual reality to improve the quality of different aspects of our real life. 

Bio: Prof Sylvia Xueni Pan Ph.D. is a Professor of VR at Goldsmiths, University of London. She co-leads the Goldsmiths Social, Empathic, and Embodied VR lab (SeeVR Lab) and the MA/MSc in Virtual and Augmented Reality programme at Goldsmiths Computing. Her research interest is the use of Virtual Reality as a medium for real-time social interaction, in particular in the application areas of training and therapy.

Shrikanth Narayanan (University of Southern California, USA)

Title:  Multimodal Machine Intelligence: From Signals to Human-centered Experiences

Abstract:  Converging technological advances in sensing, machine learning, and computing offer tremendous opportunities for contextually-rich multimodal, spatiotemporal characterization of an individual’s behavior and state and of the environment within which they operate.   Behavioral signals in the audio and visual modalities available in speech, spoken language, and body language offer a window into decoding not just what one is doing but how one is thinking and feeling. At the simplest level, this could entail determining who is interacting with whom about what and how using automated audio and video analysis of verbal and nonverbal behavior. Computational modeling can also target more complex, higher-level constructs, including human affective expression and processing. Behavioral signals combined with physiological signals such as heart rate, respiration, and skin conductance offer further possibilities for understanding the dynamic cognitive and affective states in context. Machine intelligence could also help detect, analyze and model deviation from what is deemed typical. This, in turn, is enabling novel possibilities for understanding and supporting various aspects of human-centered applications, notably in psychological health and well-being.   This talk will highlight some of the advances in human-centered machine intelligence with a specific focus on communicative, affective and social behavior. It will also discuss the challenges and opportunities in creating trustworthy signal processing and machine learning approaches that are inclusive, equitable, robust, safe, and secure e.g., with respect to protected variables such as gender/race/age/ability, etc.

Bio: Shrikanth (Shri) Narayanan is University Professor and Niki & C. L. Max Nikias Chair in Engineering at the University of Southern California, where he is Professor of Electrical & Computer Engineering, Computer Science, Linguistics, Psychology, Neuroscience, Pediatrics, and Otolaryngology—Head & Neck Surgery, Director of the Ming Hsieh Institute and Research Director of the Information Sciences Institute. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies.  He is a Guggenheim Fellow, member of the European Academy of Sciences and Arts, and a Fellow of the National Academy of Inventors, the Acoustical Society of America, IEEE, ISCA, the American Association for the Advancement of Science (AAAS), the Association for Psychological Science, the Association for the Advancement of Affective Computing (AAAC) and the American Institute for Medical and Biological Engineering (AIMBE).  He is a recipient of several honors, including the 2015 Engineers Council’s Distinguished Educator Award, a Mellon award for mentoring excellence, the 2005 and 2009 Best Journal Paper awards from the IEEE Signal Processing Society, and serving as its Distinguished Lecturer for 2010-11, a 2018 ISCA CSL Best Journal Paper award, and serving as an ISCA Distinguished Lecturer for 2015-16, Willard R. Zemlin Memorial Lecturer for ASHA in 2017, and the Ten Year Technical Impact Award in 2014 and the Sustained Accomplishment Award in 2020 from ACM ICMI. He has published over 1000 papers and has been granted eighteen U.S. patents. His research and inventions have led to technology commercialization, including through startups he co-founded: Behavioral Signals Technologies focused on the telecommunication services and AI-based conversational assistance industry, and Lyssn focused on mental health care delivery, treatment, and quality assurance. He served as the inaugural Vice President–Education for the IEEE Signal Processing Society 2020-22.

Julie R. Williamson (University of Glasgow, UK)

Title: Being Social in XR

Abstract: Virtual environments enable new forms of interaction and connection, but fall short of the meaningful experiences we expect during face to face interactions.  In the real world, we “give off” a variety of social signals, such as position, posture, gesture, facial expression, and more that are crucial to interpersonal interaction and expression.  Capturing or translating these signals into virtual environments may improve interaction, but we can also design beyond reality using fundamental human experiences as a starting point.  My current work focuses on techniques for establishing stable interpersonal realities when interacting across the XR spectrum.

Bio: Dr. Williamson is a lecturer in HCI at the University of Glasgow. She is part of the Glasgow Interactive Systems Group (GIST), leading the Public and Performative Interaction theme within GIST. Her research focuses on how people use technology in public spaces and how interactive technologies can be designed given the “performative” aspects of using technology in public. Her current research looks at playful interfaces for public spaces that use embedded interaction, large format displays, and whole body input. Dr. Williamson completed her PhD in Computing Science at the University of Glasgow in January 2012, supervised by Stephen Brewster. Her thesis explored the social acceptability of using multimodal interfaces in public spaces, especially with respect to mobile interfaces. She completed her Bachelor of Science Cum Laude in Informatics at the University of California, Irvine. 

Workshop contributions



Authors: Deniz Iren (Open Universiteit, NL) et al.


Abstract: Nonverbal cues such as gestures and facial expressions are indispensable in human communication. However, such an essential aspect of social interactions is inaccessible to the sight impaired. This issue can be alleviated with the assistance of augmented reality and affective computing techniques embodied as wearable technology. Even though both augmented reality and affective computing have been studied comprehensively, the real-life deployment and utilization of these techniques are hindered by the limitations of wearable devices in terms of computational capabilities and battery life. This calls for a holistic approach to implementing lightweight and robust affective computing methods. In this study, the authors present a prototype that combines facial expression and gesture recognition that is optimized to function on battery-powered wearable devices. Additionally, the prototype embodies a haptic sleeve that communicates the detected facial expressions and gestures to the wearer.


Authors: Magdalena Igras Cybulska (AGH UST, Kraków, Poland) et. al.

Abstract:  GlossoVR is a virtual reality (VR) application that combines training in public speaking in front of a virtual audience and in voice emission in relaxation exercises. It is accompanied by digital signal processing (DSP) and artificial intelligence (AI) modules which provide automatic feedback on the vocal performance as well as the behavior and psychophysiology of the user. In particular, the authors address parameters of speech emotions, prosody and timbre, and the user's hand gestures and eye movement. This article reports the work in progress, focusing on the approaches, datasets and algorithms applied in the current state of the GlossoVR project.

Scope

This workshop invites researchers to submit original, high-quality research papers related to multi-modal affective and social behavior analysis and synthesis in XR. Relevant topics include, but are not limited to:

Important Dates

PROGRAM 

All times are Shanghai, China local time (UTC+8). See further details here.

Time Converter

Submission Instructions

Authors are invited to submit research or work-in-progress papers:

Papers will be included in the IEEE Xplore library. Authors are encouraged to submit videos to aid the program committee in reviewing their submissions. Please anonymize your submissions, as the workshop uses a double-blind review process. Authors of accepted papers are expected to register and present their papers at the workshop. 

Papers should use the IEEE VR formatting guidelines and be submitted through the IEEE VR 2023 Precision Conference System (PCS).

When starting your submission, please make sure to select the relevant track for the workshop "IEEE VR 2023 - Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality".


Workshop format

The workshop will have two keynote speakers, a few research papers, and a panel discussion including the keynote speakers, organizers, and the audience in an interactive manner.

Organizing Committee

Zerrin Yumak

Utrecht University, The Netherlands

Funda Durupinar

University of Massachusetts Boston, USA

Oya Celiktutan

King's College, London

Pablo Cesar

CWI and TU Delft, The Netherlands

Aniket Bera

Purdue University, USA

Mar Gonzalez-Franco

Google Labs, USA

INTERNATIONAL Program Committee



Contact

If you have any questions or remarks regarding this workshop, please contact Zerrin Yumak (Z.Yumak[at]uu.nl) or Funda Durupinar (funda.durupinarbabur[at]umb.edu).