IEEE VR 2023 Workshop on Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality (MASSXR)
March 25, 2023 Online (hosted in Shanghai, China)
NEWS
Early bird registration deadline is 28 Feb. More information can be found below.
MASSXR will have four great keynotes from Taku Komura, Sylvia Xueni Pan, Shrikanth Narayanan, and Julie R. Williamson.
The submission deadline is January 11.
Introduction
The recent advances in immersive technologies such as realistic digital humans, off-the-shelf XR devices that are able to capture user's speech, face, hands, and body, together with the development of sophisticated data-driven AI algorithms are offering a great opportunity for the automatic analysis and synthesis of social and affective cues in XR, which are essential for truly user-aware interaction systems. Although affective and social signal understanding and synthesis are studied in other fields (e.g., for human-robot interaction, intelligent virtual agents, or computer vision), it has not yet been explored adequately in Virtual and Augmented Reality. This demands extended-reality-specific theoretical and methodological foundations. Particularly, this workshop focuses on the following research questions:
How can we sense the user’s affective and social states using sensors available in XR?
How can we collect users’ interaction data in immersive situations?
How can we generate affective and social cues for digital humans/avatars in immersive interactions enabled by dialogue, voice, and non-verbal behaviors?
How can we develop systematic methodologies and techniques to develop plausible, trustable, personalized behaviors for social and affective interaction in XR?
The objective of this workshop on Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality is to bring together researchers and practitioners working in the field of social and affective computing with the ones on 3D computer vision/graphics and computer animation and discuss the current state and future directions, opportunities, and challenges. The workshop aims to establish a new platform for the development of immersive embodied intelligence at the intersection of Artificial intelligence (AI) and Extended Reality (XR). We expect that the workshop will provide an opportunity for researchers to develop new techniques and will lead to new collaboration among the participants.
Location and date
The workshop IEEE-MASSXR will take place during the 30th IEEE Conference on Virtual Reality and 3D User Interfaces (IEEE VR 2023) conference, which will be held from March 25-29, 2023, in Shanghai, China.
IEEE-MASSXR is a half-day workshop, and it will be held online.
KEYNOTE Speakers
Taku Komura (The University of Hong Kong, HK)
Title: Building Virtual Humans for Real-Time Communication and Interaction
Abstract: In this talk, I will present about our recent work towards constructing virtual avatars that can interact with humans. First, I will present Faceformer, our recent transformer-based character control that produces facial animation given the speech. The Faceformer can produce realistic lip and upper face movements from the speech considering the longer context of the speech. Extensive experiments and a perceptual user study show that our approach outperforms the existing state-of-the-arts. I next talk about the Periodic Autoencoder (PAE), which can learn periodic features from large unstructured motion datasets in an unsupervised manner. The character movements are decomposed into multiple latent channels that capture the non-linear periodicity of different body segments while progressing forward in time. Our method extracts a multi-dimensional phase space from full-body motion data, which effectively clusters animations and produces a manifold in which computed feature distances provide a better similarity measure than in the original motion space to achieve better temporal and spatial alignment. We demonstrate that the learned periodic embedding can significantly help to improve neural motion synthesis in a number of tasks, including diverse locomotion skills, style-based movements, dance motion synthesis from music, synthesis of dribbling motions in football, and motion query for matching poses within large animation databases.
Bio: Taku Komura joined The University of Hong Kong in 2020. Before joining The University of Hong Kong, he worked at the University of Edinburgh (2006-2020), City University of Hong Kong (2002-2006) and RIKEN (2000-2002). He received his BSc, MSc and PhD in Information Science from University of Tokyo. His research has focused on data-driven character animation, physically-based character animation, crowdfunding simulation, 3D modelling, cloth animation, anatomy-based modelling and robotics. Recently, his main research interests have been on physically-based animation and the application of machine learning techniques for animation synthesis. He received the Royal Society Industry Fellowship (2014) and the Google AR/VR Research Award (2017).
Sylvia Xueni Pan (Goldsmiths, University of London, UK)
Title: Immersive Social Interaction for Health and Healthcare
Abstract: Immersive technology has the potential to be transformative for many aspects of our lives, in particular in the area of how we socially engage with each other. In this talk, I will give a few examples of how we have been collaborating with neuroscientists, psychiatrists, and medical doctors on using virtual reality to improve the quality of different aspects of our real life.
Bio: Prof Sylvia Xueni Pan Ph.D. is a Professor of VR at Goldsmiths, University of London. She co-leads the Goldsmiths Social, Empathic, and Embodied VR lab (SeeVR Lab) and the MA/MSc in Virtual and Augmented Reality programme at Goldsmiths Computing. Her research interest is the use of Virtual Reality as a medium for real-time social interaction, in particular in the application areas of training and therapy.
Shrikanth Narayanan (University of Southern California, USA)
Title: Multimodal Machine Intelligence: From Signals to Human-centered Experiences
Abstract: Converging technological advances in sensing, machine learning, and computing offer tremendous opportunities for contextually-rich multimodal, spatiotemporal characterization of an individual’s behavior and state and of the environment within which they operate. Behavioral signals in the audio and visual modalities available in speech, spoken language, and body language offer a window into decoding not just what one is doing but how one is thinking and feeling. At the simplest level, this could entail determining who is interacting with whom about what and how using automated audio and video analysis of verbal and nonverbal behavior. Computational modeling can also target more complex, higher-level constructs, including human affective expression and processing. Behavioral signals combined with physiological signals such as heart rate, respiration, and skin conductance offer further possibilities for understanding the dynamic cognitive and affective states in context. Machine intelligence could also help detect, analyze and model deviation from what is deemed typical. This, in turn, is enabling novel possibilities for understanding and supporting various aspects of human-centered applications, notably in psychological health and well-being. This talk will highlight some of the advances in human-centered machine intelligence with a specific focus on communicative, affective and social behavior. It will also discuss the challenges and opportunities in creating trustworthy signal processing and machine learning approaches that are inclusive, equitable, robust, safe, and secure e.g., with respect to protected variables such as gender/race/age/ability, etc.
Bio: Shrikanth (Shri) Narayanan is University Professor and Niki & C. L. Max Nikias Chair in Engineering at the University of Southern California, where he is Professor of Electrical & Computer Engineering, Computer Science, Linguistics, Psychology, Neuroscience, Pediatrics, and Otolaryngology—Head & Neck Surgery, Director of the Ming Hsieh Institute and Research Director of the Information Sciences Institute. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Guggenheim Fellow, member of the European Academy of Sciences and Arts, and a Fellow of the National Academy of Inventors, the Acoustical Society of America, IEEE, ISCA, the American Association for the Advancement of Science (AAAS), the Association for Psychological Science, the Association for the Advancement of Affective Computing (AAAC) and the American Institute for Medical and Biological Engineering (AIMBE). He is a recipient of several honors, including the 2015 Engineers Council’s Distinguished Educator Award, a Mellon award for mentoring excellence, the 2005 and 2009 Best Journal Paper awards from the IEEE Signal Processing Society, and serving as its Distinguished Lecturer for 2010-11, a 2018 ISCA CSL Best Journal Paper award, and serving as an ISCA Distinguished Lecturer for 2015-16, Willard R. Zemlin Memorial Lecturer for ASHA in 2017, and the Ten Year Technical Impact Award in 2014 and the Sustained Accomplishment Award in 2020 from ACM ICMI. He has published over 1000 papers and has been granted eighteen U.S. patents. His research and inventions have led to technology commercialization, including through startups he co-founded: Behavioral Signals Technologies focused on the telecommunication services and AI-based conversational assistance industry, and Lyssn focused on mental health care delivery, treatment, and quality assurance. He served as the inaugural Vice President–Education for the IEEE Signal Processing Society 2020-22.
Julie R. Williamson (University of Glasgow, UK)
Title: Being Social in XR
Abstract: Virtual environments enable new forms of interaction and connection, but fall short of the meaningful experiences we expect during face to face interactions. In the real world, we “give off” a variety of social signals, such as position, posture, gesture, facial expression, and more that are crucial to interpersonal interaction and expression. Capturing or translating these signals into virtual environments may improve interaction, but we can also design beyond reality using fundamental human experiences as a starting point. My current work focuses on techniques for establishing stable interpersonal realities when interacting across the XR spectrum.
Bio: Dr. Williamson is a lecturer in HCI at the University of Glasgow. She is part of the Glasgow Interactive Systems Group (GIST), leading the Public and Performative Interaction theme within GIST. Her research focuses on how people use technology in public spaces and how interactive technologies can be designed given the “performative” aspects of using technology in public. Her current research looks at playful interfaces for public spaces that use embedded interaction, large format displays, and whole body input. Dr. Williamson completed her PhD in Computing Science at the University of Glasgow in January 2012, supervised by Stephen Brewster. Her thesis explored the social acceptability of using multimodal interfaces in public spaces, especially with respect to mobile interfaces. She completed her Bachelor of Science Cum Laude in Informatics at the University of California, Irvine.
Workshop contributions
Title: Augmented Reality and Affective Computing for Nonverbal Interaction Support of the Visually Impaired
Authors: Deniz Iren (Open Universiteit, NL) et al.
Abstract: Nonverbal cues such as gestures and facial expressions are indispensable in human communication. However, such an essential aspect of social interactions is inaccessible to the sight impaired. This issue can be alleviated with the assistance of augmented reality and affective computing techniques embodied as wearable technology. Even though both augmented reality and affective computing have been studied comprehensively, the real-life deployment and utilization of these techniques are hindered by the limitations of wearable devices in terms of computational capabilities and battery life. This calls for a holistic approach to implementing lightweight and robust affective computing methods. In this study, the authors present a prototype that combines facial expression and gesture recognition that is optimized to function on battery-powered wearable devices. Additionally, the prototype embodies a haptic sleeve that communicates the detected facial expressions and gestures to the wearer.
Title: Towards a multimodal VR trainer of voice emission and public speaking - work in progress
Authors: Magdalena Igras Cybulska (AGH UST, Kraków, Poland) et. al.
Abstract: GlossoVR is a virtual reality (VR) application that combines training in public speaking in front of a virtual audience and in voice emission in relaxation exercises. It is accompanied by digital signal processing (DSP) and artificial intelligence (AI) modules which provide automatic feedback on the vocal performance as well as the behavior and psychophysiology of the user. In particular, the authors address parameters of speech emotions, prosody and timbre, and the user's hand gestures and eye movement. This article reports the work in progress, focusing on the approaches, datasets and algorithms applied in the current state of the GlossoVR project.
Scope
This workshop invites researchers to submit original, high-quality research papers related to multi-modal affective and social behavior analysis and synthesis in XR. Relevant topics include, but are not limited to:
Analysis and synthesis of multi-modal social and affective cues in XR
Data-driven expressive character animation (e.g., face, gaze, gestures, ...)
AI algorithms for modeling social interactions with human- and AI-driven virtual humans
Machine learning for dyadic and multi-party interactions
Generating diverse, personalized, and style-based body motions
Music-driven animation (e.g., dance, instrument playing)
Multi-modal data collection and annotation in and for XR (e.g., using VR/AR headsets, microphones, motion capture devices, and 4D scanners)
Efficient and novel machine learning methods (e.g., transfer learning, self-supervised and few-shot learning, generative and graph models)
Subjective and objective analysis of data-driven algorithms for XR
Applications in healthcare, education, and entertainment (e.g., sign language)
Important Dates
Submission deadline: January 11, 2023 (Anywhere on Earth)
Notifications: January 20, 2023
Camera-ready deadline: January 31, 2023
Conference date: March 25-29, 2023
Workshop date: March 25, 2023 (Shanghai time)
PROGRAM
08:30 - 08:40 Introduction and Welcome
08:40 - 09:20 Invited Speaker: Shrikanth Narayanan (30 mins + 10 mins Q&A)
09:20 - 10:00 Invited Speaker: Taku Komura (30 mins + 10 mins Q&A)
10:00 - 10:30 Paper presentations (10 mins presentation + 5 mins Q&A)
10:30 - 10:50 Break
10:50 - 11:30 Invited Speaker: Sylvia Xueni Pan (30 mins + 10 mins Q&A)
11:30 - 12:10 Invited Speaker: Julie Williamson (30 mins + 10 mins Q&A)
12:10 - 13:10 Panel discussion with keynote speakers and organizers
13:10 - 13:20 Closing
All times are Shanghai, China local time (UTC+8). See further details here.
Submission Instructions
Authors are invited to submit research or work-in-progress papers:
Research paper: 4-6 pages + references
Work-in-progress paper: 2-3 pages + references
Papers will be included in the IEEE Xplore library. Authors are encouraged to submit videos to aid the program committee in reviewing their submissions. Please anonymize your submissions, as the workshop uses a double-blind review process. Authors of accepted papers are expected to register and present their papers at the workshop.
Papers should use the IEEE VR formatting guidelines and be submitted through the IEEE VR 2023 Precision Conference System (PCS).
When starting your submission, please make sure to select the relevant track for the workshop "IEEE VR 2023 - Multi-modal Affective and Social Behavior Analysis and Synthesis in Extended Reality".
Workshop format
The workshop will have two keynote speakers, a few research papers, and a panel discussion including the keynote speakers, organizers, and the audience in an interactive manner.
Organizing Committee
INTERNATIONAL Program Committee
Abdallah El Ali, University of Amsterdam, The Netherlands
Uttaran Bhattacharya, Adobe Inc
Mark Billinghurst, University of South Australia, Australia
Claudia Esteves, Universidad de Guanajuato, México
Hayley Hung, TU Delft, The Netherlands
Ugur Gudukbay, Bilkent University, Turkey
Sameer Kishore, Middlesex University Dubai
Stefan Kopp, Bielefeld University, Germany
Dinesh Manocha, University of Maryland, USA
Stacy Marsella, Northeastern University, USA
Rachel McDonnell, Trinity College Dublin, Ireland
Aline Normoyle, Bryn Mawr College, USA
Sylvie Xueni Pan, Goldsmiths, University of London, UK
Catherine Pelachaud, Sorbonne University, France
Voicu Popescu, Purdue University, USA
Albert Ali Salah, Utrecht University, The Netherlands
Ana Tajadura-Jiménez, University College London, UK
Nguyen Tan Viet Tuyen, King's College London, UK
Julie Williamson, University of Glasgow, UK
Contact
If you have any questions or remarks regarding this workshop, please contact Zerrin Yumak (Z.Yumak[at]uu.nl) or Funda Durupinar (funda.durupinarbabur[at]umb.edu).