GenECA

Fellow

Santosh Patapati

Location

Remote

Project Type

multimodal interactions with embodied conversational agents

Size

demo at cvpr 2025

Introduces a robust framework for multimodal interactions with embodied conversational agents, emphasizing emotion-sensitive interaction.

Contributions are always welcome! Please reach out to santosh@cyrionlabs.org if you're interested in volunteering to help out with GenECA. We're a team of 14 volunteers (PhD students, graduate students, undergrads, and even high schoolers) currently working on finishing up the final features and writing the documentation to fully open-source the project. GenECA is the culmination of over 3 years of work, and we're in the final stretch!

Demo Video

This video provides a demonstration of the Generalized Embodied Conversational Agent (GenECA) Framework applied for a psychotherapy use case. GenECA allows developers to create highly customizable multimodal Embodied Conversational Agents (ECAs) at a level never previously seen in the literature.

Additional Details

The demonstration shown in the video is a recreation of Tessa, a past ECA designed for psychotherapeutic interventions. Tessa was recreated using only the GUI aspect of the framework, no technical changes or coding was needed. The 3D model, TTS voice, environment, animations, dialogue tree, behavior mapping, multimodal features, tracking algorithms, backchannel movements, and more were all configured using the GUI.

Note that a 3D model with relatively simple animations and low graphics quality was used for demonstration purposes in this video. For this reason, animations may seem rudimentary. However, we have observed that the framework can function with photorealistic models and highly complex movements. The idle movement may also be complex and contain many different cycles. In this demonstration video, the idle animation consists of two cycles, as the ECA sways from side to side.

We have only tested the framework with up to 50 hand gesture options. However, we have observed that the framework slows (higher delays between interactions) as the number of gesture and backchannel options scale. The same pattern applies for backchannel behaviors (e.g., eyebrow movements, smile intensity, blinking, etc.). In this video, the ECA has access to 19 unique hand gestures and backchannel movements.

Blackridge Elm Capital

8217 Cottage Drive, McKinney, Texas, 75070

424-392-9227

contact@blackridgeelm.com

Blackridge Elm Capital

8217 Cottage Drive, McKinney, Texas, 75070

424-392-9227

contact@blackridgeelm.com

Blackridge Elm Capital

8217 Cottage Drive, McKinney, Texas, 75070

424-392-9227

contact@blackridgeelm.com