How To Add an Interactive AI Agent to SightLab VR Pro and Vizard

To start, please watch this demonstration video of an AI agent in VR with SightLab VR Pro and Vizard.

This tutorial will guide you through how you can add an interactive AI agent to a SightLab VR Pro session. To get access to a SightLab VR Pro demo, click here or contact sales@worldviz.com for a quote.

Note: You will need the latest version of SightLab VR Pro. Contact support@worldviz.com if you currently own SightLab VR Pro and just need the latest version.

Key Features

Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
Choose from OpenAI models (including GPT-4, custom GPTs), Anthropic models (like Claude 3 Opus). Requires an API key.
Modify avatar appearance, animations, environment, and more. Works with most avatar libraries (Avaturn, ReadyPlayerMe, Mixamo, Rocketbox, Reallusion, etc.).
Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
Use speech recognition to converse using your voice or text-based input.
Choose from high-quality voices from Open AI TTS or Eleven Labs (requires API)
Supports passtgrough AR as well as VR
Train the agent as it adapts using conversation history and interactions.
Works with all features of SightLab, including data collection and visualizations, experiment setup and design, eye tracking measurements, multi-user capabilities and much more.

Instructions

Installation
- Ensure you have the required libraries installed using the Vizard Package Manager. These include:some text
  - openai (for OpenAI GPT agents)
  - anthropic (for Anthropic Claude agent)
  - elevenlabs (for ElevenLabs text-to-speech) tested with version1.0.0
  - SpeechRecognition
  - pyaudio
  - python-vlc
  - Need to install vlc player (for Open AI TTS). version 3.0.20 or highersome text
    - https://get.videolan.org/vlc/3.0.20/win64/vlc-3.0.20-win64.exe
  - For elevenlabs you may need to install ffmpeg and mpv player
  - Note: Requires an active internet connection
API Keys
- Obtain API keys from OpenAI (if using Chat GPT), Anthropic (if using that model), and ElevenLabs (if using elevenlabs instead of OpenAI's TTS).
- Create a folder named "keys" in your SightLab root directory and place these text files inside:some text
  - key.txt: Contains your OpenAI API key.
  - elevenlabs_key.txt: Contains your ElevenLabs API key (if using elevenlabs)..
  - ffmpeg_path.txt: Contains the path to the ffmpeg bin folder. On some setups this is not needed (ffmpeg download).some text
    - Copy path where bin directory is and paste that into this text file.
- If using the Anthropic model, create an anthropic_key.txt file containing your Anthropic API key.
- For Gemini can place a text file called gemini_key.txt.
Configuration
- Open the AI_Agent_Config.py script in the configs folder and configure the following options. Add new config files if you want to have multiple configurations (would then change the top line in the AI_Agent.py script where it's being imported).some text
  - AI_MODEL: Choose between 'CHAT_GPT' and 'CLAUDE'.
  - OPENAI_MODEL: Specify the OpenAI model name (e.g., "gpt-4").
  - ANTHROPIC_MODEL: Specify the Anthropic model name (e.g., "claude-3-opus-20240229").
  - USE_SPEECH_RECOGNITION: Toggle on or off using speech recognition vs. text based interactions.
  - SPEECH_MODEL: Choose Open AI TTS or Eleven Labs.
  - ELEVEN_LABS_VOICE: Choose Voice for Eleven Labs.
  - OPEN_AI_VOICE: Choose Voice for Open AI TTS.
  - USE_GUI: Choose if you want to use the SightLab GUI to select environments and options.
  - chatgpt_prompt_file: Save prompts as text files in "prompts" folder. Reference which one to use here.
  - USE_PASSTHROUGH: Choose if using Mixed Reality Passthrough (select 'empty.osgb' for environment).
  - ENVIRONMENT: Not necessary to set if using GUI. User your own or find ones in utils/resources/environment.
  - AVATAR_MODEL: Add avatar model to use. Use your own or find some in utils/resources/avatar/full_body.
  - BIOPAC_ON: Choose whether to connect with Biopac Acqknowledge to measure physiological responses.
  - And other configurations as needed such as Avatar options (see below), GUI options, max token size, history and more (refer to the script for details).
Running the Script
- Run AI_Agent.py to start

Interaction
- Hold either the 'c' key or the RH grip button to start speaking, let go to stop and the AI agent will respond. If INTERUPT_AI is True, you can press 'c' or RH grip to interrupt and speak again. some text
  - If HOLD_KEY_TO_SPEAK is False, press the 'c' key once to start speaking to the AI.
  - If USE_KEY_WORD is set to True, say "Agent" before each interaction.
  - If SPEECH_RECOGNITION is set to False, press 'c' to bring up the chat window.
- To stop the conversation, type "q" and click "OK" if using text, or you can either say “exit” or close the script if using speech.

‍

Modifying Environment and Avatar(s)

See this page for places to get new assets
Avatars- (works with Avaturn, ReadyPlayerMe, Reallusion, Mixamo, RocketBox, and other fbx avatar libraries). For adding new avatars see this page https://sightlab.worldviz.com/examplestemplates/adding-avatar-agents. Bring your avatar into Inspector to verify textures, scale, etc.

Reallusion Avatar (available on ActorCore Library)

Place the environment model in utils/resources/environment for default location, or reference the new path in the config file.

Decker’s Office Environment

Modify the config file to update the environment and avatar path, as well as avatar options.

‍

Obtaining API Keys
To use certain features of the AI Agent, you'll need to obtain API keys from the following services:

OpenAI (for ChatGPT and Open AI Text to Speech):some text
1. Visit the OpenAI website (not the ChatGPT login page): https://openai.com/
2. Sign up for an account if you don't have one, or log in if you already do.
3. Navigate to the API section of your account.
4. Click "Create a new secret key" and copy the key.
5. Paste the copied key into a text file named "key.txt" and place it in your root SightLab folder.
Eleven Labs (for ElevenLabs Text-to-Speech):some text
1. Log in to your ElevenLabs account: https://elevenlabs.io/.
2. Click your profile icon in the top-right corner.
3. Click the eye icon next to the "API Key" field.
4. Copy your API key.
5. Paste the copied key into a text file named "elevenlabs_key.txt" and place it in your root SightLab folder.
Anthropic API: some text
1. Go to the Anthropic website (https://www.anthropic.com/) and click on the "Sign Up" button in the top right corner.
2. Fill out the sign-up form with your email address and other required information. You may need to provide details about your intended use case for the API.
3. After submitting the form, you should receive a confirmation email. Follow the instructions in the email to verify your account.
4. Once your account is verified, log in to the Anthropic website using your credentials.
5. Navigate to the API section of your account dashboard.
Gemini and Gemini Ultrasome text
1. https://aistudio.google.com/app/apikey
2. In Package Manager- cmd use this: install -q -U google-generativeai
3. https://ai.google.dev/tutorials/python_quickstart
4. More instructions on using Gemini to come soon.

Avatar Configuration Options

TALK_ANIMATION: The animation index for the avatar's talking animation.
IDLE_ANIMATION: The animation index for the avatar's idle or default pose.
AVATAR_POSITION: A list defining the avatar's starting position in the virtual environment (format: [x, y, z]).
AVATAR_EULER: A list specifying the avatar's initial rotation in Euler angles (format: [yaw, pitch, roll]).
NECK_BONE, HEAD_BONE, SPINE_BONE: String names of the bones used for the follow viewpoint (can find these by opening the avatar model in Inspector).
TURN_NECK: Boolean flag if the neck needs to initially be turned.
NECK_TWIST_VALUES: List of values defining the neck's twisting motion (format: [yaw, pitch, roll]).
USE_MOUTH_MORPH: Boolean flag to activate or deactivate mouth morphing animations during speech (not needed for Rocketbox avatars).
MOUTH_OPEN_ID: The ID number of the morph target for opening the mouth (find in Inspector).
MOUTH_OPEN_AMOUNT: The amount by which the mouth opens, typically a value between 0 and 1.
BLINKING: Boolean flag to enable or disable blinking animations.
BLINK_ID: The ID number of the morph target for blinking.
DISABLE_LIGHTING_AVATAR: Boolean flag if lighting in the environment is too blown out for the avatar.
ATTACH_FACE_LIGHT: Boolean flag to attach a light source to the avatar's face.
FACE_LIGHT_BONE: The name of the bone to which the face light is attached if ATTACH_FACE_LIGHT is true.
MORPH_DURATION_ADJUSTMENT: If mouth movement goes too long, can adjust this.

Additional Information:

For prompts, add "" quotation marks around GPT prompt and use "I am... " for configuring agent. For Anthropic do not need quotes and can use "You are..."
For elevenlabs, refer to the ElevenLabs Python documentation for more details: https://github.com/elevenlabs.

There is also a version of this available that just runs as an education based tool, where you can select objects in a scene and get information and labels on that item (such as paintings in an art gallery). Contact support@worldviz.com for this version.

Issues and Troubleshooting

There may be an error if you have your microphone set to your VR headset and the sound output device set to not be the headset
May see an error if are using the free version of elevenlabs and run out of the 10,000 character limit (paid accounts get larger quotas)
ffplay error with elevenlabs - may need to install ffmpeg and add it to the Vizard environment path https://www.gyan.dev/ffmpeg/builds/
mpv player error with elevenlabs- may need to install mpv and add it to the Vizard environment path https://mpv.io/installation/

‍

Need Help?

For more instructions on using SightLab VR Pro see here https://sightlab.worldviz.com/

‍

We Want to Hear from You!

Try integrating an AI agent into your SightLab VR Pro session and share your experiences with us. Your feedback inspires our next Tech Tips!

For more information on SightLab, Vizard or any of Wordviz’s solutions for VR enterprise and research applications contact sales@worldviz.com