Cookbook
-
Gemini Audio Video Chat
Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!
-
Google Gemini Real Time Voice API
Talk to Gemini in real time using Google's voice API.
-
OpenAI Real Time Voice API
Talk to ChatGPT in real time using OpenAI's voice API.
-
Hello Llama: Stop Word Detection
A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!
-
Llama Code Editor
Create and edit HTML pages with just your voice! Powered by SambaNova systems.
-
Audio Input/Output with mini-omni2
Build a GPT-4o like experience with mini-omni2, an audio-native LLM.
-
Talk to Claude
Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.
-
Kyutai Moshi
Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.
-
Talk to Ultravox
Talk to Fixie.AI's audio-native Ultravox LLM with the transformers library.
-
Talk to Llama 3.2 3b
Use the Lepton API to make Llama 3.2 talk back to you!
-
Talk to Qwen2-Audio
Qwen2-Audio is a SOTA audio-to-text LLM developed by Alibaba.
-
Yolov10 Object Detection
Run the Yolov10 model on a user webcam stream in real time!
-
Video Object Detection with RT-DETR
Upload a video and stream out frames with detected objects (powered by RT-DETR) model.
-
Text-to-Speech with Parler
Stream out audio generated by Parler TTS!