Cookbook

  • 🗣👀 Gemini Audio Video Chat


    Stream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation!

    Demo

    Code

  • 🗣 Google Gemini Real Time Voice API


    Talk to Gemini in real time using Google's voice API.

    Demo

    Code

  • 🗣 OpenAI Real Time Voice API


    Talk to ChatGPT in real time using OpenAI's voice API.

    Demo

    Code

  • 🗣 Hello Llama: Stop Word Detection


    A code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code!

    Demo

    Code

  • 🤖 Llama Code Editor


    Create and edit HTML pages with just your voice! Powered by SambaNova systems.

    Demo

    Code

  • 🗣 Audio Input/Output with mini-omni2


    Build a GPT-4o like experience with mini-omni2, an audio-native LLM.

    Demo

    Code

  • 🗣 Talk to Claude


    Use the Anthropic and Play.Ht APIs to have an audio conversation with Claude.

    Demo

    Code

  • 🗣 Kyutai Moshi


    Kyutai's moshi is a novel speech-to-speech model for modeling human conversations.

    Demo

    Code

  • 🗣 Talk to Ultravox


    Talk to Fixie.AI's audio-native Ultravox LLM with the transformers library.

    Demo

    Code

  • 🗣 Talk to Llama 3.2 3b


    Use the Lepton API to make Llama 3.2 talk back to you!

    Demo

    Code

  • 🤖 Talk to Qwen2-Audio


    Qwen2-Audio is a SOTA audio-to-text LLM developed by Alibaba.

    Demo

    Code

  • 📷 Yolov10 Object Detection


    Run the Yolov10 model on a user webcam stream in real time!

    Demo

    Code

  • 📷 Video Object Detection with RT-DETR


    Upload a video and stream out frames with detected objects (powered by RT-DETR) model.

    Demo

    Code

  • 🔈 Text-to-Speech with Parler


    Stream out audio generated by Parler TTS!

    Demo

    Code