Advanced Configuration

Track Constraints

You can specify the track_constraints parameter to control how the data is streamed to the server. The full documentation on track constraints is here.

For example, you can control the size of the frames captured from the webcam like so:

track_constraints = {
    "width": {"exact": 500},
    "height": {"exact": 500},
    "frameRate": {"ideal": 30},
}
webrtc = WebRTC(track_constraints=track_constraints,
                modality="video",
                mode="send-receive")

Warning

WebRTC may not enforce your constaints. For example, it may rescale your video (while keeping the same resolution) in order to maintain the desired (or reach a better) frame rate. If you really want to enforce height, width and resolution constraints, use the rtp_params parameter as set "degradationPreference": "maintain-resolution".

image = WebRTC(
    label="Stream",
    mode="send",
    track_constraints=track_constraints,
    rtp_params={"degradationPreference": "maintain-resolution"}
)

The RTC Configuration

You can configure how the connection is created on the client by passing an rtc_configuration parameter to the WebRTC component constructor. See the list of available arguments here.

When deploying on a remote server, an rtc_configuration parameter must be passed in. See Deployment.

Reply on Pause Voice-Activity-Detection

The ReplyOnPause class runs a Voice Activity Detection (VAD) algorithm to determine when a user has stopped speaking.

First, the algorithm determines when the user has started speaking.
Then it groups the audio into chunks.
On each chunk, we determine the length of human speech in the chunk.
If the length of human speech is below a threshold, a pause is detected.

The following parameters control this argument:

from gradio_webrtc import AlgoOptions, ReplyOnPause, WebRTC

options = AlgoOptions(audio_chunk_duration=0.6, # (1)
                      started_talking_threshold=0.2, # (2)
                      speech_threshold=0.1, # (3)
                      )

with gr.Blocks as demo:
    audio = WebRTC(...)
    audio.stream(ReplyOnPause(..., algo_options=algo_options)
    )

demo.launch()

This is the length (in seconds) of audio chunks.
If the chunk has more than 0.2 seconds of speech, the user started talking.
If, after the user started speaking, there is a chunk with less than 0.1 seconds of speech, the user stopped speaking.

Stream Handler Input Audio

You can configure the sampling rate of the audio passed to the ReplyOnPause or StreamHandler instance with the input_sampling_rate parameter. The current default is 48000

from gradio_webrtc import ReplyOnPause, WebRTC

with gr.Blocks as demo:
    audio = WebRTC(...)
    audio.stream(ReplyOnPause(..., input_sampling_rate=24000)
    )

demo.launch()

Stream Handler Output Audio

You can configure the output audio chunk size of ReplyOnPause (and any StreamHandler) with the output_sample_rate and output_frame_size parameters.

The following code (which uses the default values of these parameters), states that each output chunk will be a frame of 960 samples at a frame rate of 24,000 hz. So it will correspond to 0.04 seconds.

from gradio_webrtc import ReplyOnPause, WebRTC

with gr.Blocks as demo:
    audio = WebRTC(...)
    audio.stream(ReplyOnPause(..., output_sample_rate=24000, output_frame_size=960)
    )

demo.launch()

Tip

In general it is best to leave these settings untouched. In some cases, lowering the output_frame_size can yield smoother audio playback.

Audio Icon

You can display an icon of your choice instead of the default wave animation for audio streaming. Pass any local path or url to an image (svg, png, jpeg) to the components icon parameter. This will display the icon as a circular button. When audio is sent or recevied (depending on the mode parameter) a pulse animation will emanate from the button.

You can control the button color and pulse color with icon_button_color and pulse_color parameters. They can take any valid css color.

CodeCode Custom colors

audio = WebRTC(
    label="Stream",
    rtc_configuration=rtc_configuration,
    mode="receive",
    modality="audio",
    icon="phone-solid.svg",
)

audio = WebRTC(
    label="Stream",
    rtc_configuration=rtc_configuration,
    mode="receive",
    modality="audio",
    icon="phone-solid.svg",
    icon_button_color="black",
    pulse_color="black",
)

Changing the Button Text

You can supply a button_labels dictionary to change the text displayed in the Start, Stop and Waiting buttons that are displayed in the UI. The keys must be "start", "stop", and "waiting".

webrtc = WebRTC(
    label="Video Chat",
    modality="audio-video",
    mode="send-receive",
    button_labels={"start": "Start Talking to Gemini"}
)