API Reference

Clients

class discord.ext.native_voice.VoiceClient

A native voice client with audio, video, and media receive support.

This client extends discord.VoiceClient with native RTP transport crypto, video send/receive, RTX/NACK handling, and media sinks.

property codecs

Codecs advertised by this client.

Type

Tuple[discord.VoiceCodec, …]

classmethod with_config(*, rtx=..., udp_qos=..., codecs=..., video_streams=..., ffmpeg_executable=..., enable_debug_stats=...)

Return a subclass with voice negotiation options preset.

These options affect voice protocol identification, so they must be present on the class passed to channel.connect.

Parameters
  • rtx (bool) – Whether to enable RTX support. This will cause the client to advertise RTX payload types for video codecs and to use RTX for video retransmissions if the voice server negotiates it. Enabling RTX may increase bandwidth usage, but can improve video quality on lossy connections. It is enabled by default.

  • udp_qos (bool) – Whether to request UDP QoS marking for the voice socket. This marks outgoing media packets with Discord’s native DSCP value on platforms that allow it, which may improve prioritisation on supported networks. It is disabled by default.

  • codecs (List[discord.VoiceCodec]) – Codec objects to advertise. When omitted, codecs are generated from local FFmpeg capabilities and sorted by the local hardware/software capability score. When provided, the codec order is preserved and priorities are recomputed after unsupported entries are skipped.

  • video_streams (List[discord.VoiceStream]) – The simulcast streams advertised to the server. Defaults to a single max quality stream.

  • ffmpeg_executable (str) – FFmpeg executable used for automatic local codec capability probing.

  • enable_debug_stats (bool) – Whether to collect debug counters for RTP/RTCP receive diagnostics. This is disabled by default to avoid performance issues.

Returns

A configured subclass of this voice client.

Return type

Type[VoiceClient]

Raises

ValueErrorcodecs does not include an Opus audio codec.

property video_streams

Simulcast streams advertised by this client.

Type

Tuple[discord.VoiceStream, …]

await create_stream(*, timeout=30.0, reconnect=True, cls=...)

This function is a coroutine.

Create a Go Live stream from the current voice channel.

Parameters
  • timeout (float) – The number of seconds to wait for stream RTC connection.

  • reconnect (bool) – Whether the stream protocol should attempt reconnects.

  • cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with. Defaults to StreamClient.

Returns

The connected stream RTC client.

Return type

StreamProtocol

Raises

discord.ClientException – The voice client is not connected or the voice session is not ready.

await disconnect(*, force=False)

This function is a coroutine.

Disconnects this voice client from voice.

property negotiated_video_codec

The video codec selected by the voice server.

Type

Optional[str]

send_audio_packet(data, *, encode=True)

Sends an audio packet composed of the data.

You must be connected to play audio.

Parameters
  • data (bytes) – The bytes-like object denoting PCM or Opus voice data.

  • encode (bool) – Indicates if data should be encoded into Opus.

Raises
  • ClientException – You are not connected.

  • opus.OpusError – Encoding the data failed.

play(source, *, after=None, application='audio', bitrate=64, fec=True, expected_packet_loss=0.0, bandwidth='full', signal_type='auto', video_width=None, video_height=None, video_fps=None, video_bitrate=None)

Play an audio or media source.

This extends discord.VoiceClient.play() with MediaSource support. When the source has video, the client starts the negotiated video transport before sending frames.

The finalizer, after is called after the source has been exhausted or an error occurred.

If an error happens while the media player is running, the exception is caught and the player is then stopped. If no after callback is passed, any caught exception will be logged using the library logger.

Extra parameters may be passed to the internal opus encoder if a PCM based audio source is used. Otherwise, they are ignored.

Parameters
  • source (discord.AudioSource) – The audio or media source to play.

  • after (Callable[[Optional[Exception]], Any]) – The finalizer that is called after the stream is exhausted. This function must have a single parameter, error, that denotes an optional exception that was raised during playing.

  • application (str) – Configures the encoder’s intended application. Can be one of: 'audio', 'voip', 'lowdelay'. Defaults to 'audio'.

  • bitrate (int) – Configures the bitrate in the audio encoder. Can be between 16 and 512. Defaults to 64.

  • fec (bool) – Configures the encoder’s use of inband forward error correction. Defaults to True.

  • expected_packet_loss (float) – Configures the encoder’s expected packet loss percentage. Requires FEC. Defaults to 0.0.

  • bandwidth (str) – Configures the encoder’s bandpass. Can be one of: 'narrow', 'medium', 'wide', 'superwide', 'full'. Defaults to 'full'.

  • signal_type (str) – Configures the type of signal being encoded. Can be one of: 'auto', 'voice', 'music'. Defaults to 'auto'.

  • video_width (Optional[int]) – Video width used when the source does not provide a VideoConfig.

  • video_height (Optional[int]) – Video height used when the source does not provide a VideoConfig.

  • video_fps (Optional[int]) – Video frame rate override.

  • video_bitrate (Optional[int]) – Video bitrate override in bits per second.

Raises
await start_video(*, width, height, fps=30, bitrate=0)

This function is a coroutine.

Start outbound video using the negotiated video codec.

This is called automatically by play() and should not be called by the user in most cases.

Parameters
  • width (int) – Encoded video width.

  • height (int) – Encoded video height.

  • fps (int) – Encoded frame rate.

  • bitrate (int) – Target bitrate in bits per second.

Raises

discord.ClientException – The voice client is not connected, or no video codec was negotiated.

await request_video(ssrc, *, quality=100, any=..., pixel_count=None)

This function is a coroutine.

Request that Discord forwards video for an SSRC.

Parameters
  • ssrc (int) – The video SSRC to request.

  • quality (int) – The requested stream quality.

  • any (Optional[int]) – The fallback quality request for otherwise unspecified streams.

  • pixel_count (Optional[int]) – Optional pixel-count hint sent with the media sink wants payload.

Raises

discord.ClientException – The voice client is not connected.

await stop_video()

This function is a coroutine.

Stop outbound video and reset video transport state.

This is called automatically by the player and should not be called by the user in most cases.

send_video_frame(frame, *, frame_time_ms=33.0, stream=None)

Packetize, encrypt, and send one encoded video frame.

Parameters
  • frame (bytes) – The encoded frame in the negotiated codec.

  • frame_time_ms (float) – The frame duration in milliseconds.

  • stream (Optional[discord.VoiceStream]) – The active simulcast stream to send on. Defaults to the selected primary stream.

Returns

The number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, no active stream is selected, the stream is inactive, or the stream has no negotiated SSRC.

send_video_frames(frames, /)

Send RID-keyed encoded frames for active simulcast streams.

Parameters

frames (Dict[str, VideoFrame]) – Mapping of stream RID to encoded frame.

Returns

The total number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, an active stream has no negotiated SSRC, or no active stream is selected.

listen(sink, *, after=None)

Listen for inbound native media packets.

Parameters
  • sink (Union[MediaSink, Callable[[MediaPacket], Any]]) – The sink or callback that receives decoded media packets.

  • after (Optional[Callable[[Optional[Exception]], Any]]) – A callback called after listening stops.

Raises
  • discord.ClientException – The voice client is not connected, is already listening, or sink is already registered as a child or closed.

  • TypeErrorsink is not a MediaSink or callable, or after is not callable.

property sink

The current media receive sink, if one was provided to listen().

This property can also be used to change the active sink while receiving. The old sink is detached but not cleaned up.

Type

MediaSink

set_sink(sink, /)

Changes the active receive sink and returns the previous sink.

The old sink is detached without running MediaSink.cleanup(), so callers that keep it should clean it up explicitly when they are done.

Parameters

sink (MediaSink) – The sink to use.

Returns

The previous active sink, if any.

Return type

Optional[MediaSink]

Raises
is_listening()

bool: Whether this client is currently receiving media packets.

stop_listening()

Stop receiving media packets and clean up the active sink.

property average_latency

Average of most recent 20 HEARTBEAT latencies in seconds.

New in version 1.4.

Type

float

property endpoint

The endpoint we are connecting to.

Type

str

get_stream(owner)

Optional[Stream]: Returns a known Go Live stream by owner ID for this voice connection.

New in version 2.2.

Parameters

owner (Snowflake) – The owner of the stream.

Returns

The stream if found.

Return type

Optional[Stream]

property guild

The guild we’re connected to, if applicable.

Type

Optional[Guild]

is_connected()

Indicates if the voice client is connected to voice.

is_paused()

Indicates if we’re playing audio, but if we’re paused.

is_playing()

Indicates if we’re currently playing audio.

property latency

Latency between a HEARTBEAT and a HEARTBEAT_ACK in seconds.

This could be referred to as the Discord Voice WebSocket latency and is an analogue of user’s voice latencies as seen in the Discord client.

New in version 1.4.

Type

float

await move_to(channel, *, timeout=30.0)

This function is a coroutine.

Moves you to a different voice channel.

Parameters
  • channel (Optional[Snowflake]) – The channel to move to. Must be a voice channel.

  • timeout (Optional[float]) –

    How long to wait for the move to complete.

    New in version 2.1.

Raises

asyncio.TimeoutError – The move did not complete in time, but may still be ongoing.

pause()

Pauses the audio playing.

resume()

Resumes the audio playing.

property session_id

The voice connection session ID.

Type

str

property source

The audio source being played, if playing.

This property can also be used to change the audio source currently being played.

Type

Optional[AudioSource]

stop()

Stops playing audio.

property stream_clients

The Go Live stream clients attached to this voice connection.

New in version 2.2.

Type

Tuple[StreamProtocol]

property streams

The Go Live streams known for this voice connection.

New in version 2.2.

Type

Tuple[Stream]

property token

The voice connection token.

Type

str

await update_speaking_state(flags)

Update the current speaking flags.

Parameters

flags (SpeakingFlags) – The new speaking flags.

property user

The user connected to voice (i.e. ourselves).

Type

ClientUser

property voice_privacy_code

Get the voice privacy code of this E2EE session’s group.

A new privacy code is created and cached each time a new transition is executed. This can be None if there is no active DAVE session happening.

New in version 2.1.

Type

str

await watch_stream(stream_key, *, timeout=30.0, reconnect=True, cls)

This function is a coroutine.

Watches a Go Live stream by stream key and connects with the provided stream protocol.

This is useful when the stream is not already cached. If the stream is cached, this delegates to Stream.watch().

New in version 2.2.

Parameters
  • stream_key (StreamKey) – The stream key to watch.

  • timeout (float) – The timeout in seconds to wait for the stream connection to complete.

  • reconnect (bool) – Whether the stream protocol should attempt reconnects.

  • cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with.

Raises

ClientException – You are not connected to the stream’s voice channel, or you tried to watch your own stream.

Returns

The connected stream protocol.

Return type

StreamProtocol

class discord.ext.native_voice.StreamClient

A native RTC client for a Discord Go Live stream.

Stream clients are created from VoiceClient.create_stream() or discord.Stream.watch(). By default, stream clients inherit codec and RTX policy from their parent VoiceClient, but stream protocol subclasses can override their own negotiation config.

property codecs

Codecs advertised by this stream RTC client.

Type

Tuple[discord.VoiceCodec, …]

classmethod with_config(*, rtx=..., udp_qos=..., codecs=..., video_streams=..., ffmpeg_executable=..., enable_debug_stats=...)

Return a subclass with stream RTC negotiation options preset.

Omitted codec and RTX options inherit from the parent native voice client. Options provided here apply only to the stream RTC transport.

These options affect voice protocol identification, so they must be present on the class passed to create_stream or similar.

Parameters
  • rtx (bool) – Whether to enable RTX support. This will cause the client to advertise RTX payload types for video codecs and to use RTX for video retransmissions if the voice server negotiates it. Enabling RTX may increase bandwidth usage, but can improve video quality on lossy connections. It is enabled by default.

  • udp_qos (bool) – Whether to request UDP QoS marking for the voice socket. This marks outgoing media packets with Discord’s native DSCP value on platforms that allow it, which may improve prioritisation on supported networks. It is disabled by default.

  • codecs (List[discord.VoiceCodec]) – Codec objects to advertise. When omitted, codecs are generated from local FFmpeg capabilities and sorted by the local hardware/software capability score. When provided, the codec order is preserved and priorities are recomputed after unsupported entries are skipped.

  • video_streams (List[discord.VoiceStream]) – The simulcast streams advertised to the server. Defaults to a single max quality stream.

  • ffmpeg_executable (str) – FFmpeg executable used for automatic local codec capability probing when this stream client does not inherit parent codecs.

  • enable_debug_stats (bool) – Whether to collect debug RTP/RTCP receive counters. When omitted, the stream RTC client inherits the parent voice client’s setting.

Returns

A configured subclass of this stream client.

Return type

Type[StreamClient]

Raises

ValueErrorcodecs does not include an Opus audio codec.

play(source, *, preview_provider=..., **kwargs)

Play media on the stream RTC transport.

This extends play() with stream preview provider support.

The finalizer, after is called after the source has been exhausted or an error occurred.

If an error happens while the media player is running, the exception is caught and the player is then stopped. If no after callback is passed, any caught exception will be logged using the library logger.

Extra parameters may be passed to the internal opus encoder if a PCM based audio source is used. Otherwise, they are ignored.

Parameters
  • source (discord.AudioSource) – The audio or media source to play.

  • after (Callable[[Optional[Exception]], Any]) – The finalizer that is called after the stream is exhausted. This function must have a single parameter, error, that denotes an optional exception that was raised during playing.

  • application (str) – Configures the encoder’s intended application. Can be one of: 'audio', 'voip', 'lowdelay'. Defaults to 'audio'.

  • bitrate (int) – Configures the bitrate in the audio encoder. Can be between 16 and 512. Defaults to 64.

  • fec (bool) – Configures the encoder’s use of inband forward error correction. Defaults to True.

  • expected_packet_loss (float) – Configures the encoder’s expected packet loss percentage. Requires FEC. Defaults to 0.0.

  • bandwidth (str) – Configures the encoder’s bandpass. Can be one of: 'narrow', 'medium', 'wide', 'superwide', 'full'. Defaults to 'full'.

  • signal_type (str) – Configures the type of signal being encoded. Can be one of: 'auto', 'voice', 'music'. Defaults to 'auto'.

  • video_width (Optional[int]) – Video width used when the source does not provide a VideoConfig.

  • video_height (Optional[int]) – Video height used when the source does not provide a VideoConfig.

  • video_fps (Optional[int]) – Video frame rate override.

  • video_bitrate (Optional[int]) – Video bitrate override in bits per second.

  • preview_provider (Optional[Callable[[], Optional[bytes]]]) – A callable returning image preview bytes. By default, the media source’s preview reader is used, if available.

Raises
  • discord.ClientException – Already playing media or not connected. You do not own this stream. A preview was requested without a media source or preview provider.

  • TypeError – Source is not a AudioSource or after is not callable.

  • discord.opus.OpusNotLoaded – Source is not opus encoded and opus is not loaded.

  • ValueError – An improper value was passed as an encoder parameter.

await disconnect(*, force=False)

This function is a coroutine.

Disconnect this stream RTC client and clean up stream playback.

set_preview_provider(provider, /)

Set the callable used for stream preview uploads.

Parameters

provider (Optional[Callable[[], Optional[bytes]]]) – The preview provider to use, or None to clear it.

start_preview_loop(provider=None, /, *, interval=300.0, retry_interval=60.0, start_delay=0.5)

Start periodic stream preview uploads.

All interval parameters default to Discord client behavior.

Parameters
  • provider (Optional[Callable[[], Optional[bytes]]]) – The preview provider to use. When omitted, the current provider is reused.

  • interval (float) – Number of seconds between successful preview uploads.

  • retry_interval (float) – Number of seconds to wait after a skipped or failed preview upload.

  • start_delay (float) – Number of seconds to wait before the first preview upload attempt.

Raises

discord.ClientException – This client does not own the stream or no preview provider is set.

stop_preview_loop()

Stop periodic stream preview uploads.

await on_stream_create(stream)

This function is a coroutine.

An event handler called when the connected stream is created.

This mirrors on_stream_create() for the stream protocol instance that is connected to the stream.

Parameters

stream (Stream) – The stream that was created.

await on_stream_available(stream)

This function is a coroutine.

An event handler called when the connected stream becomes available again.

This is dispatched after a stream that was previously marked Stream.unavailable receives a new create event.

Parameters

stream (Stream) – The stream that became available.

await on_stream_server_update(data)

This function is a coroutine.

An event handler called when Discord sends the stream RTC server data.

This event is used by stream protocol implementations to finish or resume the stream RTC connection. Unlike the public stream events, this exposes the raw gateway payload because it contains the stream token and endpoint.

Parameters

data (dict) – The raw stream server update gateway payload.

await on_stream_update(_before, after)

This function is a coroutine.

An event handler called when the connected stream is updated.

Parameters
  • before (Stream) – The stream before the update.

  • after (Stream) – The stream after the update.

await on_stream_unavailable(stream)

This function is a coroutine.

An event handler called when the connected stream becomes temporarily unavailable.

Unavailable streams remain cached and may later dispatch on_stream_available() when Discord reports the stream again.

Parameters

stream (Stream) – The stream that became unavailable.

await on_stream_delete(_stream, _reason)

This function is a coroutine.

An event handler called when the connected stream is deleted.

Parameters
  • stream (Stream) – The stream that was deleted.

  • reason (StreamDeleteReason) – The reason the stream was deleted or rejected.

property average_latency

Average of most recent 20 HEARTBEAT latencies in seconds.

New in version 1.4.

Type

float

await create_stream(*, timeout=30.0, reconnect=True, cls=...)

This function is a coroutine.

Create a Go Live stream from the current voice channel.

Parameters
  • timeout (float) – The number of seconds to wait for stream RTC connection.

  • reconnect (bool) – Whether the stream protocol should attempt reconnects.

  • cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with. Defaults to StreamClient.

Returns

The connected stream RTC client.

Return type

StreamProtocol

Raises

discord.ClientException – The voice client is not connected or the voice session is not ready.

property endpoint

The endpoint we are connecting to.

Type

str

get_stream(owner)

Optional[Stream]: Returns a known Go Live stream by owner ID for this voice connection.

New in version 2.2.

Parameters

owner (Snowflake) – The owner of the stream.

Returns

The stream if found.

Return type

Optional[Stream]

property guild

The guild we’re connected to, if applicable.

Type

Optional[Guild]

is_connected()

Indicates if the voice client is connected to voice.

is_listening()

bool: Whether this client is currently receiving media packets.

is_paused()

Indicates if we’re playing audio, but if we’re paused.

is_playing()

Indicates if we’re currently playing audio.

property latency

Latency between a HEARTBEAT and a HEARTBEAT_ACK in seconds.

This could be referred to as the Discord Voice WebSocket latency and is an analogue of user’s voice latencies as seen in the Discord client.

New in version 1.4.

Type

float

listen(sink, *, after=None)

Listen for inbound native media packets.

Parameters
  • sink (Union[MediaSink, Callable[[MediaPacket], Any]]) – The sink or callback that receives decoded media packets.

  • after (Optional[Callable[[Optional[Exception]], Any]]) – A callback called after listening stops.

Raises
  • discord.ClientException – The voice client is not connected, is already listening, or sink is already registered as a child or closed.

  • TypeErrorsink is not a MediaSink or callable, or after is not callable.

await move_to(channel, *, timeout=30.0)

This function is a coroutine.

Moves you to a different voice channel.

Parameters
  • channel (Optional[Snowflake]) – The channel to move to. Must be a voice channel.

  • timeout (Optional[float]) –

    How long to wait for the move to complete.

    New in version 2.1.

Raises

asyncio.TimeoutError – The move did not complete in time, but may still be ongoing.

property negotiated_video_codec

The video codec selected by the voice server.

Type

Optional[str]

pause()

Pauses the audio playing.

await request_video(ssrc, *, quality=100, any=..., pixel_count=None)

This function is a coroutine.

Request that Discord forwards video for an SSRC.

Parameters
  • ssrc (int) – The video SSRC to request.

  • quality (int) – The requested stream quality.

  • any (Optional[int]) – The fallback quality request for otherwise unspecified streams.

  • pixel_count (Optional[int]) – Optional pixel-count hint sent with the media sink wants payload.

Raises

discord.ClientException – The voice client is not connected.

resume()

Resumes the audio playing.

send_audio_packet(data, *, encode=True)

Sends an audio packet composed of the data.

You must be connected to play audio.

Parameters
  • data (bytes) – The bytes-like object denoting PCM or Opus voice data.

  • encode (bool) – Indicates if data should be encoded into Opus.

Raises
  • ClientException – You are not connected.

  • opus.OpusError – Encoding the data failed.

send_video_frame(frame, *, frame_time_ms=33.0, stream=None)

Packetize, encrypt, and send one encoded video frame.

Parameters
  • frame (bytes) – The encoded frame in the negotiated codec.

  • frame_time_ms (float) – The frame duration in milliseconds.

  • stream (Optional[discord.VoiceStream]) – The active simulcast stream to send on. Defaults to the selected primary stream.

Returns

The number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, no active stream is selected, the stream is inactive, or the stream has no negotiated SSRC.

send_video_frames(frames, /)

Send RID-keyed encoded frames for active simulcast streams.

Parameters

frames (Dict[str, VideoFrame]) – Mapping of stream RID to encoded frame.

Returns

The total number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, an active stream has no negotiated SSRC, or no active stream is selected.

property session_id

The voice connection session ID.

Type

str

set_sink(sink, /)

Changes the active receive sink and returns the previous sink.

The old sink is detached without running MediaSink.cleanup(), so callers that keep it should clean it up explicitly when they are done.

Parameters

sink (MediaSink) – The sink to use.

Returns

The previous active sink, if any.

Return type

Optional[MediaSink]

Raises
property sink

The current media receive sink, if one was provided to listen().

This property can also be used to change the active sink while receiving. The old sink is detached but not cleaned up.

Type

MediaSink

property source

The audio source being played, if playing.

This property can also be used to change the audio source currently being played.

Type

Optional[AudioSource]

await start_video(*, width, height, fps=30, bitrate=0)

This function is a coroutine.

Start outbound video using the negotiated video codec.

This is called automatically by play() and should not be called by the user in most cases.

Parameters
  • width (int) – Encoded video width.

  • height (int) – Encoded video height.

  • fps (int) – Encoded frame rate.

  • bitrate (int) – Target bitrate in bits per second.

Raises

discord.ClientException – The voice client is not connected, or no video codec was negotiated.

stop()

Stops playing audio.

stop_listening()

Stop receiving media packets and clean up the active sink.

await stop_video()

This function is a coroutine.

Stop outbound video and reset video transport state.

This is called automatically by the player and should not be called by the user in most cases.

property stream_clients

The Go Live stream clients attached to this voice connection.

New in version 2.2.

Type

Tuple[StreamProtocol]

property stream_key

The stream key being connected to.

Type

StreamKey

property streams

The Go Live streams known for this voice connection.

New in version 2.2.

Type

Tuple[Stream]

property token

The voice connection token.

Type

str

await update_speaking_state(flags)

Update the current speaking flags.

Parameters

flags (SpeakingFlags) – The new speaking flags.

property user

The user connected to voice (i.e. ourselves).

Type

ClientUser

property video_streams

Simulcast streams advertised by this client.

Type

Tuple[discord.VoiceStream, …]

property voice_privacy_code

Get the voice privacy code of this E2EE session’s group.

A new privacy code is created and cached each time a new transition is executed. This can be None if there is no active DAVE session happening.

New in version 2.1.

Type

str

await watch_stream(stream_key, *, timeout=30.0, reconnect=True, cls)

This function is a coroutine.

Watches a Go Live stream by stream key and connects with the provided stream protocol.

This is useful when the stream is not already cached. If the stream is cached, this delegates to Stream.watch().

New in version 2.2.

Parameters
  • stream_key (StreamKey) – The stream key to watch.

  • timeout (float) – The timeout in seconds to wait for the stream connection to complete.

  • reconnect (bool) – Whether the stream protocol should attempt reconnects.

  • cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with.

Raises

ClientException – You are not connected to the stream’s voice channel, or you tried to watch your own stream.

Returns

The connected stream protocol.

Return type

StreamProtocol

Media Sources

class discord.ext.native_voice.MediaSource

An audio source that can also yield encoded video frames.

video_realtime

Whether video frame pacing should track wall-clock capture timing.

Type

bool

video_retry_delay

Delay used before retrying video reads that temporarily return no frames.

Type

float

video_catchup_frames

Maximum number of video frames to send in one player tick while catching up.

Type

int

has_audio()

bool: Whether this source currently has audio to read.

read_video()

Read one encoded video frame for the primary stream.

Returns

The next encoded video frame, if one is available.

Return type

Optional[VideoFrame]

read_video_streams(streams)

Read encoded video frames for the active outbound simulcast streams.

The default implementation preserves the single-stream read_video() behaviour and returns a frame for the first active stream only. Sources that can encode multiple simulcast outputs should override this and return RID-keyed frames for each stream they are able to produce on this tick.

Parameters

streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.

Returns

A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.

Return type

Optional[Mapping[str, VideoFrame]]

read_preview()

Read image preview bytes for a Go Live stream preview.

Returns

Encoded image bytes for a stream preview, if available.

Return type

Optional[Union[bytes, bytearray, memoryview]]

has_video()

bool: Whether this source currently has video to read.

supports_simulcast()

bool: Whether read_video_streams() can emit multiple video outputs.

property video_config

Video parameters known by this source.

Type

Optional[VideoConfig]

on_media_sink_wants(wants)

Handle a remote media sink wants update for this source.

The default implementation does nothing. Adaptive sources can override this to adjust their encoder, bitrate, resolution, or selected output stream when Discord asks for a different quality.

Parameters

wants (MediaSinkWants) – The remote quality requests sent by Discord.

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

is_opus()

Checks if the audio source is already encoded in Opus.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

class discord.ext.native_voice.AudioMediaSource(original, /)

Wraps an existing discord.AudioSource as a media source.

This keeps first-party discord.py audio sources, such as discord.PCMAudio, discord.FFmpegPCMAudio, and discord.FFmpegOpusAudio, usable in unified media pipelines.

Parameters

original (discord.AudioSource) – The audio source to wrap.

original

The wrapped audio source.

Type

discord.AudioSource

has_audio()

bool: Whether this source currently has audio to read.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.PCMMediaSource(stream, /, *, close=False)

A media source backed by raw 16-bit 48 kHz stereo PCM bytes.

This mirrors discord.PCMAudio for file-like raw PCM inputs while keeping the source composable with video-capable media sources.

Parameters
  • stream (bytes) – A bytes-like object that yields 20 ms PCM frames.

  • close (bool) – Whether to close the stream when the source is exhausted or cleaned up.

stream

The wrapped binary stream.

Type

bytes

has_audio()

bool: Whether this source currently has audio to read.

is_opus()

Checks if the audio source is already encoded in Opus.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.AudioFrameSource(frames, /, *, opus=False)

An audio source backed by an iterable of audio frames.

This is the in-memory/custom-producer counterpart to d.py’s file-like discord.PCMAudio. PCM frames should be 20 ms of 48 kHz stereo signed 16-bit audio; Opus frames may be variable length.

Parameters
  • frames (Iterable[Union[bytes, bytearray, memoryview]]) – The audio frames to read from.

  • opus (bool) – Whether the frames are already Opus encoded.

has_audio()

bool: Whether this source currently has audio to read.

is_opus()

Checks if the audio source is already encoded in Opus.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.PCMAudio(stream)

Represents raw 16-bit 48KHz stereo PCM audio source.

stream

A file-like object that reads byte data representing raw PCM.

Type

file object

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

class discord.ext.native_voice.FFmpegAudio(source, *, executable='ffmpeg', args, **subprocess_kwargs)

Represents an FFmpeg (or AVConv) based AudioSource.

User created AudioSources using FFmpeg differently from how FFmpegPCMAudio and FFmpegOpusAudio work should subclass this.

New in version 1.3.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegPCMAudio(source, *, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None)

An audio source from FFmpeg (or AVConv).

This launches a sub-process to a specific input file given.

Warning

You must have the ffmpeg or avconv executable in your path environment variable in order for this to work.

Parameters
  • source (Union[str, io.BufferedIOBase]) – The input that ffmpeg will take and convert to PCM bytes. If pipe is True then this is a file-like object that is passed to the stdin of ffmpeg.

  • executable (str) –

    The executable name (and path) to use. Defaults to ffmpeg.

    Warning

    Since this class spawns a subprocess, care should be taken to not pass in an arbitrary executable name when using this parameter.

  • pipe (bool) – If True, denotes that source parameter will be passed to the stdin of ffmpeg. Defaults to False.

  • stderr (Optional[file object]) – A file-like object to pass to the Popen constructor.

  • before_options (Optional[str]) – Extra command line arguments to pass to ffmpeg before the -i flag.

  • options (Optional[str]) – Extra command line arguments to pass to ffmpeg after the -i flag.

Raises

ClientException – The subprocess failed to be created.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

class discord.ext.native_voice.FFmpegOpusAudio(source, *, bitrate=None, codec=None, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None)

An audio source from FFmpeg (or AVConv).

This launches a sub-process to a specific input file given. However, rather than producing PCM packets like FFmpegPCMAudio does that need to be encoded to Opus, this class produces Opus packets, skipping the encoding step done by the library.

Alternatively, instead of instantiating this class directly, you can use FFmpegOpusAudio.from_probe() to probe for bitrate and codec information. This can be used to opportunistically skip pointless re-encoding of existing Opus audio data for a boost in performance at the cost of a short initial delay to gather the information. The same can be achieved by passing copy to the codec parameter, but only if you know that the input source is Opus encoded beforehand.

New in version 1.3.

Warning

You must have the ffmpeg or avconv executable in your path environment variable in order for this to work.

Parameters
  • source (Union[str, io.BufferedIOBase]) – The input that ffmpeg will take and convert to Opus bytes. If pipe is True then this is a file-like object that is passed to the stdin of ffmpeg.

  • bitrate (int) – The bitrate in kbps to encode the output to. Defaults to 128.

  • codec (Optional[str]) –

    The codec to use to encode the audio data. Normally this would be just libopus, but is used by FFmpegOpusAudio.from_probe() to opportunistically skip pointlessly re-encoding Opus audio data by passing copy as the codec value. Any values other than copy, opus, or libopus will be considered libopus. Defaults to libopus.

    Warning

    Do not provide this parameter unless you are certain that the audio input is already Opus encoded. For typical use FFmpegOpusAudio.from_probe() should be used to determine the proper value for this parameter.

  • executable (str) –

    The executable name (and path) to use. Defaults to ffmpeg.

    Warning

    Since this class spawns a subprocess, care should be taken to not pass in an arbitrary executable name when using this parameter.

  • pipe (bool) – If True, denotes that source parameter will be passed to the stdin of ffmpeg. Defaults to False.

  • stderr (Optional[file object]) – A file-like object to pass to the Popen constructor.

  • before_options (Optional[str]) – Extra command line arguments to pass to ffmpeg before the -i flag.

  • options (Optional[str]) – Extra command line arguments to pass to ffmpeg after the -i flag.

Raises

ClientException – The subprocess failed to be created.

classmethod await from_probe(source, *, method=None, **kwargs)

This function is a coroutine.

A factory method that creates a FFmpegOpusAudio after probing the input source for audio codec and bitrate information.

Examples

Use this function to create an FFmpegOpusAudio instance instead of the constructor:

source = await discord.FFmpegOpusAudio.from_probe("song.webm")
voice_client.play(source)

If you are on Windows and don’t have ffprobe installed, use the fallback method to probe using ffmpeg instead:

source = await discord.FFmpegOpusAudio.from_probe("song.webm", method='fallback')
voice_client.play(source)

Using a custom method of determining codec and bitrate:

def custom_probe(source, executable):
    # some analysis code here
    return codec, bitrate

source = await discord.FFmpegOpusAudio.from_probe("song.webm", method=custom_probe)
voice_client.play(source)
Parameters
  • source – Identical to the source parameter for the constructor.

  • method (Optional[Union[str, Callable[str, str]]]) – The probing method used to determine bitrate and codec information. As a string, valid values are native to use ffprobe (or avprobe) and fallback to use ffmpeg (or avconv). As a callable, it must take two string arguments, source and executable. Both parameters are the same values passed to this factory function. executable will default to ffmpeg if not provided as a keyword argument.

  • kwargs – The remaining parameters to be passed to the FFmpegOpusAudio constructor, excluding bitrate and codec.

Raises
  • AttributeError – Invalid probe method, must be 'native' or 'fallback'.

  • TypeError – Invalid value for probe parameter, must be str or a callable.

Returns

An instance of this class.

Return type

FFmpegOpusAudio

classmethod await probe(source, *, method=None, executable=None)

This function is a coroutine.

Probes the input source for bitrate and codec information.

Parameters
Raises
  • AttributeError – Invalid probe method, must be 'native' or 'fallback'.

  • TypeError – Invalid value for probe parameter, must be str or a callable.

Returns

A 2-tuple with the codec and bitrate of the input source.

Return type

Optional[Tuple[Optional[str], int]]

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

class discord.ext.native_voice.VideoFrameSource(frames, *, codec, fps, width=0, height=0, bitrate=0)

A video source backed by an iterable of already-encoded frames.

Parameters
  • frames (Iterable[Union[VideoFrame, bytes, bytearray, memoryview]]) – Encoded video frames to read from.

  • codec (str) – The Discord video codec name for the frames.

  • fps (int) – The frame rate used to derive frame durations.

  • width (int) – The encoded frame width in pixels.

  • height (int) – The encoded frame height in pixels.

  • bitrate (int) – The target video bitrate in bits per second.

codec

The normalized Discord video codec name.

Type

str

frame_time_ms

The default frame duration in milliseconds.

Type

float

has_video()

bool: Whether this source currently has video to read.

property video_config

Video configuration for frames from this source.

Type

VideoConfig

read_video()

Read one encoded video frame for the primary stream.

Returns

The next encoded video frame, if one is available.

Return type

Optional[VideoFrame]

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.EncodedVideoSource(source, *, codec, fps, width=0, height=0, bitrate=0)

A video source backed by already-encoded video frames.

VP8, VP9, and AV1 inputs are read as IVF streams. H264 and H265 inputs are read as Annex B streams with access unit delimiters.

Parameters
  • source (Union[str, os.PathLike, BinaryIO]) – A path or bytes-like object containing encoded video frames.

  • codec (str) – The video codec name for the input.

  • fps (int) – The frame rate used to derive frame durations.

  • width (int) – The encoded frame width in pixels.

  • height (int) – The encoded frame height in pixels.

  • bitrate (int) – The target video bitrate in bits per second.

codec

The normalized Discord video codec name.

Type

str

frame_time_ms

The default frame duration in milliseconds.

Type

float

has_video()

bool: Whether this source currently has video to read.

property video_config

Video parameters known by this source.

Type

Optional[VideoConfig]

read_video()

Read one encoded video frame for the primary stream.

Returns

The next encoded video frame, if one is available.

Return type

Optional[VideoFrame]

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.SimulcastVideoSource(sources, /)

A video source composed of RID-keyed video sources.

Each child source should produce encoded frames for the same codec, with keys matching the negotiated discord.VoiceStream RIDs.

Parameters

sources (Dict[str, MediaSource]) – The child video sources, keyed by RTP stream ID.

sources

The child sources, keyed by RTP stream ID.

Type

Dict[str, MediaSource]

has_video()

bool: Whether this source currently has video to read.

read_video_streams(streams)

Read encoded video frames for the active outbound simulcast streams.

The default implementation preserves the single-stream read_video() behaviour and returns a frame for the first active stream only. Sources that can encode multiple simulcast outputs should override this and return RID-keyed frames for each stream they are able to produce on this tick.

Parameters

streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.

Returns

A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.

Return type

Optional[Mapping[str, VideoFrame]]

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegVideoSource(command, *, codec, fps, width=0, height=0, bitrate=0, preview_command=None, pipe_source=None, stderr=None, live_timestamps=False)

An encoded video source backed by an FFmpeg subprocess.

The subprocess writes codec-ready H264/H265 Annex B or VP8/VP9/AV1 IVF frames to stdout for the native RTP packetizers.

Parameters
  • command (List[str]) – The FFmpeg command to run.

  • codec (str) – The Discord video codec name produced by FFmpeg.

  • fps (int) – The target frame rate.

  • width (int) – The encoded frame width in pixels.

  • height (int) – The encoded frame height in pixels.

  • bitrate (int) – The target video bitrate in bits per second.

  • preview_command (Optional[List[str]]) – FFmpeg command used to produce a stream preview image frame.

  • pipe_source (Any) – Optional file-like object or native desktop capture source piped into FFmpeg stdin.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • live_timestamps (bool) – Whether frame durations should track wall-clock capture timing.

command

The FFmpeg command being run.

Type

List[str]

preview_command

The FFmpeg preview command, if configured.

Type

Optional[List[str]]

codec

The normalized Discord video codec name.

Type

str

frame_time_ms

The default frame duration in milliseconds.

Type

float

classmethod preflight_desktop(*, width, height, fps=1, codec='H264', bitrate=4000000, executable='ffmpeg', input_args=None, before_options=None, transcoder=None, native_capture=False, output_index=0, timeout=15.0)

Check whether the configured desktop source can produce an encoded frame.

This is useful before joining voice, since desktop capture and encoder failures are often caused by the local session rather than Discord transport.

Parameters
  • width (int) – The capture width in pixels.

  • height (int) – The capture height in pixels.

  • fps (int) – The capture frame rate.

  • codec (str) – The Discord video codec to encode.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

  • native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).

  • output_index (int) – The native desktop output index to capture.

  • timeout (float) – Maximum seconds to wait for the preflight encode.

Raises
  • discord.ClientException – Desktop capture, FFmpeg startup, encoder validation, or the preflight encode failed.

  • RuntimeError – Platform desktop capture defaults are not available.

classmethod from_desktop(codec, *, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, transcoder=None, native_capture=False, output_index=0, display=...)

Create an FFmpeg video source from the current desktop capture input.

Parameters
  • codec (str) – The Discord video codec to encode.

  • width (int) – The capture width in pixels.

  • height (int) – The capture height in pixels.

  • fps (int) – The capture frame rate.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

  • native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).

  • output_index (int) – The native desktop output index to capture.

  • display (Optional[str]) – The X11 display name used by the default Linux desktop input.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises
classmethod from_file(source, codec, *, width, height, fps, bitrate, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None, source_codec=None, input_args=None, preview_input_args=None, transcoder=None)

Create an FFmpeg video source from a file or stdin pipe.

Parameters
  • source (Union[str, os.PathLike, BinaryIO]) – A video path or binary stream.

  • codec (str) – The Discord video codec to encode.

  • width (int) – The encoded video width in pixels.

  • height (int) – The encoded video height in pixels.

  • fps (int) – The target frame rate.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • pipe (bool) – Whether to pipe source into FFmpeg stdin instead of treating it as a path.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • source_codec (Optional[str]) – The input video codec used for decoder selection.

  • input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises
classmethod await from_probe(source, codec=None, *, width=None, height=None, fps=None, bitrate=None, method=None, executable='ffmpeg', stderr=None, before_options=None, options=None, input_args=None, preview_input_args=None, transcoder=None)

This function is a coroutine.

Create a video source while probing missing video metadata first.

Parameters
  • source (Union[str, os.PathLike]) – The video file path to probe and encode.

  • codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.

  • width (Optional[int]) – The encoded video width in pixels. If omitted, the first video stream is probed.

  • height (Optional[int]) – The encoded video height in pixels. If omitted, the first video stream is probed.

  • fps (Optional[int]) – The target frame rate. If omitted, the first video stream is probed.

  • bitrate (Optional[int]) – The target video bitrate in bits per second. If omitted, the first video stream is probed.

  • method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.

  • executable (str) – The FFmpeg executable to run.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises
classmethod await probe(source, *, method=None, executable='ffmpeg')

This function is a coroutine.

Probe the first video stream for codec, width, height, FPS, and bitrate.

Parameters
  • source (Union[str, os.PathLike]) – The video file path to probe.

  • method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.

  • executable (str) – The FFmpeg executable used to locate ffprobe or run fallback probing.

Returns

The discovered video stream metadata.

Return type

VideoProbeInfo

has_video()

bool: Whether this source currently has video to read.

property video_config

Video parameters known by this source.

Type

Optional[VideoConfig]

read_video()

Read one encoded video frame for the primary stream.

Returns

The next encoded video frame, if one is available.

Return type

Optional[VideoFrame]

read_preview()

Read image preview bytes for a Go Live stream preview.

Returns

Encoded image bytes for a stream preview, if available.

Return type

Optional[Union[bytes, bytearray, memoryview]]

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegMediaSource(*, audio=None, video=None)

A composite FFmpeg source that can provide audio and video together.

Parameters
classmethod from_file(source, codec, *, width, height, fps, bitrate, executable='ffmpeg', pipe=False, audio=True, opus_audio=False, audio_bitrate=128, audio_stderr=None, audio_before_options=None, audio_options=None, video_stderr=None, video_before_options=None, video_options=None, video_source_codec=None, video_input_args=None, preview_input_args=None, video_transcoder=None)

Create an FFmpeg media source from a file or video stdin pipe.

Parameters
  • source (Union[str, os.PathLike, BinaryIO]) – A media path or binary stream.

  • codec (str) – The Discord video codec to encode.

  • width (int) – The encoded video width in pixels.

  • height (int) – The encoded video height in pixels.

  • fps (int) – The target video frame rate.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • pipe (bool) – Whether to pipe source into FFmpeg stdin for video.

  • audio (bool) – Whether to include audio from the input.

  • opus_audio (bool) – Whether to copy/probe Opus audio instead of decoding to PCM.

  • audio_bitrate (int) – The audio bitrate in kbps when using Opus audio.

  • audio_stderr (Optional[BinaryIO]) – Where audio FFmpeg stderr is redirected.

  • audio_before_options (Optional[str]) – Extra audio FFmpeg options placed before input options.

  • audio_options (Optional[str]) – Extra audio FFmpeg output options.

  • video_stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.

  • video_before_options (Optional[str]) – Extra video FFmpeg options placed before input options.

  • video_options (Optional[str]) – Extra video FFmpeg output options.

  • video_source_codec (Optional[str]) – The input video codec used for decoder selection.

  • video_input_args (Optional[List[str]]) – Explicit video FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises
  • discord.ClientExceptionpipe=True was used with audio=True or FFmpeg setup failed.

  • TypeErrorsource is incompatible with the selected pipe mode.

  • ValueErrorcodec is not a supported Discord video codec.

classmethod await from_probe(source, codec=None, *, width=None, height=None, fps=None, bitrate=None, method=None, video_method=None, executable='ffmpeg', audio=True, audio_stderr=None, audio_before_options=None, audio_options=None, video_stderr=None, video_before_options=None, video_options=None, video_input_args=None, preview_input_args=None, video_transcoder=None)

This function is a coroutine.

Create a media source while probing media metadata first.

This mirrors discord.FFmpegOpusAudio.from_probe() for unified audio/video playback, letting FFmpeg copy Opus audio when possible and using the first video stream for missing codec, width, height, FPS, and bitrate values.

Parameters
  • source (Union[str, os.PathLike]) – The media file path to probe and encode.

  • codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.

  • width (Optional[int]) – The encoded video width in pixels. If omitted, the first video stream is probed.

  • height (Optional[int]) – The encoded video height in pixels. If omitted, the first video stream is probed.

  • fps (Optional[int]) – The target video frame rate. If omitted, the first video stream is probed.

  • bitrate (Optional[int]) – The target video bitrate in bits per second. If omitted, the first video stream is probed.

  • method (Optional[Union[str, Callable[[str, str], Any]]]) – The audio probing method passed to discord.FFmpegOpusAudio.from_probe().

  • video_method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.

  • executable (str) – The FFmpeg executable to run.

  • audio (bool) – Whether to include audio from the input.

  • audio_stderr (Optional[BinaryIO]) – Where audio FFmpeg stderr is redirected.

  • audio_before_options (Optional[str]) – Extra audio FFmpeg options placed before input options.

  • audio_options (Optional[str]) – Extra audio FFmpeg output options.

  • video_stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.

  • video_before_options (Optional[str]) – Extra video FFmpeg options placed before input options.

  • video_options (Optional[str]) – Extra video FFmpeg output options.

  • video_input_args (Optional[List[str]]) – Explicit video FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises
classmethod await probe_video(source, *, method=None, executable='ffmpeg')

This function is a coroutine.

Probe the first video stream for codec, width, height, FPS, and bitrate.

Parameters
  • source (Union[str, os.PathLike]) – The media file path to probe.

  • method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.

  • executable (str) – The FFmpeg executable used to locate ffprobe or run fallback probing.

Returns

The discovered video stream metadata.

Return type

VideoProbeInfo

classmethod preflight_desktop(*, width, height, fps=1, codec='H264', bitrate=4000000, executable='ffmpeg', input_args=None, before_options=None, video_transcoder=None, native_capture=False, output_index=0, timeout=15.0)

Check whether the configured FFmpeg desktop input can capture a frame.

Parameters
  • width (int) – The capture width in pixels.

  • height (int) – The capture height in pixels.

  • fps (int) – The capture frame rate.

  • codec (str) – The Discord video codec to encode.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

  • native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms.

  • output_index (int) – The native desktop output index to capture.

  • timeout (float) – Maximum seconds to wait for the preflight encode.

Raises
  • discord.ClientException – Desktop capture, FFmpeg startup, encoder validation, or the preflight encode failed.

  • RuntimeError – Platform desktop capture defaults are not available.

  • ValueErrorcodec is not a supported Discord video codec.

classmethod from_desktop(codec, *, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, audio=None, video_transcoder=None, native_capture=False, output_index=0)

Create an FFmpeg media source from desktop capture video.

Parameters
  • codec (str) – The Discord video codec to encode.

  • width (int) – The capture width in pixels.

  • height (int) – The capture height in pixels.

  • fps (int) – The capture frame rate.

  • bitrate (int) – The target video bitrate in bits per second.

  • executable (str) – The FFmpeg executable to run.

  • input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.

  • stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • audio (Optional[discord.AudioSource]) – Existing audio source to combine with the desktop video source.

  • video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

  • native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms.

  • output_index (int) – The native desktop output index to capture.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises
class discord.ext.native_voice.FFmpegSimulcastVideoSource(sources, /)

An FFmpeg-backed simulcast source with one encoder per RID.

This source is intended for camera/self-video style simulcast. Each child encoder produces an encoded frame stream for one advertised discord.VoiceStream RID, and VoiceClient sends only active negotiated RIDs.

read_video_streams(streams)

Read encoded video frames for the active outbound simulcast streams.

The default implementation preserves the single-stream read_video() behaviour and returns a frame for the first active stream only. Sources that can encode multiple simulcast outputs should override this and return RID-keyed frames for each stream they are able to produce on this tick.

Parameters

streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.

Returns

A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.

Return type

Optional[Mapping[str, VideoFrame]]

classmethod from_desktop(codec, *, streams, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, transcoder=None, native_capture=False, output_index=0)

Create a simulcast source from the current desktop capture input.

Parameters
  • codec (str) – The Discord video codec to encode.

  • width (int) – The source capture width in pixels.

  • height (int) – The source capture height in pixels.

  • fps (int) – The source frame rate.

  • bitrate (int) – The source video bitrate in bits per second.

  • streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.

  • executable (str) – The FFmpeg executable to run.

  • input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

  • native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).

  • output_index (int) – The native desktop output index to capture.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises
  • discord.ClientException – Duplicate stream RIDs, desktop capture, FFmpeg startup, or encoder selection failed.

  • RuntimeError – Platform desktop capture defaults are not available.

  • ValueErrorcodec is not a supported Discord video codec.

classmethod from_file(source, codec, *, streams, width, height, fps, bitrate, executable='ffmpeg', stderr=None, before_options=None, options=None, source_codec=None, input_args=None, preview_input_args=None, transcoder=None)

Create a simulcast source from a video file.

Parameters
  • source (Union[str, os.PathLike]) – The video file path to encode.

  • codec (str) – The Discord video codec to encode.

  • width (int) – The source video width in pixels.

  • height (int) – The source video height in pixels.

  • fps (int) – The source frame rate.

  • bitrate (int) – The source video bitrate in bits per second.

  • streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.

  • executable (str) – The FFmpeg executable to run.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • source_codec (Optional[str]) – The input video codec used for decoder selection.

  • input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises
classmethod await from_probe(source, codec=None, *, streams, width=None, height=None, fps=None, bitrate=None, method=None, executable='ffmpeg', stderr=None, before_options=None, options=None, input_args=None, preview_input_args=None, transcoder=None)

This function is a coroutine.

Create a simulcast source while probing missing video metadata first.

Parameters
  • source (Union[str, os.PathLike]) – The video file path to probe and encode.

  • codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.

  • width (Optional[int]) – The source video width in pixels. If omitted, the first video stream is probed.

  • height (Optional[int]) – The source video height in pixels. If omitted, the first video stream is probed.

  • fps (Optional[int]) – The source frame rate. If omitted, the first video stream is probed.

  • bitrate (Optional[int]) – The source video bitrate in bits per second. If omitted, the first video stream is probed.

  • streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.

  • method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.

  • executable (str) – The FFmpeg executable to run.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.

  • preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.

  • transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises
  • discord.ClientException – Required video metadata could not be probed, duplicate stream RIDs were found, or FFmpeg setup failed.

  • ValueErrorcodec is not a supported Discord video codec.

class discord.ext.native_voice.MultiMediaSource(sources, /)

Combines multiple sources into one playable media source.

The source mixes multiple PCM audio inputs into one audio lane. Opus audio is supported only when it is the sole audio input, since encoded Opus cannot be mixed without decoding first. Video uses the first video-capable source that yields a frame.

Parameters

sources (List[discord.AudioSource]) – The sources to combine.

property sources

The sources being combined.

Type

Sequence[discord.AudioSource]

has_audio()

bool: Whether this source currently has audio to read.

has_video()

bool: Whether this source currently has video to read.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

read_video()

Read one encoded video frame for the primary stream.

Returns

The next encoded video frame, if one is available.

Return type

Optional[VideoFrame]

read_video_streams(streams)

Read encoded video frames for the active outbound simulcast streams.

The default implementation preserves the single-stream read_video() behaviour and returns a frame for the first active stream only. Sources that can encode multiple simulcast outputs should override this and return RID-keyed frames for each stream they are able to produce on this tick.

Parameters

streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.

Returns

A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.

Return type

Optional[Mapping[str, VideoFrame]]

read_preview()

Read image preview bytes for a Go Live stream preview.

Returns

Encoded image bytes for a stream preview, if available.

Return type

Optional[Union[bytes, bytearray, memoryview]]

property video_config

Video configuration from the active video source.

Type

Optional[VideoConfig]

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.CompositeMediaSource(*, audio=None, video=None)

Combines separate audio and video sources into one media source.

Parameters
  • audio (Optional[discord.AudioSource]) – The source used for audio frames.

  • video (Optional[MediaSource]) – The source used for video frames and stream previews.

audio

The source used for audio frames.

Type

Optional[discord.AudioSource]

video

The source used for video frames and stream previews.

Type

Optional[MediaSource]

has_audio()

bool: Whether this source currently has audio to read.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.MediaVolumeTransformer(original, volume=1.0)

Adjusts PCM audio volume while preserving video from another source.

Parameters
original

The wrapped source.

Type

discord.AudioSource

property volume

The audio volume multiplier.

Type

float

has_audio()

bool: Whether this source currently has audio to read.

read()

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns

A bytes like object that represents the PCM or Opus data.

Return type

bytes

is_opus()

Checks if the audio source is already encoded in Opus.

is_finished()

bool: Whether this source has no more media to produce.

cleanup()

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

Media Sinks

class discord.ext.native_voice.MediaSink

Base class for receive-side media sinks.

Sinks can be chained by passing a destination sink to another sink. The root sink is owned by VoiceClient.listen() and is cleaned up when listening stops.

Parameters

destination (Optional[MediaSink]) – A child sink to register under this sink.

property root

The root sink in this sink chain.

Type

MediaSink

property parent

The parent sink in this chain.

Type

Optional[MediaSink]

property child

The first child sink, if any.

Type

Optional[MediaSink]

property children

Child sinks registered under this sink.

Type

Sequence[MediaSink]

property voice_client

The voice client owning this sink.

Type

Optional[discord.VoiceProtocol]

property client

The Discord client owning this sink.

Type

Optional[discord.Client]

property closed

Whether this sink has been cleaned up.

Type

bool

for ... in walk_children(*, with_self=False)

Yield child sinks depth-first.

Parameters

with_self (bool) – Whether to yield this sink before its children.

Yields

MediaSink – Child sinks in depth-first order.

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.BasicSink(callback, *, media_types=None, codecs=None)

A sink that forwards each accepted packet to a callback.

Parameters
  • callback (Callable[[MediaPacket], Any]) – The callback invoked for each accepted packet.

  • media_types (Optional[List[str]]) – Media types to accept.

  • codecs (Optional[List[str]]) – Codec names to accept.

callback

The callback invoked for each accepted packet.

Type

Callable[[MediaPacket], Any]

class discord.ext.native_voice.QueueSink(destination=..., *, media_types=None, codecs=None, maxsize=0, drop_oldest=False)

Stores decoded receive packets in a queue.Queue.

This is useful when application code wants to consume multiplexed audio and video packets from its own worker instead of doing all work inside the receive callback.

Parameters
  • destination (queue.Queue) – Queue to write packets into. If omitted, a queue is created.

  • media_types (Optional[List[str]]) – Media types to accept.

  • codecs (Optional[List[str]]) – Codec names to accept.

  • maxsize (int) – Maximum size for a created queue.

  • drop_oldest (bool) – Whether to drop the oldest packet when the queue is full.

queue

The queue receiving packets.

Type

queue.Queue

drop_oldest

Whether the oldest packet is dropped when the queue is full.

Type

bool

dropped

Number of packets dropped by this sink.

Type

int

write(packet)

Queue one packet.

Parameters

packet (MediaPacket) – The packet to queue.

Returns

Whether the packet was accepted by the queue.

Return type

bool

get(block=True, timeout=None)

Remove and return one packet from the queue.

Parameters
  • block (bool) – Whether to block until a packet is available.

  • timeout (Optional[float]) – Maximum seconds to block.

Returns

The next queued packet.

Return type

MediaPacket

Raises

queue.Empty – The queue is empty and block is False or the timeout elapses.

get_nowait()

MediaPacket: Remove and return one packet without blocking.

Raises

queue.Empty – The queue is empty.

qsize()

int: The approximate queue size.

empty()

bool: Whether the queue is empty.

full()

bool: Whether the queue is full.

task_done()

Indicate that a queued packet has been processed.

Raises

ValueError – Called more times than there were queued packets.

join()

Block until all queued packets are marked done.

class discord.ext.native_voice.AsyncQueueSink(destination=..., *, loop=None, media_types=None, codecs=None, maxsize=0, drop_oldest=False)

Stores decoded receive packets in an asyncio.Queue.

Async equivalent to QueueSink.

Parameters
  • destination (asyncio.Queue) – Queue to write packets into. If omitted, a queue is created.

  • loop (Optional[asyncio.AbstractEventLoop]) – The event loop used to schedule queue writes from the receive thread.

  • media_types (Optional[List[str]]) – Media types to accept.

  • codecs (Optional[List[str]]) – Codec names to accept.

  • maxsize (int) – Maximum size for a created queue.

  • drop_oldest (bool) – Whether to drop the oldest packet when the queue is full.

queue

The queue receiving packets.

Type

asyncio.Queue

loop

The event loop used to schedule queue writes.

Type

Optional[asyncio.AbstractEventLoop]

drop_oldest

Whether the oldest packet is dropped when the queue is full.

Type

bool

dropped

Number of packets dropped by this sink.

Type

int

await get()

This function is a coroutine.

MediaPacket: Remove and return one packet from the async queue.

get_nowait()

MediaPacket: Remove and return one packet without blocking.

Raises

asyncio.QueueEmpty – The queue is empty.

qsize()

int: The approximate queue size.

empty()

bool: Whether the queue is empty.

full()

bool: Whether the queue is full.

task_done()

Indicate that a queued packet has been processed.

Raises

ValueError – Called more times than there were queued packets.

await join()

This function is a coroutine.

Wait until all queued packets are marked done.

class discord.ext.native_voice.MultiSink(destinations, /)

Fan out each received packet to multiple child sinks.

Parameters

destinations (List[MediaSink]) – The child sinks to fan out to.

property child

The first child sink, if any.

Type

Optional[MediaSink]

property children

Child sinks registered under this fan-out.

Type

Sequence[MediaSink]

class discord.ext.native_voice.PerUserSink(factory, /, *, fallback_to_ssrc=True)

Lazily creates one child sink per received user.

If a packet arrives before Discord has mapped the SSRC to a user ID, the packet is routed by SSRC. When a later packet for that SSRC has a user ID, the existing child is promoted to the user key so recordings stay together.

Parameters
  • factory (Callable[[int], MediaSink]) – Callable used to create a sink for each user ID or fallback SSRC.

  • fallback_to_ssrc (bool) – Whether packets without a user ID should be routed by SSRC.

factory

Callable used to create child sinks.

Type

Callable[[int], MediaSink]

fallback_to_ssrc

Whether packets without a user ID are routed by SSRC.

Type

bool

property children

All currently-created per-user sinks.

Type

Sequence[MediaSink]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.WaveSink(destination)

Writes decoded audio packets to a WAV file.

Parameters

destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object.

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.MixedWaveSink(destination, *, users=None)

Records decoded audio packets into one timeline-aligned WAV file.

Unlike WaveSink, this sink uses each packet’s RTP timestamp to place audio on the output timeline.

Parameters
wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.FFmpegSink(destination, *, executable='ffmpeg', before_options=None, options=None, stderr=None)

Writes decoded audio packets into an FFmpeg subprocess.

Parameters
  • destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object. File-like destinations receive FFmpeg stdout.

  • executable (str) – The FFmpeg executable to run.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

returncode

The FFmpeg process return code after cleanup.

Type

Optional[int]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.FFmpegMuxSink(destination, *, video_codec=None, width=0, height=0, fps=30, audio=True, video=True, executable='ffmpeg', before_options=None, options=None, output_format=None, audio_codec=None, shortest=True, stderr=None, keep_temp=False, timeout=120.0)

Records multiplexed receive packets into one FFmpeg output.

Audio packets are decoded to timestamp-aligned PCM and video packets are written as their decoded frame payloads.

Parameters
  • destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object. File-like destinations receive FFmpeg stdout.

  • video_codec (Optional[str]) – Restrict recording to a single Discord video codec.

  • width (int) – Video width used for codecs that require container dimensions.

  • height (int) – Video height used for codecs that require container dimensions.

  • fps (int) – Fallback video frame rate for muxing.

  • audio (bool) – Whether to record audio packets.

  • video (bool) – Whether to record video packets.

  • executable (str) – The FFmpeg executable to run.

  • before_options (Optional[str]) – Extra FFmpeg options placed before input options.

  • options (Optional[str]) – Extra FFmpeg output options.

  • output_format (Optional[str]) – Explicit FFmpeg output format.

  • audio_codec (Optional[str]) – Audio codec to encode with during muxing.

  • shortest (bool) – Whether to stop muxed output at the shortest audio/video input.

  • stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

  • keep_temp (bool) – Whether to keep temporary elementary stream files after cleanup.

  • timeout (Optional[float]) – Maximum seconds to wait for FFmpeg muxing during cleanup.

destination

The configured output destination.

Type

Union[str, os.PathLike, BinaryIO]

video_codec

The selected or detected Discord video codec.

Type

Optional[str]

width

Video width used for muxing.

Type

int

height

Video height used for muxing.

Type

int

fps

Fallback video frame rate for muxing.

Type

int

audio_enabled

Whether audio recording is enabled.

Type

bool

video_enabled

Whether video recording is enabled.

Type

bool

returncode

The FFmpeg process return code after cleanup.

Type

Optional[int]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.EncodedVideoSink(destination, *, codec, width=0, height=0, fps=30, rtp_timestamps=False)

Writes received encoded video frames to IVF or Annex B output.

Parameters
  • destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object.

  • codec (str) – The Discord video codec to write.

  • width (int) – Video width for IVF headers.

  • height (int) – Video height for IVF headers.

  • fps (int) – Video frame rate for IVF headers when RTP timestamps are not used.

  • rtp_timestamps (bool) – Whether IVF frame timestamps should be derived from RTP timestamps.

codec

The normalized Discord video codec name.

Type

str

width

Video width for output metadata.

Type

int

height

Video height for output metadata.

Type

int

fps

Video frame rate for output metadata.

Type

int

rtp_timestamps

Whether output timestamps are derived from RTP timestamps.

Type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.PCMDecodeSink(destination, /, *, fec=False)

Decodes Opus audio packets to PCM before forwarding them to another sink.

Parameters
  • destination (MediaSink) – The child sink to forward decoded packets to.

  • fec (bool) – Whether to attempt Opus in-band FEC recovery for one missing packet.

fec

Whether Opus in-band FEC recovery is enabled.

Type

bool

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.SilenceFillSink(destination, /, *, silence_after=0.06, frame_duration=0.02, max_silence=1.0)

Pads short receive-audio gaps with synthetic PCM silence packets.

The sink forwards real packets to its destination, then emits audio/pcm silence for active audio SSRCs after a short gap. This is useful for sinks that consume a continuous PCM timeline, such as FFmpeg, callback, and queue consumers. The default silence duration is bounded so a speaker that stops talking does not produce endless output.

Parameters
  • destination (MediaSink) – The child sink to forward real and synthetic packets to.

  • silence_after (float) – Seconds to wait after the last audio packet before emitting silence.

  • frame_duration (float) – Duration of each synthetic PCM silence packet in seconds.

  • max_silence (Optional[float]) – Maximum seconds of silence to emit for each active audio track.

silence_after

Seconds to wait after the last audio packet before emitting silence.

Type

float

frame_duration

Duration of each synthetic PCM silence packet in seconds.

Type

float

max_silence

Maximum seconds of silence to emit for each active audio track.

Type

Optional[float]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.MediaSinkVolumeTransformer(destination, volume=1.0, /)

Adjusts PCM audio volume before forwarding to another sink.

Parameters
  • destination (MediaSink) – The child sink to forward transformed packets to.

  • volume (float) – The initial audio volume multiplier.

property volume

The audio volume multiplier.

Type

float

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()

Close this sink and all child sinks.

class discord.ext.native_voice.ConditionalFilter(destination, predicate, /)

A sink filter that forwards packets when a predicate returns true.

Parameters
  • destination (MediaSink) – The child sink to forward accepted packets to.

  • predicate (Callable[[MediaPacket], bool]) – The predicate used to accept packets.

predicate

The predicate used to accept packets.

Type

Callable[[MediaPacket], bool]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

class discord.ext.native_voice.TimedFilter(destination, duration, *, start_on_init=False)

Forward packets for a bounded duration.

Parameters
  • destination (MediaSink) – The child sink to forward accepted packets to.

  • duration (float) – The number of seconds to accept packets for.

  • start_on_init (bool) – Whether the duration timer starts when the filter is created.

duration

The number of seconds to accept packets for.

Type

float

start_time

The monotonic time when the filter started accepting packets.

Type

Optional[float]

class discord.ext.native_voice.UserFilter(destination, user, /)

Forward only packets from a specific user.

Parameters
  • destination (MediaSink) – The child sink to forward accepted packets to.

  • user (discord.abc.Snowflake) – The user whose media packets should be accepted.

user_id

The ID of the accepted user.

Type

int

class discord.ext.native_voice.MediaFilter(destination, *, media_types=None, codecs=None, users=None)

Forward packets matching media type, codec, and user filters.

Parameters
  • destination (MediaSink) – The child sink to forward accepted packets to.

  • media_types (Optional[List[str]]) – Media types to accept.

  • codecs (Optional[List[str]]) – Codec names to accept.

  • users (Optional[List[discord.abc.Snowflake]]) – Users whose media packets should be accepted.

media_types

Media types accepted by this filter.

Type

Optional[Set[str]]

codecs

Codec names accepted by this filter.

Type

Optional[Set[str]]

user_ids

User IDs accepted by this filter.

Type

Optional[Set[int]]

wants_media(media_type, codec)

Return whether this sink wants a media type/codec pair.

Parameters
  • media_type (str) – The decoded media type, such as audio or video.

  • codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

Data Objects

class discord.ext.native_voice.MediaPacket

Represents one decoded receive-side media packet.

For video, payload is a full depacketized encoded frame. The RTP fields, raw, and extension fields correspond to the RTP packet that completed that frame.

media_type

The media type, currently audio or video.

Type

str

codec

The decoded codec name.

Type

str

payload

The Opus packet, PCM packet, or full encoded video frame.

Type

bytes

payload_type

The media RTP payload type.

Type

int

marker

Whether the RTP marker bit was set.

Type

bool

sequence

The RTP sequence number.

Type

int

timestamp

The RTP timestamp.

Type

int

ssrc

The normalized media SSRC.

Type

int

user_id

The mapped user ID, if Discord has identified the SSRC.

Type

Optional[int]

raw

The raw encrypted RTP packet received from the socket.

Type

bytes

extension_payload

The decrypted one-byte RTP extension payload bytes.

Type

bytes

rtp_extended

Whether the RTP extension bit was set.

Type

bool

rtp_extensions

Parsed one-byte RTP extension elements.

Type

Tuple[RTPExtension, …]

rtp_packets

Parsed RTP packets that produced this media packet.

Type

Tuple[RTPPacket, …]

received_at

Local monotonic timestamp for when this packet/frame was decoded.

Type

Optional[float]

rtcp_time

Unix timestamp mapped from RTCP sender reports or RTP absolute send time, if either was available.

Type

Optional[float]

speaking_flags

The decoded Discord speaking flags, if this is an audio packet.

Type

Optional[discord.SpeakingFlags]

audio_level

Decoded RTP audio-level extension value, where 0 is loudest and 127 is silence.

Type

Optional[int]

audio_voice_activity

RTP audio-level voice activity bit, if present.

Type

Optional[bool]

class discord.ext.native_voice.MediaSinkWants

Represents a Discord media sink wants payload.

wants

Per-SSRC quality requests. Positive values select the requested send quality; 0 means the receiver does not want that SSRC forwarded.

Type

Dict[int, int]

any

The fallback quality request for otherwise unspecified streams.

Type

Optional[int]

pixel_counts

Per-SSRC preferred pixel counts.

Type

Dict[int, float]

class discord.ext.native_voice.VideoConfig(codec, width, height, fps=30, bitrate=0)

Playback parameters for a video-capable MediaSource.

This lets discord.ext.native_voice.VoiceClient.play() start video automatically when the source knows its own dimensions and codec.

codec

The encoded video codec name, such as H264.

Type

str

width

The encoded video width in pixels.

Type

int

height

The encoded video height in pixels.

Type

int

fps

The target frame rate.

Type

int

bitrate

The target video bitrate in bits per second.

Type

int

class discord.ext.native_voice.VideoFrame(data, frame_time_ms=33.0)

Represents one encoded video frame yielded by a MediaSource.

data

The encoded frame bytes for the selected video codec.

Type

bytes

frame_time_ms

The duration of the frame in milliseconds.

Type

float

class discord.ext.native_voice.VideoProbeInfo(width=None, height=None, fps=None, bitrate=None, codec=None)

Metadata discovered for a video input.

width

The video width in pixels.

Type

Optional[int]

height

The video height in pixels.

Type

Optional[int]

fps

The frame rate, rounded to an integer.

Type

Optional[int]

bitrate

The video bitrate in bits per second.

Type

Optional[int]

codec

The Discord video codec name, if it could be mapped.

Type

Optional[str]

class discord.ext.native_voice.VideoTranscoderConfig(encoder=None, decoder=None, prefer_hardware=True, validate_encoder=True, validate_decoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

FFmpeg codec selection options for video sources.

encoder

Exact FFmpeg video encoder to use, or a mapping of Discord codec name to FFmpeg encoder name. If omitted, an available encoder is selected for the target codec.

Type

Optional[Union[str, Dict[str, str]]]

decoder

Exact FFmpeg video decoder to use for the input, or a mapping of Discord codec name to FFmpeg decoder name. This is emitted as an input option before -i.

Type

Optional[Union[str, Dict[str, str]]]

prefer_hardware

Prefer low-latency hardware encoders when FFmpeg advertises them.

Type

bool

validate_encoder

Validate explicit encoders against ffmpeg -encoders before starting.

Type

bool

validate_decoder

Validate explicit decoders against ffmpeg -decoders before starting.

Type

bool

encoder_options

Extra arguments appended immediately after the selected encoder options.

Type

List[str]

input_options

Extra arguments inserted before the input arguments.

Type

List[str]

output_options

Extra arguments appended after options and before the output format.

Type

List[str]

video_filters

Full FFmpeg video filtergraph fragments. If omitted, sources use the default low-latency software scale and yuv420p conversion.

Type

Optional[List[str]]

classmethod software(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Prefer software encoders and skip hardware encoder probing.

Parameters
  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod nvenc(*, preset=None, tune=None, gpu=None, spatial_aq=None, temporal_aq=None, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Use NVIDIA NVENC encoders for H264, H265, and AV1.

Parameters
  • preset (Optional[str]) – NVENC preset option.

  • tune (Optional[str]) – NVENC tuning option.

  • gpu (Optional[int]) – GPU index passed to NVENC.

  • spatial_aq (Optional[bool]) – Whether to enable NVENC spatial adaptive quantization.

  • temporal_aq (Optional[bool]) – Whether to enable NVENC temporal adaptive quantization.

  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod amf(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Use AMD AMF encoders for H264, H265, and AV1.

Parameters
  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod qsv(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Use Intel Quick Sync Video encoders for H264, H265, VP9, and AV1.

Parameters
  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod vaapi(*, device='/dev/dri/renderD128', validate_encoder=True, encoder_options=(), input_options=(), output_options=())

Use VAAPI encoders.

Parameters
  • device (str) – VAAPI render device path.

  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod video_toolbox(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Use macOS VideoToolbox encoders for H264 and H265.

Parameters
  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod media_foundation(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)

Use Windows Media Foundation encoders for H264, H265, and AV1.

Parameters
  • validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.

  • encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.

  • input_options (List[str]) – Extra arguments inserted before input arguments.

  • output_options (List[str]) – Extra arguments appended after source options and before output format.

  • video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

class discord.ext.native_voice.RTPExtension

Represents a parsed one-byte RTP header extension.

id

The RTP extension ID.

Type

int

data

The extension payload bytes.

Type

bytes

class discord.ext.native_voice.RTPPacket

Represents one parsed receive-side RTP packet.

For RTX packets, payload is the recovered associated media payload and sequence is the original media sequence number. The transport RTX SSRC and payload type are preserved in rtx_ssrc and rtx_payload_type.

media_type

The media type, currently audio or video.

Type

str

codec

The decoded codec name.

Type

str

payload

The RTP media payload.

Type

bytes

payload_type

The media RTP payload type.

Type

int

marker

Whether the RTP marker bit was set.

Type

bool

sequence

The RTP sequence number.

Type

int

timestamp

The RTP timestamp.

Type

int

ssrc

The normalized media SSRC.

Type

int

user_id

The mapped user ID, if Discord has identified the SSRC.

Type

Optional[int]

raw

The raw encrypted RTP packet received from the socket.

Type

bytes

extension_payload

The decrypted one-byte RTP extension payload bytes.

Type

bytes

rtp_extended

Whether the RTP extension bit was set.

Type

bool

rtp_extensions

Parsed one-byte RTP extension elements.

Type

Tuple[RTPExtension, …]

rtx

Whether this packet was received through RTX retransmission.

Type

bool

rtx_ssrc

The RTX transport SSRC, if this packet was repaired.

Type

Optional[int]

rtx_payload_type

The RTX RTP payload type, if this packet was repaired.

Type

Optional[int]

audio_level

Decoded RTP audio-level extension value, where 0 is loudest and 127 is silence.

Type

Optional[int]

audio_voice_activity

RTP audio-level voice activity bit, if present.

Type

Optional[bool]

class discord.ext.native_voice.RTPSendStats

Represents the latest RTP send state for an SSRC.

ssrc

The RTP SSRC.

Type

int

sequence

The latest RTP sequence number sent.

Type

int

transport_sequence

The latest RTP transport-wide sequence number sent, if available.

Type

Optional[int]

updated_at

Local monotonic timestamp for the latest update.

Type

float

class discord.ext.native_voice.AudioSendStats

Represents audio RTP send counters.

ssrc

The audio RTP SSRC.

Type

int

packets_sent

Number of audio RTP packets sent.

Type

int

octets_sent

Number of audio payload octets sent.

Type

int

last_sequence

The latest audio RTP sequence number sent, if available.

Type

Optional[int]

updated_at

Local monotonic timestamp for the latest send update, if available.

Type

Optional[float]

class discord.ext.native_voice.RTCPReceiverReport

Represents one RTCP receiver report block.

sender_ssrc

The SSRC that sent the receiver report.

Type

int

source_ssrc

The SSRC that the report describes.

Type

int

fraction_lost

The packet loss fraction reported by the receiver.

Type

int

cumulative_lost

The cumulative packet loss count reported by the receiver.

Type

int

extended_high_sequence

The extended highest sequence number received.

Type

int

jitter

The interarrival jitter value reported by the receiver.

Type

int

last_sender_report

Compact NTP timestamp from the last sender report.

Type

int

delay_since_last_sender_report

Delay since the last sender report in RTCP timestamp units.

Type

int

received_at

Local monotonic timestamp for when this report was decoded.

Type

float

class discord.ext.native_voice.MediaPlayerStats

Represents media player send timing statistics.

started_at

Local monotonic timestamp for when playback started or resumed.

Type

float

audio_frames_sent

Number of audio frames sent.

Type

int

video_frame_batches_sent

Number of video frame batches sent.

Type

int

video_frames_sent

Number of encoded video frames sent.

Type

int

video_packets_sent

Number of RTP video packets sent.

Type

int

late_video_frames

Number of video frames sent later than their scheduled time.

Type

int

max_video_late_ms

Maximum observed video lateness in milliseconds.

Type

float

audio_send_mean_ms

Mean time spent sending an audio frame in milliseconds.

Type

float

audio_send_max_ms

Maximum time spent sending an audio frame in milliseconds.

Type

float

video_send_mean_ms

Mean time spent sending a video frame batch in milliseconds.

Type

float

video_send_max_ms

Maximum time spent sending a video frame batch in milliseconds.

Type

float

video_send_interval_mean_ms

Mean interval between video frame batch sends in milliseconds.

Type

float

video_send_interval_p95_ms

Approximate p95 interval between video frame batch sends in milliseconds.

Type

float

video_send_interval_max_ms

Maximum interval between video frame batch sends in milliseconds.

Type

float

sleep_mean_ms

Mean time spent sleeping in the player loop in milliseconds.

Type

float

sleep_max_ms

Maximum time spent sleeping in the player loop in milliseconds.

Type

float