API Reference¶

Clients¶

Attributes

average_latency
codecs
endpoint
guild
latency
negotiated_video_codec
session_id
sink
source
stream_clients
streams
token
user
video_streams
voice_privacy_code

Methods

clsVoiceClient.with_config
asynccreate_stream
asyncdisconnect
defget_stream
defis_connected
defis_listening
defis_paused
defis_playing
deflisten
asyncmove_to
defpause
defplay
asyncrequest_video
defresume
defsend_audio_packet
defsend_video_frame
defsend_video_frames
defset_sink
asyncstart_video
defstop
defstop_listening
asyncstop_video
asyncupdate_speaking_state
asyncwatch_stream

class discord.ext.native_voice.VoiceClient¶

A native voice client with audio, video, and media receive support.

This client extends discord.VoiceClient with native RTP transport crypto, video send/receive, RTX/NACK handling, and media sinks.

property codecs¶

Codecs advertised by this client.

Type: Tuple[discord.VoiceCodec, …]

classmethod with_config(*, rtx=..., udp_qos=..., codecs=..., video_streams=..., ffmpeg_executable=..., enable_debug_stats=...)¶

Return a subclass with voice negotiation options preset.

These options affect voice protocol identification, so they must be present on the class passed to channel.connect.

Parameters

rtx (bool) – Whether to enable RTX support. This will cause the client to advertise RTX payload types for video codecs and to use RTX for video retransmissions if the voice server negotiates it. Enabling RTX may increase bandwidth usage, but can improve video quality on lossy connections. It is enabled by default.
udp_qos (bool) – Whether to request UDP QoS marking for the voice socket. This marks outgoing media packets with Discord’s native DSCP value on platforms that allow it, which may improve prioritisation on supported networks. It is disabled by default.
codecs (List[discord.VoiceCodec]) – Codec objects to advertise. When omitted, codecs are generated from local FFmpeg capabilities and sorted by the local hardware/software capability score. When provided, the codec order is preserved and priorities are recomputed after unsupported entries are skipped.
video_streams (List[discord.VoiceStream]) – The simulcast streams advertised to the server. Defaults to a single max quality stream.
ffmpeg_executable (str) – FFmpeg executable used for automatic local codec capability probing.
enable_debug_stats (bool) – Whether to collect debug counters for RTP/RTCP receive diagnostics. This is disabled by default to avoid performance issues.

Returns

A configured subclass of this voice client.

Return type

Type[VoiceClient]

Raises

ValueError – codecs does not include an Opus audio codec.

property video_streams¶

Simulcast streams advertised by this client.

Type: Tuple[discord.VoiceStream, …]

await create_stream(*, timeout=30.0, reconnect=True, cls=...)¶

This function is a coroutine.

Create a Go Live stream from the current voice channel.

Parameters

timeout (float) – The number of seconds to wait for stream RTC connection.
reconnect (bool) – Whether the stream protocol should attempt reconnects.
cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with. Defaults to StreamClient.

Returns

The connected stream RTC client.

Return type

StreamProtocol

Raises

discord.ClientException – The voice client is not connected or the voice session is not ready.

await disconnect(*, force=False)¶

This function is a coroutine.

Disconnects this voice client from voice.

property negotiated_video_codec¶

The video codec selected by the voice server.

Type: Optional[str]

send_audio_packet(data, *, encode=True)¶

Sends an audio packet composed of the data.

You must be connected to play audio.

Parameters

data (bytes) – The bytes-like object denoting PCM or Opus voice data.
encode (bool) – Indicates if data should be encoded into Opus.

Raises

ClientException – You are not connected.
opus.OpusError – Encoding the data failed.

play(source, *, after=None, application='audio', bitrate=64, fec=True, expected_packet_loss=0.0, bandwidth='full', signal_type='auto', video_width=None, video_height=None, video_fps=None, video_bitrate=None)¶

Play an audio or media source.

This extends discord.VoiceClient.play() with MediaSource support. When the source has video, the client starts the negotiated video transport before sending frames.

The finalizer, after is called after the source has been exhausted or an error occurred.

If an error happens while the media player is running, the exception is caught and the player is then stopped. If no after callback is passed, any caught exception will be logged using the library logger.

Extra parameters may be passed to the internal opus encoder if a PCM based audio source is used. Otherwise, they are ignored.

Parameters

source (discord.AudioSource) – The audio or media source to play.
after (Callable[[Optional[Exception]], Any]) – The finalizer that is called after the stream is exhausted. This function must have a single parameter, error, that denotes an optional exception that was raised during playing.
application (str) – Configures the encoder’s intended application. Can be one of: 'audio', 'voip', 'lowdelay'. Defaults to 'audio'.
bitrate (int) – Configures the bitrate in the audio encoder. Can be between 16 and 512. Defaults to 64.
fec (bool) – Configures the encoder’s use of inband forward error correction. Defaults to True.
expected_packet_loss (float) – Configures the encoder’s expected packet loss percentage. Requires FEC. Defaults to 0.0.
bandwidth (str) – Configures the encoder’s bandpass. Can be one of: 'narrow', 'medium', 'wide', 'superwide', 'full'. Defaults to 'full'.
signal_type (str) – Configures the type of signal being encoded. Can be one of: 'auto', 'voice', 'music'. Defaults to 'auto'.
video_width (Optional[int]) – Video width used when the source does not provide a VideoConfig.
video_height (Optional[int]) – Video height used when the source does not provide a VideoConfig.
video_fps (Optional[int]) – Video frame rate override.
video_bitrate (Optional[int]) – Video bitrate override in bits per second.

Raises

discord.ClientException – Already playing media or not connected.
TypeError – Source is not a AudioSource or after is not callable.
discord.opus.OpusNotLoaded – Source is not opus encoded and opus is not loaded.
ValueError – An improper value was passed as an encoder parameter.

await start_video(*, width, height, fps=30, bitrate=0)¶

This function is a coroutine.

Start outbound video using the negotiated video codec.

This is called automatically by play() and should not be called by the user in most cases.

Parameters

width (int) – Encoded video width.
height (int) – Encoded video height.
fps (int) – Encoded frame rate.
bitrate (int) – Target bitrate in bits per second.

Raises

discord.ClientException – The voice client is not connected, or no video codec was negotiated.

await request_video(ssrc, *, quality=100, any=..., pixel_count=None)¶

This function is a coroutine.

Request that Discord forwards video for an SSRC.

Parameters

ssrc (int) – The video SSRC to request.
quality (int) – The requested stream quality.
any (Optional[int]) – The fallback quality request for otherwise unspecified streams.
pixel_count (Optional[int]) – Optional pixel-count hint sent with the media sink wants payload.

Raises

discord.ClientException – The voice client is not connected.

await stop_video()¶

This function is a coroutine.

Stop outbound video and reset video transport state.

This is called automatically by the player and should not be called by the user in most cases.

send_video_frame(frame, *, frame_time_ms=33.0, stream=None)¶

Packetize, encrypt, and send one encoded video frame.

Parameters

frame (bytes) – The encoded frame in the negotiated codec.
frame_time_ms (float) – The frame duration in milliseconds.
stream (Optional[discord.VoiceStream]) – The active simulcast stream to send on. Defaults to the selected primary stream.

Returns

The number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, no active stream is selected, the stream is inactive, or the stream has no negotiated SSRC.

send_video_frames(frames, /)¶

Send RID-keyed encoded frames for active simulcast streams.

Parameters: frames (Dict[str, VideoFrame]) – Mapping of stream RID to encoded frame.
Returns: The total number of RTP packets sent.
Return type: int
Raises: discord.ClientException – The voice client is not connected, video has not been started, an active stream has no negotiated SSRC, or no active stream is selected.

listen(sink, *, after=None)¶

Listen for inbound native media packets.

Parameters

sink (Union[MediaSink, Callable[[MediaPacket], Any]]) – The sink or callback that receives decoded media packets.
after (Optional[Callable[[Optional[Exception]], Any]]) – A callback called after listening stops.

Raises

discord.ClientException – The voice client is not connected, is already listening, or sink is already registered as a child or closed.
TypeError – sink is not a MediaSink or callable, or after is not callable.

property sink¶

The current media receive sink, if one was provided to listen().

This property can also be used to change the active sink while receiving. The old sink is detached but not cleaned up.

Type: MediaSink

set_sink(sink, /)¶

Changes the active receive sink and returns the previous sink.

The old sink is detached without running MediaSink.cleanup(), so callers that keep it should clean it up explicitly when they are done.

Parameters

sink (MediaSink) – The sink to use.

Returns

The previous active sink, if any.

Return type

Optional[MediaSink]

Raises

ValueError – The voice client is not currently listening.
discord.ClientException – sink is already registered as a child or closed.

is_listening()¶: bool: Whether this client is currently receiving media packets.

stop_listening()¶: Stop receiving media packets and clean up the active sink.

property average_latency¶

Average of most recent 20 HEARTBEAT latencies in seconds.

New in version 1.4.

Type: float

property endpoint¶

The endpoint we are connecting to.

Type: str

get_stream(owner)¶

Optional[Stream]: Returns a known Go Live stream by owner ID for this voice connection.

New in version 2.2.

Parameters: owner (Snowflake) – The owner of the stream.
Returns: The stream if found.
Return type: Optional[Stream]

property guild¶

The guild we’re connected to, if applicable.

Type: Optional[Guild]

is_connected()¶: Indicates if the voice client is connected to voice.

is_paused()¶: Indicates if we’re playing audio, but if we’re paused.

is_playing()¶: Indicates if we’re currently playing audio.

property latency¶

Latency between a HEARTBEAT and a HEARTBEAT_ACK in seconds.

This could be referred to as the Discord Voice WebSocket latency and is an analogue of user’s voice latencies as seen in the Discord client.

New in version 1.4.

Type: float

await move_to(channel, *, timeout=30.0)¶

This function is a coroutine.

Moves you to a different voice channel.

Parameters

channel (Optional[Snowflake]) – The channel to move to. Must be a voice channel.
timeout (Optional[float]) –
How long to wait for the move to complete.

New in version 2.1.

Raises

asyncio.TimeoutError – The move did not complete in time, but may still be ongoing.

pause()¶: Pauses the audio playing.

resume()¶: Resumes the audio playing.

property session_id¶

The voice connection session ID.

Type: str

property source¶

The audio source being played, if playing.

This property can also be used to change the audio source currently being played.

Type: Optional[AudioSource]

stop()¶: Stops playing audio.

property stream_clients¶

The Go Live stream clients attached to this voice connection.

New in version 2.2.

Type: Tuple[StreamProtocol]

property streams¶

The Go Live streams known for this voice connection.

New in version 2.2.

Type: Tuple[Stream]

property token¶

The voice connection token.

Type: str

await update_speaking_state(flags)¶

Update the current speaking flags.

Parameters: flags (SpeakingFlags) – The new speaking flags.

property user¶

The user connected to voice (i.e. ourselves).

Type: ClientUser

property voice_privacy_code¶

Get the voice privacy code of this E2EE session’s group.

A new privacy code is created and cached each time a new transition is executed. This can be None if there is no active DAVE session happening.

New in version 2.1.

Type: str

await watch_stream(stream_key, *, timeout=30.0, reconnect=True, cls)¶

This function is a coroutine.

Watches a Go Live stream by stream key and connects with the provided stream protocol.

This is useful when the stream is not already cached. If the stream is cached, this delegates to Stream.watch().

New in version 2.2.

Parameters

stream_key (StreamKey) – The stream key to watch.
timeout (float) – The timeout in seconds to wait for the stream connection to complete.
reconnect (bool) – Whether the stream protocol should attempt reconnects.
cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with.

Raises

ClientException – You are not connected to the stream’s voice channel, or you tried to watch your own stream.

Returns

The connected stream protocol.

Return type

StreamProtocol

Attributes

average_latency
codecs
endpoint
guild
latency
negotiated_video_codec
session_id
sink
source
stream_clients
stream_key
streams
token
user
video_streams
voice_privacy_code

Methods

clsStreamClient.with_config
asynccreate_stream
asyncdisconnect
defget_stream
defis_connected
defis_listening
defis_paused
defis_playing
deflisten
asyncmove_to
asyncon_stream_available
asyncon_stream_create
asyncon_stream_delete
asyncon_stream_server_update
asyncon_stream_unavailable
asyncon_stream_update
defpause
defplay
asyncrequest_video
defresume
defsend_audio_packet
defsend_video_frame
defsend_video_frames
defset_preview_provider
defset_sink
defstart_preview_loop
asyncstart_video
defstop
defstop_listening
defstop_preview_loop
asyncstop_video
asyncupdate_speaking_state
asyncwatch_stream

class discord.ext.native_voice.StreamClient¶

A native RTC client for a Discord Go Live stream.

Stream clients are created from VoiceClient.create_stream() or discord.Stream.watch(). By default, stream clients inherit codec and RTX policy from their parent VoiceClient, but stream protocol subclasses can override their own negotiation config.

property codecs¶

Codecs advertised by this stream RTC client.

Type: Tuple[discord.VoiceCodec, …]

classmethod with_config(*, rtx=..., udp_qos=..., codecs=..., video_streams=..., ffmpeg_executable=..., enable_debug_stats=...)¶

Return a subclass with stream RTC negotiation options preset.

Omitted codec and RTX options inherit from the parent native voice client. Options provided here apply only to the stream RTC transport.

These options affect voice protocol identification, so they must be present on the class passed to create_stream or similar.

Parameters

rtx (bool) – Whether to enable RTX support. This will cause the client to advertise RTX payload types for video codecs and to use RTX for video retransmissions if the voice server negotiates it. Enabling RTX may increase bandwidth usage, but can improve video quality on lossy connections. It is enabled by default.
udp_qos (bool) – Whether to request UDP QoS marking for the voice socket. This marks outgoing media packets with Discord’s native DSCP value on platforms that allow it, which may improve prioritisation on supported networks. It is disabled by default.
codecs (List[discord.VoiceCodec]) – Codec objects to advertise. When omitted, codecs are generated from local FFmpeg capabilities and sorted by the local hardware/software capability score. When provided, the codec order is preserved and priorities are recomputed after unsupported entries are skipped.
video_streams (List[discord.VoiceStream]) – The simulcast streams advertised to the server. Defaults to a single max quality stream.
ffmpeg_executable (str) – FFmpeg executable used for automatic local codec capability probing when this stream client does not inherit parent codecs.
enable_debug_stats (bool) – Whether to collect debug RTP/RTCP receive counters. When omitted, the stream RTC client inherits the parent voice client’s setting.

Returns

A configured subclass of this stream client.

Return type

Type[StreamClient]

Raises

ValueError – codecs does not include an Opus audio codec.

play(source, *, preview_provider=..., **kwargs)¶

Play media on the stream RTC transport.

This extends play() with stream preview provider support.

The finalizer, after is called after the source has been exhausted or an error occurred.

Extra parameters may be passed to the internal opus encoder if a PCM based audio source is used. Otherwise, they are ignored.

Parameters

source (discord.AudioSource) – The audio or media source to play.
after (Callable[[Optional[Exception]], Any]) – The finalizer that is called after the stream is exhausted. This function must have a single parameter, error, that denotes an optional exception that was raised during playing.
application (str) – Configures the encoder’s intended application. Can be one of: 'audio', 'voip', 'lowdelay'. Defaults to 'audio'.
bitrate (int) – Configures the bitrate in the audio encoder. Can be between 16 and 512. Defaults to 64.
fec (bool) – Configures the encoder’s use of inband forward error correction. Defaults to True.
expected_packet_loss (float) – Configures the encoder’s expected packet loss percentage. Requires FEC. Defaults to 0.0.
bandwidth (str) – Configures the encoder’s bandpass. Can be one of: 'narrow', 'medium', 'wide', 'superwide', 'full'. Defaults to 'full'.
signal_type (str) – Configures the type of signal being encoded. Can be one of: 'auto', 'voice', 'music'. Defaults to 'auto'.
video_width (Optional[int]) – Video width used when the source does not provide a VideoConfig.
video_height (Optional[int]) – Video height used when the source does not provide a VideoConfig.
video_fps (Optional[int]) – Video frame rate override.
video_bitrate (Optional[int]) – Video bitrate override in bits per second.
preview_provider (Optional[Callable[[], Optional[bytes]]]) – A callable returning image preview bytes. By default, the media source’s preview reader is used, if available.

Raises

discord.ClientException – Already playing media or not connected. You do not own this stream. A preview was requested without a media source or preview provider.
TypeError – Source is not a AudioSource or after is not callable.
discord.opus.OpusNotLoaded – Source is not opus encoded and opus is not loaded.
ValueError – An improper value was passed as an encoder parameter.

await disconnect(*, force=False)¶

This function is a coroutine.

Disconnect this stream RTC client and clean up stream playback.

set_preview_provider(provider, /)¶

Set the callable used for stream preview uploads.

Parameters: provider (Optional[Callable[[], Optional[bytes]]]) – The preview provider to use, or None to clear it.

start_preview_loop(provider=None, /, *, interval=300.0, retry_interval=60.0, start_delay=0.5)¶

Start periodic stream preview uploads.

All interval parameters default to Discord client behavior.

Parameters

provider (Optional[Callable[[], Optional[bytes]]]) – The preview provider to use. When omitted, the current provider is reused.
interval (float) – Number of seconds between successful preview uploads.
retry_interval (float) – Number of seconds to wait after a skipped or failed preview upload.
start_delay (float) – Number of seconds to wait before the first preview upload attempt.

Raises

discord.ClientException – This client does not own the stream or no preview provider is set.

stop_preview_loop()¶: Stop periodic stream preview uploads.

await on_stream_create(stream)¶

This function is a coroutine.

An event handler called when the connected stream is created.

This mirrors on_stream_create() for the stream protocol instance that is connected to the stream.

Parameters: stream (Stream) – The stream that was created.

await on_stream_available(stream)¶

This function is a coroutine.

An event handler called when the connected stream becomes available again.

This is dispatched after a stream that was previously marked Stream.unavailable receives a new create event.

Parameters: stream (Stream) – The stream that became available.

await on_stream_server_update(data)¶

This function is a coroutine.

An event handler called when Discord sends the stream RTC server data.

This event is used by stream protocol implementations to finish or resume the stream RTC connection. Unlike the public stream events, this exposes the raw gateway payload because it contains the stream token and endpoint.

Parameters: data (dict) – The raw stream server update gateway payload.

await on_stream_update(_before, after)¶

This function is a coroutine.

An event handler called when the connected stream is updated.

Parameters

before (Stream) – The stream before the update.
after (Stream) – The stream after the update.

await on_stream_unavailable(stream)¶

This function is a coroutine.

An event handler called when the connected stream becomes temporarily unavailable.

Unavailable streams remain cached and may later dispatch on_stream_available() when Discord reports the stream again.

Parameters: stream (Stream) – The stream that became unavailable.

await on_stream_delete(_stream, _reason)¶

This function is a coroutine.

An event handler called when the connected stream is deleted.

Parameters

stream (Stream) – The stream that was deleted.
reason (StreamDeleteReason) – The reason the stream was deleted or rejected.

property average_latency¶

Average of most recent 20 HEARTBEAT latencies in seconds.

New in version 1.4.

Type: float

await create_stream(*, timeout=30.0, reconnect=True, cls=...)¶

This function is a coroutine.

Create a Go Live stream from the current voice channel.

Parameters

timeout (float) – The number of seconds to wait for stream RTC connection.
reconnect (bool) – Whether the stream protocol should attempt reconnects.
cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with. Defaults to StreamClient.

Returns

The connected stream RTC client.

Return type

StreamProtocol

Raises

discord.ClientException – The voice client is not connected or the voice session is not ready.

property endpoint¶

The endpoint we are connecting to.

Type: str

get_stream(owner)¶

Optional[Stream]: Returns a known Go Live stream by owner ID for this voice connection.

New in version 2.2.

Parameters: owner (Snowflake) – The owner of the stream.
Returns: The stream if found.
Return type: Optional[Stream]

property guild¶

The guild we’re connected to, if applicable.

Type: Optional[Guild]

is_connected()¶: Indicates if the voice client is connected to voice.

is_listening()¶: bool: Whether this client is currently receiving media packets.

is_paused()¶: Indicates if we’re playing audio, but if we’re paused.

is_playing()¶: Indicates if we’re currently playing audio.

property latency¶

Latency between a HEARTBEAT and a HEARTBEAT_ACK in seconds.

This could be referred to as the Discord Voice WebSocket latency and is an analogue of user’s voice latencies as seen in the Discord client.

New in version 1.4.

Type: float

listen(sink, *, after=None)¶

Listen for inbound native media packets.

Parameters

sink (Union[MediaSink, Callable[[MediaPacket], Any]]) – The sink or callback that receives decoded media packets.
after (Optional[Callable[[Optional[Exception]], Any]]) – A callback called after listening stops.

Raises

discord.ClientException – The voice client is not connected, is already listening, or sink is already registered as a child or closed.
TypeError – sink is not a MediaSink or callable, or after is not callable.

await move_to(channel, *, timeout=30.0)¶

This function is a coroutine.

Moves you to a different voice channel.

Parameters

channel (Optional[Snowflake]) – The channel to move to. Must be a voice channel.
timeout (Optional[float]) –
How long to wait for the move to complete.

New in version 2.1.

Raises

asyncio.TimeoutError – The move did not complete in time, but may still be ongoing.

property negotiated_video_codec¶

The video codec selected by the voice server.

Type: Optional[str]

pause()¶: Pauses the audio playing.

await request_video(ssrc, *, quality=100, any=..., pixel_count=None)¶

This function is a coroutine.

Request that Discord forwards video for an SSRC.

Parameters

ssrc (int) – The video SSRC to request.
quality (int) – The requested stream quality.
any (Optional[int]) – The fallback quality request for otherwise unspecified streams.
pixel_count (Optional[int]) – Optional pixel-count hint sent with the media sink wants payload.

Raises

discord.ClientException – The voice client is not connected.

resume()¶: Resumes the audio playing.

send_audio_packet(data, *, encode=True)¶

Sends an audio packet composed of the data.

You must be connected to play audio.

Parameters

data (bytes) – The bytes-like object denoting PCM or Opus voice data.
encode (bool) – Indicates if data should be encoded into Opus.

Raises

ClientException – You are not connected.
opus.OpusError – Encoding the data failed.

send_video_frame(frame, *, frame_time_ms=33.0, stream=None)¶

Packetize, encrypt, and send one encoded video frame.

Parameters

frame (bytes) – The encoded frame in the negotiated codec.
frame_time_ms (float) – The frame duration in milliseconds.
stream (Optional[discord.VoiceStream]) – The active simulcast stream to send on. Defaults to the selected primary stream.

Returns

The number of RTP packets sent.

Return type

int

Raises

discord.ClientException – The voice client is not connected, video has not been started, no active stream is selected, the stream is inactive, or the stream has no negotiated SSRC.

send_video_frames(frames, /)¶

Send RID-keyed encoded frames for active simulcast streams.

Parameters: frames (Dict[str, VideoFrame]) – Mapping of stream RID to encoded frame.
Returns: The total number of RTP packets sent.
Return type: int
Raises: discord.ClientException – The voice client is not connected, video has not been started, an active stream has no negotiated SSRC, or no active stream is selected.

property session_id¶

The voice connection session ID.

Type: str

set_sink(sink, /)¶

Changes the active receive sink and returns the previous sink.

The old sink is detached without running MediaSink.cleanup(), so callers that keep it should clean it up explicitly when they are done.

Parameters

sink (MediaSink) – The sink to use.

Returns

The previous active sink, if any.

Return type

Optional[MediaSink]

Raises

ValueError – The voice client is not currently listening.
discord.ClientException – sink is already registered as a child or closed.

property sink¶

The current media receive sink, if one was provided to listen().

This property can also be used to change the active sink while receiving. The old sink is detached but not cleaned up.

Type: MediaSink

property source¶

The audio source being played, if playing.

This property can also be used to change the audio source currently being played.

Type: Optional[AudioSource]

await start_video(*, width, height, fps=30, bitrate=0)¶

This function is a coroutine.

Start outbound video using the negotiated video codec.

This is called automatically by play() and should not be called by the user in most cases.

Parameters

width (int) – Encoded video width.
height (int) – Encoded video height.
fps (int) – Encoded frame rate.
bitrate (int) – Target bitrate in bits per second.

Raises

discord.ClientException – The voice client is not connected, or no video codec was negotiated.

stop()¶: Stops playing audio.

stop_listening()¶: Stop receiving media packets and clean up the active sink.

await stop_video()¶

This function is a coroutine.

Stop outbound video and reset video transport state.

This is called automatically by the player and should not be called by the user in most cases.

property stream_clients¶

The Go Live stream clients attached to this voice connection.

New in version 2.2.

Type: Tuple[StreamProtocol]

property stream_key¶

The stream key being connected to.

Type: StreamKey

property streams¶

The Go Live streams known for this voice connection.

New in version 2.2.

Type: Tuple[Stream]

property token¶

The voice connection token.

Type: str

await update_speaking_state(flags)¶

Update the current speaking flags.

Parameters: flags (SpeakingFlags) – The new speaking flags.

property user¶

The user connected to voice (i.e. ourselves).

Type: ClientUser

property video_streams¶

Simulcast streams advertised by this client.

Type: Tuple[discord.VoiceStream, …]

property voice_privacy_code¶

Get the voice privacy code of this E2EE session’s group.

A new privacy code is created and cached each time a new transition is executed. This can be None if there is no active DAVE session happening.

New in version 2.1.

Type: str

await watch_stream(stream_key, *, timeout=30.0, reconnect=True, cls)¶

This function is a coroutine.

Watches a Go Live stream by stream key and connects with the provided stream protocol.

This is useful when the stream is not already cached. If the stream is cached, this delegates to Stream.watch().

New in version 2.2.

Parameters

stream_key (StreamKey) – The stream key to watch.
timeout (float) – The timeout in seconds to wait for the stream connection to complete.
reconnect (bool) – Whether the stream protocol should attempt reconnects.
cls (Type[StreamProtocol]) – A type that subclasses StreamProtocol to connect with.

Raises

ClientException – You are not connected to the stream’s voice channel, or you tried to watch your own stream.

Returns

The connected stream protocol.

Return type

StreamProtocol

Media Sources¶

class discord.ext.native_voice.MediaSource¶

An audio source that can also yield encoded video frames.

video_realtime¶

Whether video frame pacing should track wall-clock capture timing.

Type: bool

video_retry_delay¶

Delay used before retrying video reads that temporarily return no frames.

Type: float

video_catchup_frames¶

Maximum number of video frames to send in one player tick while catching up.

Type: int

has_audio()¶: bool: Whether this source currently has audio to read.

read_video()¶

Read one encoded video frame for the primary stream.

Returns: The next encoded video frame, if one is available.
Return type: Optional[VideoFrame]

read_video_streams(streams)¶

Read encoded video frames for the active outbound simulcast streams.

The default implementation preserves the single-stream read_video() behaviour and returns a frame for the first active stream only. Sources that can encode multiple simulcast outputs should override this and return RID-keyed frames for each stream they are able to produce on this tick.

Parameters: streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.
Returns: A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.
Return type: Optional[Mapping[str, VideoFrame]]

read_preview()¶

Read image preview bytes for a Go Live stream preview.

Returns: Encoded image bytes for a stream preview, if available.
Return type: Optional[Union[bytes, bytearray, memoryview]]

has_video()¶: bool: Whether this source currently has video to read.

supports_simulcast()¶: bool: Whether read_video_streams() can emit multiple video outputs.

property video_config¶

Video parameters known by this source.

Type: Optional[VideoConfig]

on_media_sink_wants(wants)¶

Handle a remote media sink wants update for this source.

The default implementation does nothing. Adaptive sources can override this to adjust their encoder, bitrate, resolution, or selected output stream when Discord asks for a different quality.

Parameters: wants (MediaSinkWants) – The remote quality requests sent by Discord.

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

is_opus()¶: Checks if the audio source is already encoded in Opus.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

If is_opus() method returns True, then it must return 20ms worth of Opus encoded audio. Otherwise, it must be 20ms worth of 16-bit 48KHz stereo PCM, which is about 3,840 bytes per frame (20ms worth of audio).

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

class discord.ext.native_voice.AudioMediaSource(original, /)¶

Wraps an existing discord.AudioSource as a media source.

This keeps first-party discord.py audio sources, such as discord.PCMAudio, discord.FFmpegPCMAudio, and discord.FFmpegOpusAudio, usable in unified media pipelines.

Parameters: original (discord.AudioSource) – The audio source to wrap.

original¶

The wrapped audio source.

Type: discord.AudioSource

has_audio()¶: bool: Whether this source currently has audio to read.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.PCMMediaSource(stream, /, *, close=False)¶

A media source backed by raw 16-bit 48 kHz stereo PCM bytes.

This mirrors discord.PCMAudio for file-like raw PCM inputs while keeping the source composable with video-capable media sources.

Parameters

stream (bytes) – A bytes-like object that yields 20 ms PCM frames.
close (bool) – Whether to close the stream when the source is exhausted or cleaned up.

stream¶

The wrapped binary stream.

Type: bytes

has_audio()¶: bool: Whether this source currently has audio to read.

is_opus()¶: Checks if the audio source is already encoded in Opus.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.AudioFrameSource(frames, /, *, opus=False)¶

An audio source backed by an iterable of audio frames.

This is the in-memory/custom-producer counterpart to d.py’s file-like discord.PCMAudio. PCM frames should be 20 ms of 48 kHz stereo signed 16-bit audio; Opus frames may be variable length.

Parameters

frames (Iterable[Union[bytes, bytearray, memoryview]]) – The audio frames to read from.
opus (bool) – Whether the frames are already Opus encoded.

has_audio()¶: bool: Whether this source currently has audio to read.

is_opus()¶: Checks if the audio source is already encoded in Opus.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.PCMAudio(stream)¶

Represents raw 16-bit 48KHz stereo PCM audio source.

stream¶

A file-like object that reads byte data representing raw PCM.

Type: file object

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

class discord.ext.native_voice.FFmpegAudio(source, *, executable='ffmpeg', args, **subprocess_kwargs)¶

Represents an FFmpeg (or AVConv) based AudioSource.

User created AudioSources using FFmpeg differently from how FFmpegPCMAudio and FFmpegOpusAudio work should subclass this.

New in version 1.3.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegPCMAudio(source, *, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None)¶

An audio source from FFmpeg (or AVConv).

This launches a sub-process to a specific input file given.

Warning

You must have the ffmpeg or avconv executable in your path environment variable in order for this to work.

Parameters

source (Union[str, io.BufferedIOBase]) – The input that ffmpeg will take and convert to PCM bytes. If pipe is True then this is a file-like object that is passed to the stdin of ffmpeg.
executable (str) –
The executable name (and path) to use. Defaults to ffmpeg.

Warning

Since this class spawns a subprocess, care should be taken to not pass in an arbitrary executable name when using this parameter.
pipe (bool) – If True, denotes that source parameter will be passed to the stdin of ffmpeg. Defaults to False.
stderr (Optional[file object]) – A file-like object to pass to the Popen constructor.
before_options (Optional[str]) – Extra command line arguments to pass to ffmpeg before the -i flag.
options (Optional[str]) – Extra command line arguments to pass to ffmpeg after the -i flag.

Raises

ClientException – The subprocess failed to be created.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

class discord.ext.native_voice.FFmpegOpusAudio(source, *, bitrate=None, codec=None, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None)¶

An audio source from FFmpeg (or AVConv).

This launches a sub-process to a specific input file given. However, rather than producing PCM packets like FFmpegPCMAudio does that need to be encoded to Opus, this class produces Opus packets, skipping the encoding step done by the library.

Alternatively, instead of instantiating this class directly, you can use FFmpegOpusAudio.from_probe() to probe for bitrate and codec information. This can be used to opportunistically skip pointless re-encoding of existing Opus audio data for a boost in performance at the cost of a short initial delay to gather the information. The same can be achieved by passing copy to the codec parameter, but only if you know that the input source is Opus encoded beforehand.

New in version 1.3.

Warning

You must have the ffmpeg or avconv executable in your path environment variable in order for this to work.

Parameters

source (Union[str, io.BufferedIOBase]) – The input that ffmpeg will take and convert to Opus bytes. If pipe is True then this is a file-like object that is passed to the stdin of ffmpeg.
bitrate (int) – The bitrate in kbps to encode the output to. Defaults to 128.
codec (Optional[str]) –
The codec to use to encode the audio data. Normally this would be just libopus, but is used by FFmpegOpusAudio.from_probe() to opportunistically skip pointlessly re-encoding Opus audio data by passing copy as the codec value. Any values other than copy, opus, or libopus will be considered libopus. Defaults to libopus.

Warning

Do not provide this parameter unless you are certain that the audio input is already Opus encoded. For typical use FFmpegOpusAudio.from_probe() should be used to determine the proper value for this parameter.
executable (str) –
The executable name (and path) to use. Defaults to ffmpeg.

Warning

Since this class spawns a subprocess, care should be taken to not pass in an arbitrary executable name when using this parameter.
pipe (bool) – If True, denotes that source parameter will be passed to the stdin of ffmpeg. Defaults to False.
stderr (Optional[file object]) – A file-like object to pass to the Popen constructor.
before_options (Optional[str]) – Extra command line arguments to pass to ffmpeg before the -i flag.
options (Optional[str]) – Extra command line arguments to pass to ffmpeg after the -i flag.

Raises

ClientException – The subprocess failed to be created.

classmethod await from_probe(source, *, method=None, **kwargs)¶

This function is a coroutine.

A factory method that creates a FFmpegOpusAudio after probing the input source for audio codec and bitrate information.

Examples

Use this function to create an FFmpegOpusAudio instance instead of the constructor:

source = await discord.FFmpegOpusAudio.from_probe("song.webm")
voice_client.play(source)

If you are on Windows and don’t have ffprobe installed, use the fallback method to probe using ffmpeg instead:

source = await discord.FFmpegOpusAudio.from_probe("song.webm", method='fallback')
voice_client.play(source)

Using a custom method of determining codec and bitrate:

def custom_probe(source, executable):
    # some analysis code here
    return codec, bitrate

source = await discord.FFmpegOpusAudio.from_probe("song.webm", method=custom_probe)
voice_client.play(source)

Parameters

source – Identical to the source parameter for the constructor.
method (Optional[Union[str, Callable[str, str]]]) – The probing method used to determine bitrate and codec information. As a string, valid values are native to use ffprobe (or avprobe) and fallback to use ffmpeg (or avconv). As a callable, it must take two string arguments, source and executable. Both parameters are the same values passed to this factory function. executable will default to ffmpeg if not provided as a keyword argument.
kwargs – The remaining parameters to be passed to the FFmpegOpusAudio constructor, excluding bitrate and codec.

Raises

AttributeError – Invalid probe method, must be 'native' or 'fallback'.
TypeError – Invalid value for probe parameter, must be str or a callable.

Returns

An instance of this class.

Return type

FFmpegOpusAudio

classmethod await probe(source, *, method=None, executable=None)¶

This function is a coroutine.

Probes the input source for bitrate and codec information.

Parameters

source – Identical to the source parameter for FFmpegOpusAudio.
method – Identical to the method parameter for FFmpegOpusAudio.from_probe().
executable (str) – Identical to the executable parameter for FFmpegOpusAudio.

Raises

AttributeError – Invalid probe method, must be 'native' or 'fallback'.
TypeError – Invalid value for probe parameter, must be str or a callable.

Returns

A 2-tuple with the codec and bitrate of the input source.

Return type

Optional[Tuple[Optional[str], int]]

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

class discord.ext.native_voice.VideoFrameSource(frames, *, codec, fps, width=0, height=0, bitrate=0)¶

A video source backed by an iterable of already-encoded frames.

Parameters

frames (Iterable[Union[VideoFrame, bytes, bytearray, memoryview]]) – Encoded video frames to read from.
codec (str) – The Discord video codec name for the frames.
fps (int) – The frame rate used to derive frame durations.
width (int) – The encoded frame width in pixels.
height (int) – The encoded frame height in pixels.
bitrate (int) – The target video bitrate in bits per second.

codec¶

The normalized Discord video codec name.

Type: str

frame_time_ms¶

The default frame duration in milliseconds.

Type: float

has_video()¶: bool: Whether this source currently has video to read.

property video_config¶

Video configuration for frames from this source.

Type: VideoConfig

read_video()¶

Read one encoded video frame for the primary stream.

Returns: The next encoded video frame, if one is available.
Return type: Optional[VideoFrame]

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.EncodedVideoSource(source, *, codec, fps, width=0, height=0, bitrate=0)¶

A video source backed by already-encoded video frames.

VP8, VP9, and AV1 inputs are read as IVF streams. H264 and H265 inputs are read as Annex B streams with access unit delimiters.

Parameters

source (Union[str, os.PathLike, BinaryIO]) – A path or bytes-like object containing encoded video frames.
codec (str) – The video codec name for the input.
fps (int) – The frame rate used to derive frame durations.
width (int) – The encoded frame width in pixels.
height (int) – The encoded frame height in pixels.
bitrate (int) – The target video bitrate in bits per second.

codec¶

The normalized Discord video codec name.

Type: str

frame_time_ms¶

The default frame duration in milliseconds.

Type: float

has_video()¶: bool: Whether this source currently has video to read.

property video_config¶

Video parameters known by this source.

Type: Optional[VideoConfig]

read_video()¶

Read one encoded video frame for the primary stream.

Returns: The next encoded video frame, if one is available.
Return type: Optional[VideoFrame]

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.SimulcastVideoSource(sources, /)¶

A video source composed of RID-keyed video sources.

Each child source should produce encoded frames for the same codec, with keys matching the negotiated discord.VoiceStream RIDs.

Parameters: sources (Dict[str, MediaSource]) – The child video sources, keyed by RTP stream ID.

sources¶

The child sources, keyed by RTP stream ID.

Type: Dict[str, MediaSource]

has_video()¶: bool: Whether this source currently has video to read.

read_video_streams(streams)¶

Read encoded video frames for the active outbound simulcast streams.

Parameters: streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.
Returns: A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.
Return type: Optional[Mapping[str, VideoFrame]]

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegVideoSource(command, *, codec, fps, width=0, height=0, bitrate=0, preview_command=None, pipe_source=None, stderr=None, live_timestamps=False)¶

An encoded video source backed by an FFmpeg subprocess.

The subprocess writes codec-ready H264/H265 Annex B or VP8/VP9/AV1 IVF frames to stdout for the native RTP packetizers.

Parameters

command (List[str]) – The FFmpeg command to run.
codec (str) – The Discord video codec name produced by FFmpeg.
fps (int) – The target frame rate.
width (int) – The encoded frame width in pixels.
height (int) – The encoded frame height in pixels.
bitrate (int) – The target video bitrate in bits per second.
preview_command (Optional[List[str]]) – FFmpeg command used to produce a stream preview image frame.
pipe_source (Any) – Optional file-like object or native desktop capture source piped into FFmpeg stdin.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
live_timestamps (bool) – Whether frame durations should track wall-clock capture timing.

command¶

The FFmpeg command being run.

Type: List[str]

preview_command¶

The FFmpeg preview command, if configured.

Type: Optional[List[str]]

codec¶

The normalized Discord video codec name.

Type: str

frame_time_ms¶

The default frame duration in milliseconds.

Type: float

classmethod preflight_desktop(*, width, height, fps=1, codec='H264', bitrate=4000000, executable='ffmpeg', input_args=None, before_options=None, transcoder=None, native_capture=False, output_index=0, timeout=15.0)¶

Check whether the configured desktop source can produce an encoded frame.

This is useful before joining voice, since desktop capture and encoder failures are often caused by the local session rather than Discord transport.

Parameters

width (int) – The capture width in pixels.
height (int) – The capture height in pixels.
fps (int) – The capture frame rate.
codec (str) – The Discord video codec to encode.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.
native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).
output_index (int) – The native desktop output index to capture.
timeout (float) – Maximum seconds to wait for the preflight encode.

Raises

discord.ClientException – Desktop capture, FFmpeg startup, encoder validation, or the preflight encode failed.
RuntimeError – Platform desktop capture defaults are not available.

classmethod from_desktop(codec, *, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, transcoder=None, native_capture=False, output_index=0, display=...)¶

Create an FFmpeg video source from the current desktop capture input.

Parameters

codec (str) – The Discord video codec to encode.
width (int) – The capture width in pixels.
height (int) – The capture height in pixels.
fps (int) – The capture frame rate.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.
native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).
output_index (int) – The native desktop output index to capture.
display (Optional[str]) – The X11 display name used by the default Linux desktop input.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises

discord.ClientException – Desktop capture, FFmpeg startup, or encoder selection failed.
RuntimeError – Platform desktop capture defaults are not available.
ValueError – codec is not a supported Discord video codec.

classmethod from_file(source, codec, *, width, height, fps, bitrate, executable='ffmpeg', pipe=False, stderr=None, before_options=None, options=None, source_codec=None, input_args=None, preview_input_args=None, transcoder=None)¶

Create an FFmpeg video source from a file or stdin pipe.

Parameters

source (Union[str, os.PathLike, BinaryIO]) – A video path or binary stream.
codec (str) – The Discord video codec to encode.
width (int) – The encoded video width in pixels.
height (int) – The encoded video height in pixels.
fps (int) – The target frame rate.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
pipe (bool) – Whether to pipe source into FFmpeg stdin instead of treating it as a path.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
source_codec (Optional[str]) – The input video codec used for decoder selection.
input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises

TypeError – source is incompatible with the selected pipe mode.
discord.ClientException – FFmpeg startup or encoder selection failed.
ValueError – codec is not a supported Discord video codec.

classmethod await from_probe(source, codec=None, *, width=None, height=None, fps=None, bitrate=None, method=None, executable='ffmpeg', stderr=None, before_options=None, options=None, input_args=None, preview_input_args=None, transcoder=None)¶

This function is a coroutine.

Create a video source while probing missing video metadata first.

Parameters

source (Union[str, os.PathLike]) – The video file path to probe and encode.
codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.
width (Optional[int]) – The encoded video width in pixels. If omitted, the first video stream is probed.
height (Optional[int]) – The encoded video height in pixels. If omitted, the first video stream is probed.
fps (Optional[int]) – The target frame rate. If omitted, the first video stream is probed.
bitrate (Optional[int]) – The target video bitrate in bits per second. If omitted, the first video stream is probed.
method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.
executable (str) – The FFmpeg executable to run.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created video source.

Return type

FFmpegVideoSource

Raises

AttributeError – method names an invalid video probe method.
TypeError – method is not a string, callable, or None.
discord.ClientException – Required video metadata could not be probed or FFmpeg setup failed.
ValueError – codec is not a supported Discord video codec.

classmethod await probe(source, *, method=None, executable='ffmpeg')¶

This function is a coroutine.

Probe the first video stream for codec, width, height, FPS, and bitrate.

Parameters

source (Union[str, os.PathLike]) – The video file path to probe.
method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.
executable (str) – The FFmpeg executable used to locate ffprobe or run fallback probing.

Returns

The discovered video stream metadata.

Return type

VideoProbeInfo

has_video()¶: bool: Whether this source currently has video to read.

property video_config¶

Video parameters known by this source.

Type: Optional[VideoConfig]

read_video()¶

Read one encoded video frame for the primary stream.

Returns: The next encoded video frame, if one is available.
Return type: Optional[VideoFrame]

read_preview()¶

Read image preview bytes for a Go Live stream preview.

Returns: Encoded image bytes for a stream preview, if available.
Return type: Optional[Union[bytes, bytearray, memoryview]]

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.FFmpegMediaSource(*, audio=None, video=None)¶

A composite FFmpeg source that can provide audio and video together.

Parameters

audio (Optional[discord.AudioSource]) – The FFmpeg-backed audio source.
video (Optional[MediaSource]) – The FFmpeg-backed video source.

classmethod from_file(source, codec, *, width, height, fps, bitrate, executable='ffmpeg', pipe=False, audio=True, opus_audio=False, audio_bitrate=128, audio_stderr=None, audio_before_options=None, audio_options=None, video_stderr=None, video_before_options=None, video_options=None, video_source_codec=None, video_input_args=None, preview_input_args=None, video_transcoder=None)¶

Create an FFmpeg media source from a file or video stdin pipe.

Parameters

source (Union[str, os.PathLike, BinaryIO]) – A media path or binary stream.
codec (str) – The Discord video codec to encode.
width (int) – The encoded video width in pixels.
height (int) – The encoded video height in pixels.
fps (int) – The target video frame rate.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
pipe (bool) – Whether to pipe source into FFmpeg stdin for video.
audio (bool) – Whether to include audio from the input.
opus_audio (bool) – Whether to copy/probe Opus audio instead of decoding to PCM.
audio_bitrate (int) – The audio bitrate in kbps when using Opus audio.
audio_stderr (Optional[BinaryIO]) – Where audio FFmpeg stderr is redirected.
audio_before_options (Optional[str]) – Extra audio FFmpeg options placed before input options.
audio_options (Optional[str]) – Extra audio FFmpeg output options.
video_stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.
video_before_options (Optional[str]) – Extra video FFmpeg options placed before input options.
video_options (Optional[str]) – Extra video FFmpeg output options.
video_source_codec (Optional[str]) – The input video codec used for decoder selection.
video_input_args (Optional[List[str]]) – Explicit video FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises

discord.ClientException – pipe=True was used with audio=True or FFmpeg setup failed.
TypeError – source is incompatible with the selected pipe mode.
ValueError – codec is not a supported Discord video codec.

classmethod await from_probe(source, codec=None, *, width=None, height=None, fps=None, bitrate=None, method=None, video_method=None, executable='ffmpeg', audio=True, audio_stderr=None, audio_before_options=None, audio_options=None, video_stderr=None, video_before_options=None, video_options=None, video_input_args=None, preview_input_args=None, video_transcoder=None)¶

This function is a coroutine.

Create a media source while probing media metadata first.

This mirrors discord.FFmpegOpusAudio.from_probe() for unified audio/video playback, letting FFmpeg copy Opus audio when possible and using the first video stream for missing codec, width, height, FPS, and bitrate values.

Parameters

source (Union[str, os.PathLike]) – The media file path to probe and encode.
codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.
width (Optional[int]) – The encoded video width in pixels. If omitted, the first video stream is probed.
height (Optional[int]) – The encoded video height in pixels. If omitted, the first video stream is probed.
fps (Optional[int]) – The target video frame rate. If omitted, the first video stream is probed.
bitrate (Optional[int]) – The target video bitrate in bits per second. If omitted, the first video stream is probed.
method (Optional[Union[str, Callable[[str, str], Any]]]) – The audio probing method passed to discord.FFmpegOpusAudio.from_probe().
video_method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.
executable (str) – The FFmpeg executable to run.
audio (bool) – Whether to include audio from the input.
audio_stderr (Optional[BinaryIO]) – Where audio FFmpeg stderr is redirected.
audio_before_options (Optional[str]) – Extra audio FFmpeg options placed before input options.
audio_options (Optional[str]) – Extra audio FFmpeg output options.
video_stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.
video_before_options (Optional[str]) – Extra video FFmpeg options placed before input options.
video_options (Optional[str]) – Extra video FFmpeg output options.
video_input_args (Optional[List[str]]) – Explicit video FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises

discord.ClientException – Required media metadata could not be probed or FFmpeg setup failed.
ValueError – codec is not a supported Discord video codec.

classmethod await probe_video(source, *, method=None, executable='ffmpeg')¶

This function is a coroutine.

Probe the first video stream for codec, width, height, FPS, and bitrate.

Parameters

source (Union[str, os.PathLike]) – The media file path to probe.
method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.
executable (str) – The FFmpeg executable used to locate ffprobe or run fallback probing.

Returns

The discovered video stream metadata.

Return type

VideoProbeInfo

classmethod preflight_desktop(*, width, height, fps=1, codec='H264', bitrate=4000000, executable='ffmpeg', input_args=None, before_options=None, video_transcoder=None, native_capture=False, output_index=0, timeout=15.0)¶

Check whether the configured FFmpeg desktop input can capture a frame.

Parameters

width (int) – The capture width in pixels.
height (int) – The capture height in pixels.
fps (int) – The capture frame rate.
codec (str) – The Discord video codec to encode.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.
native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms.
output_index (int) – The native desktop output index to capture.
timeout (float) – Maximum seconds to wait for the preflight encode.

Raises

discord.ClientException – Desktop capture, FFmpeg startup, encoder validation, or the preflight encode failed.
RuntimeError – Platform desktop capture defaults are not available.
ValueError – codec is not a supported Discord video codec.

classmethod from_desktop(codec, *, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, audio=None, video_transcoder=None, native_capture=False, output_index=0)¶

Create an FFmpeg media source from desktop capture video.

Parameters

codec (str) – The Discord video codec to encode.
width (int) – The capture width in pixels.
height (int) – The capture height in pixels.
fps (int) – The capture frame rate.
bitrate (int) – The target video bitrate in bits per second.
executable (str) – The FFmpeg executable to run.
input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.
stderr (Optional[Union[IO[bytes], int]]) – Where video FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
audio (Optional[discord.AudioSource]) – Existing audio source to combine with the desktop video source.
video_transcoder (Optional[VideoTranscoderConfig]) – Video encoder and filter selection options.
native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms.
output_index (int) – The native desktop output index to capture.

Returns

The created media source.

Return type

FFmpegMediaSource

Raises

discord.ClientException – Desktop capture, FFmpeg startup, or encoder selection failed.
RuntimeError – Platform desktop capture defaults are not available.
ValueError – codec is not a supported Discord video codec.

class discord.ext.native_voice.FFmpegSimulcastVideoSource(sources, /)¶

An FFmpeg-backed simulcast source with one encoder per RID.

This source is intended for camera/self-video style simulcast. Each child encoder produces an encoded frame stream for one advertised discord.VoiceStream RID, and VoiceClient sends only active negotiated RIDs.

read_video_streams(streams)¶

Read encoded video frames for the active outbound simulcast streams.

Parameters: streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.
Returns: A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.
Return type: Optional[Mapping[str, VideoFrame]]

classmethod from_desktop(codec, *, streams, width, height, fps, bitrate, executable='ffmpeg', input_args=None, stderr=None, before_options=None, options=None, transcoder=None, native_capture=False, output_index=0)¶

Create a simulcast source from the current desktop capture input.

Parameters

codec (str) – The Discord video codec to encode.
width (int) – The source capture width in pixels.
height (int) – The source capture height in pixels.
fps (int) – The source frame rate.
bitrate (int) – The source video bitrate in bits per second.
streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.
executable (str) – The FFmpeg executable to run.
input_args (Optional[List[str]]) – FFmpeg input arguments. When omitted, platform desktop capture defaults are used.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.
native_capture (bool) – Whether to use the native desktop capture bridge on supported platforms (currently Windows only).
output_index (int) – The native desktop output index to capture.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises

discord.ClientException – Duplicate stream RIDs, desktop capture, FFmpeg startup, or encoder selection failed.
RuntimeError – Platform desktop capture defaults are not available.
ValueError – codec is not a supported Discord video codec.

classmethod from_file(source, codec, *, streams, width, height, fps, bitrate, executable='ffmpeg', stderr=None, before_options=None, options=None, source_codec=None, input_args=None, preview_input_args=None, transcoder=None)¶

Create a simulcast source from a video file.

Parameters

source (Union[str, os.PathLike]) – The video file path to encode.
codec (str) – The Discord video codec to encode.
width (int) – The source video width in pixels.
height (int) – The source video height in pixels.
fps (int) – The source frame rate.
bitrate (int) – The source video bitrate in bits per second.
streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.
executable (str) – The FFmpeg executable to run.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
source_codec (Optional[str]) – The input video codec used for decoder selection.
input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises

discord.ClientException – Duplicate stream RIDs, FFmpeg startup, or encoder selection failed.
ValueError – codec is not a supported Discord video codec.

classmethod await from_probe(source, codec=None, *, streams, width=None, height=None, fps=None, bitrate=None, method=None, executable='ffmpeg', stderr=None, before_options=None, options=None, input_args=None, preview_input_args=None, transcoder=None)¶

This function is a coroutine.

Create a simulcast source while probing missing video metadata first.

Parameters

source (Union[str, os.PathLike]) – The video file path to probe and encode.
codec (Optional[str]) – The Discord video codec to encode. If omitted, the first video stream is probed.
width (Optional[int]) – The source video width in pixels. If omitted, the first video stream is probed.
height (Optional[int]) – The source video height in pixels. If omitted, the first video stream is probed.
fps (Optional[int]) – The source frame rate. If omitted, the first video stream is probed.
bitrate (Optional[int]) – The source video bitrate in bits per second. If omitted, the first video stream is probed.
streams (List[discord.VoiceStream]) – The simulcast stream descriptors to encode.
method (Optional[Union[str, Callable[[str, str], Any]]]) – The video probing method.
executable (str) – The FFmpeg executable to run.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
input_args (Optional[List[str]]) – Explicit FFmpeg input arguments.
preview_input_args (Optional[List[str]]) – FFmpeg input arguments used to produce stream previews.
transcoder (Optional[VideoTranscoderConfig]) – Encoder and filter selection options.

Returns

The created simulcast video source.

Return type

FFmpegSimulcastVideoSource

Raises

discord.ClientException – Required video metadata could not be probed, duplicate stream RIDs were found, or FFmpeg setup failed.
ValueError – codec is not a supported Discord video codec.

class discord.ext.native_voice.MultiMediaSource(sources, /)¶

Combines multiple sources into one playable media source.

The source mixes multiple PCM audio inputs into one audio lane. Opus audio is supported only when it is the sole audio input, since encoded Opus cannot be mixed without decoding first. Video uses the first video-capable source that yields a frame.

Parameters: sources (List[discord.AudioSource]) – The sources to combine.

property sources¶

The sources being combined.

Type: Sequence[discord.AudioSource]

has_audio()¶: bool: Whether this source currently has audio to read.

has_video()¶: bool: Whether this source currently has video to read.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

read_video()¶

Read one encoded video frame for the primary stream.

Returns: The next encoded video frame, if one is available.
Return type: Optional[VideoFrame]

read_video_streams(streams)¶

Read encoded video frames for the active outbound simulcast streams.

Parameters: streams (List[discord.VoiceStream]) – The active outbound video streams selected by the voice client.
Returns: A mapping of RTP stream ID to encoded frame, None when the video lane is finished, or an empty mapping when no frame is ready yet.
Return type: Optional[Mapping[str, VideoFrame]]

read_preview()¶

Read image preview bytes for a Go Live stream preview.

Returns: Encoded image bytes for a stream preview, if available.
Return type: Optional[Union[bytes, bytearray, memoryview]]

property video_config¶

Video configuration from the active video source.

Type: Optional[VideoConfig]

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.CompositeMediaSource(*, audio=None, video=None)¶

Combines separate audio and video sources into one media source.

Parameters

audio (Optional[discord.AudioSource]) – The source used for audio frames.
video (Optional[MediaSource]) – The source used for video frames and stream previews.

audio¶

The source used for audio frames.

Type: Optional[discord.AudioSource]

video¶

The source used for video frames and stream previews.

Type: Optional[MediaSource]

has_audio()¶: bool: Whether this source currently has audio to read.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

class discord.ext.native_voice.MediaVolumeTransformer(original, volume=1.0)¶

Adjusts PCM audio volume while preserving video from another source.

Parameters

original (discord.AudioSource) – The source to wrap.
volume (float) – The initial volume multiplier.

original¶

The wrapped source.

Type: discord.AudioSource

property volume¶

The audio volume multiplier.

Type: float

has_audio()¶: bool: Whether this source currently has audio to read.

read()¶

Reads 20ms worth of audio.

Subclasses must implement this.

If the audio is complete, then returning an empty bytes-like object to signal this is the way to do so.

Returns: A bytes like object that represents the PCM or Opus data.
Return type: bytes

is_opus()¶: Checks if the audio source is already encoded in Opus.

is_finished()¶: bool: Whether this source has no more media to produce.

cleanup()¶

Called when clean-up is needed to be done.

Useful for clearing buffer data or processes after it is done playing audio.

Media Sinks¶

class discord.ext.native_voice.MediaSink¶

Base class for receive-side media sinks.

Sinks can be chained by passing a destination sink to another sink. The root sink is owned by VoiceClient.listen() and is cleaned up when listening stops.

Parameters: destination (Optional[MediaSink]) – A child sink to register under this sink.

property root¶

The root sink in this sink chain.

Type: MediaSink

property parent¶

The parent sink in this chain.

Type: Optional[MediaSink]

property child¶

The first child sink, if any.

Type: Optional[MediaSink]

property children¶

Child sinks registered under this sink.

Type: Sequence[MediaSink]

property voice_client¶

The voice client owning this sink.

Type: Optional[discord.VoiceProtocol]

property client¶

The Discord client owning this sink.

Type: Optional[discord.Client]

property closed¶

Whether this sink has been cleaned up.

Type: bool

for ... in walk_children(*, with_self=False)¶

Yield child sinks depth-first.

Parameters: with_self (bool) – Whether to yield this sink before its children.
Yields: MediaSink – Child sinks in depth-first order.

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.BasicSink(callback, *, media_types=None, codecs=None)¶

A sink that forwards each accepted packet to a callback.

Parameters

callback (Callable[[MediaPacket], Any]) – The callback invoked for each accepted packet.
media_types (Optional[List[str]]) – Media types to accept.
codecs (Optional[List[str]]) – Codec names to accept.

callback¶

The callback invoked for each accepted packet.

Type: Callable[[MediaPacket], Any]

class discord.ext.native_voice.QueueSink(destination=..., *, media_types=None, codecs=None, maxsize=0, drop_oldest=False)¶

Stores decoded receive packets in a queue.Queue.

This is useful when application code wants to consume multiplexed audio and video packets from its own worker instead of doing all work inside the receive callback.

Parameters

destination (queue.Queue) – Queue to write packets into. If omitted, a queue is created.
media_types (Optional[List[str]]) – Media types to accept.
codecs (Optional[List[str]]) – Codec names to accept.
maxsize (int) – Maximum size for a created queue.
drop_oldest (bool) – Whether to drop the oldest packet when the queue is full.

queue¶

The queue receiving packets.

Type: queue.Queue

drop_oldest¶

Whether the oldest packet is dropped when the queue is full.

Type: bool

dropped¶

Number of packets dropped by this sink.

Type: int

write(packet)¶

Queue one packet.

Parameters: packet (MediaPacket) – The packet to queue.
Returns: Whether the packet was accepted by the queue.
Return type: bool

get(block=True, timeout=None)¶

Remove and return one packet from the queue.

Parameters

block (bool) – Whether to block until a packet is available.
timeout (Optional[float]) – Maximum seconds to block.

Returns

The next queued packet.

Return type

MediaPacket

Raises

queue.Empty – The queue is empty and block is False or the timeout elapses.

get_nowait()¶

MediaPacket: Remove and return one packet without blocking.

Raises: queue.Empty – The queue is empty.

qsize()¶: int: The approximate queue size.

empty()¶: bool: Whether the queue is empty.

full()¶: bool: Whether the queue is full.

task_done()¶

Indicate that a queued packet has been processed.

Raises: ValueError – Called more times than there were queued packets.

join()¶: Block until all queued packets are marked done.

class discord.ext.native_voice.AsyncQueueSink(destination=..., *, loop=None, media_types=None, codecs=None, maxsize=0, drop_oldest=False)¶

Stores decoded receive packets in an asyncio.Queue.

Async equivalent to QueueSink.

Parameters

destination (asyncio.Queue) – Queue to write packets into. If omitted, a queue is created.
loop (Optional[asyncio.AbstractEventLoop]) – The event loop used to schedule queue writes from the receive thread.
media_types (Optional[List[str]]) – Media types to accept.
codecs (Optional[List[str]]) – Codec names to accept.
maxsize (int) – Maximum size for a created queue.
drop_oldest (bool) – Whether to drop the oldest packet when the queue is full.

queue¶

The queue receiving packets.

Type: asyncio.Queue

loop¶

The event loop used to schedule queue writes.

Type: Optional[asyncio.AbstractEventLoop]

drop_oldest¶

Whether the oldest packet is dropped when the queue is full.

Type: bool

dropped¶

Number of packets dropped by this sink.

Type: int

await get()¶

This function is a coroutine.

MediaPacket: Remove and return one packet from the async queue.

get_nowait()¶

MediaPacket: Remove and return one packet without blocking.

Raises: asyncio.QueueEmpty – The queue is empty.

qsize()¶: int: The approximate queue size.

empty()¶: bool: Whether the queue is empty.

full()¶: bool: Whether the queue is full.

task_done()¶

Indicate that a queued packet has been processed.

Raises: ValueError – Called more times than there were queued packets.

await join()¶

This function is a coroutine.

Wait until all queued packets are marked done.

class discord.ext.native_voice.MultiSink(destinations, /)¶

Fan out each received packet to multiple child sinks.

Parameters: destinations (List[MediaSink]) – The child sinks to fan out to.

property child¶

The first child sink, if any.

Type: Optional[MediaSink]

property children¶

Child sinks registered under this fan-out.

Type: Sequence[MediaSink]

class discord.ext.native_voice.PerUserSink(factory, /, *, fallback_to_ssrc=True)¶

Lazily creates one child sink per received user.

If a packet arrives before Discord has mapped the SSRC to a user ID, the packet is routed by SSRC. When a later packet for that SSRC has a user ID, the existing child is promoted to the user key so recordings stay together.

Parameters

factory (Callable[[int], MediaSink]) – Callable used to create a sink for each user ID or fallback SSRC.
fallback_to_ssrc (bool) – Whether packets without a user ID should be routed by SSRC.

factory¶

Callable used to create child sinks.

Type: Callable[[int], MediaSink]

fallback_to_ssrc¶

Whether packets without a user ID are routed by SSRC.

Type: bool

property children¶

All currently-created per-user sinks.

Type: Sequence[MediaSink]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.WaveSink(destination)¶

Writes decoded audio packets to a WAV file.

Parameters: destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object.

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.MixedWaveSink(destination, *, users=None)¶

Records decoded audio packets into one timeline-aligned WAV file.

Unlike WaveSink, this sink uses each packet’s RTP timestamp to place audio on the output timeline.

Parameters

destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object.
users (Optional[List[int]]) – User IDs to include. When omitted, all users are mixed.

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.FFmpegSink(destination, *, executable='ffmpeg', before_options=None, options=None, stderr=None)¶

Writes decoded audio packets into an FFmpeg subprocess.

Parameters

destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object. File-like destinations receive FFmpeg stdout.
executable (str) – The FFmpeg executable to run.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.

returncode¶

The FFmpeg process return code after cleanup.

Type: Optional[int]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.FFmpegMuxSink(destination, *, video_codec=None, width=0, height=0, fps=30, audio=True, video=True, executable='ffmpeg', before_options=None, options=None, output_format=None, audio_codec=None, shortest=True, stderr=None, keep_temp=False, timeout=120.0)¶

Records multiplexed receive packets into one FFmpeg output.

Audio packets are decoded to timestamp-aligned PCM and video packets are written as their decoded frame payloads.

Parameters

destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object. File-like destinations receive FFmpeg stdout.
video_codec (Optional[str]) – Restrict recording to a single Discord video codec.
width (int) – Video width used for codecs that require container dimensions.
height (int) – Video height used for codecs that require container dimensions.
fps (int) – Fallback video frame rate for muxing.
audio (bool) – Whether to record audio packets.
video (bool) – Whether to record video packets.
executable (str) – The FFmpeg executable to run.
before_options (Optional[str]) – Extra FFmpeg options placed before input options.
options (Optional[str]) – Extra FFmpeg output options.
output_format (Optional[str]) – Explicit FFmpeg output format.
audio_codec (Optional[str]) – Audio codec to encode with during muxing.
shortest (bool) – Whether to stop muxed output at the shortest audio/video input.
stderr (Optional[Union[IO[bytes], int]]) – Where FFmpeg stderr is redirected.
keep_temp (bool) – Whether to keep temporary elementary stream files after cleanup.
timeout (Optional[float]) – Maximum seconds to wait for FFmpeg muxing during cleanup.

destination¶

The configured output destination.

Type: Union[str, os.PathLike, BinaryIO]

video_codec¶

The selected or detected Discord video codec.

Type: Optional[str]

width¶

Video width used for muxing.

Type: int

height¶

Video height used for muxing.

Type: int

fps¶

Fallback video frame rate for muxing.

Type: int

audio_enabled¶

Whether audio recording is enabled.

Type: bool

video_enabled¶

Whether video recording is enabled.

Type: bool

returncode¶

The FFmpeg process return code after cleanup.

Type: Optional[int]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.EncodedVideoSink(destination, *, codec, width=0, height=0, fps=30, rtp_timestamps=False)¶

Writes received encoded video frames to IVF or Annex B output.

Parameters

destination (Union[str, os.PathLike, bytes]) – Output path or bytes-like object.
codec (str) – The Discord video codec to write.
width (int) – Video width for IVF headers.
height (int) – Video height for IVF headers.
fps (int) – Video frame rate for IVF headers when RTP timestamps are not used.
rtp_timestamps (bool) – Whether IVF frame timestamps should be derived from RTP timestamps.

codec¶

The normalized Discord video codec name.

Type: str

width¶

Video width for output metadata.

Type: int

height¶

Video height for output metadata.

Type: int

fps¶

Video frame rate for output metadata.

Type: int

rtp_timestamps¶

Whether output timestamps are derived from RTP timestamps.

Type: bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.PCMDecodeSink(destination, /, *, fec=False)¶

Decodes Opus audio packets to PCM before forwarding them to another sink.

Parameters

destination (MediaSink) – The child sink to forward decoded packets to.
fec (bool) – Whether to attempt Opus in-band FEC recovery for one missing packet.

fec¶

Whether Opus in-band FEC recovery is enabled.

Type: bool

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.SilenceFillSink(destination, /, *, silence_after=0.06, frame_duration=0.02, max_silence=1.0)¶

Pads short receive-audio gaps with synthetic PCM silence packets.

The sink forwards real packets to its destination, then emits audio/pcm silence for active audio SSRCs after a short gap. This is useful for sinks that consume a continuous PCM timeline, such as FFmpeg, callback, and queue consumers. The default silence duration is bounded so a speaker that stops talking does not produce endless output.

Parameters

destination (MediaSink) – The child sink to forward real and synthetic packets to.
silence_after (float) – Seconds to wait after the last audio packet before emitting silence.
frame_duration (float) – Duration of each synthetic PCM silence packet in seconds.
max_silence (Optional[float]) – Maximum seconds of silence to emit for each active audio track.

silence_after¶

Seconds to wait after the last audio packet before emitting silence.

Type: float

frame_duration¶

Duration of each synthetic PCM silence packet in seconds.

Type: float

max_silence¶

Maximum seconds of silence to emit for each active audio track.

Type: Optional[float]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.MediaSinkVolumeTransformer(destination, volume=1.0, /)¶

Adjusts PCM audio volume before forwarding to another sink.

Parameters

destination (MediaSink) – The child sink to forward transformed packets to.
volume (float) – The initial audio volume multiplier.

property volume¶

The audio volume multiplier.

Type: float

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

cleanup()¶: Close this sink and all child sinks.

class discord.ext.native_voice.ConditionalFilter(destination, predicate, /)¶

A sink filter that forwards packets when a predicate returns true.

Parameters

destination (MediaSink) – The child sink to forward accepted packets to.
predicate (Callable[[MediaPacket], bool]) – The predicate used to accept packets.

predicate¶

The predicate used to accept packets.

Type: Callable[[MediaPacket], bool]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

class discord.ext.native_voice.TimedFilter(destination, duration, *, start_on_init=False)¶

Forward packets for a bounded duration.

Parameters

destination (MediaSink) – The child sink to forward accepted packets to.
duration (float) – The number of seconds to accept packets for.
start_on_init (bool) – Whether the duration timer starts when the filter is created.

duration¶

The number of seconds to accept packets for.

Type: float

start_time¶

The monotonic time when the filter started accepting packets.

Type: Optional[float]

class discord.ext.native_voice.UserFilter(destination, user, /)¶

Forward only packets from a specific user.

Parameters

destination (MediaSink) – The child sink to forward accepted packets to.
user (discord.abc.Snowflake) – The user whose media packets should be accepted.

user_id¶

The ID of the accepted user.

Type: int

class discord.ext.native_voice.MediaFilter(destination, *, media_types=None, codecs=None, users=None)¶

Forward packets matching media type, codec, and user filters.

Parameters

destination (MediaSink) – The child sink to forward accepted packets to.
media_types (Optional[List[str]]) – Media types to accept.
codecs (Optional[List[str]]) – Codec names to accept.
users (Optional[List[discord.abc.Snowflake]]) – Users whose media packets should be accepted.

media_types¶

Media types accepted by this filter.

Type: Optional[Set[str]]

codecs¶

Codec names accepted by this filter.

Type: Optional[Set[str]]

user_ids¶

User IDs accepted by this filter.

Type: Optional[Set[int]]

wants_media(media_type, codec)¶

Return whether this sink wants a media type/codec pair.

Parameters

media_type (str) – The decoded media type, such as audio or video.
codec (str) – The decoded media codec name.

Returns

Whether this sink wants packets with the provided media type and codec.

Return type

bool

Data Objects¶

class discord.ext.native_voice.MediaPacket¶

Represents one decoded receive-side media packet.

For video, payload is a full depacketized encoded frame. The RTP fields, raw, and extension fields correspond to the RTP packet that completed that frame.

media_type¶

The media type, currently audio or video.

Type: str

codec¶

The decoded codec name.

Type: str

payload¶

The Opus packet, PCM packet, or full encoded video frame.

Type: bytes

payload_type¶

The media RTP payload type.

Type: int

marker¶

Whether the RTP marker bit was set.

Type: bool

sequence¶

The RTP sequence number.

Type: int

timestamp¶

The RTP timestamp.

Type: int

ssrc¶

The normalized media SSRC.

Type: int

user_id¶

The mapped user ID, if Discord has identified the SSRC.

Type: Optional[int]

raw¶

The raw encrypted RTP packet received from the socket.

Type: bytes

extension_payload¶

The decrypted one-byte RTP extension payload bytes.

Type: bytes

rtp_extended¶

Whether the RTP extension bit was set.

Type: bool

rtp_extensions¶

Parsed one-byte RTP extension elements.

Type: Tuple[RTPExtension, …]

rtp_packets¶

Parsed RTP packets that produced this media packet.

Type: Tuple[RTPPacket, …]

received_at¶

Local monotonic timestamp for when this packet/frame was decoded.

Type: Optional[float]

rtcp_time¶

Unix timestamp mapped from RTCP sender reports or RTP absolute send time, if either was available.

Type: Optional[float]

speaking_flags¶

The decoded Discord speaking flags, if this is an audio packet.

Type: Optional[discord.SpeakingFlags]

audio_level¶

Decoded RTP audio-level extension value, where 0 is loudest and 127 is silence.

Type: Optional[int]

audio_voice_activity¶

RTP audio-level voice activity bit, if present.

Type: Optional[bool]

class discord.ext.native_voice.MediaSinkWants¶

Represents a Discord media sink wants payload.

wants¶

Per-SSRC quality requests. Positive values select the requested send quality; 0 means the receiver does not want that SSRC forwarded.

Type: Dict[int, int]

any¶

The fallback quality request for otherwise unspecified streams.

Type: Optional[int]

pixel_counts¶

Per-SSRC preferred pixel counts.

Type: Dict[int, float]

class discord.ext.native_voice.VideoConfig(codec, width, height, fps=30, bitrate=0)¶

Playback parameters for a video-capable MediaSource.

This lets discord.ext.native_voice.VoiceClient.play() start video automatically when the source knows its own dimensions and codec.

codec¶

The encoded video codec name, such as H264.

Type: str

width¶

The encoded video width in pixels.

Type: int

height¶

The encoded video height in pixels.

Type: int

fps¶

The target frame rate.

Type: int

bitrate¶

The target video bitrate in bits per second.

Type: int

class discord.ext.native_voice.VideoFrame(data, frame_time_ms=33.0)¶

Represents one encoded video frame yielded by a MediaSource.

data¶

The encoded frame bytes for the selected video codec.

Type: bytes

frame_time_ms¶

The duration of the frame in milliseconds.

Type: float

class discord.ext.native_voice.VideoProbeInfo(width=None, height=None, fps=None, bitrate=None, codec=None)¶

Metadata discovered for a video input.

width¶

The video width in pixels.

Type: Optional[int]

height¶

The video height in pixels.

Type: Optional[int]

fps¶

The frame rate, rounded to an integer.

Type: Optional[int]

bitrate¶

The video bitrate in bits per second.

Type: Optional[int]

codec¶

The Discord video codec name, if it could be mapped.

Type: Optional[str]

class discord.ext.native_voice.VideoTranscoderConfig(encoder=None, decoder=None, prefer_hardware=True, validate_encoder=True, validate_decoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

FFmpeg codec selection options for video sources.

encoder¶

Exact FFmpeg video encoder to use, or a mapping of Discord codec name to FFmpeg encoder name. If omitted, an available encoder is selected for the target codec.

Type: Optional[Union[str, Dict[str, str]]]

decoder¶

Exact FFmpeg video decoder to use for the input, or a mapping of Discord codec name to FFmpeg decoder name. This is emitted as an input option before -i.

Type: Optional[Union[str, Dict[str, str]]]

prefer_hardware¶

Prefer low-latency hardware encoders when FFmpeg advertises them.

Type: bool

validate_encoder¶

Validate explicit encoders against ffmpeg -encoders before starting.

Type: bool

validate_decoder¶

Validate explicit decoders against ffmpeg -decoders before starting.

Type: bool

encoder_options¶

Extra arguments appended immediately after the selected encoder options.

Type: List[str]

input_options¶

Extra arguments inserted before the input arguments.

Type: List[str]

output_options¶

Extra arguments appended after options and before the output format.

Type: List[str]

video_filters¶

Full FFmpeg video filtergraph fragments. If omitted, sources use the default low-latency software scale and yuv420p conversion.

Type: Optional[List[str]]

classmethod software(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Prefer software encoders and skip hardware encoder probing.

Parameters

validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod nvenc(*, preset=None, tune=None, gpu=None, spatial_aq=None, temporal_aq=None, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Use NVIDIA NVENC encoders for H264, H265, and AV1.

Parameters

preset (Optional[str]) – NVENC preset option.
tune (Optional[str]) – NVENC tuning option.
gpu (Optional[int]) – GPU index passed to NVENC.
spatial_aq (Optional[bool]) – Whether to enable NVENC spatial adaptive quantization.
temporal_aq (Optional[bool]) – Whether to enable NVENC temporal adaptive quantization.
validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod amf(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Use AMD AMF encoders for H264, H265, and AV1.

Parameters

validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod qsv(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Use Intel Quick Sync Video encoders for H264, H265, VP9, and AV1.

Parameters

validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod vaapi(*, device='/dev/dri/renderD128', validate_encoder=True, encoder_options=(), input_options=(), output_options=())¶

Use VAAPI encoders.

Parameters

device (str) – VAAPI render device path.
validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod video_toolbox(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Use macOS VideoToolbox encoders for H264 and H265.

Parameters

validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

classmethod media_foundation(*, validate_encoder=True, encoder_options=(), input_options=(), output_options=(), video_filters=None)¶

Use Windows Media Foundation encoders for H264, H265, and AV1.

Parameters

validate_encoder (bool) – Whether to validate the selected encoder before starting FFmpeg.
encoder_options (List[str]) – Extra arguments appended immediately after selected encoder options.
input_options (List[str]) – Extra arguments inserted before input arguments.
output_options (List[str]) – Extra arguments appended after source options and before output format.
video_filters (Optional[List[str]]) – Full FFmpeg video filtergraph fragments.

Returns

The configured transcoder options.

Return type

VideoTranscoderConfig

class discord.ext.native_voice.RTPExtension¶

Represents a parsed one-byte RTP header extension.

id¶

The RTP extension ID.

Type: int

data¶

The extension payload bytes.

Type: bytes

class discord.ext.native_voice.RTPPacket¶

Represents one parsed receive-side RTP packet.

For RTX packets, payload is the recovered associated media payload and sequence is the original media sequence number. The transport RTX SSRC and payload type are preserved in rtx_ssrc and rtx_payload_type.

media_type¶

The media type, currently audio or video.

Type: str

codec¶

The decoded codec name.

Type: str

payload¶

The RTP media payload.

Type: bytes

payload_type¶

The media RTP payload type.

Type: int

marker¶

Whether the RTP marker bit was set.

Type: bool

sequence¶

The RTP sequence number.

Type: int

timestamp¶

The RTP timestamp.

Type: int

ssrc¶

The normalized media SSRC.

Type: int

user_id¶

The mapped user ID, if Discord has identified the SSRC.

Type: Optional[int]

raw¶

The raw encrypted RTP packet received from the socket.

Type: bytes

extension_payload¶

The decrypted one-byte RTP extension payload bytes.

Type: bytes

rtp_extended¶

Whether the RTP extension bit was set.

Type: bool

rtp_extensions¶

Parsed one-byte RTP extension elements.

Type: Tuple[RTPExtension, …]

rtx¶

Whether this packet was received through RTX retransmission.

Type: bool

rtx_ssrc¶

The RTX transport SSRC, if this packet was repaired.

Type: Optional[int]

rtx_payload_type¶

The RTX RTP payload type, if this packet was repaired.

Type: Optional[int]

audio_level¶

Decoded RTP audio-level extension value, where 0 is loudest and 127 is silence.

Type: Optional[int]

audio_voice_activity¶

RTP audio-level voice activity bit, if present.

Type: Optional[bool]

class discord.ext.native_voice.RTPSendStats¶

Represents the latest RTP send state for an SSRC.

ssrc¶

The RTP SSRC.

Type: int

sequence¶

The latest RTP sequence number sent.

Type: int

transport_sequence¶

The latest RTP transport-wide sequence number sent, if available.

Type: Optional[int]

updated_at¶

Local monotonic timestamp for the latest update.

Type: float

class discord.ext.native_voice.AudioSendStats¶

Represents audio RTP send counters.

ssrc¶

The audio RTP SSRC.

Type: int

packets_sent¶

Number of audio RTP packets sent.

Type: int

octets_sent¶

Number of audio payload octets sent.

Type: int

last_sequence¶

The latest audio RTP sequence number sent, if available.

Type: Optional[int]

updated_at¶

Local monotonic timestamp for the latest send update, if available.

Type: Optional[float]

class discord.ext.native_voice.RTCPReceiverReport¶

Represents one RTCP receiver report block.

sender_ssrc¶

The SSRC that sent the receiver report.

Type: int

source_ssrc¶

The SSRC that the report describes.

Type: int

fraction_lost¶

The packet loss fraction reported by the receiver.

Type: int

cumulative_lost¶

The cumulative packet loss count reported by the receiver.

Type: int

extended_high_sequence¶

The extended highest sequence number received.

Type: int

jitter¶

The interarrival jitter value reported by the receiver.

Type: int

last_sender_report¶

Compact NTP timestamp from the last sender report.

Type: int

delay_since_last_sender_report¶

Delay since the last sender report in RTCP timestamp units.

Type: int

received_at¶

Local monotonic timestamp for when this report was decoded.

Type: float

class discord.ext.native_voice.MediaPlayerStats¶

Represents media player send timing statistics.

started_at¶

Local monotonic timestamp for when playback started or resumed.

Type: float

audio_frames_sent¶

Number of audio frames sent.

Type: int

video_frame_batches_sent¶

Number of video frame batches sent.

Type: int

video_frames_sent¶

Number of encoded video frames sent.

Type: int

video_packets_sent¶

Number of RTP video packets sent.

Type: int

late_video_frames¶

Number of video frames sent later than their scheduled time.

Type: int

max_video_late_ms¶

Maximum observed video lateness in milliseconds.

Type: float

audio_send_mean_ms¶

Mean time spent sending an audio frame in milliseconds.

Type: float

audio_send_max_ms¶

Maximum time spent sending an audio frame in milliseconds.

Type: float

video_send_mean_ms¶

Mean time spent sending a video frame batch in milliseconds.

Type: float

video_send_max_ms¶

Maximum time spent sending a video frame batch in milliseconds.

Type: float

video_send_interval_mean_ms¶

Mean interval between video frame batch sends in milliseconds.

Type: float

video_send_interval_p95_ms¶

Approximate p95 interval between video frame batch sends in milliseconds.

Type: float

video_send_interval_max_ms¶

Maximum interval between video frame batch sends in milliseconds.

Type: float

sleep_mean_ms¶

Mean time spent sleeping in the player loop in milliseconds.

Type: float

sleep_max_ms¶

Maximum time spent sleeping in the player loop in milliseconds.

Type: float

API Reference¶

Clients¶

Media Sources¶

Media Sinks¶

Data Objects¶

Settings

Font

Use a serif font:

Theme

Automatic

Light

Dark