`diart.blocks.embedding`#

Module Contents#

Classes#

`SpeakerEmbedding`
`OverlappedSpeechPenalty`	Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores.
`EmbeddingNormalization`
`OverlapAwareSpeakerEmbedding`	Extract overlap-aware speaker embeddings given an audio chunk and its segmentation.

class diart.blocks.embedding.SpeakerEmbedding(model, device=None)#

Parameters:

model (diart.models.EmbeddingModel) –
device (Optional[torch.device]) –

static from_pretrained(model, use_hf_token=True, device=None)#

Parameters:

use_hf_token (Union[Text, bool, None]) –
device (Optional[torch.device]) –

Return type:

SpeakerEmbedding

__call__(waveform, weights=None)#

Calculate speaker embeddings of input audio. If weights are given, calculate many speaker embeddings from the same waveform.

Parameters:

waveform (TemporalFeatures, shape (samples, channels) or (batch, samples, channels)) –
weights (Optional[TemporalFeatures], shape (frames, speakers) or (batch, frames, speakers)) – Per-speaker and per-frame weights. Defaults to no weights.

Returns:

embeddings – If weights are provided, the shape is (batch, speakers, embedding_dim), otherwise the shape is (batch, embedding_dim). If batch size == 1, the batch dimension is omitted.

Return type:

torch.Tensor

class diart.blocks.embedding.OverlappedSpeechPenalty(gamma=3, beta=10, normalize=False)#

Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores.

Note

For more information, see “Overlap-Aware Low-Latency Online Speaker Diarization based on End-to-End Local Segmentation” (Section 2.2.1 Segmentation-driven speaker embedding). This block implements Equation 2.

Parameters:

gamma (float, optional) – Exponent to lower low-confidence predictions. Defaults to 3.
beta (float, optional) – Temperature parameter (actually 1/beta) to lower joint speaker activations. Defaults to 10.
normalize (bool, optional) – Whether to min-max normalize weights to be in the range [0, 1]. Defaults to False.

__call__(segmentation)#

Parameters:: segmentation (diart.features.TemporalFeatures) –
Return type:: diart.features.TemporalFeatures

class diart.blocks.embedding.EmbeddingNormalization(norm=1)#

Parameters:: norm (Union[float, torch.Tensor]) –

__call__(embeddings)#

Parameters:: embeddings (torch.Tensor) –
Return type:: torch.Tensor

class diart.blocks.embedding.OverlapAwareSpeakerEmbedding(model, gamma=3, beta=10, norm=1, normalize_weights=False, device=None)#

Extract overlap-aware speaker embeddings given an audio chunk and its segmentation.

Parameters:

model (EmbeddingModel) – A pre-trained embedding model.
gamma (float, optional) – Exponent to lower low-confidence predictions. Defaults to 3.
beta (float, optional) – Softmax’s temperature parameter (actually 1/beta) to lower joint speaker activations. Defaults to 10.
norm (float or torch.Tensor of shape (batch, speakers, 1) where batch is optional) – The target norm for the embeddings. It can be different for each speaker. Defaults to 1.
normalize_weights (bool, optional) – Whether to min-max normalize embedding weights to be in the range [0, 1].
device (Optional[torch.device]) – The device on which to run the embedding model. Defaults to GPU if available or CPU if not.

static from_pretrained(model, gamma=3, beta=10, norm=1, use_hf_token=True, normalize_weights=False, device=None)#

Parameters:

gamma (float) –
beta (float) –
norm (Union[float, torch.Tensor]) –
use_hf_token (Union[Text, bool, None]) –
normalize_weights (bool) –
device (Optional[torch.device]) –

__call__(waveform, segmentation)#

Parameters:

waveform (diart.features.TemporalFeatures) –
segmentation (diart.features.TemporalFeatures) –

Return type:

torch.Tensor

diart.blocks.embedding#

Module Contents#

Classes#

`diart.blocks.embedding`#