diart.blocks.embedding#
Module Contents#
Classes#
Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores. |
|
Extract overlap-aware speaker embeddings given an audio chunk and its segmentation. |
- class diart.blocks.embedding.SpeakerEmbedding(model, device=None)#
- Parameters:
model (diart.models.EmbeddingModel) –
device (Optional[torch.device]) –
- static from_pretrained(model, use_hf_token=True, device=None)#
- Parameters:
use_hf_token (Union[Text, bool, None]) –
device (Optional[torch.device]) –
- Return type:
- __call__(waveform, weights=None)#
Calculate speaker embeddings of input audio. If weights are given, calculate many speaker embeddings from the same waveform.
- Parameters:
waveform (TemporalFeatures, shape (samples, channels) or (batch, samples, channels)) –
weights (Optional[TemporalFeatures], shape (frames, speakers) or (batch, frames, speakers)) – Per-speaker and per-frame weights. Defaults to no weights.
- Returns:
embeddings – If weights are provided, the shape is (batch, speakers, embedding_dim), otherwise the shape is (batch, embedding_dim). If batch size == 1, the batch dimension is omitted.
- Return type:
torch.Tensor
- class diart.blocks.embedding.OverlappedSpeechPenalty(gamma=3, beta=10, normalize=False)#
Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores.
Note
For more information, see “Overlap-Aware Low-Latency Online Speaker Diarization based on End-to-End Local Segmentation” (Section 2.2.1 Segmentation-driven speaker embedding). This block implements Equation 2.
- Parameters:
gamma (float, optional) – Exponent to lower low-confidence predictions. Defaults to 3.
beta (float, optional) – Temperature parameter (actually 1/beta) to lower joint speaker activations. Defaults to 10.
normalize (bool, optional) – Whether to min-max normalize weights to be in the range [0, 1]. Defaults to False.
- __call__(segmentation)#
- Parameters:
segmentation (diart.features.TemporalFeatures) –
- Return type:
diart.features.TemporalFeatures
- class diart.blocks.embedding.EmbeddingNormalization(norm=1)#
- Parameters:
norm (Union[float, torch.Tensor]) –
- __call__(embeddings)#
- Parameters:
embeddings (torch.Tensor) –
- Return type:
torch.Tensor
- class diart.blocks.embedding.OverlapAwareSpeakerEmbedding(model, gamma=3, beta=10, norm=1, normalize_weights=False, device=None)#
Extract overlap-aware speaker embeddings given an audio chunk and its segmentation.
- Parameters:
model (EmbeddingModel) – A pre-trained embedding model.
gamma (float, optional) – Exponent to lower low-confidence predictions. Defaults to 3.
beta (float, optional) – Softmax’s temperature parameter (actually 1/beta) to lower joint speaker activations. Defaults to 10.
norm (float or torch.Tensor of shape (batch, speakers, 1) where batch is optional) – The target norm for the embeddings. It can be different for each speaker. Defaults to 1.
normalize_weights (bool, optional) – Whether to min-max normalize embedding weights to be in the range [0, 1].
device (Optional[torch.device]) – The device on which to run the embedding model. Defaults to GPU if available or CPU if not.
- static from_pretrained(model, gamma=3, beta=10, norm=1, use_hf_token=True, normalize_weights=False, device=None)#
- Parameters:
gamma (float) –
beta (float) –
norm (Union[float, torch.Tensor]) –
use_hf_token (Union[Text, bool, None]) –
normalize_weights (bool) –
device (Optional[torch.device]) –
- __call__(waveform, segmentation)#
- Parameters:
waveform (diart.features.TemporalFeatures) –
segmentation (diart.features.TemporalFeatures) –
- Return type:
torch.Tensor