diart.blocks.clustering#
Module Contents#
Classes#
Implements constrained incremental online clustering of speakers and manages cluster centers. |
- class diart.blocks.clustering.OnlineSpeakerClustering(tau_active, rho_update, delta_new, metric='cosine', max_speakers=20)#
Implements constrained incremental online clustering of speakers and manages cluster centers.
- Parameters:
tau_active (float) – Threshold for detecting active speakers. This threshold is applied on the maximum value of per-speaker output activation of the local segmentation model.
rho_update (float) – Threshold for considering the extracted embedding when updating the centroid of the local speaker. The centroid to which a local speaker is mapped is only updated if the ratio of speech/chunk duration of a given local speaker is greater than this threshold.
delta_new (float) – Threshold on the distance between a speaker embedding and a centroid. If the distance between a local speaker and all centroids is larger than delta_new, then a new centroid is created for the current speaker.
metric (str. Defaults to "cosine".) – The distance metric to use.
max_speakers (int) – Maximum number of global speakers to track through a conversation. Defaults to 20.
- property num_free_centers: int#
- Return type:
int
- property num_known_speakers: int#
- Return type:
int
- property num_blocked_speakers: int#
- Return type:
int
- property inactive_centers: List[int]#
- Return type:
List[int]
- get_next_center_position()#
- Return type:
Optional[int]
- init_centers(dimension)#
Initializes the speaker centroid matrix
- Parameters:
dimension (int) – Dimension of embeddings used for representing a speaker.
- update(assignments, embeddings)#
Updates the speaker centroids given a list of assignments and local speaker embeddings
- Parameters:
assignments (Iterable[Tuple[int, int]])) – An iterable of tuples with two elements having the first element as the source speaker and the second element as the target speaker.
embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix containing embeddings for all local speakers.
- add_center(embedding)#
Add a new speaker centroid initialized to a given embedding
- Parameters:
embedding (np.ndarray) – Embedding vector of some local speaker
- Returns:
center_index – Index of the created center
- Return type:
int
- identify(segmentation, embeddings)#
Identify the centroids to which the input speaker embeddings belong.
- Parameters:
segmentation (np.ndarray, shape (frames, local_speakers)) – Matrix of segmentation outputs
embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix of embeddings
- Returns:
speaker_map – A mapping from local speakers to global speakers.
- Return type:
- __call__(segmentation, embeddings)#
- Parameters:
segmentation (pyannote.core.SlidingWindowFeature) –
embeddings (torch.Tensor) –
- Return type:
pyannote.core.SlidingWindowFeature