`diart.blocks.clustering`#

Module Contents#

Implements constrained incremental online clustering of speakers and manages cluster centers.

class diart.blocks.clustering.OnlineSpeakerClustering(tau_active, rho_update, delta_new, metric='cosine', max_speakers=20)#

Implements constrained incremental online clustering of speakers and manages cluster centers.

Parameters:

tau_active (float) – Threshold for detecting active speakers. This threshold is applied on the maximum value of per-speaker output activation of the local segmentation model.
rho_update (float) – Threshold for considering the extracted embedding when updating the centroid of the local speaker. The centroid to which a local speaker is mapped is only updated if the ratio of speech/chunk duration of a given local speaker is greater than this threshold.
delta_new (float) – Threshold on the distance between a speaker embedding and a centroid. If the distance between a local speaker and all centroids is larger than delta_new, then a new centroid is created for the current speaker.
metric (str. Defaults to "cosine".) – The distance metric to use.
max_speakers (int) – Maximum number of global speakers to track through a conversation. Defaults to 20.

property num_free_centers: int#

property num_known_speakers: int#

property num_blocked_speakers: int#

property inactive_centers: List[int]#

get_next_center_position()#

init_centers(dimension)#

Initializes the speaker centroid matrix

Parameters:: dimension (int) – Dimension of embeddings used for representing a speaker.

update(assignments, embeddings)#

Updates the speaker centroids given a list of assignments and local speaker embeddings

Parameters:

assignments (Iterable[Tuple[int, int]])) – An iterable of tuples with two elements having the first element as the source speaker and the second element as the target speaker.
embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix containing embeddings for all local speakers.

add_center(embedding)#

Add a new speaker centroid initialized to a given embedding

Parameters:: embedding (np.ndarray) – Embedding vector of some local speaker
Returns:: center_index – Index of the created center
Return type:: int

identify(segmentation, embeddings)#

Identify the centroids to which the input speaker embeddings belong.

Parameters:

segmentation (np.ndarray, shape (frames, local_speakers)) – Matrix of segmentation outputs
embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix of embeddings

Returns:

speaker_map – A mapping from local speakers to global speakers.

Return type:

SpeakerMap

__call__(segmentation, embeddings)#

Parameters:

Return type:

pyannote.core.SlidingWindowFeature