diart.blocks.clustering#

Module Contents#

Classes#

OnlineSpeakerClustering

Implements constrained incremental online clustering of speakers and manages cluster centers.

class diart.blocks.clustering.OnlineSpeakerClustering(tau_active, rho_update, delta_new, metric='cosine', max_speakers=20)#

Implements constrained incremental online clustering of speakers and manages cluster centers.

Parameters:
  • tau_active (float) – Threshold for detecting active speakers. This threshold is applied on the maximum value of per-speaker output activation of the local segmentation model.

  • rho_update (float) – Threshold for considering the extracted embedding when updating the centroid of the local speaker. The centroid to which a local speaker is mapped is only updated if the ratio of speech/chunk duration of a given local speaker is greater than this threshold.

  • delta_new (float) – Threshold on the distance between a speaker embedding and a centroid. If the distance between a local speaker and all centroids is larger than delta_new, then a new centroid is created for the current speaker.

  • metric (str. Defaults to "cosine".) – The distance metric to use.

  • max_speakers (int) – Maximum number of global speakers to track through a conversation. Defaults to 20.

property num_free_centers: int#
Return type:

int

property num_known_speakers: int#
Return type:

int

property num_blocked_speakers: int#
Return type:

int

property inactive_centers: List[int]#
Return type:

List[int]

get_next_center_position()#
Return type:

Optional[int]

init_centers(dimension)#

Initializes the speaker centroid matrix

Parameters:

dimension (int) – Dimension of embeddings used for representing a speaker.

update(assignments, embeddings)#

Updates the speaker centroids given a list of assignments and local speaker embeddings

Parameters:
  • assignments (Iterable[Tuple[int, int]])) – An iterable of tuples with two elements having the first element as the source speaker and the second element as the target speaker.

  • embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix containing embeddings for all local speakers.

add_center(embedding)#

Add a new speaker centroid initialized to a given embedding

Parameters:

embedding (np.ndarray) – Embedding vector of some local speaker

Returns:

center_index – Index of the created center

Return type:

int

identify(segmentation, embeddings)#

Identify the centroids to which the input speaker embeddings belong.

Parameters:
  • segmentation (np.ndarray, shape (frames, local_speakers)) – Matrix of segmentation outputs

  • embeddings (np.ndarray, shape (local_speakers, embedding_dim)) – Matrix of embeddings

Returns:

speaker_map – A mapping from local speakers to global speakers.

Return type:

SpeakerMap

__call__(segmentation, embeddings)#
Parameters:
  • segmentation (pyannote.core.SlidingWindowFeature) –

  • embeddings (torch.Tensor) –

Return type:

pyannote.core.SlidingWindowFeature