diart.operators#

Module Contents#

Classes#

Functions#

rearrange_audio_stream([duration, step, sample_rate])

buffer_slide(n)

accumulate_output(duration, step[, patch_collar])

Accumulate predictions and audio to infinity: O(N) space complexity.

buffer_output(duration, step, latency, sample_rate[, ...])

Store last predictions and audio inside a fixed buffer.

Attributes#

diart.operators.Operator#
class diart.operators.AudioBufferState#
chunk: numpy.ndarray | None#
buffer: numpy.ndarray | None#
start_time: float#
changed: bool#
static initial()#
static has_samples(num_samples)#
Parameters:

num_samples (int) –

static to_sliding_window(sample_rate)#
Parameters:

sample_rate (int) –

diart.operators.rearrange_audio_stream(duration=5, step=0.5, sample_rate=16000)#
Parameters:
  • duration (float) –

  • step (float) –

  • sample_rate (int) –

Return type:

Operator

diart.operators.buffer_slide(n)#
Parameters:

n (int) –

class diart.operators.PredictionWithAudio#
property has_audio: bool#
Return type:

bool

prediction: pyannote.core.Annotation#
waveform: pyannote.core.SlidingWindowFeature | None#
class diart.operators.OutputAccumulationState#
property cropped_waveform: pyannote.core.SlidingWindowFeature#
Return type:

pyannote.core.SlidingWindowFeature

annotation: pyannote.core.Annotation | None#
waveform: pyannote.core.SlidingWindowFeature | None#
real_time: float#
next_sample: int | None#
static initial()#
Return type:

OutputAccumulationState

to_tuple()#
Return type:

Tuple[Optional[pyannote.core.Annotation], Optional[pyannote.core.SlidingWindowFeature], float]

diart.operators.accumulate_output(duration, step, patch_collar=0.05)#

Accumulate predictions and audio to infinity: O(N) space complexity. Uses a pre-allocated buffer that doubles its size once full: O(logN) concat operations.

Parameters:
  • duration (float) – Buffer duration in seconds.

  • step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.

  • patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).

Return type:

A reactive x operator implementing this behavior.

diart.operators.buffer_output(duration, step, latency, sample_rate, patch_collar=0.05)#

Store last predictions and audio inside a fixed buffer. Provides the best time/space complexity trade-off if the past data is not needed.

Parameters:
  • duration (float) – Buffer duration in seconds.

  • step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.

  • latency (float) – Latency of the system in seconds.

  • sample_rate (int) – Sample rate of the audio source.

  • patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).

Return type:

A reactive x operator implementing this behavior.