`diart.operators`#

Module Contents#

Classes#

`AudioBufferState`
`PredictionWithAudio`
`OutputAccumulationState`

Functions#

`rearrange_audio_stream`([duration, step, sample_rate])
`buffer_slide`(n)
`accumulate_output`(duration, step[, patch_collar])	Accumulate predictions and audio to infinity: O(N) space complexity.
`buffer_output`(duration, step, latency, sample_rate[, ...])	Store last predictions and audio inside a fixed buffer.

Attributes#

Operator

diart.operators.Operator#

class diart.operators.AudioBufferState#

chunk: numpy.ndarray | None#

buffer: numpy.ndarray | None#

start_time: float#

changed: bool#

static initial()#

static has_samples(num_samples)#

Parameters:: num_samples (int) –

static to_sliding_window(sample_rate)#

Parameters:: sample_rate (int) –

diart.operators.rearrange_audio_stream(duration=5, step=0.5, sample_rate=16000)#

Parameters:

duration (float) –
step (float) –
sample_rate (int) –

Return type:

Operator

diart.operators.buffer_slide(n)#

Parameters:: n (int) –

class diart.operators.PredictionWithAudio#

property has_audio: bool#

Return type:: bool

prediction: pyannote.core.Annotation#

waveform: pyannote.core.SlidingWindowFeature | None#

class diart.operators.OutputAccumulationState#

property cropped_waveform: pyannote.core.SlidingWindowFeature#

Return type:: pyannote.core.SlidingWindowFeature

annotation: pyannote.core.Annotation | None#

waveform: pyannote.core.SlidingWindowFeature | None#

real_time: float#

next_sample: int | None#

static initial()#

Return type:: OutputAccumulationState

to_tuple()#

Return type:: Tuple[Optional[pyannote.core.Annotation], Optional[pyannote.core.SlidingWindowFeature], float]

diart.operators.accumulate_output(duration, step, patch_collar=0.05)#

Accumulate predictions and audio to infinity: O(N) space complexity. Uses a pre-allocated buffer that doubles its size once full: O(logN) concat operations.

Parameters:

duration (float) – Buffer duration in seconds.
step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.
patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).

Return type:

A reactive x operator implementing this behavior.

diart.operators.buffer_output(duration, step, latency, sample_rate, patch_collar=0.05)#

Store last predictions and audio inside a fixed buffer. Provides the best time/space complexity trade-off if the past data is not needed.

Parameters:

duration (float) – Buffer duration in seconds.
step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.
latency (float) – Latency of the system in seconds.
sample_rate (int) – Sample rate of the audio source.
patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).

Return type:

A reactive x operator implementing this behavior.

diart.operators#

Module Contents#

Classes#

Functions#

Attributes#

`diart.operators`#