diart.operators#
Module Contents#
Classes#
Functions#
|
|
|
|
|
Accumulate predictions and audio to infinity: O(N) space complexity. |
|
Store last predictions and audio inside a fixed buffer. |
Attributes#
- diart.operators.Operator#
- class diart.operators.AudioBufferState#
- chunk: numpy.ndarray | None#
- buffer: numpy.ndarray | None#
- start_time: float#
- changed: bool#
- static initial()#
- static has_samples(num_samples)#
- Parameters:
num_samples (int) –
- static to_sliding_window(sample_rate)#
- Parameters:
sample_rate (int) –
- diart.operators.rearrange_audio_stream(duration=5, step=0.5, sample_rate=16000)#
- Parameters:
duration (float) –
step (float) –
sample_rate (int) –
- Return type:
Operator
- diart.operators.buffer_slide(n)#
- Parameters:
n (int) –
- class diart.operators.PredictionWithAudio#
- property has_audio: bool#
- Return type:
bool
- prediction: pyannote.core.Annotation#
- waveform: pyannote.core.SlidingWindowFeature | None#
- class diart.operators.OutputAccumulationState#
- property cropped_waveform: pyannote.core.SlidingWindowFeature#
- Return type:
pyannote.core.SlidingWindowFeature
- annotation: pyannote.core.Annotation | None#
- waveform: pyannote.core.SlidingWindowFeature | None#
- real_time: float#
- next_sample: int | None#
- static initial()#
- Return type:
- to_tuple()#
- Return type:
Tuple[Optional[pyannote.core.Annotation], Optional[pyannote.core.SlidingWindowFeature], float]
- diart.operators.accumulate_output(duration, step, patch_collar=0.05)#
Accumulate predictions and audio to infinity: O(N) space complexity. Uses a pre-allocated buffer that doubles its size once full: O(logN) concat operations.
- Parameters:
duration (float) – Buffer duration in seconds.
step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.
patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).
- Return type:
A reactive x operator implementing this behavior.
- diart.operators.buffer_output(duration, step, latency, sample_rate, patch_collar=0.05)#
Store last predictions and audio inside a fixed buffer. Provides the best time/space complexity trade-off if the past data is not needed.
- Parameters:
duration (float) – Buffer duration in seconds.
step (float) – Duration of the chunks at each event in seconds. The first chunk may be bigger given the latency.
latency (float) – Latency of the system in seconds.
sample_rate (int) – Sample rate of the audio source.
patch_collar (float, optional) – Collar to merge speaker turns of the same speaker, in seconds. Defaults to 0.05 (i.e. 50ms).
- Return type:
A reactive x operator implementing this behavior.