I was wondering how it could be achieved in Rust. My goal is to generate loudness informations from a huge set of tracks, progressively, and display them in a UI later.
That picture is not a spectrum or histogram. It is just the graph of the audio waveform itself. You don't need to transform the audio data at all; just to display it.
The quality and speed of the display will depend on how well it handles having many more points/samples to display than the width of the image; a naive line drawer will work but potentially be quite slow.
(I don't have a specific Rust library to recommend for this; I haven't needed to solve this problem in Rust myself.)
To elaborate on this a bit - most digital audio is represented via pulse-code modulation, which is where the amplitude of a sound is sampled at regular intervals (usually something like 44.1khz) to approximate the analogue signal.
So when you decode audio from something like a WAV file, the output is effectively the Y values for the points on your graph (with the X axis being time).
You don't need to transform the audio data at all; just to display it.
Absolutely, and this is my intention. Sorry if I wasn't clear about it.
How come it's not an histogram, though? This indicates the volume of the audio track's samples, which means there is one vertical bar per, let's say, 1k sample.
And I don't know how to obtain this result, is there a Rust crate that generates such informations (like returning a Vec<u64> of loudness?) that I could then display using something like plotter?
Should be fairly easy to do with Iterator::chunks (with chunk_size = sample_count / image_width) and computing a “representative” value for each chunk (eg. average of absolute values, or sqrt(sum of squares) – normal average won’t work because on average the signal is zero unless there’s a DC bias).
Seems logical, but do you know a crate that would do that? I guess that the computation of this "representation" value isn't a matter of "average" volume but a more convoluted formula. And there's the performance aspect of this, too.
A histogram would have “buckets of loudness” on the x axis and number of samples in each bucket on the y axis, whereas you simply have time on the x axis and “loudness” on the y axis.
DSP pedantry: This is not quite correct. The instantaneous value (a sound pressure / displacement / voltage) is sampled. The amplitude of a signal is abs() of that instantaneous value over a period.
A better visualization strategy is to take the min and max of the samples within that time range, then paint the vertical range from min to max. That way, you get to view all of the peaks without averaging them away (this is very important for accurate visualization of transients in the sound), and the result is similar to (but more efficient than) if you drew lines between every sample.
(To get a visually-antialiased result, you might do this at a finer granularity than once per pixel column, then draw all of them with averaging of the pixels, not the samples.)
A histogram plots the count or probability weight of a certain value in different intervals. An amplitude in itself is not a histogram, it's just an instantaneous value (or maybe the average over short windows) of the strength of the signal.
You could obtain a histogram from the amplitude by counting how often (eg. in how many samples) the amplitude lies in a specific interval. That would basically be the "density" of the projection of the signal to the vertical axis. That is a quantity that is distinct from the signal itself, which is interpreted over time.
Root Mean Squared (RMS) amplitude has some useful properties as a measure of loudness. This involves taking the square of each value in an interval, then average all of those squares, then take the square root.
Here is a method one might consider : 1) divide the signal into a series of equal duration intervals, then use RMS over each selected interval to determine average amplitude of each interval. 2) With that, you have values of loudness that can be the input to the histogram calculation to determine of how many of each range occurred throughout the whole series.
To get RMS amplitude of a selected interval, one can use sox:
# sox foo.wav -n stat
Samples read: 220500
Length (seconds): 5.000000
Scaled by: 2147483647.0
Maximum amplitude: 0.999939
Minimum amplitude: -1.000000
Midline amplitude: -0.000031
Mean norm: 0.079951
Mean amplitude: -0.002050
RMS amplitude: 0.244085
Maximum delta: 0.386505
Minimum delta: 0.000000
Mean delta: 0.007803
RMS delta: 0.024331
Rough frequency: 699
Volume adjustment: 1.000
This looks like it will generate the RMS amplitude of each window, where the window is determined by a ring buffer size.
Many of the plotting libraries have built-in histogram calculation. You might select a plotting library to get the histogram capability. For example: