Plotters - Creating a spectrogram/heatmap with log scaling

Hi all,

I'm working on a tool to batch generate spectrograms for a bunch of sound files. I'm using rustfft to generate the fourier transforms, and then plotting them using plotters.

Currently my plotting code is not very sophisticated - it basically splits the DrawingArea into freq x n_samples rectangles and then colors them individually for scaling. There are a number of issues with this approach, but it works as a hacky MVP.

pub fn plot_spectrogram<DB: DrawingBackend>(spectrogram: &Array2<f32>, drawing_area: &DrawingArea<DB, Shift>) {
    
    // get some dimensions for drawing
    // The shape is in [nrows, ncols], meaning [n_samples, n_freqbins], but we want to transpose this
    // so that in our graph, x = time, y = frequency.
    let (num_samples, num_freq_bins) = match spectrogram.shape() {
        &[num_rows, num_columns] => (num_rows, num_columns),
        _ => panic!("Spectrogram is a {}D array, expected a 2D array.", spectrogram.ndim())
    };

    println!("...from a spectrogram with {} samples x {} frequency bins.", num_samples, num_freq_bins);
    let spectrogram_cells = drawing_area.split_evenly((num_freq_bins, num_samples));

    // Scaling values
    let windows_scaled = spectrogram.map(|i| i.abs()/(num_freq_bins as f32));
    let highest_spectral_density = windows_scaled.max_skipnan();
 
    // transpose and flip around to prepare for graphing
    /* the array is currently oriented like this:
        t = 0 |
              |
              |
              |
              |
        t = n +-------------------
            f = 0              f = m
        so it needs to be flipped...
        t = 0 |
              |
              |
              |
              |
        t = n +-------------------
            f = m              f = 0
        ...and transposed...
        f = m |
              |
              |
              |
              |
        f = 0 +-------------------
            t = 0              t = n
        
        ... in order to look like a proper spectrogram
    */
    let windows_flipped = windows_scaled.slice(ndarray::s![.., ..; -1]); // flips the
    let windows_flipped = windows_flipped.t();
    // Finally add a color scale
    let color_scale = colorous::MAGMA;
    
    // fill the cells with the appropriate color
    for (cell, spectral_density) in spectrogram_cells.iter().zip(windows_flipped.iter()) {
            let spectral_density_scaled = spectral_density.sqrt() / highest_spectral_density.sqrt();
            
            let color = color_scale.eval_continuous(spectral_density_scaled as f64);
            cell.fill(&RGBColor(color.r, color.g, color.b)).unwrap();
        };
}

One issue with this approach is that I'm plotting the frequencies on a linear scale, not on a logarithmic scale. Compare these two spectrograms, one using sonogram, one generated with my tool:

(sound clip is the last 15 seconds of Aphex Twin's Windowlicker)

Most spectrograms are using a logarithmic scale, because human perception of pitch is logarithmic. But how do I do that with plotters? I see that plotters has some provision for log scaling with the LogCoord struct and similar, but if I'm trying to create an image, rather than something like a line graph, do I need to do something special to fill in an area, rather than just plot a point?

And if there's a better-suited library for this type of plotting, I'd love to know that as well.

I would just compute the logarithm in code.

let spectral_density_scaled = spectral_density.sqrt() / highest_spectral_density.sqrt();
let sd_log = spectral_density_scaled.ln();

let color = color_scale.eval_continuous(sd_log as f64);

Sorry, to clarify I'm not talking about log-scaling the spectral density (the brightness/color of pixels), but the rather the frequency bins (the Y axis).

Note how the spiral that Aphex Twin put at the end of his song looks more round and even on the left side of the image, versus the right.

I think my comment on the spectral_density_scaled part of the code was confusing, so I've edited it.

Aha, and your matrix has a value for each coordinate? I would write a formula that takes a coordinate in the output picture and computes where it is in the input, rounds it, then looks up in the original image.

So a loop over each x and y in the output that does this:

output[(x,y)] = input[((y_offset - y).exp() as usize, x)];

Here you choose y_offset as the actual value of y at coordinate zero. You may also want to scale it.

The flip and transpose can be combined into this.

Aha, and your matrix has a value for each coordinate?

Exactly right; the matrix stores a spectral density (value) for each frequency bin (column) and sample (row).

This approach seems much more workable than trying to rely on plotters to do the scaling, and if I go further and just do the flipping/transposing/scaling on the actual matrix, I can also avoid a quirk in plotters (drawing areas must be at least 1px by 1px). I'll give that a shot.

And voila:

Screenshot 2021-03-24 141840

I ended up using the approach that Alice suggested, transforming the spectrogram itself by looking up the exponent of the index ( output[x] = input[x.exp()]).

Here is the actual function that I ended up writing. It works, but allocates into a Vec<Vec<T>> in its current state and the type signature could stand to be cleaned up (in practice it really only needs to take an f32 or f64):

fn log_distort_vector<'a, T: Num + Sum<&'a T> + FromPrimitive + Copy + Debug>(input: ArrayView1<'a, T>) -> Vec<T> {
    let length = input.len();
    let log_length = f32::log2(length as f32);
    let log_bin_width = log_length / length as f32;
    let log_transformed = (1..(length + 1))
        .map(|i| {
            let lower = log_bin_width * (i as f32 - 1f32);
            let lower = lower
                .exp2()
                .floor() as usize;
            let higher = log_bin_width * (i as f32);
            let higher = higher
                .exp2()
                .floor() as usize;

            let average_spectral_density = input.slice(ndarray::s![lower..higher])
                .mean()
                .unwrap_or_else(|| {*input.get(lower).expect(format!("index {} out of bounds for array with length {}", lower, length).as_str())})
                ;

            average_spectral_density
        })
        .collect::<Vec<_>>();

    log_transformed
}

I call into this from my spectrogram calculation function like so:

let log_transformed = windows.axis_iter(Axis(0))
        .map(log_distort_vector)
        .flatten()
        .collect::<Vec<_>>();
    
    let log_transformed = Array::from_shape_vec(windows_shape, log_transformed).unwrap();
    let log_transformed = log_transformed.into_dimensionality().unwrap();

Again I could save an allocation and make my code more readable by modifying the array in-place, but in practice generating the image itself takes much more time than my spectrogram calculation and scaling.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.