Std::fs::read slow?

Okay so this is becoming quite the rabbit hole...

Alignment doesn't matter but offset into an allocation (or page?) does

This C code measures how long the read syscall takes, based on how far the buf pointer is offset.

I got the idea because Julia's read doesn't 32-byte align its allocations/pointers either, and still gets the faster 45ms reads.

#include <fcntl.h>
#include <stddef.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

int64_t millis()
{
    struct timespec now;
    timespec_get(&now, TIME_UTC);
    return ((int64_t) now.tv_sec) * 1000 + ((int64_t) now.tv_nsec) / 1000000;
}

int main() {
    puts("offset milliseconds");
    for (int off = 0; off < 100; ++off) {
        int fd = open("testfile", O_RDONLY);
        size_t n = 400000000;
        char *buf = malloc(n + 4096);
        int64_t a = millis();
        if (n != read(fd, buf+off, n)) {
            puts("oops");
        }
        int64_t b = millis();
        free(buf);
        close(fd);
        printf("% 3d % 5d\n", off, b - a);
    }
}

Output:

offset milliseconds
  0   133
  1   132
  2   132
  3   133
  4   131
  5   131
  6   133
  7   134
  8   134
  9   133
 10   133
 11   132
 12   132
 13   131
 14   130
 15   130
 16    46
 17    48
 18    47
 19    47
 20    47
 21    48
 22    48
 23    47
 24    47
 25    47
 26    48

So starting with an offset of 16 into the allocation, the read takes ~45ms instead of ~130ms.

What?!

Well that's not the best part yet! This pattern stays constant until shortly before the next page boundary!

offset milliseconds
 4071    45
 4072    45
 4073    45
 4074    45
 4075    45
 4076    46
 4077    46
 4078    46
 4079    46
 4080    46
 4081   129 //!<-- suddenly slow again!
 4082   127
 4083   125
 4084   125
 4085   125
 4086   125
 4087   125
 4088   125
 4089   126
 4090   126
 4091   128
 4092   126
 4093   126
 4094   125
 4095   125
 4096   126
 4097   126
 4098   126
 4099   129
 4100   129
 4101   130
 4102   131
 4103   131
 4104   132
 4105   131
 4106   131
 4107   133
 4108   133
 4109   134
 4110   133
 4111   131
 4112    45 //!<-- and fast again!
 4113    45
 4114    46
 4115    45
 4116    46
 4117    46
 4118    46
 4119    46
 4120    46
 4121    45
 4122    45
 4123    46

This is crazy to me. I guess it might be time to try to debug the Linux kernel?