Okay so this is becoming quite the rabbit hole...
Alignment doesn't matter but offset into an allocation (or page?) does
This C code measures how long the read syscall takes, based on how far the buf pointer is offset.
I got the idea because Julia's read
doesn't 32-byte align its allocations/pointers either, and still gets the faster 45ms reads.
#include <fcntl.h>
#include <stddef.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
int64_t millis()
{
struct timespec now;
timespec_get(&now, TIME_UTC);
return ((int64_t) now.tv_sec) * 1000 + ((int64_t) now.tv_nsec) / 1000000;
}
int main() {
puts("offset milliseconds");
for (int off = 0; off < 100; ++off) {
int fd = open("testfile", O_RDONLY);
size_t n = 400000000;
char *buf = malloc(n + 4096);
int64_t a = millis();
if (n != read(fd, buf+off, n)) {
puts("oops");
}
int64_t b = millis();
free(buf);
close(fd);
printf("% 3d % 5d\n", off, b - a);
}
}
Output:
offset milliseconds
0 133
1 132
2 132
3 133
4 131
5 131
6 133
7 134
8 134
9 133
10 133
11 132
12 132
13 131
14 130
15 130
16 46
17 48
18 47
19 47
20 47
21 48
22 48
23 47
24 47
25 47
26 48
So starting with an offset of 16 into the allocation, the read takes ~45ms instead of ~130ms.
What?!
Well that's not the best part yet! This pattern stays constant until shortly before the next page boundary!
offset milliseconds
4071 45
4072 45
4073 45
4074 45
4075 45
4076 46
4077 46
4078 46
4079 46
4080 46
4081 129 //!<-- suddenly slow again!
4082 127
4083 125
4084 125
4085 125
4086 125
4087 125
4088 125
4089 126
4090 126
4091 128
4092 126
4093 126
4094 125
4095 125
4096 126
4097 126
4098 126
4099 129
4100 129
4101 130
4102 131
4103 131
4104 132
4105 131
4106 131
4107 133
4108 133
4109 134
4110 133
4111 131
4112 45 //!<-- and fast again!
4113 45
4114 46
4115 45
4116 46
4117 46
4118 46
4119 46
4120 46
4121 45
4122 45
4123 46
This is crazy to me. I guess it might be time to try to debug the Linux kernel?