How can `numpy.empty` not result in UB, but Rust unitialized do?

Hi,

After going through "What The Hardware Does" is not What Your Program Does: Uninitialized Memory - Rust Internals, I would conclude that both Rust and C would have an issue with reading uninitialized memory.

However, it seems that numpy (written in C), one of the most popular Python libraries for numerics, has a function numpy.empty. The example shows printing the un-initialized values (i.e. reading uninitialized).

How come this is sound, but reading un-initialized in Rust is not? Is it because the code has already been compiled and thus no optimizations can be performed?

Some digging:

That calls PyArray_NewFromDescr, which calls npy_alloc_cache here, which calls _npy_alloc_cache here, which calls alloc here. I can't find where alloc is defined to check whether it is a malloc.

2 Likes

My guess is that strictly speaking, it isn't sound.

1 Like

I think the only reason it doesn't result in miscompilations is because numpy is dynamically linked to cpython and as such it is not possible to the compiler to observe that uninitialized memory was used.

1 Like

It is not.

The sad truth is that most "C programmers", or people who write C code, don't really know C. The same goes for C++. Universities are full of courses that teach simplistic, semi-false, or downright wrong gut instincts while teaching students C and/or C++. Some popular internet forums are probably even worse. People who "learned" pre-standard C before 1989 or something now write UB-ridden code in horrible style for all sorts of systems, and they pass on the "knowledge" to the younger generations.

The reading and printing of uninitialized memory in NumPy's C core is not correct. It is Undefined Behavior, but I think fixing it is a ship that has long sailed.

7 Likes

That is quite often true.

To be fair to old time C programmers I think they are quite reasonable in expecting a read of an uninitialised variable to produce an undefined result. As in:

int i;
printf("%d\n", i);

Rather than expecting modern day compilers and optimisers threatening to completely derail their code and render the rest of their program as undefined.

At least the numpy.empty documentation warns the user:

...it requires the user to manually set all the values in the array, and should be used with caution.

I wonder of more people can reproduce this. In docker I only could once. On my local machine it happens all the time:

a = np.empty(10000)
s = a.sum()
for _ in range(100):
    assert s == a.sum()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-e28546471745> in <module>
      2 s = a.sum()
      3 for _ in range(100):
----> 4     assert s == a.sum()

AssertionError: 

UB in "safe" python. :see_no_evil:

1 Like

Python is not a memory-safe language. It is quite easy to cause UB in the interpreter as I've recently found out.

eval((lambda:0).__code__.replace(co_consts=()))
6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.