How can `numpy.empty` not result in UB, but Rust unitialized do?

jorgecarleitao · September 2, 2021, 3:29pm

Hi,

After going through "What The Hardware Does" is not What Your Program Does: Uninitialized Memory - Rust Internals, I would conclude that both Rust and C would have an issue with reading uninitialized memory.

However, it seems that numpy (written in C), one of the most popular Python libraries for numerics, has a function numpy.empty. The example shows printing the un-initialized values (i.e. reading uninitialized).

How come this is sound, but reading un-initialized in Rust is not? Is it because the code has already been compiled and thus no optimizations can be performed?

Some digging:

That calls PyArray_NewFromDescr, which calls npy_alloc_cache here, which calls _npy_alloc_cache here, which calls alloc here. I can't find where alloc is defined to check whether it is a malloc.

alice · September 2, 2021, 3:38pm

My guess is that strictly speaking, it isn't sound.

bjorn3 · September 2, 2021, 3:53pm

I think the only reason it doesn't result in miscompilations is because numpy is dynamically linked to cpython and as such it is not possible to the compiler to observe that uninitialized memory was used.

H2CO3 · September 2, 2021, 4:06pm

It is not.

The sad truth is that most "C programmers", or people who write C code, don't really know C. The same goes for C++. Universities are full of courses that teach simplistic, semi-false, or downright wrong gut instincts while teaching students C and/or C++. Some popular internet forums are probably even worse. People who "learned" pre-standard C before 1989 or something now write UB-ridden code in horrible style for all sorts of systems, and they pass on the "knowledge" to the younger generations.

The reading and printing of uninitialized memory in NumPy's C core is not correct. It is Undefined Behavior, but I think fixing it is a ship that has long sailed.

ZiCog · September 2, 2021, 4:31pm

That is quite often true.

To be fair to old time C programmers I think they are quite reasonable in expecting a read of an uninitialised variable to produce an undefined result. As in:

int i;
printf("%d\n", i);

Rather than expecting modern day compilers and optimisers threatening to completely derail their code and render the rest of their program as undefined.

At least the numpy.empty documentation warns the user:

...it requires the user to manually set all the values in the array, and should be used with caution.

ritchie46 · September 2, 2021, 6:07pm

I wonder of more people can reproduce this. In docker I only could once. On my local machine it happens all the time:

a = np.empty(10000)
s = a.sum()
for _ in range(100):
    assert s == a.sum()

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-2-e28546471745> in <module>
      2 s = a.sum()
      3 for _ in range(100):
----> 4     assert s == a.sum()

AssertionError:

UB in "safe" python.

jschievink · September 2, 2021, 6:18pm

Python is not a memory-safe language. It is quite easy to cause UB in the interpreter as I've recently found out.

eval((lambda:0).__code__.replace(co_consts=()))

system · December 1, 2021, 6:19pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Uninitialized memory and ffi (again) help	17	1488	April 6, 2021
Reading uninitialized value vs undefined behaviour	15	928	January 12, 2023
Understanding the basic idea of uninitialized memory help	61	1647	December 22, 2025
What counts as undefined behavior community	88	3905	February 9, 2023
Is it possible to read uninitialized memory without invoking UB? help	47	4255	November 21, 2021

How can `numpy.empty` not result in UB, but Rust unitialized do?

Related topics