What can I do to avoid "thread 'main' has overflowed its stack" when working with large arrays

scottmcm · June 21, 2022, 5:02pm

That looks to me like the default implementation on the trait, no?

For low-enough alignments (which is almost everything, unless you're forcing particularly-high allignment manually), it's just a call to calloc: alloc.rs - source

(Or whatever the allocator you're using chooses to do.)

Cerber-Ursi · June 21, 2022, 5:08pm

In Rust, they're not allocating at all - they're using whatever you've created them from.

Neutron3529 · June 21, 2022, 6:12pm

I tried to find the implementation of calloc in libc, found this

It seems that, calloc is more than malloc. a malloc must be called. A memset is always under something on "likely" or "expect" patch

void *
__libc_calloc (size_t n, size_t elem_size)
{
  mstate av;
  mchunkptr oldtop;
  INTERNAL_SIZE_T sz, oldtopsize;
  void *mem;
  unsigned long clearsize;
  unsigned long nclears;
  INTERNAL_SIZE_T *d;
  ptrdiff_t bytes;

  if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes)))
    {
       __set_errno (ENOMEM);
       return NULL;
    }

  sz = bytes;

  if (!__malloc_initialized)
    ptmalloc_init ();

  MAYBE_INIT_TCACHE ();

  if (SINGLE_THREAD_P)
    av = &main_arena;
  else
    arena_get (av, sz);

  if (av)
    {
      /* Check if we hand out the top chunk, in which case there may be no
	 need to clear. */
#if MORECORE_CLEARS
      oldtop = top (av);
      oldtopsize = chunksize (top (av));
# if MORECORE_CLEARS < 2
      /* Only newly allocated memory is guaranteed to be cleared.  */
      if (av == &main_arena &&
	  oldtopsize < mp_.sbrk_base + av->max_system_mem - (char *) oldtop)
	oldtopsize = (mp_.sbrk_base + av->max_system_mem - (char *) oldtop);
# endif
      if (av != &main_arena)
	{
	  heap_info *heap = heap_for_ptr (oldtop);
	  if (oldtopsize < (char *) heap + heap->mprotect_size - (char *) oldtop)
	    oldtopsize = (char *) heap + heap->mprotect_size - (char *) oldtop;
	}
#endif
    }
  else
    {
      /* No usable arenas.  */
      oldtop = 0;
      oldtopsize = 0;
    }
  mem = _int_malloc (av, sz);

  assert (!mem || chunk_is_mmapped (mem2chunk (mem)) ||
          av == arena_for_chunk (mem2chunk (mem)));

  if (!SINGLE_THREAD_P)
    {
      if (mem == 0 && av != NULL)
	{
	  LIBC_PROBE (memory_calloc_retry, 1, sz);
	  av = arena_get_retry (av, sz);
	  mem = _int_malloc (av, sz);
	}

      if (av != NULL)
	__libc_lock_unlock (av->mutex);
    }

  /* Allocation failed even after a retry.  */
  if (mem == 0)
    return 0;

  mchunkptr p = mem2chunk (mem);

  /* If we are using memory tagging, then we need to set the tags
     regardless of MORECORE_CLEARS, so we zero the whole block while
     doing so.  */
  if (__glibc_unlikely (mtag_enabled))
    return tag_new_zero_region (mem, memsize (p));

  INTERNAL_SIZE_T csz = chunksize (p);

  /* Two optional cases in which clearing not necessary */
  if (chunk_is_mmapped (p))
    {
      if (__builtin_expect (perturb_byte, 0))
        return memset (mem, 0, sz);

      return mem;
    }

#if MORECORE_CLEARS
  if (perturb_byte == 0 && (p == oldtop && csz > oldtopsize))
    {
      /* clear only the bytes from non-freshly-sbrked memory */
      csz = oldtopsize;
    }
#endif

  /* Unroll clear of <= 36 bytes (72 if 8byte sizes).  We know that
     contents have an odd number of INTERNAL_SIZE_T-sized words;
     minimally 3.  */
  d = (INTERNAL_SIZE_T *) mem;
  clearsize = csz - SIZE_SZ;
  nclears = clearsize / sizeof (INTERNAL_SIZE_T);
  assert (nclears >= 3);

  if (nclears > 9)
    return memset (d, 0, clearsize);

  else
    {
      *(d + 0) = 0;
      *(d + 1) = 0;
      *(d + 2) = 0;
      if (nclears > 4)
        {
          *(d + 3) = 0;
          *(d + 4) = 0;
          if (nclears > 6)
            {
              *(d + 5) = 0;
              *(d + 6) = 0;
              if (nclears > 8)
                {
                  *(d + 7) = 0;
                  *(d + 8) = 0;
                }
            }
        }
    }

  return mem;
}

Coding-Badly · June 21, 2022, 6:45pm

I suspect it uses the heap for large data but it is possible NumPy places such things on the stack.

The "trick" is to include "stack probes". When a thread starts, the operating system reserves the requested amount of stack space and places a "guard page" at the end. When the guard page is touched a page fault occurs, the operating takes charge, it extends the stack, places a guard page at the new end, then returns control to the thread. In other words, the operating system grows the stack as needed as long as the thread touches the guard page. That's the purpose of a "stack probe". When a function needs lots of stack space it can "probe ahead" to ensure the stack has been extended. Obviously, there's a performance hit from stack probes so some compilers / run-time libraries choose not to do that. Apparently Rust does not.

If NumPy does use the stack then it very likely also uses stack probes to ensure there's enough space.

quinedot · June 21, 2022, 9:55pm

Rust has them, modulo how much LLVM supports a given platform. It's an exploit mitigation though (it segfaults), not a stack extension mechanism.

scottmcm · June 22, 2022, 1:53am

On some platforms, it does appear to be possible to grow the stack at runtime -- rustc does that (ensure_sufficient_stack in rustc_data_structures::stack - Rust) to be able to processes excessively-nested syntax, for example.

Coding-Badly · June 22, 2022, 5:36am

I think all tier 1 is covered.

The implementation includes the answer to the ultimate question of life, the universe, and everything. That's handy.

system · September 20, 2022, 5:36am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Stack overflow when iterating over very large collection help	11	4414	January 12, 2023
Main Overflows its Stack at Array Initialization help	5	437	February 17, 2020
HELP: heap allocation, stack overflow error? help	8	1227	January 12, 2023
Stack overflow when compiling on Windows 10 help	11	2338	January 31, 2021
Thread 'tokio-runtime-worker' has overflowed its stack	4	5457	August 24, 2020

What can I do to avoid "thread 'main' has overflowed its stack" when working with large arrays

Related Topics