File owned by my LD_PRELOAD library gets closed somewhere

My LD_PRELOAD library opens a tracker file to write file system access metadata to the file.
MY LD_PRELOAD library intercepts the close libc call and if the file descriptor that is being closed is the tracker file FD, then it exits/tracebacks, so I can find who is explicitly closing.
Additionally I explicitly make sure that CLOEXEC for the FD 800, to false, so the exec doesnt close the FD.
I noticed some programs explicitly close some file descriptors, So I additionally, when opening a file for my ld_preload library, duplicate the the fd to 800, and close the original fd, so that the FD my ld_preload programs are all 800 and above, and is not in the lower fd numbers that gets cleaned up by some other appllications.

But, still, during the run of the tool on a large build, I am noticing that my ld_preload library, errors/tracebacks trying to write to the FD owned by my library. And it looks like something is closing the FD 800.
This means some one is closing the FD. Can't figure out who. close() is not being called I presume, I intercept that call and I would know if it is closing 800.

I suspect this is happening during the fork/exec process, But I cant seem to figure out where or who is closing it.
I have strace of the process and I don't see any "close(800)" in the strace. And I am stumped as to how to debug this futher.

Except for exec syscall(which shouldnt close, because I set FD_CLOEXEC to false), and the close libc call I interceptl. Is there any other way the fd 800 could be closed?

Anyone have any ideas. How can I debug this?

You could use SystemTap probe kernel.function("close_fd") to print kernel and user backtraces, or dig even deeper on "filp_close".

Great tip on SystemTap.
Spent the last few hours learning how to use it and adding it to my debugging toolset.
Thanks.

STAP now logs all the close() operations, but I dont see any close(800), which is the specific fd number I use.

I am still finding that fds get closed, even though I dont see close() sys calls.

The only other way I can imagine the fd gets closed, is with execv* API, that it is if FD_CLOEXEC is set.
And I explicitly disable this.

Is there anything else that could closing the FD, that I should be watching for ?

Did you try "filp_close"? AFAIK all methods of closing end up there, but it won't have the fd number at that point. You can get the path from fullpath_struct_file(task_current(), $file), I think, but I didn't test it.

Havent got that part working yet. On RHEKL8
I tried

probe kernel.function("filep_close")
{
printf("%s %d: %s(%s:%d)\n", execname(), pid(), ppfunc(), 
        kernel_string($filp->f_path->dentry->d_iname),
        $filp->f_path->dentry->d_inode->i_ino);  
print_usyms(ubacktrace())
}

but got the following error.

semantic error: while resolving probe point: identifier 'kernel' at /ws/sarvi-sjc/wisktrack/stap/default.stp:26:7
        source: probe kernel.function("filep_close")
                      ^

semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/4.18.0-147.3.1.el8_1.x86_64/build'

So I need to figure out how to get past that.

bash-4.4$ sudo yum install kernel-debuginfo kernel-debuginfo-common-x86_64
Updating Subscription Management repositories.
Unable to read consumer identity
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Last metadata expiration check: 0:04:07 ago on Mon 21 Dec 2020 10:17:31 PM PST.
No match for argument: kernel-debuginfo
No match for argument: kernel-debuginfo-common-x86_64
Error: Unable to find a match: kernel-debuginfo kernel-debuginfo-common-x86_64
bash-4.4$

Will figure this out

BTW. Does SysTap give anything more the strace? interms of tracking close syscalls?

Debug packages live in separate repositories, but yum debuginfo-install kernel should enable those.

probe syscall.close should be the same as what strace reports, but one advantage there is that SystemTap doesn't attach with ptrace -- so it doesn't conflict with other tools that do (strace, gdb, ...).

Need a bit more debugging advice.
I now intercept vfork() as well as, like you said pthread_atfork doesnt help with vfork, so had to do the vfork intercept.

On doing this, My test ran into a segfault/core dump.

But the core dump doesnt seem usefull. And I am stumped as to how debug further, since the core dump is not offering anything

(gdb) core /nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/host-Linux/core.make-4.2.1-p8.19444
[New LWP 19444]
Core was generated by `/auto/binos-tools/bin/make-4.2.1-p8 CBS2_MAKE=1 -C /nobackup/sarvi/xewisktest/v'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f3dbf730740 in ?? ()
#2  0x00007ffef7f260d0 in ?? ()
#3  0x00007f3dbf730070 in ?? ()
#4  0x00007f3dbf72f030 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb)

I do have strace for the crashing process looks as below

write(800, "Jm91trOCf6q9607pXleN3E WISKENV  {\"USER\":\"root\",\"HOME\":\"/root\",\"PATH\":\"/usr/share/Modules/bin:/users/sarvi/.cargo/bin:/auto/binos-tools/bin:/router/bin:/usr/cisco/bin:/usr/atria/bin:/usr/bin:/usr/local/bin:/usr/local/etc:/bin:/usr/X11R6/bin:/usr/sbin:/sbin:/usr/bin:/auto/nova-env/Lab/Labpatch/bin\",\"TERM\":\"xterm-256color\"}\n", 323) = 323
vfork()                                 = 19445
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
(END)

And I do have strace and system tap enable tracking vfork.
strace shows vfork as the last invoked before the crash.

make-4.2.1-p8(19444) vfork ()
 0x7f3dbef4e78c : __vfork+0xc/0x40 [/usr/lib64/libc-2.28.so]
 0x7f3dbf72e664 : _ZN9wisktrack8my_vfork17h92abfda70529569cE+0x24/0x40 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf6fc519 : _ZN9wisktrack5vfork28_$u7b$$u7b$closure$u7d$$u7d$17h02899b1e78ae2f1eE+0x9/0x20 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf730168 : _ZN3std9panicking3try7do_call17h521a236190e958c8E+0x28/0x50 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf730fcd : __rust_try+0x1d/0x50 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf72ede1 : _ZN3std9panicking3try17h07ae10fa909cf1d8E+0x31/0xb0 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf709801 : _ZN3std5panic12catch_unwind17hc74a89db7b572747E+0x11/0x20 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x7f3dbf72e6ac : vfork+0x2c/0x60 [/ws/sarvi-sjc/wisktrack/target/debug/libwisktrack.so]
 0x4109e6 : 0x4109e6

I added debug prints before and after invokinng the original vfork from my intercept code and I see this

vfork(35006)
vfork() -> 0
vfork() -> 35007

which suggests the intercepted vfork call and returned fine.

I started rewriting by libwisktrack.so in RUST to primary avoid such segfault/crashes but have still hit a good share of them.

Any idea how you would go about debugging this further?

What does your interception code do? vfork is pretty special as everything, even the stack, remains shared with the parent. You're not supposed to return from the caller in the child, as that may break the stack frame for the parent, but I suspect that is what's happening if you have a wrapper that returns.

If you only need to instrument something before the vfork, you might be able to arrange this with a tail-call to the real vfork, so you're not adding stack frames. Alternatively, you can approximate vfork with fork instead, but some programs do explicitly use the effects of shared memory, as rr discovered.

Spot On.
Very interesting.
Didnt have much of a choice in debugging, So I started stripping out and eliminating all other libc intercepts. So vfork was the last man standing and causing the segfault. :slight_smile:

/*  pid_t vfork(void); */
hook! {
    unsafe fn vfork() -> libc::pid_t => my_vfork {
        // setdebugmode!("vfork");
        // wisktrack_initialize();
        // tracker::reportfork("vfork");
        event!(Level::INFO, "vfork()");
        real!(vfork)()
    }
}

Where hook is macro defined in my fork of redhook library

I am not sure I understand this.
What in your opinion is the above code doing wrong?
Are there assumptions in the caller of vfork, there is nothing between the caller of vfork and the original vfork in the return path?
Does it mean vfork cannot be intercepted?

I read this " You should use vfork() when your child process simply modifies the process state and then calls one of the exec() functions. Because of the shared address space, you must avoid doing anything in the child that impacts the parent when it resumes execution. For example, if your exec() call fails, you must call _exit(), and not exit(), because calling exit() would close standard I/O stream buffers for the parent as well as the child."

which explians the stdout error I am seeing.

make-4.2.1-p8: Entering directory '/nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/host-Linux'
cbs: ERROR: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/VIEW_ROOT/cisco.comp/cbs/scripts/cbs", line 1320, in <module>
    main()
  File "/nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/VIEW_ROOT/cisco.comp/cbs/scripts/cbs", line 1123, in main
    cbs_execute_subcommand( CBS.OPTIONS.subcommand, targets )
  File "/nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/VIEW_ROOT/cisco.comp/cbs/scripts/cbs", line 604, in cbs_execute_subcommand
    execute_cbs_command( subcommand, targets )
  File "/nobackup/sarvi/xewisktest/vob/cisco.comp/BUILD_TREE/VIEW_ROOT/cisco.comp/cbs/scripts/cbs", line 526, in execute_cbs_command
    ( command )( contexts )
  File "/nobackup/sarvi/xewisktest/vob/ss.comp1/cbs/scripts/cbslib/cbs_cbs2.py", line 39, in cbs_viewtree
    CBS.IO.out( CBS.OPTIONS.view_root )
  File "/nobackup/sarvi/xewisktest/vob/ss.comp1/cbs/scripts/cbslib/lib_cbs.py", line 1255, in out
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe

came across this one. Not sure how to use it though or works.
specifically calls out vfork

There will be a call chain like user_fn -> your_vfork -> real_vfork. Returning from a function usually requires popping the return address from the stack, although that pop doesn't immediately change the memory itself.

So the child returns to user_fn and calls execve. That call will push new stuff onto the stack, clobbering the old return address (which it doesn't think is needed any more). Then the parent gets to run, but when it tries to return from your_vfork it gets clobbered data and ends up so lost, the debugger doesn't even know where you are.

I haven't verified exactly, but I'm assuming that the real implementation of vfork must deal with this at a low level, fixing its stack frame from registers or perhaps using a jump to return instead.

I think you probably can't do it in pure Rust.

In musl's x86_64/vfork.s, they do pop %rdx to save the return address in the register, then push %rdx to put it back after the syscall. In the child that's a no-op, but in the parent that will restore the stack after whatever the child did.

Whereas their arch-generic vfork.c notes "vfork syscall cannot be made from C code", and uses a plain fork/clone instead.

1 Like

The glibc version has comments to explain:

Should this be a bug/feature request against RUST? Should I open it?
This obviously is something that needs to be solved from a RUST perspective.

For now I am not intercepting vfork. I dont need it for my functionality. I needed it to initialized some part of my LD_PRELOAD library, that involved opening files and writing to it. Which had issues when done from within the library constructor.

That is not at all obvious to me. This is a very niche thing you want to do, and I'm not even sure what a native Rust solution would look like. I think I'd just use inline asm! once that feature is stable.

It might be good enough for you to intercept posix_spawn and posix_spawnp instead, which probably account for most vfork calls.

Makes sense.

I will open a ticket against rust-lang for this issue for tracking purpose.
And like you said, if asm is the way to, I suspect it will be closed.I will open a ticket against rust-lang for this issue for tracking purpose.
And like you said, if asm is the way to, I suspect it will be closed.

I already intercept spawn* since I need to propage my LD_PRELOAD library by setting LD_PRELOAD env before it is called. So i think we are covered.
Thanks
Sarvi

Need a bit more debugging advice.

I am not intercepting fork/vfork anymore.
LD_PRELOAD=libwisktrack.so intecepts openat/execv* and wait* API only for now.
The progrram I am testing with is a python script.
The strace shows this program seems to be doing a few system calls to clone() and wait*()
One of the fails as follows because the cloned process segfaults.

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f1a6d6b3350) = 33577
........
write(1, "Analyzing obj-x86_64_crb-ngwc\n", 30) = 30
select(4, [3], [], [], NULL)            = 1 (in [3])
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=33577, si_uid=19375, si_status=SIGSEGV, si_utime=0, si_stime=26} ---
read(3, "", 6)                          = 0
close(3)                                = 0
close(6)                                = 0
wait4(33577, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], 0, NULL) = 33577

The cloned subprocess crashes, straces as follows as it segfaults

read(6, "#-------------------------------------------------------------------\n# Makefile for IOSD on x86_64 (little-endian) ngwc Linux platforms\n#\n# Based on obj-armv8el_crb-ngwc/makefile\n#\n# December 2016, Ismail Badawi\n#\n# Copyright (c) 2016-2020 by Cisco Systems, Inc.\n# All rights reserved.\n#------------------------------------------------------------------\n\n.DELETE_ON_ERROR:\n\nLICENSE_PLATFORM ?= nyquist\n\nifeq ($(origin SYSROOT),undefined)\nexport SYSROOT :=       $(shell ../scripts/sysroot)\nendif\n\nCLANG_CRB := 1\n\nTARGET ?= x86_64_cge7\n\nPlx ?= -ngwc\n\n# Independent Build Contexts(IBC) Notes:\n# Each Platform, may have some assumptions about the contents of the \"current\n# object directories\". These assumptions are about some pre-existing contents\n# inside the object directory where the main platform build is being invoked\n# from.\n# In IBC, after ORCHESTRATION is done, we are invoking the build from some\n# auto-generated directories under SYSROOT, with their name in the form of\n# $(PFX_GEN)$(Tx).\n# Therefore, in order t"..., 4096) = 3348
pipe([7, 8])                            = 0
vfork()                                 = 33579
close(8)                                = 0
read(7, "/nobackup/sarvi/xewisktest/vob/ios/sys\n", 200) = 39
read(7, "", 161)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=33579, si_uid=19375, si_status=0, si_utime=5, si_stime=1} ---
rt_sigreturn({mask=[]})                 = 0
close(7)                                = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++

The vforked process = 33579 seems to have completed and exited successfull. But the above processes seems to segfault as above.
Since vfork is involved, I suspect, the vfork/exec operation messed things up.

But there is nothing between vfork interception and the execve that runs the programs as the strace below for the vforked process shows. And ends fine as well

dup2(8, 1)                              = 1
close(8)                                = 0
getcwd("/nobackup/sarvi/xewisktest/vob/ios/sys/obj-x86_64_crb-ngwc", 512) = 59
execve("../scripts/sysroot", ["../scripts/sysroot"], ["LD_PRELOAD=/ws/sarvi-sjc/wisktrack/${LIB}/libwisktrack.so", "RUST_BACKTRACE=1", "PATH=/usr/share/Modules/bin:/users/sarvi/.cargo/bin:/auto/binos-tools/bin:/router/bin:/usr/cisco/bin:/usr/atria/bin:/usr/bin:/usr/local/bin:/usr/local/etc:/bin:/usr/X11R6/bin:/usr/sbin:/sbin:/usr/bin:/auto/nova-env/Lab/Labpatch/bin", "WISK_TRACE=/nobackup/sarvi/xewisktest/wisktrace.log", "SHLVL=1", "TERM=xterm-256color", "WISK_PUUID=5zmEwpNzycymdlkeTSEdMF", "OLDPWD=/nobackup/sarvi/xewisktest", "WISK_CONFIG=", "BINOS_ROOT=/nobackup/sarvi/xewisktest/binos", "HOME=/users/sarvi", "PWD=/nobackup/sarvi/xewisktest/vob/ios/sys", "USER=sarvi", "_=/nobackup/sarvi/xewisktest/vob/ios/sys/../../cisco.comp/cbs/scripts/iosmake", "WISK_WSROOT=/nobackup/sarvi/xewisktest", "WISK_TRACK="]) = 0
.......
rt_sigaction(SIGCHLD, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGIO, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGSYS, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
write(1, "/nobackup/sarvi/xewisktest/vob/ios/sys\n", 39) = 39
exit_group(0)                           = ?
+++ exited with 0 +++
(END)

Any advice how I can debug the segfault of the parent process. STAP is not helping here. Or I am not sure what I should be looking at in STAP.

Did it actually save a core dump? On some systems you can check coredumpctl for saved cores, but you may need to change ulimit -c first.

It does save a core file.
But core file traceback doesn't seem useful.

bash-4.4$ /router/bin/gdb-8.1.3 
GNU gdb (GDB) 8.1.3
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) core vob/ios/sys/obj-x86_64_crb-ngwc/core.iosmake.66447
[New LWP 66447]
Core was generated by `/router/bin/python /nobackup/sarvi/xewisktest/vob/ios/sys/../../cisco.comp/cbs/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc3aff86cd0 in ?? ()
(gdb) bt
#0  0x00007fc3aff86cd0 in ?? ()
#1  0x0000000000000000 in ?? ()

Is there more I could be debugging with the core file ?

I was able to reproduce a similar crash and was able to narrow down what code was the difference between segfaulting and not.

I do a bunch of initialization RUST code that gets called from library constructor code for libwisktrack.so
This includes opening a tracing file and tracker file and writing data including reading a config file.
All of which works fine.
Which seems to be calling mmap/mprotect and brk to define data segment during the library constructor phase.

fstat(3, {st_dev=makedev(0, 64), st_ino=188883803, st_mode=S_IFREG|0755, st_nlink=2, st_uid=19375, st_gid=25, st_blksize=32768, st_blocks=82672, st_size=42153864, st_atime=1609266198 /* 2020-12-29T10:23:18.259702000-0800 */, st_atime_nsec=259702000, st_mtime=1609266191 /* 2020-12-29T10:23:11.747711000-0800 */, st_mtime_nsec=747711000, st_ctime=1609266192 /* 2020-12-29T10:23:12.115697000-0800 */, st_ctime_nsec=115697000}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9ee49bf000
mmap(NULL, 8451856, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9ee3f89000
mprotect(0x7f9ee4551000, 2093056, PROT_NONE) = 0
mmap(0x7f9ee4750000, 299008, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5c7000) = 0x7f9ee4750000
close(3)                                = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
..........................
getrandom(NULL, 0, GRND_NONBLOCK)       = 0
getrandom("\xe7\x87\x0f\xd0\x07\x7b\x17\xd7\xf0\x16\xbb\x17\xd5\x38\xd4\x39\xd3\x92\x92\x28\x18\x62\xf0\xb0\x78\xcc\x02\x83\x2c\xc7\x32\x2c", 32, 0) = 32
brk(NULL)                               = 0x1b3d000
brk(0x1b5e000)                          = 0x1b5e000
getpid()                                = 28945
getpid()                                = 28945
write(2, "8AEjTregSeJaBuhOSncma9", 22)  = 22
write(2, "-", 1)                        = 1
write(2, "28945", 5)                    = 5
write(2, ": Constructor: ", 15)         = 15
write(2, "28945", 5)                    = 5
write(2, "\n\n", 2)                     = 2

The difference the causes the crash is an extra call to lazy_static::initialize(&APP64BITONLY_PATTERNS);
Which ends up regex compiling a large piece of data from config that is already read.
It does some string template "render" and regexset compiling RegexSet::new(&p)
which seems to result in a tone of brk syystem calls which I suspect is resizing the data segment of the program.

write(801, "O8YImgWZKrtUiywY0ExktE: APP64BITONLY_PATTERNS Reading....\n", 58) = 58
write(801, "O8YImgWZKrtUiywY0ExktE: p: [\"^/nobackup/sarvi/xewisktest/binos/linkfarm/x86_64_cge7/sdk/sysroots/x86_64\\\\-xesdk\\\\-linux/usr/bin/x86_64\\\\-cisco\\\\-linux/(x86_64\\\\-cisco\\\\-linux\\\\-addr2line|x86_64\\\\-cisco\\\\-linux\\\\-objdump|x86_64\\\\-cisco\\\\-linux\\\\-ld|x86_64\\\\-cisco\\\\-linux\\\\-readelf|x86_64\\\\-cisco\\\\-linux\\\\-ld\\\\.bfd|x86_64\\\\-cisco\\\\-linux\\\\-gcov|x86_64\\\\-cisco\\\\-linux\\\\-size|x86_64\\\\-cisco\\\\-linux\\\\-ar|x86_64\\\\-cisco\\\\-linux\\\\-gcc\\\\-nm|x86_64\\\\-cisco\\\\-linux\\\\-gcc|x86_64\\\\-cisco\\\\-linux\\\\-as|x86_64\\\\-cisco\\\\-linux\\\\-gcc\\\\-ranlib|x86_64\\\\-cisco\\\\-linux\\\\-merge\\\\-gcda|x86_64\\\\-cisco\\\\-linux\\\\-strings|x86_64\\\\-cisco\\\\-linux\\\\-objcopy|x86_64\\\\-cisco\\\\-linux\\\\-c\\\\+\\\\+filt|x86_64\\\\-cisco\\\\-linux\\\\-nm|x86_64\\\\-cisco\\\\-linux\\\\-gprof|x86_64\\\\-cisco\\\\-linux\\\\-cpp|x86_64\\\\-cisco\\\\-linux\\\\-elfedit|x86_64\\\\-cisco\\\\-linux\\\\-g\\\\+\\\\+|x86_64\\\\-cisco\\\\-linux\\\\-strip|x86_64\\\\-cisco\\\\-linux\\\\-gcov\\\\-dump|x86_64\\\\-cisco\\\\-linux\\\\-ranlib|x86_64\\\\-cisco\\\\-linux\\\\-gcc\\\\-ar)\", \"^/nobackup/sarvi/xewisktest/binos/linkfarm/x86_64_cge7/sysr"..., 12929) = 12929
brk(0x117d000)                          = 0x117d000
brk(0x1179000)                          = 0x1179000
brk(0x119b000)                          = 0x119b000
brk(0x119a000)                          = 0x119a000
brk(0x11bb000)                          = 0x11bb000
brk(0x11dc000)                          = 0x11dc000
brk(0x11fd000)                          = 0x11fd000
......
brk(0x146d000)                          = 0x146d000
brk(0x1451000)                          = 0x1451000
mmap(NULL, 434176, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6000d25000
mmap(NULL, 434176, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6000cbb000
brk(0x144f000)                          = 0x144f000
brk(0x1448000)                          = 0x1448000
write(801, "O8YImgWZKrtUiywY0ExktE: APP64BITONLY_PATTERNS Reading....Done\n", 62) = 62

The constructor completes fine.

write(2, ": Constructor Done: ", 20)    = 20
write(2, "31780", 5)                    = 5
write(2, ", ", 2)                       = 2
write(2, "[\"/nobackup/sarvi/xewisktest/vob/ss.comp1/onep/tools/cthrift/cthrift-0.3.14/bin/./cthrift.exe\",\"--prefix\",\"onep\",\"-o\",\"presentation/InBoundToNeIdl-gen\",\"include/./InBoundToNeIdl.h\"]", 181) = 181
write(2, "\n\n", 2)   

But the segfault/crash happens in the main program, way further down, possibly towards the end of the program

write(4, "zero__,\n    \t},\n};\n#define cthrift_recv_struct_UtdFileRepuRetroAlertMsgIDL_zero__  cthrift_recv_struct_list_zero__\nstatic struct cthrift_struct_info__ cthrift_recv_struct_UtdFileRepuRetroAlertMsgIDL_info__[1] = {{\n    \t\"struct_UtdFileRepuRetroAlertMsgIDL\",\n    \tCTHRIFT_FLAGS_ISSET__,\n    \toffsetof(struct UtdFileRepuRetroAlertMsgIDL, isset__),\n    \tsizeof(((struct UtdFileRepuRetroAlertMsgIDL *)0)->isset__),\n    \tsizeof(struct UtdFileRepuRetroAlertMsgIDL),\n    \t5,\n    \t5,\n    \tcthrift_recv_struct_UtdFileRepuRetroAlertMsgIDL_fields__,\n  }};\nconst char * onep_get_name_cthrift_recv_struct_UtdFileRepuRetroAlertMsgIDL_info__ (void) {\n  \treturn (cthrift_recv_struct_UtdFileRepuRetroAlertMsgIDL_info__[0].cc_name_);\n}\nstatic struct cthrift_field_info__ cthrift_recv_struct_UtdFileAnalysisUploadAlertMsgIDL_fields__[] = {\n  \t{ /* timestamp */\n    \t\"timestamp\",\n    \tCTHRIFT_FLAGS_READ__|CTHRIFT_FLAGS_ISSET__,\n    \t1,\n    \tCTHRIFT_TYPE_I64__,\n    \t0,\n    \toffsetof(struct UtdFileAnalysisUploadAlertMsgIDL, timestamp),\n    \t0,\n"..., 4096) = 4096
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x4} ---
+++ killed by SIGSEGV (core dumped) +++

I then reduce the regex compilation to a tiny operration

write(801, "FYIgeA920l8KgSbcWotJmA: p: [\"^NOMATCH/.*$\"]\n", 44) = 44

And I dont see the list of brk syscalls increasig the side data segmet. And there is no crash

So bottom line, it looks like some how doing large data malloc or compute operations in the library constructor some how seems to cause this crash in the main program.

Question: Are there any limitations how much malloc or data can be defined/used from within the library constructor function? Is there way to do this better.

One of the reasons I tried to move some of this code into a RUST Once() segment and tried calling during the syscalls intercepts during the main program execution, outside the library constructor. But that ran into its own set of problems of not having a consistent and reliable place to call the one time initialization code and problems with Once and vfork.