ACCESS_VIOLATION after returning from Rust fn that had FFI call to dll

Hi all

I have a Delphi 2.0 program that uses an external DLL. This DLL is declared as follows in the Delphi code:

Function SynthesizeForms
  (lemma : PChar; withApp : integer; codeType : integer;
   var outBuf : array of SynthFormSet; bufLength : integer) : integer;
   stdcall; external 'fmsynth.dll';

and is called like that:

var
  formsBuffer : array[0..299] of SynthFormSet;
  synteesitudvorme, i, j, x_tyybinumber, x_variandinumber : integer;
  lemma : array[0..29] of char;
...
  synteesitudvorme := SynthesizeForms (lemma, 0, j, formsBuffer, 300);

I've been struggling with trying to define correct Rust declaration for this external function and eventually came along with this (where dynlyb is a libloading crate):

type SynthFn<'lib> = dynlib::Symbol<'lib, unsafe extern "stdcall" fn(dt::PChar, dt::Integer, dt::Integer, &mut [SynthFormSet], dt::Integer) -> dt::Integer>;

and is seems to work, at least the DLL function gets called, provides (more or less expected) values into the array and returns a count, as expected.

BUT

After I return from Rust fn that was calling this DLL function, I invariably always have this:

error: process didn't exit successfully: `<exe-name censored>` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

In the WinDbg, I can see some drop_in_place calls before the exception, so I presume this is a function-leaving block that's dropping values from stack, but otherwise than that I have no clue what's going on :frowning:

I seem to have exhausted whatever knowledge concerning x86 architecture/WinDbg I might have had, so in a total desperation I humbly summon the community.

Maybe there's something that I might have missed with the Symbol definition? Or is it due to (rather) long array of struct-s being defined on stack (it does not seem to prevent Delphi code from working, though)

Or might I benefit from parsing thru IR representation (if I might have it somehow)

Anyway, any suggestion is more than welcome!

&mut [SynthFormSet]

vs

array of SynthFormSet

according to delphi - How to translate "array of" param to C++? - Stack Overflow delphi passes two parameters for array, while in Rust should use one.

But I used Delphi long time ago, so I may be wrong.

amazingly, Rust also passes two (pointer and length), if it's declared like that. However, even if I only pass a pointer, and the next argument - which is essentially the length - it also works (the external function returns all right), however it does not help with the ACCESS_VIOLATION.

I've also changed the array to be allocated on heap as in

    let mut buffer: Box<[SynthFormSet; BUF_SIZE as usize]> = Box::new([Default::default(); BUF_SIZE as usize]);

which did not change a bit :frowning:

I know nothing about Delphi, but after searching a bit, this seems to be a weird interaction between Rust slices and Delphi dynamic arrays: Delphi dynamic arrays seem to be some kind of Box<[T]> that, if my skimming got things correctly, is cloned each time it is used (I suspect that it happens in case of giving it as an argument), thus allowing the function to aftewards deallocate the input array.

This should usually crash, but if the Delphi DLL and Rust are using the same allocator, the code may not crash right away, but only when Rust deallocates the Box<[...]> or a Vec<_> (double free).

To confirm this, can you try using the following buffer ?

let mut buffer = Box::leak(Box::new(
    [SynthFormSet::default(); BUF_SIZE as usize]
));

this has changed exactly nothing..

is there I way I could debug it so that I could see which value is being dropped when the exception occurs ?

and by the way, I don't think it's cloned inside the DLL function, if that's what you mean - why would I be getting correct values under the exactly same address where my initial buffer was created?

Fair enough, I didn't know how the function operated, although the outBuffer name was a fair hint :wink:

You can do something like this:

#[repr(transparent)]
struct VerboseDrop<T: ?Sized> (pub T);

impl<T: ?Sized> Drop for VerboseDrop<T> { fn drop (&mut self) {
    eprintln!(
        "Dropping the following bytes at {address:p}: {bytes:#x?}",
        address = &*self,
        bytes = unsafe { ::core::slice::from_raw_parts(
            &*self as *const Self as *const u8,
            ::core::mem::size_of_val(&*self),
        )},
    );
}}

And then whenever you wrap your variables with VerboseDrop (you should be able to directly use VerboseDrop even within FFI, thanks to #[repr(transparent)].

For instance,

let mut buffer = Box::new(VerboseDrop(
    [SynthFormSet::default(); BUF_SIZE as usize]
));

And I suggest you do the same with the PChar input too.

(Rust is full of black magic, it seems)

Ok now that I've done it:

    let buffers_storage = Box::new(VerboseDrop(
        [SynthFormSet::default(); (BUF_SIZE-1) as usize]
    ));
    let mut buffer = (*buffer_storage).0;

this is what I have in the end:

Dropping the following bytes at 0xc37ed0: [
    0x0,
    0x0,
    0x0,
    0x0,
...
    0x0,
]
error: process didn't exit successfully: `<exe-name censored>` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

I've cut approximately 64500 lines.. it appears to be a little bit too verbose ))

but the gist is clear, it seems - right after dropping the Box exception occurs. (I have also added eprintln to confirm that buffer_storage indeed has this address:
eprintln!("Address of buffer is: {:p}", &*buffer_storage);, and it matches with the one shown by VerboseDrop trait)

I however struggle to understand why is it all zero-es? Maybe it's because all the members were dropped first ? and as they don't have this VerboseDrop I can't see those droppings in the logs ?

PS: I've slightly modified the output of VerboseDrop and now it looks like this:

Address of buffer is: 0x9f8448
...
Dropping 64285 bytes at 0x9f8448
error: process didn't exit successfully: `<>` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

By the way, if we redefine drop like this, does it mean that the actual drop does not happen ?

The reason the VerboseDrop works is that it wraps the value it holds in a "transparent" way, so that it is assured to have the exact same size and layout as the value it holds. Because of this we can pass it in place of anything else in ffi. It also implements the Drop trait, so it has a hook to be called before it is deallocated. This code is recursively called for each member of any struct if they so require it. (usize doesn't for example, but Vec does) so it is essentially adding to the preexisting Drop code.

1 Like

I've moved all my variables (including buffer_storage) inside the unsafe {} block which is used to call the DLL in question. And added eprintln debug statement after that block and right before the return expression (Ok())

Now I can see that the value is dropped OK, the statement is printed, however exception still is raised from within the method return sequence:

Dropping 64285 bytes at 0x906e80
We've left the unsafe block
error: process didn't exit successfully: `<>` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

In the mean-time I've implement the minimal reproducible solution with C++ and it does not have any problems... Which is cool of course, but also a little bit sad, as I would really like to use this task to learn myself some Rust, and not to scratch that C++ itch all over again :frowning:

This is the whole solution, basically (don't @ me, my C++ knowledge is very very out-dated):

// synth-forms.cpp : This file contains the 'main' function. Program execution begins and ends there.
//

#include "pch.h"
#include <iostream>
#include <Windows.h>
#include <string>

#pragma pack(1)
struct SynthForm {
	char form[30];
	int	stem_length;
};

#pragma pack(1)
struct SynthFormSet {
	int declination_type;
	char part_of_speech[3];
	int number_of_options;
	int parallel_forms;
	char form_code[30];
	SynthForm forms[5];
};

typedef int(__stdcall *MYPROC)(const char *, int, int, SynthFormSet[], int);

void synthesize(const char *word) {
	SynthFormSet *buffer = new SynthFormSet[300];


	HINSTANCE dll_instance = ::LoadLibrary(TEXT("fmsynth.dll"));

	if (dll_instance)
	{
		MYPROC func = (MYPROC) ::GetProcAddress(dll_instance, "SynthesizeForms");


		if (func)
		{
			char str[300];
			std::string forms;
				
			int count = func(word, 0, 0, buffer, 300);
			std::cout << "Options found: " << count << "\n";

			for (int i = 0; i < count; i++) {

				forms.clear();
				SynthFormSet formset = buffer[i];

				sprintf_s(str, "%s, %d, %d, %d, %s, ", formset.part_of_speech, formset.declination_type, formset.number_of_options,
					formset.parallel_forms, formset.form_code);

				forms.append(str);

				str[0] = 0;

				for (int j = 0; j < formset.parallel_forms; j++) {
					if (formset.forms[j].stem_length > 0) {

						/* I know, I know !! it's shitty as... ! BUT string manipulation makes me do it :( */
						if (j > 0) {
							sprintf_s(str, " ~ %s (%d)", formset.forms[j].form, formset.forms[j].stem_length);
						}
						else {
							sprintf_s(str, "%s (%d)", formset.forms[j].form, formset.forms[j].stem_length);
						}
						forms.append(str);
					}
				}


				std::cout << forms << '\n';
			}
		}

		delete buffer;

		//::FreeLibrary(dll_instance);
	}
}

int main()
{
	synthesize("iga");
	synthesize("hea");
	std::cout << "Hello World!\n";
}

The only difference I can think of is that C++ does not do anything decoding/encoding related, (yet), so it can only process native (Cp-1257) strings, otherwise it's the same thing I want to have with Rust.

I just thought, may be I could wrap the DLL call as a C++ library and link it statically into Rust? are there any tutorials for this particular case ? Also, would it make any difference ? :thinking:

Can you try with my Rust translation of your C++ code?

// word: &CStr
SYNTHESIZE_FORMS(
    word.as_ptr(),
    0,
    0,
    buffer.as_mut_ptr(),
    buffer.len().try_into().expect("Overflow"), // 300
)

Amazingly !! This works exactly as my C++ code! That's unbelievable how someone could just.. do it so quickly and have it running from the first go!!

Man, you're my super-hero!

Now I got to figure out the difference.. it does not quite look like my Rust version, to be honest ))

2 Likes

I've been able to provide a minimal changes to my code, and it seems as though the gist is indeed in the signature and calling (who could have thought ha-ha):

Now it's

type SynthFn<'lib> = dynlib::Symbol<'lib, unsafe extern "stdcall" fn(dt::PChar, dt::Integer, dt::Integer, *mut SynthFormSet, dt::Integer) -> dt::Integer>;
...
    let (buffer, count) = synthesize_encoded(lemma.as_ptr());

...
fn synthesize_encoded(word: *const u8) -> ([SynthFormSet; BUF_SIZE], usize) {
    let mut buffer = [SynthFormSet::default(); BUF_SIZE];

    let count = usize::try_from(unsafe {
        SYNTHESIZE_FN(
            word,
            0,
            0,
            buffer.as_mut_ptr(),
            buffer.len().try_into().expect("Overflow"),
        )
    }).expect("Overflow");
    println!("Options found: {}", count);
    (buffer, count)
}

This is a bit disappointing for me personally as I was pretty damn confident that the stack is identical (as it might possibly be) compared to what Delphi has, when the DLL symbol is called.. :frowning_face:

And also - there's so much Rust to be learned yet.. and it's really fascinating that the community has such members as @Yandros! I wish you find the job you're looking for - you definitely deserve it!!

2 Likes

That's the beauty of Rust once one masters it (which @yandros obviously has and I have not, yet). With Rust the 80% of C/C++ programming errors that arise from null pointer dereference, use after free, thread concurrency issues, etc. simply can't get past the compiler. Programs usually work as soon as rustc accepts the source code. If any application logic errors remain, they typically can be located with debug print statements rather than having to run an object-code debugger.

2 Likes

This is a great statement. The only time I find myself potentially needing to use a debugger is when I'm doing ffi, like C#/rust ffi where I couldn't figure out why writing to an array in rust didn't change it in C#.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.