Trying to compare codes between c++ and rust


#1

hi, i am new visitor, a newb, although newbs in rust are considered pro’s in other languages such as c / c++
i am new to them too . i am a total newbe .
my experience is a little of javascript c# asp.net sql etc’
those languages are far easier to catch as i try to follow rust, which tries to simplify management of resources and memory
i could notes that on the way it introducing some complications on the syntax side…
i will be happy to know if there’s a chance to introduce a kidden garden level of step by step to rust , as opposed to the tutorials available as they are not so welcomming to the total … say unexperienced programmers . and it’s only natural, as the laguage goal is to transport users of forign low level languages such as c and c++ to a newer /better development environment.

my question is, should i stick to easy to learn java and c# or there’s a blog or tutorial for total beginners that does not asume you masterd any other system programming language before having a try with rust.
this is easy to medium level of c++ and it was easy enough for me to get the result i want with c++
although i had to ask around for an approach to clean up after using all allocated memory .
the code is quite strait forward but i couldn’t get even close to produce any similar rust version
any suggestions how to code it in rust ?

void GetPacksChar(int size, PackChar** DpArrPnt )
{
    int count = 0;
    int TmpStrSize = 10;
    *DpArrPnt = (PackChar*)CoTaskMemAlloc( size * sizeof( PackChar ));
    PackChar* CurPackPnt = *DpArrPnt;
    char dummyStringDataObject[]= "abcdefgHij";
    for ( int i = 0; i < size; i++,CurPackPnt++ )
    {
        
        dummyStringDataObject[TmpStrSize-1] = '0' + i % (126 - '0');    
        CurPackPnt->IntVal=i;
        CurPackPnt->buffer = strdup(dummyStringDataObject);// (char*)malloc(sizeof(dummyStringDataObject));
        //strcpy(CurPackPnt->buffer, dummyStringDataObject);
        //CurPackPnt->buffer = dummyStringDataObject;
    }

}

the struct is a simple, a char* or string field, and integer field

typedef struct _dataPack{
unsigned int IVal;
char* Sval;
}dataPack;

thanks a lot .


#2

Hello! It’s always exciting to see a new person interested in Rust.

I’m having a hard time figuring out how that code example you gave would fit in a bigger program. The purpose of that for loop is particularly confusing! I don’t think I would recommend this kind of problem for a beginner to solve if they are just trying to learn the language.

However, I think I understand what it is supposed to do. I don’t think it would help you if I just rewrote it in Rust for you, so I will walk you through it, instead!

First of all, we have to create a new struct. The struct you gave doesn’t match the struct that the code appears to be expecting, so we will design a struct that will work for this code, and then try to translate the code into something that will work in Rust.

We should probably name it PackChar just to stay consistent with the C++ version. That kind of struct name is perfectly allowed in Rust (where each word is capitalized, we call this “camel case”), so we will stick with it. Now, for the fields.

Rust does not have an actual unsigned int type, but it does have something very close! Since unsigned int is an unsigned, 32-bit integer, at least on all modern machines, u32 is the equivalent in Rust. The struct that is actually supposed to work with the GetPacksChar function probably takes an int instead of an unsigned int, because that is what i is defined as when we get to this line:

CurPackPnt->IntVal=i;

However, since we never use negative numbers here, unsigned works just fine.

While Rust technically does have a char* type (to support interaction with C code), we will use something a little better for our Rust code, because it handles all the nasty pointer-stuff for us: String. This is similar to std::string in C++. The way we use it will be quite a bit different than the way we would use char*, but it will be so much better.

So our struct definition should look like this:

struct PackChar {
    int_val: u32,
    buffer: String,
}

I used the field names from the function code because they looked cleaner. Notice that I changed IntVal to int_val. This form is the preferred style in Rust; it is called “snake case”.

The next thing we need to do is come up with a new signature for our GetPacksChar function, because Rust won’t accept void GetPacksChar(int size, PackChar** DpArrPnt). So let’s take a second and think about what this function is supposed to do.

It takes an argument, int size, that appears to be the number of PackChar to create, since we use it later to allocate in CoTaskMemAlloc(size * sizeof( PackChar ));, and then as our stopping point for our for loop. That will work for us. However, since Rust doesn’t have an int type, we have to pick an alternative again. Let’s just go with u32 to be consistent.

The PackChar** DpArrPnt argument is very interesting. It looks like we’re taking a pointer to a pointer, probably because the caller wants our function to update that pointer so it can use it after our function returns. This pointer is probably meant to point to several PackChar allocated in a row; this is reinforced by us allocating size * sizeof( PackChar ) and assigning the result to this out-pointer. Calling this function probably looks something like this:

PackChar* MyPacksChar;
GetPacksChar(10, &MyPacksChar);

However, in Rust, taking a pointer to return a value in is an anti-pattern. This means it’s the opposite of what we actually should be doing, because the language gives us better tools to solve this problem. Additionally, we really do not want to deal with raw pointers like this, because it gives us a lot of chances to screw up.

Instead of taking an out-pointer like this and returning void, we will instead return an instance of a type that handles a “row of the same type” situation like this for us: Vec. The C++ equivalent would be std::vector.

So our final function signature should look something like this:

fn get_packs_char(size: u32) -> Vec<PackChar>

Notice that I changed the function name to snake case. This is the preferred style in Rust here, too.

Now, we look at the guts of the function. That first line looks to be pointless—count is never used—so let’s just ignore it.

TmpStrSize and its interaction with dummyStringDataObject is a little scary. If someone accidentally made TmpStrSize equal to 11, then this line:

dummyStringDataObject[TmpStrSize-1] = '0' + i % (126 - '0');

would write past the bounds of dummyStringDataObject and change something totally unrelated! This is very, very dangerous! In fact, the Heartbleed bug is caused by a very similar mistake; in that case, they actually let the user determine the value of the index into the string, and read out everything before it, so an attacker could make the OpenSSL library read out everything in memory after the string object if they wanted to, just by giving a number much larger than the size of the string!

However, because Rust is super awesome, we don’t have to do bad stuff like this just to make our programs work. In fact, we won’t be doing anything like this at all. Instead, we’ll use the format!() macro, which works a lot like C’s sprintf() function, to create the String right when we need it, which will also eliminate our strdup() call.

So, without further ado, let’s look at our updated function and then go down line-by-line:

fn get_packs_char(size: u32) -> Vec<PackChar> {
    use std::char;

    let mut out_vec = Vec::new();

    for i in 0 .. size {
        let int_0 = '0' as u32;
        let last_char_val = int_0 + i % (126 - int_0);
        let last_char = char::from_u32(last_char_val).unwrap();
        let buffer = format!("abcdefgHi{}", last_char);

        let pack_char = PackChar {
            int_val: i,
            buffer: buffer,
        };

        out_vec.push(pack_char);
    }

    out_vec
}    

That’s pretty concise, if I do say so myself! It’s a few lines longer than the original version but I wanted to make it pretty easy to read. Let’s go down line-by-line, shall we?

We can skip the function signature since we already talked about it earlier.

    use std::char;

We need this import to get the char::from_u32() function. Imports are usually done at the top of the file, but putting it here is fine for our purposes.

    let mut out_vec = Vec::new();

This line is pretty simple. It just says, “Create a mutable variable called out_vec, and initialize it to a new (empty) Vec.” We don’t pre-allocate like we did in the C++ version, because Rust won’t let us access uninitialized memory in safe code. (Vec does have the with_capacity() function which does pre-allocate, but it still won’t let you access that memory so it doesn’t mean much to us. It’s just there for optimization, mostly.) The Vec will adjust its allocation as we add elements to it, all behind the scenes so we don’t have to deal with it.

    for i in 0 .. size {

This is Rust’s take on a for loop. Since Rust is based heavily around iterators, the for loop takes them directly: 0 .. size creates an iterator that will yield integers in-order from 0 up to, but not including size. This is roughly equivalent to for(int i = 0; i < size; i++), but it’s much more powerful because it lets you loop through collections with the same construct.

        let int_0 = '0' as u32;
        let last_char_val = int_0 + i % (126 - int_0);

This is the same arithmetic we see in the first line of the for loop in the C++ version. I just translated to Rust and cleaned it up a bit. Since char does not directly support arithmetic, we have to convert '0' to a u32 first, and I figured it would be cleaner if we used an immutable variable instead of doing the conversion twice.

        let last_char = char::from_u32(last_char_val).unwrap();

This call converts last_char_val, which is a u32, to a char. However, not every u32 can be safely converted to a char (because of Unicode screwiness), so this function does a checked conversion, returning Option<char>, which in Rust means a value that may or may not be present (kind of like pointers in C++ that can be NULL but much safer to work with and not necessarily involving pointers). The .unwrap() call just says, “Convert this Option<char> to a char. If it’s not available, then just quit; I don’t want to deal with that case.”

Since our arithmetic guarantees that last_char_val will be in a valid range for converting to char, this error should never happen. Rust just forces us to acknowledge the possibility. This might seem annoying in this case, but it’s very beneficial overall, because programmers are generally horrible at remembering to handle edge cases.

        let buffer = format!("abcdefgHi{}", last_char);

This line combines the first line of the for loop in the C++ version with the strdup() call. The format!() is a macro invocation that basically says, “Take this string, "abcdefgHi{}" and replace the {} in it with the value of last_char. Then, give me a new String as a result.” This is much safer than manually replacing the last character in the string and then copying it. Since we never deal with indices, we can’t accidentally go out of bounds.

        let pack_char = PackChar {
            int_val: i,
            buffer: buffer,
        };

        out_vec.push(pack_char);

Since Vec won’t allow us to access uninitialized memory like we could with PackChar** in C++, we have to create the struct on the stack first before adding it to the vector. out_vec.push() simply adds pack_char as the last element of the vector. Again, this is much safer than in C++ because we’re not using indices directly and potentially accessing uninitialized memory.

    out_vec

This is an implicit return. Since everything is an expression in Rust, the last line of any function that returns anything has to be an expression that evaluates to that value, unlike in Java or C++ where we would have to write something explicit like

    return out_vec;

though you can do this in Rust if you want to return from an earlier point in the function.

And that’s it! The best part is, you don’t have to do any of the cleanup yourself! The returned Vec<PackChar> will deallocate when the caller stops using it, and it will also make sure every String inside every PackChar is deallocated as well. You don’t have to worry about any of that. Rust takes care of it all for you.

As for tutorials, have you read The Rust Programming Language book yet? It’s the official guide to Rust, maintained by the Rust team and community. From what I can tell, it doesn’t assume too much about the reader. It especially shouldn’t assume any systems programming experience, because a lot of our community has come from higher-level languages like Ruby.

If you do find yourself struggling with a particular section, please let someone know! We need to make sure that our official literature on the language is easy enough to comprehend for most people. You can open an issue on the Rust GitHub if you spot something that you think someone should take a look at.

It sounds like English isn’t your first language. That’s okay! Rust has a thriving multilingual community. There might be a translation in your native language, if you think that would be easier to understand. If not, perhaps you can help start on one? Sometimes the best way that you can help yourself learn something is to try explaining it to others.

If you have IRC, I do recommend getting on #rust and #rust-beginners on the Moznet IRC. Those are the best places to get immediate responses to your questions, though they’re not always active. (Just please don’t try pasting code snippets in IRC; it’s way too much for the chat.) There might be a Rust channel for speakers of your language as well, or you can ask if you can start one!

Note that to join #rust you will need to register a nickname; we’ve had some problems with spammers recently and have had to restrict the channels to registered users only. This may have been lifted since I last got on IRC, but if you find yourself unable to join, this is probably the reason why.

If nothing else, you can always come back here! That’s what this forum is for.

Was this helpful at all, or was it just far too much to read?


#3

Hi there, I’m in the same situation as you. I’ve only had one programming course while I’ve been at college, it was an introductory java course. Until about two months ago, I’d only ever made small python and c programs, most were under 100 lines and I only made them because I was bored at work. Learning Rust has been a challenge for me, mainly because I lack any sort of foundation in software development or computer science. That lack of a solid foundation is partly what attracted me to Rust though. My main program that I’m responsible for at work, a client / server binary that lets me run commands or upload / download file, is written in C. The program works, but I was running into various problems while building it, like user-after-free bugs, that slowed development and really made me doubt my ability to create useful programs. When I read that Rust protected against these sorts of problems, I figured I’d give it a shot. I’m glad I started learning it, after using it for a while I feel like I’m not really fighting the compiler anymore, so I can spend more time writing my program instead of trying to fix things like ownership issues. I also end up spending a lot of time on wikipedia when I don’t understand a programming concept (OOP, methods, etc.), or referencing the official Rust book when I want a more in-depth explanation of something Rust specific. The IRC channel, rust-beginners on irc.mozilla.org is also super helpful. Every time I’ve gone to the channel, there have been very knowledgeable people in chat that are willing to help.


#4

Wow @DroidLogician, good answer.


#5

This info could be distilled and added into this collection:

https://alvalea.gitbooks.io/rust-for-cpp/

I’m going to ping the author but it seems it’s been a while since the last update.


#6

Hi everyone,

Well, in that book I gathered some basic examples of Rust features and general purpose software patterns that apply to most of the programming languages. However I see this example as something too specific to be added to that collection.

Cheers


#7

@DroidLogician wow thanks a lot for such a wonderful, broad and full answer… it’s just shows how welcoming the rust community is.
it was very kind of you, really some of the tutorials do seem like they require higher level of experience in programming, but as i mentioned it’s a new language and i have the opportunity to contribute to new comers with same level of my experience with a tutorial of my own, that’s good idea, i’ll just have to step in a little further before i try something like that, and i will be happy to, in the future .

for now i am going to build a new project with this function and test it .
i will create a cargo build, (just learned about cargo .toml config fies) and how to create a library (.dll executable) as i first stared to use rust, i just used plain rustc compilation with no projects, and then interoping with rust from c# to this point i have successfully have done some tests with simpler code that returns simpler datatypes as opposed to the array of struct (Vec) i just hope it’s not with a performance loss vs plain pointer to an array, at least not a great performance loss i hope.
i really thank you, people should be aware that answer like yours is very heart warming, it has an effect. cheers mate.

ps. i now see that the link you posted is where i have learnd how to cargo


#8

I have been referred to this thread from IRC by @rbanay, who wanted to make that function ultimately interoperate with C#. I’m no expert of C# (and actually I haven’t tried FFI with C# at all), but given that we already have a (supposedly) working C code, I think that we should make our Rust function as identical as the original C function. Let’s see how to do that.


For the completeness, let’s start with a simple function, like squaring an int-typed input. It will look like this:

extern crate libc;
use libc::c_int;

#[no_mangle]
pub extern "C" fn square(val: c_int) -> c_int { val * val }

What does it mean? Besides from the otherwise normal Rust syntax, the following has been added:

  • You need to use exact C types (c_int, c_long etc.) if possible. You don’t know the size of (say) int in C/C++ (well, you may know that it’s 32 bits long in ILP64 platforms and so on, but Rust is all about the explicitness…). You may sidestep this issue by using explicitly sized types (i32/int32_t etc.), but I will leave this familiar to the C side for now.
  • extern "C" is to tell the Rust compiler that this function may be called from C/C++ code which doesn’t know about Rust at all, so the function should adjust to them. You may be fine without them (as Rust and C speaks roughly same language) but then strange errors will ensue later.
  • #[no_mangle] is similar, but more related to the way that the function is found and resolved from C/C++ code.
  • You may not have to make an exported function public at the moment, but this ensures that it won’t be optimized away later. Also it does not leave the Rust compiler confused and spewing false warnings.

Compiling this requires some headache (ugh), because we i) need an external dependency for FFI (libc crate) and ii) we should make it a dynamic library as opposed to normal Rust libraries (which are static). Fortunately Cargo will do the hard work for us. From now on let’s say that our source code is located at foo.rs. Put Cargo.toml into the same directory with the following contents:

[package]
name = "foo"
version = "0.0.0"

[dependencies]
libc = "0.2"

[lib]
path = "foo.rs"
crate-type = ["dylib"]

Now running cargo build will give you target/debug/libfoo.so which is a resulting dynamic library you want. Note that this is not optimized; to get an optimized library you run cargo build --release, and the resulting library will be placed to target/release/libfoo.so instead.

Python corner

If you know Python as well, Python has an excellent built-in library called ctypes which can be used to easily verify if your dylib is working or not:

>>> import ctypes
>>> square = ctypes.CDLL('target/debug/libt.so').square
>>> square(3)
9
>>> square(1024)
1048576

Isn’t this simple, eh?


Now on the building a data structure, you cannot really use Rust’s convenient types like String or Vec<T> here. Your result type should be visible to the caller as is and you cannot easily control how the caller would see them. Therefore we will again adjust to them.

As a next step, we will build a function which returns an allocated array of c_ints which contains 0^2, 1^2, 2^2, ..., (n-1)^2 given the argument n. Here we go:

#[no_mangle]
pub extern "C" fn squares(size: c_int) -> *mut c_int {
    let v: Vec<_> = (0..size).map(|i| i * i).collect();
    Box::into_raw(v.into_boxed_slice()) as *mut _
}

Here the first line is a normal, idiomatic Rust way to build a vector… wait, did I say that you cannot use Vec<T> above? That is a half of the truth: you cannot return it, but you can internally use it to make a final allocated array. The logic goes like this:

  1. The original Vec<T> is made in the ordinary Rust way.
  2. The vector is converted to Box<[T]> via into_boxed_slice. This is like Vec<T> but not really growable, and thus composed of a pointer to the [T] (“unsized” slice) and its length.
  3. Box<T> is then consumed into the raw *mut [T] which has no ownership. At this stage, you can forget this value to create a leak—this is safe in Rust, although not very desirable. (That’s also why we don’t have unsafe in this function!)
  4. While *mut [T] is almost what we want, C does not have a concept of combined pointer-length data. We already know its length, so we discard its length by explicitly casting to *mut T.

Note that Rust uses a system memory allocator for dylibs, so allocating and deallocating the memory is safe as long as you don’t mix a different type of allocators. (A different type means that any non-paired malloc and free equivalents: C++'s operator new and HeapFree for example don’t pair.)

Now having an allocating function, we would be better to have another function that cleans the mess we’ve created:

#[no_mangle]
pub extern "C" fn free_squares(size: c_int, out: *mut c_int) {
    let v: Vec<_> = unsafe { Vec::from_raw_parts(out, size as usize, size as usize) };
    drop(v);
}

Here we recreate the vector out of an original size and a returned array out. Vec actually has a method doing this, except that we need to be careful about the third, capacity parameter. This is the size of the memory that the allocator really allocated for us (as opposed to the size we have requested). Depending on the allocator, a wrong capacity may cause troubles. But above we have used Box<[T]> as an intermediate type, which makes the length and the capacity equal (“shrink to fit”), so we know the exact capacity to give.

Given a regained ownership to the vector, we simply drop it to deallocate. let _: Vec<T> = ...; would also work, but I prefer to be explicit.

Python corner

This time we need to give an explicit return type (restype) to the bound functions, but otherwise it is straightforward.

>>> import ctypes
>>> dll = ctypes.CDLL('target/debug/libfoo.so')
>>> squares      = dll.squares;      squares.restype = ctypes.POINTER(ctypes.c_int)
>>> free_squares = dll.free_squares; free_squares.restype = None
>>> v = squares(10)
>>> v[0:10] # Python doesn't have the length, so we should be explicit about this
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> free_squares(10, v)
>>> # now it is bad to use `v`

Making an arbitrarily complex data structure is similar, just with more work. The following is the adjusted version of the original Rusty function by @DroidLogician

use libc::c_char;

#[repr(C)]
pub struct PackChar {
    pub int_val: c_int,
    pub buffer: *mut c_char, // changed
    pub buffer_size: c_int, // added
}

#[no_mangle]
pub extern "C" fn get_packs_char(size: c_int) -> *mut PackChar {
    use std::char;

    let mut out_vec = Vec::new();

    for i in 0..size {
        let int_0 = '0' as u32;
        let last_char_val = int_0 + i as u32 % (126 - int_0);
        let last_char = char::from_u32(last_char_val).unwrap();

        let buffer = format!("abcdefgHi{}", last_char);
        let buffer_size = buffer.len() as c_int;
        let buffer = Box::into_raw(buffer.into_bytes().into_boxed_slice()) as *mut _; // added

        let pack_char = PackChar {
            int_val: i,
            buffer: buffer,
            buffer_size: buffer_size,
        };

        out_vec.push(pack_char);
    }

    Box::into_raw(out_vec.into_boxed_slice()) as *mut _ // changed
}

Not changed that much, right? The entire code remains almost same, with some modifications:

  • This time we need to make also the PackChar structure safe to pass to C/C++ code. This is a job of #[repr(C)] attribute. Again, without it, you may be fine but you may get strange errors later.
  • The String is decomposed of buffer and buffer_size. Unlike in C, Rust does not terminate the string with a zero byte so we should be explicit about the length. In this code you may suppose that the length is always 10 (and have assert_eq!(buffer_size, 10) somewhere), but I believe your original code is not what you really wanted anyway.
  • String is really Vec<u8> in disguise with a UTF-8 constraint. Therefore we can treat it as like ordinary Vec<T> in terms of allocation (via into_bytes).
  • We still need the byte length as opposed to the character length (whichever it means), but String::len does return that, so we are safe.

We also need to free the returned array:

#[no_mangle]
pub extern "C" fn free_packs_char(size: c_int, out: *mut PackChar) {
    let v: Vec<PackChar> = unsafe { Vec::from_raw_parts(out, size as usize, size as usize) };
    for pack_char in v.into_iter() {
        let buffer = pack_char.buffer;
        let buffer_size = pack_char.buffer_size as usize;
        let buffer = unsafe { Vec::from_raw_parts(buffer, buffer_size, buffer_size) };
        drop(buffer);
    }
}

This time, we recreate the vector then iterate for each element of it to free the string inside. v.into_iter() will consume v during the iteration so we don’t have to drop it explicitly—actually, we can’t.

Dropping the string is nothing new (as we have used Box<[u8]> as an intermediate type again), but note that we don’t need to convert it back to the String. It is possible to do so, but it will just verify that the input were really encoded in UTF-8 and that input will be discarded immediately. What a waste.

Python corner

This time we have to tell Python about our shiny structure, and also how to interpret internal pointers within it. Note that I’m an old man using Python 2 (ugh) so the code interpreting a byte string may not work in Python 3. Your mileage may vary.

>>> import ctypes
>>> dll = ctypes.CDLL('target/debug/libfoo.so')
>>> class PackChar(ctypes.Structure):
...     _fields_ = [('int_val', ctypes.c_int),
...                 ('buffer', ctypes.c_char_p),
...                 ('buffer_size', ctypes.c_int)]
...     def __repr__(self): # so that we can pretty-print the structure
...         return '<PackChar int_val={int_val!r} buffer={buffer!r}>'.format(
...             int_val=self.int_val,
...             buffer=self.buffer[0:self.buffer_size],
...         )
...
>>> get_packs_char  = dll.get_packs_char;  get_packs_char.restype = ctypes.POINTER(PackChar)
>>> free_packs_char = dll.free_packs_char; free_packs_char.restype = None
>>> out = get_packs_char(6)
>>> import pprint; pprint.pprint(out[0:6])
[<PackChar int_val=0 buffer='abcdefgHi0'>,
 <PackChar int_val=1 buffer='abcdefgHi1'>,
 <PackChar int_val=2 buffer='abcdefgHi2'>,
 <PackChar int_val=3 buffer='abcdefgHi3'>,
 <PackChar int_val=4 buffer='abcdefgHi4'>,
 <PackChar int_val=5 buffer='abcdefgHi5'>]
>>> free_packs_char(6, out)
>>> # again, we can't use `out` now

No Rust (the fungi) and Python (the snake) has been harmed during the preparation of this article.