Hello, since some days I've started to study Rust, and I have being amazed by how many Rust design decisions I agree with, this is unusual. I am still a Rust newbie, so far I have read just the very nice "Rust by Example" (http://rustbyexample.com ), several blog posts and some Reddit threads.
I usually program in D and Python, and I also like the Haskell, ATS2 (www.ats-lang.org ), Whiley (whiley.org ), Julia (julialang.org ) and Ada languages, for different reasons. I have opened more than one thousand of enhancement requests/bug reports for D, I maintain about one thousand RosettaCode (rosettacode.org ) solutions for D (many entries have multiple solutions), and I have given a hand to design the D Range algorithms (std.range - D Programming Language ). English isn't my native language so if you spot some English mistakes feel free to note them.
Below there's a first batch of notes and small questions (most links here don't have the https: because as first post this system doesn't let me write more than two links...).
- Do you know some Rust programmers/users in Italy? I'd like to meet some of them, to create a little meeting. (I haven't visited a Rust IRC channel yet).
-
So far (I am still at an early stage of Rust study) the only Rust language design decision I have found that I don't like much regards the % operator, that behaves like in C99/D (it computes the dividend):
fn main() {
let n1 = std::env::args()
.nth(1)
.map_or(Ok(10), |n| n.parse::())
.unwrap();let n2 = std::env::args() .nth(2) .map_or(Ok(20), |n| n.parse::<i64>()) .unwrap(); println!("{}", n1 % n2);
}
Rust outputs:
>test1 10 3
1
>test1 10 -3
1
>test1 -10 3
-1
>test1 -10 -3
-1
Similar operations on the Python Shell give:
>>> 10 % 3
1
>>> 10 % -3
-2
>>> -10 % 3
2
>>> -10 % -3
-1
I can undertand a desire of C99 semantic compatibility, and regarding performance I know what result the CPUs gives. But in most cases I need the divisor when I'm using signed numbers. In some languages the C99 % operator is also bug-prone (I am not yet sure if this is also true for Rust).
Example: if you want to simulate a simple 1D cellular automaton using a 1D Von Neumann neighborhood (that is, reading the current cell, the cell before, and the successive cell, to compute the next generation of automation), with 1D toroidal arrangement (wrap-around), you can access the three needed cells like this in Python2:
V = [10, 20, 30, 40, 50]
N = len(V)
for i in xrange(N):
print V[(i - 1) % N], V[i], V[(i + 1) % N]
That outputs:
50 10 20
10 20 30
20 30 40
30 40 50
40 50 10
(In Python you can even write just "print V[i - 1], V[i], V[(i + 1) % N]" because negative indexes wrap-around from the array end).
Here in D language you can't use the % directly, you have to use something like this (same output):
enum mod = (in int n, in int m) pure nothrow @safe @nogc =>
((n % m) + m) % m;
void main() {
import std.stdio;
immutable V = [10, 20, 30, 40, 50];
foreach (immutable int i; 0 .. V.length)
writefln("%d %d %d", V[(i - 1).mod($)],
V[i],
V[(i + 1) % $]);
}
In Rust you can't use % here (also because the index 'i' is unsigned, so i-1 underflows at the beginning of the loop). One solution (same output):
fn main() {
let v = [10, 20, 30, 40, 50];
let n = v.len();
for i in 0 .. n {
println!("{} {} {}", v[if i == 0 { n - 1 } else { i - 1 }],
v[i],
v[(i + 1) % n]);
}
}
Do you have better ways to write that code in Rust?
One of the very few languages that I like on this is Ada, that has the "mod" operator to compute the divisor, and the "rem" operator to compute the dividend. This avoids problems with signed numbers and makes the performance implications clearly visible.
Perhaps a second operator for the divisor could be added to Rust, or at least a small function in the Rust standard library (if not already present).
-
This D code allocates a fixed-size array of i32 on the stack and then tries to fetch an item past the end of the array:
void main() {
immutable int[3] arr = [10, 20, 30];
const size_t IDX = 3;
immutable r = arr[IDX];
}
The D compiler gives this compile-time error:
test.d(4): Error: array index 3 is out of bounds arr[0 .. 3]
A similar Rust program compiles without errors:
fn main() {
let arr = [10, 20, 30];
const IDX: usize = 3;
println!("{}", arr[IDX]);
}
Something similar happens to slices with statically-known bounds.
Is this nice improvement planned for Rust, or is it already in some enhancement request, RFC, or something similar? (In a successive post I'll discuss a generalization of this idea).
-
This Rust program gives a "non-exhaustive patterns" error if I comment out the last impossible case:
fn main() {
let n : u32 = 10;
let r = match n % 3 {
0 => 10,
1 => 20,
2 => 30,
//_ => unreachable!() // ?
};
}
Even this code gives a similar error:
fn main() {
let x = 0u8;
let y = match x {
0 ... 255 => 1, // error?
};
}
I don't expect a Rust compiler to infer exhaustiveness when there are match cases with an "if", but I think in the common match{} situations where the "if" is nowhere present the Rust compiler should be smarter and avoid asking for a useless "_ => unreachable!()" case. Is this enhancement request desired and already present online?
-
This D code defines two types of mutable-values stack-allocated fixed-size compile-time-fixed-length arrays, the second with the same length as the first:
void main() {
alias Data1 = int[100];
alias Data2 = uint[Data1.length];
}
How do you write the same in Rust? This fails:
fn main() {
type Data1 = [i32; 100];
type Data2 = [u32; Data.len()];
}
test.rs:3:24: 3:28 error: unresolved name Data
[E0425]
test.rs:3 type Data2 = [u32; Data.len()];
^~~~
test.rs:3:24: 3:28 help: run rustc --explain E0425
to see a detailed explanation
error: aborting due to previous error
- Why aren't Rust associative data structures (like the hash and binary search trees) using the handy syntax to get an item?
I mean something like (D code):
void main() {
int[int] aa; // Built-in D associative array.
aa[1] = 10;
assert(aa[1] == 10);
}
Instead of "aa.insert(1, 10);" to set an item, and similarly to get an item.
Finding an item in a well designed hash should be amortized O(1). D language conventions allow to use the syntax if an operation is O(ln(n)) or faster. So both hash and search trees qualify.
-
This Rust program:
#![feature(box_syntax)]
fn main() {
let x = box 10;
println!("{}", x);
println!("{:?}", x);
}
Outputs:
10
10
But if I have understood what I have read, the difference between {} and {:?} are similar to the differences between str() and repr() in Python, so I expected an output more similar to:
10
box 10
Or perhaps:
10
box(10)
What do you think?
-
Can you ask (compactly, in few tokens) to std::collections::HashMap to use a faster (not cryptographically safe) hash function (expecially for strings)? Something like:
let mut aa: HashMap<String, i32, Hash::Fast> = HashMap::new();
Using a cryptographically safe hash function on default is acceptable, but in some cases I need higher performance, and I've experimentally seen that D built-in associative arrays (and sometimes even Python maps) are faster than a Rust HashMap at building an hash of strings (D associative arrays in past used to be safer against attacks because, despite the usage of a cryptographically unsafe hash function, they used a Red-Black tree for each bucket. But later the trees were replaced by faster linked lists).
-
On this Rust program:
fn main() {
printnl!("{}", 1);
}
The rustc compiler gives:
test.rs:2:5: 2:12 error: macro undefined: 'printnl!'
test.rs:2 printnl!("{}", 1);
^~~~~~~
error: aborting due to previous error
On a similar D program:
import std.stdio;
void main() {
writenl(1);
}
The D compiler gives a more useful error message:
test.d(3): Error: undefined identifier 'writenl', did you mean template 'writeln(T...)(T args)'?
Similarly, on this Rust program:
fn somefunc() {}
fn main() {
somefun();
}
The rustc compiler gives:
test.rs:3:5: 3:12 error: unresolved name somefun
[E0425]
test.rs:3 somefun();
^~~~~~~
test.rs:3:5: 3:12 help: run rustc --explain E0425
to see a detailed explanation
error: aborting due to previous error
While on this similar D program:
void somefunc() {}
void main() {
somefun();
}
The D compiler gives:
test.d(3): Error: undefined identifier 'somefun', did you mean function 'somefunc'?
The D compiler catches similar small mistakes (up to a Levenshtein distance of 2 or 3, divided by kind) for user-defined identifiers. Is this enhancement request desired and already present online?
-
Do you know of a method/function like "each()" that consumes iterators, a bit similar to .inspect(), to be appended at the end of chains of map/filter:
fn main() {
(0 .. 10)
.map(|x| x * x)
.inspect(|x| println!("{} ", x))
.each();
}
That code should behave about like:
fn main() {
for _ in (0 .. 10)
.map(|x| x * x)
.inspect(|x| println!("{} ", x)) {
}
}
An alternative design (perhaps nicer looking):
fn main() {
(0 .. 10)
.map(|x| x * x)
.for_each(|x| println!("{} ", x));
}
Scala sequences have "foreach":
But in one case in another language it was removed:
social.msdn.microsoft.com/Forums/en-US/758f7b98-e3ce-41e5-82a2-109f1df446c2/where-is-listtforeach
-
I have done a small experiment regarding RVO (Copy elision - Wikipedia ), this is Rust code:
const N : usize = 1000;
type Data = [i32; N];#[inline(never)]
fn pippo() -> Data {
let mut data: Data = [0; N]; // Initialize.
for i in 0 .. data.len() {
data[i] = i as i32; // Initialize again.
}
return data;
}fn main() {
let data2 = pippo();
println!("{}", data2[0]);
}
Asm, compiled in release mode (rustc 1.6.0-nightly (8ca0acc25 2015-10-28)):
_ZN5pippo20hbf0a92f9b1becfacmaaE:
pushq %r14
pushq %rbx
subq $4008, %rsp
movq %rdi, %r14
leaq 8(%rsp), %rdi
xorl %ebx, %ebx
xorl %esi, %esi
movl $4000, %edx
callq memset@PLT
movdqa .LCPI0_0(%rip), %xmm0
movdqa .LCPI0_1(%rip), %xmm1
.align 16, 0x90
.LBB0_1:
movd %ebx, %xmm2
pshufd $0, %xmm2, %xmm2
movdqa %xmm2, %xmm3
paddd %xmm0, %xmm3
paddd %xmm1, %xmm2
movdqu %xmm3, 8(%rsp,%rbx,4)
movdqu %xmm2, 24(%rsp,%rbx,4)
addq $8, %rbx
cmpq $1000, %rbx
jne .LBB0_1
leaq 8(%rsp), %rsi
movl $4000, %edx
movq %r14, %rdi
callq memcpy@PLT
addq $4008, %rsp
popq %rbx
popq %r14
retq
The same in D with ldc2 compiler, release noboundscheck mode:
alias Data = int[1000];
Data pippo() {
Data data;
foreach (immutable i; 0 .. data.length)
data[i] = i;
return data;
}
void main() {
import std.stdio;
immutable data2 = pippo();
data2[0].writeln;
}
Asm (LDC2, 0.15.2-beta2, based on DMD v2.066.1 and LLVM 3.6.1):
__D6test315pippoFZG1000i:
pushl %esi
subl $12, %esp
movl %eax, %esi
movl %esi, (%esp)
movl $4000, 8(%esp)
movl $0, 4(%esp)
calll _memset
xorl %eax, %eax
movdqa LCPI0_0, %xmm0
movdqa LCPI0_1, %xmm1
.align 16, 0x90
LBB0_1:
movd %eax, %xmm2
pshufd $0, %xmm2, %xmm2
movdqa %xmm2, %xmm3
paddd %xmm0, %xmm3
paddd %xmm1, %xmm2
movdqu %xmm3, (%esi,%eax,4)
movdqu %xmm2, 16(%esi,%eax,4)
addl $8, %eax
cmpl $1000, %eax
jne LBB0_1
addl $12, %esp
popl %esi
retl
I think the "callq memcpy@PLT" near the end of the Rust asm shows that the RVO isn't happening.
I have seen this issue, perhaps it's the same problem?
github.com/rust-lang/rfcs/issues/788
- As more general comment, I think Rust should try to improve on several things:
a) Stive to reduce the amount of unsafe{} code needed in most programs, adding some new compiler checks that run on unsafe{} code to make it less unsafe, improving its type system, introducing standard library things that safely wrap some unsafety, and so on.
b) Adding things to the standard library (and language, if necessary) that make Rust more handy, quick and natural to use, almost as a script-like language. To be used when you want to write such kind of code, usually in smaller programs (This is also named "scaling down" in the Scala community);
c) Adding things that allow the programmer to specify types and behavous more precisely, like in Ada. This is at odds with the precedent desire, so the use of such things can be opted out. Such things get used when the code needs to be fully specified and as bug-free as possible. Even this kind of code should be sufficiently succinct (unlike Ada);
d) I think the Rust design should also take a look at the languages used for high performance numerical computing: perhaps there are very few things that could be added/modified that improve this usage of Rust;
e) Improve the type system (like integrals and enums used as template arguments, higher order generics, CTFE) to reduce the code repetition, remove some usages of Rust macros, without introducing too much unsafety, keeping the language sufficiently small for normal programmers to learn, and keeping the language still focused on practicality and safety.
(I am not suggesting to turn Rust into a scripting language, or into a high performance language, or a high integrity language. I am saying that often a small amount of things can improve those usage needs signficantly without changing the overall language much, like fulfilling the 70% of those needs with a 10% of added things).
Thank you,
later,
leonardo
http://www.fantascienza.net/leonardo/