How can I prevent rust codes from reading my local files?

In my computer, I have the following directories and files:

my_directory
    my_file1.txt
    my_file2.txt
my_crate
    src
       my_functions.rs
       lib.rs
custom_crate
    src
       main.rs

The custom_crate is using my_crate(the former has dependency on the latter).

Suppose I'm going to run the main.rs from derectory custom_crate, and I can know what the content of main.rs is(by reading the content from it as to String) . And I need to make sure that the codes from main.rs can't read the content out (or downloading) from my other files, like my_file or my_functions.rs. And if the codes can do that, I will quit the running.
How can I do that?
I came out a method: if the String from main.rs contains std, then I quit the running. So I can make sure the main.rs can't use functions like std::fs::read_to_string. But I thought there are others methods being able to read file, so it must not a really solution.

You can't prevent arbitrary other code from reading the filesystem.

Why are you trying to do this? It sounds to me like you are trying to build a sandbox and "sanitize" input to be compiled and run. That approach has never worked.

If you want to sandbox code, then do it at the process/OS level. Put it in a chroot jail or a Docker container or similar. Better yet, don't compile and run arbitrary untrusted code.

13 Likes

But, but, I do that everyday with hundreds of Rust crates !

6 Likes

This is the process:
There is a crate IRust, which can evaluate String and show the result as String. And I have web server, which accept text input, which would be converted to Rust code and evaluated by IRust (this evaluation process happened in my local computer, on which the server is running). Then the evaluated result will show in the web page. But I am worried about the input code will read my local file content, so I got the problem above.

What you are asking for is impossible. As simple as that. Do not do this.

Instead, if you are running untrusted code, you must run it in a sandboxed OS-level environment (e.g. Shellbox, or, as advised by @h2co3, a chroot, etc).

Alternatively, try to refactor your web service so that the input is not untrusted arbitrary code, but some query format handled by a program written and trusted by you.

9 Likes

The server code has dependency on my local crate(suppose that is my_functions.rs). If I run the code in Shellbox, I should put the compiled program(custom_crate.exe) in the Shellbox, rather than the source files(including my_crate and custom_crate), or the input code can still read the my_functions.rs. Am I right?

You may want to have a look at how different websites that offer this work under the hood. Examples are godbolt.org, https://ato.pxeger.com/ and GitHub Codespaces · GitHub.

3 Likes

There are several important differences between compiling and running user input directly vs. using crates:

  • you can verify the crates' code yourself, if you bother to
  • crates are open-source and are subject to potential scrutiny by the community, so they are much less likely to have the opportunity of being malicious. Playground-like functionality can't trust that the code copy-pasted in it will have been inspected by others.
6 Likes

In addition to running user code in a sandboxed environment, you must also compile the code in a sandboxed environment. Rust's compiler runs arbitrary code at compile-time as part of the proc macro system. Even if it didn't do that, rustc would still be an attack surface since it's a complex piece of software that's not designed to prevent attacks against itself. E.g. it uses llvm, which has known vulnerabilities against malicious source code.

6 Likes

What you say is very true. Still, we would never have gotten anything like crates off the net or any other public repos into projects I have worked on for certain 3 and 4 letter agencies. Maybe I was made hyper suspicious by those experiences.

In a perfect world our interpreters and compilers would protect us from malicious source codes. Never allowing it to access memory that is not allowed or make disallowed sys calls etc. I guess we are a long way from that, even with Rust, and we so have to rely on hardware protection mechanisms.

In a perfect world we would have used descendants of iAPX 432 with full-blown hardware separation between tiny entities and not iAPX 386 with huge flat address space without any protections between parts of the same program.

Alas, we don't live in such ideal world and have to work with what we have available.

It was tried many times. Never works. The most famous attempts were Java applets. And they failed spectacularly. Shortly before they kicked these from browsers the majority of applets on the Web were infested with malware.

The story is very simple: wall between entities may either be flexible and cheap, and unreliable or inflexible, thick, expensive and trustable.

You couldn't, really, have both. And if you have to have it then using hardware-assisted separation is, usually, cheapest option. Which means VMs, maybe control groups, processes… definitely not objects in the same flat address space.

P.S. Of course it's always possible to create something which is both expensive and insecure, but that's not too much interesting case.

3 Likes

Wow, someone who knows the 432. Back in the day when we were building SBC's with Z80 and programming them in assembler, just when we were migrating to 8088/86, or was it 286 already, I had all the iAPX 432 data books and studied them quite a lot. I guess Intel wanted us to go that way. Just one problem, at least for me, I did not understand anything. I could not fathom how one would get even the simplest program running. Of course it was all designed for object oriented programming, with hardware separated objects as you say, which way out my understanding at the time.

The 386 was a marvel. Thank God we now don't have to mess with all that segmented memory, a huge flat space, freedom, wonderful.

Oh yes, a dismal failure. However I'm not going to count those. After all the language in used was Java, which is hardly designed to support the safety we are talking about. The compilers and VM's were built with C/C++, as was the whole browser environment, which is even worse.

So I'm still of the mind that it would be possible to build a language that would be safe, provide object isolation etc, and basically make the hardware memory protections redundant. All that will suffer from bugs in implementation that allows exploitation of course, but they can be fixed over time. As are the security holes in the hardware memory separations we have been seeing ever since the Intel 286 (I used to have a huge list of those known holes, under NDA from Intel).

Of course, that probably requires the run time, the compilers, the operating system, everything from the ground up be built with that language. So I don't expect to see it happen....

It was supposed to provide it. But failed.

Do you know how to do that on modern CPU? From scratch, I mean? Do you even know how and why one would need to train memory, of all things, e.g.?

To start something on complicated CPU is, well… complicated. Some design decisions of iAPX 432 were quite strange, I would even say, insane (instructions with arbitrary size in bits? WTH?), but the majority were just simple consequences of the fact that you can either build secure and slow system or insecure and fast, but it's not really possible to create secure and fast system.

Possible. Not even that hard. But then it would be slow and people would start improving it and then we would have repeat of the same story Java, C#, JavaScript and so on followed: from slow but secure thingie to fast but insecure.

No. Not possible. To fix “holes in the wall” you have to stop doing regular refactorings and just concentrate on guaranteed security. It may work with something like BPF where language capabilities are explicitly and permanently sacrificed in favor of security, but this would never work with general-purpose language.

Hardware security features don't change much over time. And even these prove to be constant source of insecurities that needs constant patching.

Doing something like that in language runtime is hopeless: either your language is dead (and then its security is no longer relevant) or it's changes constantly and old security holes are patched, but new ones are introduced all the time!

It was, though. Then .NET tried to do the same things as well, which also failed

Code Access Security (CAS) has been deprecated across all versions of .NET Framework and .NET. Recent versions of .NET do not honor CAS annotations and produce errors if CAS-related APIs are used. Developers should seek alternative means of accomplishing security tasks.

~ Partial Trust - WCF | Microsoft Learn

Untrusted things in your process never works. The industry needs to stop trying.

(This is also why every browser now runs different sites in different processes, for example.)

3 Likes

I wouldn't that harsh, actually. Yes, all attempts to organize security in the same process (in browser, on web server side, etc), of course, failed. They were doomed from the onset.

But! They worked in a some kinda half-broken state for years when people couldn't afford anything else.

And were much better than nothing in these times.

That I can agree with. Android's decision to not even try to use Java for security was the right choice. And we should continue to only use in-process security separation when there are no other sane choice — but then we shouldn't expect to run arbitrary sandboxed code.

And most of the time, today, proper separation with VMs or, at least, containers are usually affordable. There are literally no need to half-solve unsolvable in-process security problem anymore.

3 Likes

You got me there. No. The last time I managed to do that was bringing up the, then new, Intel 486 from reset vector, getting it out of 16 bit mode into protected 32 bit mode, spinning up some tasks and running a mini in-house OS/scheduler. That was quite hard enough for my simple mind to pull off at the time. All that setting up of segment registers, translation tables, etc, I don't recall exactly. I wish I could have taken copy of that code to marvel at today.

I know one guy who has a mini OS written in assembler that boots on his laptop. But that is after the BIOS firmware has done its magic.

No idea really. But I can guess it's to do with trying to find optimal timings for access cycles and voltage levels to eek out more performance.

Whilst those systems certainly failed to live up to their hype of safety they are very different to what I am talking about. Firstly I'm not convince the actual language definitions are rigorous with respect to safety. As a result they relied on virtual machines/interpreters to provide that sandboxing. Secondly those vm/interpreters are written in unsafe languages like C and C++. The whole thing runs on top of operating systems written in unsafe languages. All looks like a hose with very secure doors and windows built on foundations of tunnels to God knows where to me.

No, what I'm talking about is a language designed with safety as a priority and some theoretical reasons why it is safe. Like Rust I guess, barring unsafe. Compiled to real, bare metal instructions. The compiler is generating those instructions, surely it can ensure the code is safe given the semantics of the language are safe? At least in theory. Else why do we have Rust?

And of course if this has to run onto of an OS or in a browser all of that should be built the same safe way.

I may be wrong but it seems to me that whatever logic can implement in hardware we can write in software. Likely true give that we can write CPU emulators.

What you have said there is that both hardware and software are constantly changing. They both tackle security issues, they both introduce new vulnerabilities as they change.

and

Yes they failed. They were all trying to solve the problem of running arbitrary binary code produced by unsafe languages (compiled, fitted, whatever) or unscrupulous humans. On top of systems also written unsafely.

I'm talking about arbitrary source code of a safe language. As is the OP. Something like Rust. Once it is is compiled it is no longer "untrusted". It can only do what the compiler says it can within restrictions to system resources we give it. After all, if you cannot trust your compiler what can you trust?

Btw - As to community scrutiny: in practice I don't think that's always reliable or done or doable. I don't know if/how crates on crates.io are vetted but in similar eco systems, like pypi.org (npm...) there have been instances of malicious packages that were uploaded (with names very similar to popular packages, so as to make optimal use of any typos in legit code).

Just to be clear - I agree in general with every one here who said something like this: "Don't do that. Don't even try because you cannot really guarantee this". But... I'm still curious. Are there any Rust applications that provide (configurable) sandboxing (on a general OS level)? I would guess it's not possible at the moment to code sth like that using only safe Rust (?), but is it possible to do this without having to resort to asm code and specialize it for a particular OS?

1 Like

And your hardware is written in unsafe language, too.

In theory, theory and practice are the same. In practice, they are not.

Yes, Rust is safe, we have proof. In practices, though, there are dozens of soundness-related issues.

To ensure that our programs work? It's ones thing to ask the compiler to catch mistakes that programmer does accidentally. It's entirely different thing to ensure that holes in these checks are not used by malicious actor.

Yes, but that's not technical issue, it's social. CPU core development takes years. Five years or more from scratch, year or two minor adjustment.

And if you couldn't patch over for a bug then it's recall procedure with costs which can go to billions of dollars.

That means that people are serious about the need to deliver something that works as is secure.

And thus minor adjustment to spec takes years to design and implement.

Programming languages are developed on a different cadence, they are much more flexible and thus chances of getting somethig air-tight is more-or-less zero.

I can tell you more. I, personally, developed such software sandbox. It had one (or maybe two, by now) security holes found in ten years. Probably more secure than hardware, in some sense.

But why? Because it wasn't a language. It was something with fixed, frozen, specification (actually three over the course of 10 years) and that was it.

Language-imposed rules are way to fluid for them to ever become rigid enough to be usable as security-asserting tool.

Cadence matters. Rust issues new releases every six weeks and new release is only tested by tiny group of developers for three months.

New CPU architectures are released about once per ten years and minor adjustments still take years. And CPU specifications are much simpler and more rigid than any language specifications.

What do you even mean by “safe language”? If you mean memory safety then most of these languages were memory safe.

They just couldn't provide rigid enough separation of untrusted code from trusted code.

100% of compilers for 100% of popular languages are designed with assumption than source code is not malicious. Period.

You may like it or dislike it, but that's the basis on which all compilers are built.

3 Likes

100% of compilers for 100% of popular languages are designed with assumption than source code is not malicious. Period.

Not true! Safe Haskell solves this problem. Effect systems, in general, solve this problem, and effect safety is the next frontier, now that memory safety is well-mitigated by Rust's approaches to borrowing and unsafe code.