Can I use a single global file object to write from multiple threads?

If I had code like this

pub struct Tracker {
    pub file: File,
}
impl Tracker {
    pub fn init() -> Tracker {
        let fname = match var("WISK_TRACK") {
            Ok(v) => v,
            Err(_) => {
                if wsroot.is_empty() {
                    if !Path::new("/tmp/wisktrack").exists() {
                        create_dir_all("/tmp/wisktrack").unwrap();
                    }
                    String::from(format!("/tmp/wisktrack/track.{}", uuid))
                } else {
                    String::from(format!("{}/wisktrack/track.{}", wsroot, uuid))
                }
            },
        };
        let tracker = Tracker {
            file : OpenOptions::new().create(true).append(true).open(&fname).unwrap(),
        };
        tracker
    }

lazy_static! {
    pub static ref TRACKER : Tracker = Tracker::init();
}

Can I write to &TRACKER.file from multiple threads? Or do I need locking around &TRACKER.file access ?

A File does allow writing to it from multiple threads at the same time, but be careful, unless the things you write are very short, you will get your writes interleaved. To prevent this, I recommend putting a lock around it anyway.

Each of my writes will be < 4096 which I believe is the size gaurantee that gets written as a block.
So I am guessing i will be ok?

You could try.

Lock is bad, if you don't mind the order of writes, you could use a MPSC channel to send them to another thread or task which consume them and writes to file.
If your writes are append-only, remember set the append flag when opening the file. Which prevents unnecessary reads from disk.

I'm curious, how is a lock "bad" and why is a channel better?

1 Like

In this case, use lock means threads must do the writes by themselves. Disk I/O is slow, and there's only one thread can do the write at the same time, others must block. So the writes would be doing sequecially and there's lock/unlock overhead.
Use a separate thread to do the writes means there's no need to block the thread and waiting for write completion. And a lot of channel implementations are lock-free, which is usually faster than lock-based implementation.

The lock might require one of your writing threads to wait without doing anything useful while another is using the file. If you use a channel that feeds a dedicated writing thread, the messages get queued immediately and written as soon as the writer deals with everything in front of it in line.

It’s more performant, but might be undesirable in some circumstances: if you’re producing data faster than the disk can write, you’ll have an ever-growing in-memory buffer. Also, if you’re writing some kind of guaranteed-persistence software (like a database), the original thread may need to wait for its write to finish anyway.

I have written a simulator of a distributed system that runs a gazillion threads. Each thread produces trace records that are written to a single file. It is not uncommon for those records to appear out of order, e.g., the trace reports a message being received before it was sent. On rare occasions a record is incomplete, which I assume is caused by simultaneous writes.

These anomalies aren't a problem for me, since the trace is used only for debugging and analysis. You will have to be more careful than I have been if they are important to you.

Oooh, because of the "infinite buffer" of a standard channel. :bulb:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.