Rust create sparse file?

Is there a way to create sparse files from Rust? Here, by Sparse file I mean files where if an entire Page is 0, the OS optimizes it away.

Googling, there are various ways to create such files via truncate or dd. However, I can't find anyway to create one from Rust (googling returns sparse linear algebra libraries).

[Context: for mmapped on-disk b-trees, if we drop an entire page, we don't want it to waste disk space.]

According to this article, you can create a sparse file by using lseek() to move the file's cursor to a particular location and the OS will automatically treat the space in between as empty...

int create_sparse_file(char *path, uint64_t size)
{
    int fd = 0; 
    fd = open(path, O_RDWR|O_CREAT, 0666);
    if (fd == -1) {
        return -1;
    }    
    if (lseek(fd, size - 1, SEEK_CUR) == -1) {
        return -1;
    }    
    write(fd, "\0", 1);
    close(fd);
    return 0;
}

... and looking at the source code for std's filesystem APIs on *nix, it seems like we use lseek() (imported as lseek64()) when implementing Seek for std::fs::File.

I haven't tried it, but in theory you should be able to create a sparse file by just seeking to the end location.

4 Likes

The full patch was declined (issue 58635), but you can see a method of copying sparse files in rust in this PR. There's also the hole-punch crate, which seems unmaintained, but can likely be used as a reference.

2 Likes

Sorry for bumping an old thread, but I was just looking for this and I wanted to confirm that, at least on Linux, using seek and writing a dummy byte seems to work.

For example:

let mut file = File::create("sparse.rs").unwrap();
file.seek(SeekFrom::Start(1140549487)).unwrap();
file.write_all(&[0]).unwrap();

will create a 1.1gb sparse file called sparse.rs in an unwrap-happy way :sweat_smile:

2 Likes

This is my fault for titling the post 'create sparse file?'

I actually need something stronger.

Suppose mmap a file, and write some data to page N.

Then, at some later point, we zero all of page N.

I want a this point, for page N to be "sparse" in the file.

=====

In practice, this shows up in database code, where after we reclaim a page, we zero it and want the fs to store the all 0 page 'sparsely'

In linux, you can use the (non-POSIX) fallocate system call -- see the "Deallocating file space" section which talks about the FALLOC_FL_PUNCH_HOLE mode.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.