Upload only processed artifact to crates.io

I don't think the below issue has a 'native' solution but wanted to make sure, and ask for workarounds.

So there's a repository containing the following pieces:

  • some raw, unprocessed data. Call it RAW.
    • the single source of truth and tracked in version control. Cannot just be "thrown away". Might change over time.
    • required for the crate to work (this is a hybrid crate, with a thin main.rs in front of a lib.rs)
    • large, beyond the crates.io upload limit of AFAIK currently 10 MB
  • a build.rs step to convert that raw data to something more tangible. Let's call that DATA.
    • DATA is actually all that's ever needed for end users of the crate, and for library users. They wouldn't need to care for RAW
    • DATA is much smaller than RAW, below the size limit of crates.io
    • has to be regenerated whenever RAW changes
  • DATA is used via include_bytes inside of the crate source code

As a diagram, the (desired) situation is:

+---------------- GitHub repo ------------------+
|                                               |
|      +-----+  "Large,                         |
|      | RAW |   potentially > 10 MB,           |
|      |     |   Single Source of Truth"        |
|      +-----+                                  |
|         |                                     |
|         v                                     |
|     +----------+                              |
|     | build.rs |  "Processes file"            |
|     +----------+                              |
|        |                                      |
|  +---- | ------ crates.io ------------------+ |
|  |     |                                    | |
|  |     v                                    | |
|  |  +------+    "Much smaller               | |
|  |  | DATA |     than RAW, < 10 MB"         | |
|  |  +------+                                | |
|  |     |                                    | |
|  |     v                                    | |
|  |  +-------------+                         | |
|  |  | src/lib.rs  |  "include_bytes(DATA)"  | |
|  |  +-------------+                         | |
|  +------------------------------------------+ |
+-----------------------------------------------+

I reckon users (bin and lib) cannot build and use the project this way?

I see two cases:

  1. build.rs not on crates.io: Can build.rs just be... not uploaded like that (ignored in Cargo.toml)? Doesn't seem right. Plus, it places artifacts like DATA into OUT_DIR, which is definitely not valid for inclusion in cargo package etc.

    On the other hand, all build.rs does is generate DATA, so if the latter is already present somehow, src/lib.rs won't care build.rs is missing.

  2. So one might include build.rs in the crates.io block (the normal case anyway). However, after downloading the crate from crates.io via cargo install or cargo add, build.rs will want to run in either case, but won't find RAW, failing the build.

    Including a network call for downloading RAW, as one potential solution, seems unfortunate.

It feels wrong to have to bother end users with RAW and its weight when all they care for is the crate's public API (they won't even notice DATA if not looking closely). What could be done?

build.rs is specifically for running some code after downloading from the registry, so if you don't want that to happen, you could move it to a separate crate (as main.rs) instead, and run that whenever you change RAW.

1 Like

build.rs could skip processing if RAW is missing and DATA is present.

1 Like

Hmmm. That doesn't solve the OUT_DIR problem.

1 Like

For this I use a separate bin crate in the workspace, which is never itself published to crates.io. It just converts the "raw" input into a data file, puts in the real crate's src dir, which is then published together with the crate.

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.