Install Duplicates About 4.7% of .rustup on Windows

This is partially a followup to Correct Way to Install Rust on Windows for Multiple Users? .

I finally got around to modifying "FindDuplicateFiles" for this purpose and when I inspect the .rustup directory tree, I see about 4.6 to 4.8% of the space is wasted with duplicates. This, by the way, is very low compared to many installations, so this is not a complaint. It's just information.

I did a clean, default install of 1.68.2.

The .cargo tree has 13 identical files in the "bin" subdirectory, but they are all Hardlinked so no space is wasted. The summary for .rustup is below. I can supply the details if there is any interest.

GP

:: Processed 34,560 files with 2,316 groups, 30,212 potential duplicates, and 880 actual duplicates in 212 groups
::     saving 50,030,831 (52,273,152 on Disk) bytes
::     out of 1,047,802,379 (1,137,864,704 on Disk) bytes
::     for a ratio of 4.8% (4.6% on Disk).
::     Detailed Usage:
::         Type     Groups    Files            Bytes
::         ----     ------    -----            -----
::         Total     6,258   34,560    1,047,802,379
::         Filesize  2,316   30,212      261,954,423
::         Content     212    1,092       72,140,512
::         Vol-Cont    212    1,092       72,140,512
::         SymLinks      0        0                0
::     Found 0 duplicate files on different Volumes with a total of 0 bytes.
::     Found 0 already hardlinked files with 0 (0 on Disk) bytes.
::     6258 Modulo Groups (Buckets) were occupied (2316 inspected) out of 32767.
::     ScanTime = 00:00:02.3225951, MatchTime = 00:00:15.2430169, TotalTime = 00:00:17.5656120,
::     0 directory Subtrees, 0 directory Contents, and 0 Files were not processed due to Exceptions.
::     Codes: F - First Copy on Volume,      L - Linked to First Copy,
::            S - Subsequent Copy on Volume, X - Linked to Subsequent Copy.
::            Only Code 'S' can result in space savings.
::
:: Find Duplicate Files: a Windows(tm) utility.
:: Copyright (c) 2009-2023 by Trailing Edge Technology, All Rights Reserved.
:: Run  End  Time: 3/28/2023 11:26:29 PM local and 3/29/2023 5:26:29 AM UTC.

For future reference, this may be of interest: Forum Code Formatting and Syntax Highlighting

I’ve already edited your post accordingly :wink: (though I’m not sure whether or not the **s that used to create bold text should stay, so feel free to follow up with a subsequent edit)

1 Like

Thanks. I found the hint and edited in parallel. You won the race. GP

Do you know what the actual duplicated files are? The only ones I know of are ~/.rustup/toolchains/stable-x86_64-pc-windows-msvc/bin/std-*.dll being a copy of ~/.rustup/toolchains/stable-x86_64-pc-windows-msvc/lib/rustlib/x86_64-pc-windows-msvc/lib/std-*.dll and similarly for test instead of std. The former copy is a dependency of rustc.exe itself, while the later copy is used for linking your code against. The distinction is important when cross compiling and during the bootstrap process of rustc itself. In both cases the std user code is linked against is a different one from the one which rustc.exe is linked against.

Yes, I have a complete list. The list is about 1/4 MB. Few are DLLs. Most are .js, .css, or .html files. There are some .woff/woff2 and some .svg files.

I am not allowed to upload the file containing the full list as it is a text file (.txt).

I'm far from an expert with forums, but don't know how to make it available. I don't have a website.

Edit 1: I sent a reply to the notification e-mail with the output of the FindDuplicateFiles program.
Edit 2: The e-mail was reject due to its length.

Edit 3: See Post 9 for a link to the complete list of duplicate files.

GP

Here is the start of the list of duplicate files:

C:\Users\Glenn>dotnet "\Program Files\Trailing Edge Technology\FindDuplicateFiles.dll" .rustup
:: Find Duplicate Files: a Windows(tm) utility.
:: Copyright (c) 2009-2023 by Trailing Edge Technology, All Rights Reserved.
:: Run Start Time: 3/28/2023 11:26:11 PM local and 3/29/2023 5:26:11 AM UTC.
::
:: Command Line Parameters: '.rustup'.
::
:: Processing directory "C:\Users\Glenn\.rustup" (".rustup")
::
:: Searching for possible duplicates.
::
:: Possible Duplicates with FileSize = 26 (%32,767 = 26):
::   Actual Duplicates with HashCode = 0x66CAC4850B9CFE4D | 0xCA982C30A63FFE45 | 0x7A2726287D28026C | 0xBDDC871E37A9674F:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\unix\io\sidebar-items1.68.2.js"
::     1    2    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\prelude\rust_2021\sidebar-items1.68.2.js"
::     1    3    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\prelude\rust_2021\sidebar-items1.68.2.js"
::     1    4    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\prelude\rust_2024\sidebar-items1.68.2.js"
::     1    5    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\prelude\rust_2024\sidebar-items1.68.2.js"
::     1    6    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\prelude\rust_2015\sidebar-items1.68.2.js"
::     1    7    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\prelude\rust_2015\sidebar-items1.68.2.js"
::     1    8    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\prelude\rust_2018\sidebar-items1.68.2.js"
::     1    9    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\prelude\rust_2018\sidebar-items1.68.2.js"
::     1   10    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\windows\thread\sidebar-items1.68.2.js"
::     1   11    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\wasi\prelude\sidebar-items1.68.2.js"
::     1   12    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\io\prelude\sidebar-items1.68.2.js"
::     1   13    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\windows\prelude\sidebar-items1.68.2.js"
::     1   14    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\unix\prelude\sidebar-items1.68.2.js"
::     1   15    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\primitive\sidebar-items1.68.2.js"
::     1   16    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\primitive\sidebar-items1.68.2.js"
::     1   17    1    S           26  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\wasi\io\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 32,794 (%32,767 = 27):
::   Actual Duplicates with HashCode = 0x7C1B187FB7D5657A | 0xEC797EBB8293E752 | 0x5CB17CB2DAE18BA7 | 0x98CD6BCB0FF220E0:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F       32,794  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\arch\wasm32\sidebar-items1.68.2.js"
::     1    2    1    S       32,794  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\arch\wasm64\sidebar-items1.68.2.js"
::     1    3    1    S       32,794  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\arch\wasm\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 72 (%32,767 = 72):
::   Actual Duplicates with HashCode = 0x066A4B5664B54016 | 0x00250279733EFF1A | 0x5EA1E48D77680717 | 0x82CC83BB2C0EE53F:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\nomicon\.nojekyll"
::     1    2    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\unstable-book\.nojekyll"
::     1    3    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\book\2018-edition\.nojekyll"
::     1    4    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\reference\.nojekyll"
::     1    5    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\cargo\.nojekyll"
::     1    6    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\error_codes\.nojekyll"
::     1    7    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\book\first-edition\.nojekyll"
::     1    8    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\clippy\.nojekyll"
::     1    9    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\style-guide\.nojekyll"
::     1   10    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\book\second-edition\.nojekyll"
::     1   11    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\rust-by-example\.nojekyll"
::     1   12    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\rustc\.nojekyll"
::     1   13    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\edition-guide\.nojekyll"
::     1   14    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\rustdoc\.nojekyll"
::     1   15    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\embedded-book\.nojekyll"
::     1   16    1    S           72  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\book\.nojekyll"
::
:: Possible Duplicates with FileSize = 84 (%32,767 = 84):
::   Actual Duplicates with HashCode = 0x001601586DF736B0 | 0xCCD01E8C9639B30B | 0x46ABE2254BFB7EA9 | 0xD388AF7CDE8C71B4:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F           84  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\arch\mips64\sidebar-items1.68.2.js"
::     1    2    1    S           84  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\arch\mips\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 134 (%32,767 = 134):
::   Actual Duplicates with HashCode = 0x2E5E260C7CB443B2 | 0x552DFC65FA7CB0BB | 0x1707282C0CAD8385 | 0x67CE2D9EA1AA942A:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F          134  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\boxed\sidebar-items1.68.2.js"
::     1    2    1    S          134  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\alloc\boxed\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 147 (%32,767 = 147):
::   Actual Duplicates with HashCode = 0x9696B793B904CF3E | 0xAF2932EF714F5786 | 0x1DB65855817C6E55 | 0x1600DBBA534144EC:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F          147  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\core\pin\sidebar-items1.68.2.js"
::     1    2    1    S          147  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\pin\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 157 (%32,767 = 157):
::   Actual Duplicates with HashCode = 0x0B6E3F0DFA60FA57 | 0x1829C3A1F95A87B3 | 0x6884E2401D0816FC | 0x4B7A8801AB369C6C:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F          157  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\wasi\ffi\sidebar-items1.68.2.js"
::     1    2    1    S          157  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\std\os\unix\ffi\sidebar-items1.68.2.js"
::
:: Possible Duplicates with FileSize = 179 (%32,767 = 179):
::   Actual Duplicates with HashCode = 0x8DAF6C993D7E3074 | 0x31D8A3397D4F2481 | 0x7175E23C882F8B30 | 0x71382ECB330C11FB:
::   Vol Copy Link Code         Size  Name
::   ---  ---  ---  ---     --------  ----
::     1    1    1    F          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\unix\fs\trait.DirEntryExt2.js"
::     1    2    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\fd\owned\trait.AsFd.js"
::     1    3    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\fd\raw\trait.FromRawFd.js"
::     1    4    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\io\trait.Read.js"
::     1    5    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\fd\raw\trait.IntoRawFd.js"
::     1    6    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\io\trait.BufRead.js"
::     1    7    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\fd\raw\trait.AsRawFd.js"
::     1    8    1    S          179  "C:\Users\Glenn\.rustup\toolchains\stable-x86_64-pc-windows-msvc\share\doc\rust\html\implementors\std\os\windows\io\raw\trait.FromRawHandle.js"

This sample is not representative as it's ordered by filesize % 32767 due to the program's design.

GP

GitHub gists support large text files without a problem, so in case you have a GitHub account, you could post there, and put a link here. The same could of course also be done with a lot of other text file hosters, e.g. pastebin.com comes to mind (and they don't even require an account IIRC).

If it's mostly docs stuff that are duplicated then that sounds like something that could be addressed. Docs tend to be a lot of small files, which make it relatively slow to install. So deduplicating the distributed docs could be a win for install times.

Then again it may be a lot of effort for too little gain. I'm not sure of the trade-offs here.

The list of files has been uploaded to PasteBin.com at: https://pastebin.com/z6UcTxNj
with tags Rust Install Windows Duplicate Files

Edit 1: I'd suggest downloading the file to avoid line-wrap.

GP

Then again it may be a lot of effort for too little gain. I'm not sure of the trade-offs here.

I have no strong opinion here. I only brought it up to inform the more knowledgeable.

It's only ~50 MB and 880 files. But since the .cargo/bin uses hard-links, it might be fairly easy to implement.

GP

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.