Using the "image" crate, I've read a .png file, written it to a new .png file, and read it back. The image content matches.
(I checked by differencing the images with Gimp.)
The metadata, as displayed by metadataview.com, is the same, except for the checksum. The file header bytes match. The file lengths differ slightly, which is puzzling.
This is weird but OK. I just need a hash of the image for de-duplication purposes. The hash function of the image crate gives different hashes for these files. Is there some function that gives me a hash of just the image bytes? Yes, I could write one, but it would probably be slow.
(Tried to upload the images here, but this forum system converts them to JPEG, which would just confuse the issue. They're map tiles for a virtual world. They can change, but seldom do, so a mapping system needs to detect changes.)
The PNG format allows several different filter options, and uses optional DEFLATE compression which also has different settings available. Other DEFLATE implementations may have even more possible outputs for the same uncompressed input.
Therefore, two compressed PNGs of the same image are not guaranteed to have the same bytes. There are even tools like pngcrush that will encode the same image with many different combinations of filters and compression settings in order to find the smallest possible output.
You should either compare decoded image data, or use an uncompressed file format, or ensure your images are always compressed by the same tool with the same algorithms and settings.
Thank you. That's a big help, knowing that there are minor differences. I differenced the two in GIMP and saw solid black, but a slight difference in black level might not be noticed.
There's a long list of individual differences (1169 pixels in total) but all of them ~2 levels of the color. Was the image resized or color converted somewhere in the pipeline? i can't replicate it by a load/save pair myself.
let input = image::open("/tmp/imga.png").unwrap();
input.save("/tmp/imgc.png").unwrap();
let outcome = image::open("/tmp/imgc.png").unwrap();
assert_eq!(input, outcome, "holds on my machine");
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4
[[package]]
name = "adler2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa"
[[package]]
name = "autocfg"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8"
[[package]]
name = "bitflags"
version = "2.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843867be96c8daad0d758b57df9392b6d8d271134fce549de6ce169ff98a92af"
[[package]]
name = "bytemuck"
version = "1.25.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8efb64bd706a16a1bdde310ae86b351e4d21550d98d056f22f8a7f7a2183fec"
[[package]]
name = "byteorder-lite"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8f1fe948ff07f4bd06c30984e69f5b4899c516a3ef74f34df92a2df2ab535495"
[[package]]
name = "cfg-if"
version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "crc32fast"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9481c1c90cbf2ac953f07c8d4a58aa3945c425b7185c9154d67a65e4230da511"
dependencies = [
"cfg-if",
]
[[package]]
name = "fdeflate"
version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e6853b52649d4ac5c0bd02320cddc5ba956bdb407c4b75a2c6b75bf51500f8c"
dependencies = [
"simd-adler32",
]
[[package]]
name = "flate2"
version = "1.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843fba2746e448b37e26a819579957415c8cef339bf08564fe8b7ddbd959573c"
dependencies = [
"crc32fast",
"miniz_oxide",
]
[[package]]
name = "image"
version = "0.25.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6506c6c10786659413faa717ceebcb8f70731c0a60cbae39795fdf114519c1a"
dependencies = [
"bytemuck",
"byteorder-lite",
"moxcms",
"num-traits",
"png",
]
[[package]]
name = "miniz_oxide"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316"
dependencies = [
"adler2",
"simd-adler32",
]
[[package]]
name = "moxcms"
version = "0.7.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac9557c559cd6fc9867e122e20d2cbefc9ca29d80d027a8e39310920ed2f0a97"
dependencies = [
"num-traits",
"pxfm",
]
[[package]]
name = "num-traits"
version = "0.2.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841"
dependencies = [
"autocfg",
]
[[package]]
name = "png"
version = "0.18.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "60769b8b31b2a9f263dae2776c37b1b28ae246943cf719eb6946a1db05128a61"
dependencies = [
"bitflags",
"crc32fast",
"fdeflate",
"flate2",
"miniz_oxide",
]
[[package]]
name = "pxfm"
version = "0.1.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7186d3822593aa4393561d186d1393b3923e9d6163d3fbfd6e825e3e6cf3e6a8"
dependencies = [
"num-traits",
]
[[package]]
name = "rewrite"
version = "0.1.0"
dependencies = [
"image",
]
[[package]]
name = "simd-adler32"
version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e320a6c5ad31d271ad523dcf3ad13e2767ad8b1cb8f047f75a8aeaf8da139da2"
Those images are actually being fetched from a remote server (not mine) as .jpg images, decompressed, and recompressed as .png. Something in that chain is not precisely repeatable. It may not be in the Rust side at all. I assumed it was my code that was non-repeatable, but that may not be the case.
I read the files from a server as .jpg, and that part of the operation may not be exactly repeatable. Right now I don't think it's the .png compression. It may be the .jpg decompression, or even inconsistency between the servers of the server farm. Anyway, thanks for helping chase that down.
(What I'm doing: I take the map tiles of Second Life, get elevation data from another source, turn them into 3D objects, and use a new Second Life client to display them for distant terrain areas. The effect is like Google Earth or GTA V; you can see a very long distance. The present system has a very limited view distance.)
JPEG doesn't have a precisely defined decoding. It involves math on cosines in a non-RGB color space, which in practical implementations has plenty of approximations and rounding steps that are implementation-specific. You will get different results even depending on your CPU, because optimized assembly for different CPUs may perform operations in a different order, which will round numbers differently.
On top of that PNG and JPEG may contain color profiles, which also involve non-linear math and transforms which have many approximations in their implementations that depend on your libraries, OS, and possibly even the type of display you're using (color processing library will choose different sizes of lookup tables, interpolation methods, or different transform algorithms depending on similarity between file's color profile and color space used by your graphics program/OS/display).
Color profiles make PNG effectively (slightly) lossy when implementations apply the embedded color profile to convert pixels into software's internal/working color space for display and editing purposes, and then back to the original color space defined in the file when saving.
Deflate compression used internally in PNG requires making decisions for every byte, which affect overall compression efficiency, but in ways that are impossible to fully predict, so every implementation (and every compression setting they have) is a bag of tricks and heuristics, and they won't all make exactly the same decisions for millions of bytes.
In PNG you can compare/hash raw pixel values without any color profiles applied. You may want to normalize color modes (e.g. expand grayscale mode to RGB) to better capture effectively the same pixels but stored in a more compact way.
In JPEG the least bad thing you can do is compare quantized coefficients of the raw YUV channels, which eliminates issues caused by non-integer math and lossless compression optimizations. It won't be robust against even simple re-saving of a file, and there's no simple way around that. There are perceptual hash algorithms that can tolerate some differences at cost of having false positives.
The supply chain for this image is Second Life mapping system -> AWS bucket -> Akamai CDN cache -> Rust "image" JPG decoder. Something in that chain is almost, but not quite, repeatable. Probably not a Rust problem.
Almost definitely not a Rust problem, after more analysis. Looks more like Akamai caches behind a load balancer being out of sync. I have a test program re-reading the same URL, and I get one of two different results. somewhat randomly.