Read Compressed, Little-Endian File on Big-Endian Machine

travis · March 23, 2020, 3:52am

I've got a binary file that's compressed with LZSS, and the data is stored in the little-endian layout. I'm trying to decode the file on my PC which is big-endian, but I haven't had any luck with that. I'm using the compression crate, to decode the LZSS data. Does anyone have any ideas to get around the endian-ness. Right now, it seems like I'm better off learning the LZSS compression algorithm, and writing my own that can take endian-ness into account. Either that, or run my program on a Raspberry Pi since it's little-endian. I'm trying the Pi route right now to see if I can rule out endian-ness.

Hyeonu · March 23, 2020, 4:06am

Both filesystems and most compression algorithms including LZSS operates on byte stream and doesn't care about endianness.

What does it means "little-endian layout" here? Does it means UTF-16LE encoded text data?

travis · March 23, 2020, 4:32am

By "little-endian layout" I only meant that if the compression algorithm needed to interpret multiple bytes as numeric data at any point (i16, u16, i32, u32, f32, f64, etc) then the bytes will be read in reverse order on a big-endian machine. The header data to the image files contained in those LZSS blobs needed to be reversed.

It sounds like this isn't the case though. I could be using the compression crate incorrectly since I'm still figuring out how LZSS works. I don't know where to get the sizes for the window size, search buffer, or look-ahead. Not sure if they're contained within the files themselves, or if it's a hard-coded constant that just needs to be "known" for all of those files.

RAOF · March 23, 2020, 8:56am

Typically, on-disc formats (it any format that might be seen by another computer) will have their layouts explicitly specified, and well written libraries for them will do endianness correctly as a consequence.

This is especially so for compression formats, where compact representation is paramount, and so you tend to have things like “these 5 bits are the offset into array $foo, these 3 bits are the 0-based bias for the $frobinator”; there is no concept of endianness there!

travis · March 23, 2020, 6:57pm

Ok, I see. So compression generally doesn't have concepts of 16-bit and 32-bit data. It's just all raw bits and sometimes bytes, and if operations span multiple bytes, it's about jumping between bytes; never really treating multiple bytes a one solid concept, such as a 16-bit or 32-bit datatype.

I think I understand now. I haven't really done anything with compression, so these concepts are really new to me. I'm trying to decompress some image files from the PSP version of Final Fantasy IV, and this is somehow the first time I've ever had to learn about compression algorithms.

system · June 21, 2020, 6:57pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
LZSS Decompression help	3	1207	June 22, 2020
Deflate not inflating properly help	3	911	June 24, 2021
Extract lzma file help	6	1707	January 12, 2023
Confuse with the behavior of as between *const u16 and &const u8 help	3	249	June 19, 2024
Packet structure and endianness review code review	3	1136	July 23, 2021

Read Compressed, Little-Endian File on Big-Endian Machine

Related topics