Orz - compress better than bzip2 and faster than gzip

#1

https://github.com/richox/orz

orz is an optimized ROLZ/Huffman algorithm based data compressor. for most benchmark data, it is now compressing better than bzip2, and faster than gzip.
orz is under active development. we are looking for ways to improve its performance. very welcome to you to give suggestions and make contributions.

7 Likes
#2

Thanks for making this!

Can it be used as a library and for stream compression like gzip? Eg. Wrap an AsyncRead/AsyncWrite bytestream so it will be compressed/decompressed?

From the table on the repo it looks like the decompression speed is where the competition is though for orz…

#3

What about pbzip2? As I know it use the same algorithm as ordinary bzip2 utility,
but because of parallel execution it is possible to utilize all cores of CPU, so it is n-times faster then bzip2. So orz single-threaded or multi-threaded program?

#4

orz is currently single threaded. it can be made parallelized by simply splitting input data into blocks. just as @najamelan said, i will try to split it into a compression library and cli app in the future, and apply parallelization in cli.

1 Like
#5

It might be useful to implement the parallelization in the library. Clients might be interested in that. You can provide an API where the user can choose whether to use multiple threads or not. I think that with rayon it should be straightforward to have a reliable implementation (I mean cross platform out of the box etc).

You can always hide features like this behind a feature flag to avoid imposing the dependencies on users that don’t need them.

I generally put everything in a lib, except main and clap (command line argument parsing) specific code.

2 Likes
#6

I think you should mention the --silent command line argument within the --help message :nerd_face:

Also, what extension do you suggest for the encoded result? .orz?

Quick test:

encode                                                                                                  
======                                                                                                  
                                                                                                        
time bzip2 for_bzip2.pbrt                                                                               
84.173u 0.255s 1:24.45 99.9%  0+0k 0+99000io 0pf+0w                                                     
time gzip for_gzip.pbrt                                                                                 
18.442u 0.235s 0:18.68 99.9%  0+0k 0+127896io 0pf+0w                                                    
time orz encode --silent for_orz.pbrt for_orz.pbrt.orz                                                  
12.138u 0.194s 0:12.33 99.9%  0+0k 0+111872io 0pf+0w                                                    
                                                                                                        
total 484424                                                                                            
-rw-r--r-- 1 jan users  50687667 Apr 17 11:26 for_bzip2.pbrt.bz2                                        
-rw-r--r-- 1 jan users  65482538 Apr 17 11:25 for_gzip.pbrt.gz                                          
-rw-r--r-- 1 jan users 322597185 Apr 17 11:26 for_orz.pbrt                                              
-rw-rw-rw- 1 jan users  57271395 Apr 17 11:44 for_orz.pbrt.orz                                          
                                                                                                        
decode                                                                                                  
======                                                                                                  
                                                                                                        
time bunzip2 for_bzip2.pbrt.bz2                                                                         
12.664u 0.652s 0:13.32 99.9%  0+0k 0+630080io 0pf+0w                                                    
time gunzip for_gzip.pbrt.gz                                                                            
2.583u 0.312s 0:02.89 100.0%  0+0k 0+630080io 0pf+0w                                                    
time orz decode --silent for_orz.pbrt.orz from_orz.pbrt                                                 
2.986u 0.294s 0:03.28 99.6% 0+0k 0+630080io 0pf+0w                                                      
                                                                                                        
total 1316096                                                                                           
-rw-r--r-- 1 jan users 322597185 Apr 17 11:26 for_bzip2.pbrt                                            
-rw-r--r-- 1 jan users 322597185 Apr 17 11:25 for_gzip.pbrt                                             
-rw-r--r-- 1 jan users 322597185 Apr 17 11:26 for_orz.pbrt                                              
-rw-rw-rw- 1 jan users  57271395 Apr 17 11:44 for_orz.pbrt.orz                                          
-rw-rw-rw- 1 jan users 322597185 Apr 17 11:47 from_orz.pbrt                                             
1 Like