Contributed by marco on from the dept.
The issues at hand here are that this code is GPLd and is written in C++ for added obfuscation. What would make this algorithm very useful is if someone could write an actual free BSD licensed version with the exact same API as libz. In other words create a drop-in replacement for libz.
If you intend to take on this challenge, here are some guidelines:
* BSD licensed * written per style(9) * Written in C * Should have a full regression test * It should follow the OpenBSD coding and design practicesHappy coding!
Update: I should have done some more research. I was incorrect about bzip2 being patented. I searched the patent offices and only found references to it as being public domain. My bad, sorry for the confusion. Thanks to tedu to bring this to my attention.
(Comments are closed)
By Noryungi (82.127.29.248) n o r y u n g i @ y a h o o . c o m on
Quick question...
The OpenBSD web site states the following:
Integrate good code from any source with acceptable copyright (ISC or Berkeley style preferred, GPL acceptable as a last recourse but not in the kernel, NDA never acceptable).
Since I don't think LZMA will be integrated in the kernel anytime soon, is it really necessary to create another implementation the GPLed LZMA? I can understand the interest of a "pure C" LZMA, but rewriting the whole thing may be overkill... For instance, has anyone contacted the author of the LZMA utilities to arrange a release under the BSD license? Is this possible?
Comments
By Anonymous Coward (143.166.226.16) on
Comments
By Noryungi (82.127.29.248) n o r y u n g i @ y a h o o . c o m on
By Anonymous Coward (204.245.224.15) on
Not even as a "last recourse" ?
Comments
By Fábio Olivé Leite (200.248.155.122) on
By Anonymous Coward (62.252.32.11) on
Comments
By Anonymous Coward (143.166.226.16) on
Comments
By Anonymous Coward (129.215.13.83) on
The LGPL license terms do not require software which uses an LGPL-ed library (say) to be redistributed under the LGPL.
Also:
"SPECIAL EXCEPTION: Igor Pavlov, as the author of this code, expressly permits you to statically or dynamically link your code (or bind by name) to the files from LZMA SDK without subjecting your linked code to the terms of the CPL or GNU LGPL. Any modifications or additions to files from LZMA SDK, however, are subject to the GNU LGPL or CPL terms."
[http://www.7-zip.org/sdk.html]
Comments
By Anonymous Coward (66.11.66.41) on
By Anonymous Coward (143.166.255.16) on
By Anonymous Coward (131.251.0.11) on
By Anonymous Coward (24.80.111.103) on
By gwyllion (134.58.253.130) on
By sthen (81.168.66.226) on
By Rembrandt (84.188.234.142) on
1st it takes a lot time to process... yeah but
2nd: The compression is awesome.
I compressed the sourcode of a bot (malware) wich is abiut ~52.3MB (bzip2). Let`s call this Bot "Phatbot"...
So guess how small LZMA compressed the file.
lzmash -9 -> 7.2 MB!
normal lzma: 9.3MB
The source is about 153MB uncompressed.
So the compressionratios is realy.. unbeliveable.
This is incredible, realy.
Ok the Phatbot-Source may contain a lot of familiar stuff (like Plugins and bla).
But even the OpenBSD src.tar.gz is with a normal lzma (no tuning) just 60MB (~100MB as tar.gz provided by the FTP/HTTP-Mirrors).
Just to show some people how much space this algorithm could save...
The decompression speed is also much better then bzip2 (with all source-codes-Tarballs I tested)
Kind regards,
Rembrandt
By tedu (71.139.175.127) on
Comments
By Anonymous Coward (151.41.11.190) on
Comments
By Thorsten Glaser (81.173.171.215) tg@mirbsd.de on http://mirbsd.de/
Not patented everywhere, though. (I don't care ;)
By Anonymous Coward (128.171.90.200) on
Is LZMA any faster or better at compressing ?
Comments
By Anonymous Coward (71.145.130.63) on
Comments
By Anonymous Coward (128.171.90.200) on
"I have to admit that I was very impressed with the performance of the compressor after Todd showed me his results. The only downside seems to be the time it takes to compress something. Decompression speed is about the same as gzip or bzip2."
That tells me very little
By gwyllion (134.58.253.114) on
By Thorsten Glaser (81.173.175.132) tg@mirbsd.de on http://mirbsd.de/
Compare zopen(3) as on, for example,
http://cvs.mirbsd.de/src/usr.bin/compress/zopen.c
with gzopen(3) as in, for example,
http://cvs.mirbsd.de/src/lib/libz/gzio.c
The latter requires you to use gzread/write/seek/close.
I found it ugly. My solution:
http://cvs.mirbsd.de/src/lib/libz/gzfopen.c
It's a non-standard extension, admittedly, but works
well enough to allow for savecore(8) being operable
on gzipped kernels.
By Richard (195.212.199.56) on
An interesting and somewhat different angle to take I suppose :o)
By Anonymous Coward (202.45.99.226) on
Comments
By arndt (129.143.29.10) on
Comments
By Nate (65.95.228.43) on
Comments
By Fábio Olivé Leite (15.227.249.72) on
Speaking of Wikipedia, I just found this fairly interesting bit of information in Pufferfish:
Due to some unknown selection pressure, intronic and extragenic sequences have been drastically reduced within this family. As a result, they have the smallest-known genomes yet found amongst the vertebrate animals, while containing a genetic repertoire very similar to other fishes and thus comparable to vertebrates generally. Since these genomes are relatively compact it is relatively fast and inexpensive to compile their complete sequences, as has been done for two species (Takifugu rubripes and Tetraodon nigroviridis).
The similarities with OpenBSD's source tree are amazing.
Damn, every hit on Wikipedia means at least half an hour of unbillable time. ;-)
By Ste (217.205.77.85) on
By Anonymous Coward (202.45.99.226) on
By Anonymous Coward (63.237.125.191) on
I've always wondered... Are these documented somewhere?
Comments
By Nate (65.94.61.3) on
By Chas (147.154.235.51) on
I have a big data warehouse in HP-UX Oracle 7 and I like compressing everything down to one DLT. gzip takes 36 hours if I don't spread it out over multiple cpus (240Mhz is getting pretty creaky).
I wish that there was a C compressor as well as a decompressor.
Comments
By tedu (69.12.168.114) on
By Anonymous Coward (24.37.81.6) on
Thanks for the info.
Comments
By Ray (192.193.220.141) on
I think this is obvious enough: The issues at hand here are that this code is GPLd and is written in C++ for added obfuscation.
By Nick Holland (68.43.117.34) nick@holland-consulting.net on http://www.openbsd.org/faq/
Three copies of the same file:
-r--r--r-- 3 archive archive 654813222 Jan 23 11:43 smf
-r--r--r-- 3 archive archive 654813222 Jan 23 11:43 smf1
-r--r--r-- 3 archive archive 654813222 Jan 23 11:43 smf2
These are mbox mail spools...yes, 650M in size. LOTS of binary attachments. gzip did relatively little for me for compression on these files.
$ time rzip smf1
3m26.64s real 3m5.39s user 0m6.34s system
$ time lzma e smf2 smf2.lz
17m53.29s real 17m30.78s user 0m2.18s system
And the results:
$ ls -l smf*
-r--r--r-- 2 archive archive 654813222 Jan 23 11:43 smf
-r--r--r-- 1 archive archive 293921881 Feb 1 08:35 smf1.rz
-r--r--r-- 2 archive archive 654813222 Jan 23 11:43 smf2
-rw-r--r-- 1 archive archive 359881838 Feb 1 09:00 smf2.lz
IN THIS CASE, rzip was the clear winner in compression performance and results. HOWEVER, it used huge amounts of memory to accomplish this task, memory the amd64 system I did this on had. If I had only 256M, I suspect the comparative performance would be significantly different due to swap.
When setting up this system, I tried gzip and bzip2. bzip2 impressed me in how much more time it took than gzip, but provided very little improvement in compression. For this application, bzip2 was just an insulting joke, not a solution. rzip took about as long as bzip2, but gave much greater compression.
rzip has its quirks: rzip is also not pipable, and the uncompression will thrash the disk rather more than expected, and for that reason, I suspect the uncompression speeds for rzip would be much slower than most other uncompression processes. This creates some problems on a machine with one disk system when you try to unpack multiple rzip files at once (hint: just don't). The lack of pipe-ability is annoying at times, if you want to confirm the md5 file of a compressed file, you must uncompress it to disk. Very "anti-Unix" in style.
Moral: there are many kinds of compression systems out there. There is all kinds of data. Experiment with your data. Don't use my results for anything other than "hey, maybe we should look at ..."
Nick.