zstd compression algorithm and the dethroned old king xz

Here is the comparative numbers reported by Arch devs on which they based their decision to use this fast but resource hungry compression tool.  XZ still wins in size, loses on time, while ZSTD is a huge loser in memory use while compressing; decompressing is comparable and equally fast.  Zstd (gang) software also relies heavily on very current powerful server grade machines to provide the benefit of speed, to make up what it lacks in quality.   Compression software should primarily be judged on their ability to compress, and zstd fails miserably against this 45 year old trusty switchblade called xz.  So we can conclude that arch has an abundance of computing/building/packaging apparatus, with truck loads of spare ram to parallely process many packages.

Arch comparison test ZSTD vs XZMy article (a link to it) was removed from r/linux yesterday for no good reason, 100% linux related material, and as I complained I was permanently banned from posting there.

https://www.reddit.com/r/linux/comments/ejn5c5/arch_2020_welcomes_its_little_brothers_and/

In case you are wondering I was reporting that arch nearly silently started using this facebook compression algorithm on packaging and here is their own test data to support this decision:

https://lists.archlinux.org/pipermail/arch-dev-public/2019-March/029520.html

 

A different set of tests about more compression utilities

From this article that goes into a more general (many compression tools compared) but in more in-depth comparison (not the ideal for arch’s use tests run above, striving to make zstd look good) we isolated two tables on xz and zstd.

The next test was with a much larger file/archive – this time using linux 5.1-rc5

Note where on column 4 the % of cpu utilized (on a 4th gen. i7 8 thread machine) that the speed is due to multithreading.  So on a single or double core machine (single thread) the effects should be analogous to multiplying the speed by the inverse of your lack of threads.  2sec on an 8 core is 16sec on a single core, and the inverse for MB/s, 10MB/s is equal time to 80MB/s with multicore.  So on a lesser machine than the testers don’t expect the speeds to be as spectacular.  Why can’t conclude with zstd’s ram use deficiency would be on a single or double core machine.

About “free space to distribute software”

The mirror space and bandwidth to distribute those compressed packages are paid by others (us in most cases, public university servers).  While arch-devs and their super machine builders are all relieved from the burden of packaging speed, and the tons of additional memory required to do compression, which if I can interpret it correctly it counterbalances the multi-threading abilities, the increase in size and bandwidth to distribute packages is falling on the users and their corresponding mirrors feeding them.

On the question why did both r/linux and r/archlinux blocked content on the xz/zstd change:

As a late announcement in archlinux.org news, 8 days after the shift took effect, and AFTER our articles and banned posts on r/linux and r/archlinux, they made the following statement to cover their “posteriors”.

Don’t make it personal to r/linux and r/archlinux moderators. This is the real reflection of the status of linux and its evolution. A year or two ago Google took NSA’s speck cryptography algorithm and pushed linux to adopt it. Linux did. And many distros left it enabled to be used by unsuspecting users. A popular outcry was met by a silent decision to dump it eventually, so whining and cursing eventually works, or in the case of Linus should I call it whistle-blowing? I think it was around 4.17-4.18 that Linux had included speck. Arch switched it off after several other distributions had already done so, but still included the code into the kernel.

So it is not linux, it is not r/linux, or Arch-Linux, it is a problematic decision making fashion across most of linux.  What I find even more problematic is the passive audience “customers” who refrain from getting involved.  They just care about their “free as in beer software” filling the empty cells of disk space on their pc.  I would recommend that more people need to get involved and influence the decisions made and not allow Large Multinational Corporations to keep making all the decisions about their software and really corroding the nature of open and free software and the freedom of users/sysadmins to choose their tools.

Based on Fedora’s and Arch’s decision to switch package compression tools, without judgement and further research many more distributions will try to “catch up to the trend”.  Those who are limited by economic realities and rely on cheaper older machines in network to do packaging, will soon find out the burdens of using such a tool as zstd, despite of our value judgement to reject it based on its origins, and not on performance.

Like your mommy told you when you were young, don’t accept candy from a stranger, or a needle from a cheap pusher!  And facebook is and will always be a strnager to the real world of open and free software, not to say an offender of our intelligence to see it as a good willing contributor.

 

Enough?  We will add more data and sources as they come up from friends and activists against the hydra of corporatism and domination.

 

4 thoughts on “zstd compression algorithm and the dethroned old king xz

  1. Hello,

    This is Dylan (KISS Linux). I’ve been reading your posts and I’m sorry to see what happened on Reddit. I’ve noticed this censorship trend for a long while now and I despise it really.

    Don’t worry too much about it as I and others are reading your posts, learning from them and sharing them around. See: https://github.com/kisslinux/community/commit/cd29cbd27e34a767378c9585b8964760909afd48 (zstd: Drop from community)

    I removed zstd from KISS (wasn’t used for anything other than btrfs-progs which I will also be removing).

    What’s funny is that I was talking to someone just this week about the zstd issue and the jump towards the next “new shiny thing”.

    Keep it up. 🙂

    Like

  2. Hi Dylan, I am not writing on Kiss since you “are here” to write better than I could 🙂
    HNYear

    I have been doing some xz/zstd tests and when you use the simple -T (–threads) option not only does it kick butt, their claims about reproducing the same sums are crap. Only on single core does it give a different sum, on an 8 thread machine from compression 1 through 9 and from 2 to 8 threads I got 100% same sums across each compression degree. The level of compression is unbeatable by zstd. When speed has a speed advantage the compression ratio is mediocre.

    So their reports ARE INTENTIONALLY skewed to justify the choice. To me this translates to motive. And the motive is to use users as guinea pigs to assist the development of facebook’s toy.

    I used a 15MB archive for my tests, and such sizes are subsecond processes. Only on single thread tests with max compression I went to 5s.

    What ya think?
    Void’s xbps is capable of zstd much earlier than arch, they choose not to use it, but provide you the freedom, if you want to built and install your own from your repository.

    I am banned from both r/linux r/archlinux since the 3rd of Jan 2020. On the 4th they placed an announcement on their webpage about utilizing zstd now. They have been shipping .zst pkgs since 12/27!!!

    Like

  3. from arsv via /r/initFreedom

    Suggestion: instead of copying somebody’s data, write a script to benchmark them on some easily available files. It’s very easy to do, and you would avoid getting called out instantly by the first person who bothered to check.

    time xz -kd linux-5.1.tar.xz
    user 0m8.805s
    
    time zstd -kd linux-5.1.tar.zst
    user 0m1.165s
    

    From my experience, these times are representative. It’s about this kinda of difference for common package-related tasks. I’m not sure how Arch got the numbers they posted, no idea, their dataset is not really what most people care about anyway. Nonetheless, even for what I think are common use cases, the effect is there and it’s quite noticeable. Zstd trades a bit of compression, like 10% larger files, for something like 8x decompression speed-up over LZMA.

    And I must point out that it’s not only that Zstd is fast, it’s also that LZMA is unusually slow.

    I’ve been messing around with LZMA, and I will be again very soon specifically with package management applications in mind. It’s not a simple problem, it’s something that needs to be addressed properly. Just going around and denying Zstd exists will not get you anywhere. You’ll just piss people off and make them sneer every time the issue is brought up, making life very difficult for anyone who’d hopefully come up with an actual valid alternative to Zstd.

    fungalnet

    Why, don’t you trust the guy that published them?

    Max compression for zstd is 19 for xz is 9, right?

    % time zstd -19k texlive-core-2019.52579-1-any.pkg.tar
    texlive-core-2019.52579-1-any.pkg.tar : 33.25% (438732800 => 145889607 bytes, texlive-core-2019.52579-1-any.pkg.tar.zst)

    zstd -20k texlive-core-2019.52579-1-any.pkg.tar 128.26s user 0.21s system 100% cpu 2:08.37 total

    % time xz -9kT8 texlive-core-2019.52579-1-any.pkg.tar
    xz -9kT8 texlive-core-2019.52579-1-any.pkg.tar 140.21s user 1.04s system 208% cpu 1:07.70 total

    140M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar.zst
    134M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar.xz
    419M Jan 8 21:23 texlive-core-2019.52579-1-any.pkg.tar

    xz took half the time to compress and the end size was smaller by 4-5%

    To decompress zstd wins:

    % time xz -kd texlive-core-2019.52579-1-any.pkg.tar.xz
    xz -kd texlive-core-2019.52579-1-any.pkg.tar.xz 6.68s user 0.23s system 99% cpu 6.959 total

    % time zstd -kd texlive-core-2019.52579-1-any.pkg.tar.zst
    zstd -kd texlive-core-2019.52579-1-any.pkg.tar.zst 0.38s user 0.16s system 99% cpu 0.538 total

    Now, the average of total upgrades for a user daily is less than this, but let’s say it is as much as this. The difference is in decompression time, about 6.5s. To install the packages takes so much longer that the difference becomes negligible. The size to download and to store packages has increased by 5%. At 500KB/s this is a significant difference. Let’s say for a 150MB pkg like this the difference being 6MB that is 12s. So we have a 6.5s deficit and 5% more disk space needed over xz. (to keep pkgs in case you need to reinstall).

    Now, you see the difference is in compression, not decompression (128s over 67s) . The user will never notice 5-10s delay per day on a daily upgrade. Since you asked me to produce numbers I am showing how Arch’s disk space can be cut by 5% and their compression time down to half. So why are we doing this again?

    Both xz and zstd are Arch’s packages.

    232K Nov 13 02:53 /var/cache/pacman/pkg/xz-5.2.4-2-x86_64.pkg.tar.xz
    392K Nov 28 08:25 /var/cache/pacman/pkg/zstd-1.4.4-1-x86_64.pkg.tar.xz

    Ohhh,… wait, xz itself. this 45year old algorithm is half as big as this 3yo zstd facebook marvel.

    I am still going to question the motives. For modernization being the motive, the flying magnetic train is an improvement over the 1000s of years development of the wheel, but for some reason I still see many wheels around. I’d love to have a magnetic skateboard to go around town, but for now I keep my 30yo bicycle well lubed.

    Like

If your comment is considered off-topic a new topic will be created with your comment to continue a different discussion. This community is based on open and free communication, meaning we must all respect all in minimizing the exercise of freedom to disrupt such communication. Feel free to post what you think but keep in mind the subject matter discussed. It is just as easy to start a new topic as it is to dilute the content of an existing discussion.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.