Unity使用LZF实时压缩字符串/byte[]（Runtime）

最近在搞一个帧同步的 Demo，涉及到网络消息转发。之前内网版 Demo 使用 String + Protobuf 二次封装发送消息，而公网带宽和流量都要钱，有必要压缩下。

LZMA、GZip 与 LZF

三种压缩算法的优缺点：

LZMA：7z 默认的压缩算法，压缩率较高，但是时间很长；
GZip：压缩率较 LZMA 低，但时间略短，常用于 Web 服务器与浏览器通信；
LZF：Redis 内置的压缩算法，侧重点是压缩/解压时间低，自然地，其压缩率最低

同样侧重于执行效率的库还有谷歌的 Snappy，不做讨论。

为什么选择 LZF

压缩解压主要用在帧同步消息发送与接收上的，因此可以看作是实时运行。其每秒会压缩 22 次，并解压 22 次。实时运行就要求 CPU 耗时和 GC 不能过高，同时内存占用也要低。

其实一开始我是选的 LZMA ，毕竟消息字符串长度不是很长，按理来说执行时耗 CPU 时间不会太高。

先找了一个老外的 LZMA 库，Update 里调用压缩解压测试下，速度倒是还好，不过有一点，每秒内存泄漏达到了 50MiB，不一会儿，我的 PC 内存就被吃满了，这肯定不行。

后来又找了一个国人封装的 LZMA C 库，同样测试下，内存泄漏问题没有了，每帧 GC 也只有 1.2 KiB 左右，完全可以接受。但有一点，方法每帧的 CPU 耗时高达 10ms – 20ms，追了一下，追到 C 库的压缩、解压接口，再往下就只能去读 C 源码了，遂放弃。这么高的 CPU 耗时肯定不行，一个压缩解压就这么高，其他的逻辑根本跑不动。

后来看到了 LZF 算法，其自称是高性能压缩解压算法，遂引入了一个该算法的纯 C# 实现脚本，测试了下，发现其完全满足需求（Android、PC）。

工具脚本

脚本原作者在 Unity 国际版论坛里（非 Unity CN 问答论坛），同时这个脚本已收录进 unity-ui-extensions 仓库中（非官方但很有名的 Unity 工具合集仓库）。

//
// http://forum.unity3d.com/threads/lzf-compression-and-decompression-for-unity.152579/
//

/*
 * Improved version to C# LibLZF Port:
 * Copyright (c) 2010 Roman Atachiants <kelindar@gmail.com>
 *
 * Original CLZF Port:
 * Copyright (c) 2005 Oren J. Maurice <oymaurice@hazorea.org.il>
 *
 * Original LibLZF Library  Algorithm:
 * Copyright (c) 2000-2008 Marc Alexander Lehmann <schmorp@schmorp.de>
 *
 * Redistribution and use in source and binary forms, with or without modifica-
 * tion, are permitted provided that the following conditions are met:
 *
 *   1.  Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *
 *   2.  Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *
 *   3.  The name of the author may not be used to endorse or promote products
 *       derived from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MER-
 * CHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
 * EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPE-
 * CIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTH-
 * ERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * Alternatively, the contents of this file may be used under the terms of
 * the GNU General Public License version 2 (the "GPL"), in which case the
 * provisions of the GPL are applicable instead of the above. If you wish to
 * allow the use of your version of this file only under the terms of the
 * GPL and not to allow others to use your version of this file under the
 * BSD license, indicate your decision by deleting the provisions above and
 * replace them with the notice and other provisions required by the GPL. If
 * you do not delete the provisions above, a recipient may use your version
 * of this file under either the BSD or the GPL.
 */

using System;

namespace UnityEngine.UI.Extensions
{
    /// <summary>
    /// Improved C# LZF Compressor, a very small data compression library. The compression algorithm is extremely fast.
    /// Note for strings, ensure you only use Unicode else special characters may get corrupted.
    public static class CLZF2
    {
        private static readonly uint HLOG = 14;
        private static readonly uint HSIZE = (1 << 14);
        private static readonly uint MAX_LIT = (1 << 5);
        private static readonly uint MAX_OFF = (1 << 13);
        private static readonly uint MAX_REF = ((1 << 8) + (1 << 3));

        /// <summary>
        /// Hashtable, that can be allocated only once
        /// </summary>
        private static readonly long[] HashTable = new long[HSIZE];

        // Compresses inputBytes
        public static byte[] Compress(byte[] inputBytes)
        {
            // Starting guess, increase it later if needed
            int outputByteCountGuess = inputBytes.Length * 2;
            byte[] tempBuffer = new byte[outputByteCountGuess];
            int byteCount = lzf_compress(inputBytes, ref tempBuffer);

            // If byteCount is 0, then increase buffer and try again
            while (byteCount == 0)
            {
                outputByteCountGuess *= 2;
                tempBuffer = new byte[outputByteCountGuess];
                byteCount = lzf_compress(inputBytes, ref tempBuffer);
            }

            byte[] outputBytes = new byte[byteCount];
            Buffer.BlockCopy(tempBuffer, 0, outputBytes, 0, byteCount);
            return outputBytes;
        }

        // Decompress outputBytes
        public static byte[] Decompress(byte[] inputBytes)
        {
            // Starting guess, increase it later if needed
            int outputByteCountGuess = inputBytes.Length * 2;
            byte[] tempBuffer = new byte[outputByteCountGuess];
            int byteCount = lzf_decompress(inputBytes, ref tempBuffer);

            // If byteCount is 0, then increase buffer and try again
            while (byteCount == 0)
            {
                outputByteCountGuess *= 2;
                tempBuffer = new byte[outputByteCountGuess];
                byteCount = lzf_decompress(inputBytes, ref tempBuffer);
            }

            byte[] outputBytes = new byte[byteCount];
            Buffer.BlockCopy(tempBuffer, 0, outputBytes, 0, byteCount);
            return outputBytes;
        }

        /// <summary>
        /// Compresses the data using LibLZF algorithm
        /// </summary>
        /// <param name="input">Reference to the data to compress</param>
        /// <param name="output">Reference to a buffer which will contain the compressed data</param>
        /// <returns>The size of the compressed archive in the output buffer</returns>
        public static int lzf_compress(byte[] input, ref byte[] output)
        {
            int inputLength = input.Length;
            int outputLength = output.Length;

            Array.Clear(HashTable, 0, (int)HSIZE);

            long hslot;
            uint iidx = 0;
            uint oidx = 0;
            long reference;

            uint hval = (uint)(((input[iidx]) << 8) | input[iidx + 1]); // FRST(in_data, iidx);
            long off;
            int lit = 0;

            for (;;)
            {
                if (iidx < inputLength - 2)
                {
                    hval = (hval << 8) | input[iidx + 2];
                    hslot = ((hval ^ (hval << 5)) >> (int)(((3 * 8 - HLOG)) - hval * 5) & (HSIZE - 1));
                    reference = HashTable[hslot];
                    HashTable[hslot] = (long)iidx;


                    if ((off = iidx - reference - 1) < MAX_OFF
                        && iidx + 4 < inputLength
                        && reference > 0
                        && input[reference + 0] == input[iidx + 0]
                        && input[reference + 1] == input[iidx + 1]
                        && input[reference + 2] == input[iidx + 2]
                        )
                    {
                        /* match found at *reference++ */
                        uint len = 2;
                        uint maxlen = (uint)inputLength - iidx - len;
                        maxlen = maxlen > MAX_REF ? MAX_REF : maxlen;

                        if (oidx + lit + 1 + 3 >= outputLength)
                            return 0;

                        do
                            len++;
                        while (len < maxlen && input[reference + len] == input[iidx + len]);

                        if (lit != 0)
                        {
                            output[oidx++] = (byte)(lit - 1);
                            lit = -lit;
                            do
                                output[oidx++] = input[iidx + lit];
                            while ((++lit) != 0);
                        }

                        len -= 2;
                        iidx++;

                        if (len < 7)
                        {
                            output[oidx++] = (byte)((off >> 8) + (len << 5));
                        }
                        else
                        {
                            output[oidx++] = (byte)((off >> 8) + (7 << 5));
                            output[oidx++] = (byte)(len - 7);
                        }

                        output[oidx++] = (byte)off;

                        iidx += len - 1;
                        hval = (uint)(((input[iidx]) << 8) | input[iidx + 1]);

                        hval = (hval << 8) | input[iidx + 2];
                        HashTable[((hval ^ (hval << 5)) >> (int)(((3 * 8 - HLOG)) - hval * 5) & (HSIZE - 1))] = iidx;
                        iidx++;

                        hval = (hval << 8) | input[iidx + 2];
                        HashTable[((hval ^ (hval << 5)) >> (int)(((3 * 8 - HLOG)) - hval * 5) & (HSIZE - 1))] = iidx;
                        iidx++;
                        continue;
                    }
                }
                else if (iidx == inputLength)
                    break;

                /* one more literal byte we must copy */
                lit++;
                iidx++;

                if (lit == MAX_LIT)
                {
                    if (oidx + 1 + MAX_LIT >= outputLength)
                        return 0;

                    output[oidx++] = (byte)(MAX_LIT - 1);
                    lit = -lit;
                    do
                        output[oidx++] = input[iidx + lit];
                    while ((++lit) != 0);
                }
            }

            if (lit != 0)
            {
                if (oidx + lit + 1 >= outputLength)
                    return 0;

                output[oidx++] = (byte)(lit - 1);
                lit = -lit;
                do
                    output[oidx++] = input[iidx + lit];
                while ((++lit) != 0);
            }

            return (int)oidx;
        }


        /// <summary>
        /// Decompresses the data using LibLZF algorithm
        /// </summary>
        /// <param name="input">Reference to the data to decompress</param>
        /// <param name="output">Reference to a buffer which will contain the decompressed data</param>
        /// <returns>Returns decompressed size</returns>
        public static int lzf_decompress(byte[] input, ref byte[] output)
        {
            int inputLength = input.Length;
            int outputLength = output.Length;

            uint iidx = 0;
            uint oidx = 0;

            do
            {
                uint ctrl = input[iidx++];

                if (ctrl < (1 << 5)) /* literal run */
                {
                    ctrl++;

                    if (oidx + ctrl > outputLength)
                    {
                        //SET_ERRNO (E2BIG);
                        return 0;
                    }

                    do
                        output[oidx++] = input[iidx++];
                    while ((--ctrl) != 0);
                }
                else /* back reference */
                {
                    uint len = ctrl >> 5;

                    int reference = (int)(oidx - ((ctrl & 0x1f) << 8) - 1);

                    if (len == 7)
                        len += input[iidx++];

                    reference -= input[iidx++];

                    if (oidx + len + 2 > outputLength)
                    {
                        //SET_ERRNO (E2BIG);
                        return 0;
                    }

                    if (reference < 0)
                    {
                        //SET_ERRNO (EINVAL);
                        return 0;
                    }

                    output[oidx++] = output[reference++];
                    output[oidx++] = output[reference++];

                    do
                        output[oidx++] = output[reference++];
                    while ((--len) != 0);
                }
            }
            while (iidx < inputLength);

            return (int)oidx;
        }
    }

使用方法

上文的静态工具类提供了两个方法，Compress与Decompress，分别是压缩与解压。

1.压缩或解压 byte[]

直接调用 Compress 与 Decompress 即可。

2.压缩或解压字符串

压缩：先使用System.Text.Encoding.UTF8.GetBytes将字符串转为 byte[]，再通过上文的压缩方法压缩。
解压：先通过上文的解压方法解压 byte[]，再通过System.Text.Encoding.UTF8.GetString获取字符串常量。

效率与压缩率

经过我的测试，对于四百长度的 byte[]，其压缩与解压时间几乎小于 1ms，同时其带来的 GC 也可以忽略不计，不会造成性能瓶颈。

在如此高的执行效率下，其压缩率也可以接受，对于目前的消息字段，一般情况下 400 长度的 byte[] 能够压缩到 260 左右。

当然，压缩率要根据具体文件内容来看，如果重复字段很多，那么压缩率还能更高，反之，压缩率就会更低。