栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

字节数组的基数N编码

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

字节数组的基数N编码

编辑
[2020/01/26]:FWIW,下面的代码及其单元测试与我在Github上的开源库一起发布。

编辑
[2016/04/19]:如果您喜欢异常,则可能希望更改一些Depre实现代码以抛出,

InvalidDataException
而不仅仅是返回null。

编辑 [2014/09/14]:我在Enpre()中添加了一个“
HACK”,以处理输入中最后一个字节已签名的情况(如果要转换为sbyte)。我现在能想到的唯一明智的解决方案是将数组的大小调整为1。通过了此案例的其他单元测试,但我没有重新运行perf代码以解决此类情况。如果可以帮助您,请始终对Enpre()的输入在末尾包含一个虚拟0字节,以避免进行其他分配。

用法

我创建了一个RadixEncoding类(在“代码”部分中找到),该类使用三个参数初始化:

  1. 基数数字作为字符串(长度决定了实际的基数),
  2. 输入字节数组的假定字节顺序(字节序),
  3. 以及用户是否希望编码/解码逻辑确认结束的零字节。

要创建base-36编码,并使用little-endian输入,并考虑到以零字节结尾:

const string k_base36_digits = "0123456789abcdefghijklmnopqrstuvwxyz";var base36_no_zeros = new RadixEncoding(k_base36_digits, EndianFormat.Little, false);

然后实际执行编码/解码:

const string k_input = "A test 1234";byte[] input_bytes = System.Text.Encoding.UTF8.GetBytes(k_input);string enpred_string = base36_no_zeros.Enpre(input_bytes);byte[] depred_bytes = base36_no_zeros.Depre(enpred_string);

性能

与Diagnostics.Stopwatch一起计时,在i7 860 @ 2.80GHz上运行。定时EXE本身运行,而不是在调试器下运行。

使用上面相同的 k_base36_digits 字符串 EndianFormat.Little 初始化编码,并 确认结尾为零字节
(即使UTF8字节没有多余的结尾零字节)

编码“测试1234”的UTF8字节1,000,000次需要2.6567905

编码“测试1234。稍大一点!”的UTF8字节。100,000次需要1.1577325秒。
要解码相同的字符串,相同的时间需要1.244326秒

如果没有CodeContracts生成器,则必须使用if / throw代码重新实现合同。

using System;using System.Collections.Generic;using System.Numerics;using Contract = System.Diagnostics.Contracts.Contract;public enum EndianFormat{    /// <summary>Least Significant Bit order (lsb)</summary>    /// <remarks>Right-to-Left</remarks>    /// <see cref="BitConverter.IsLittleEndian"/>    Little,    /// <summary>Most Significant Bit order (msb)</summary>    /// <remarks>Left-to-Right</remarks>    Big,};/// <summary>Enpres/depres bytes to/from a string</summary>/// <remarks>/// Enpred string is always in big-endian ordering/// /// <p>Enpre and Depre take a <b>includeProceedingZeros</b> parameter which acts as a work-around/// for an edge case with our BigInteger implementation./// MSDN says BigInteger byte arrays are in LSB->MSB ordering. So a byte buffer with zeros at the /// end will have those zeros ignored in the resulting enpred radix string./// If such a loss in precision absolutely cannot occur pass true to <b>includeProceedingZeros</b>/// and for a tiny bit of extra processing it will handle the padding of zero digits (encoding)/// or bytes (decoding).</p>/// <p>Note: doing this for decoding <b>may</b> add an extra byte more than what was originally /// given to Enpre.</p>/// </remarks>// based on the answers from http://prereview.stackexchange.com/questions/14084/base-36-encoding-of-a-byte-array/public class RadixEncoding{    const int kByteBitCount = 8;    readonly string kDigits;    readonly double kBitsPerDigit;    readonly BigInteger kRadixBig;    readonly EndianFormat kEndian;    readonly bool kIncludeProceedingZeros;    /// <summary>Numerial base of this encoding</summary>    public int Radix { get { return kDigits.Length; } }    /// <summary>Endian ordering of bytes input to Enpre and output by Depre</summary>    public EndianFormat Endian { get { return kEndian; } }    /// <summary>True if we want ending zero bytes to be enpred</summary>    public bool IncludeProceedingZeros { get { return kIncludeProceedingZeros; } }    public override string ToString()    {        return string.Format("base-{0} {1}", Radix.ToString(), kDigits);    }    /// <summary>Create a radix enprer using the given characters as the digits in the radix</summary>    /// <param name="digits">Digits to use for the radix-enpred string</param>    /// <param name="bytesEndian">Endian ordering of bytes input to Enpre and output by Depre</param>    /// <param name="includeProceedingZeros">True if we want ending zero bytes to be enpred</param>    public RadixEncoding(string digits,        EndianFormat bytesEndian = EndianFormat.Little, bool includeProceedingZeros = false)    {        Contract.Requires<ArgumentNullException>(digits != null);        int radix = digits.Length;        kDigits = digits;        kBitsPerDigit = System.Math.Log(radix, 2);        kRadixBig = new BigInteger(radix);        kEndian = bytesEndian;        kIncludeProceedingZeros = includeProceedingZeros;    }    // Number of characters needed for encoding the specified number of bytes    int EncodingCharsCount(int bytesLength)    {        return (int)Math.Ceiling((bytesLength * kByteBitCount) / kBitsPerDigit);    }    // Number of bytes needed to decoding the specified number of characters    int DecodingBytesCount(int charsCount)    {        return (int)Math.Ceiling((charsCount * kBitsPerDigit) / kByteBitCount);    }    /// <summary>Enpre a byte array into a radix-enpred string</summary>    /// <param name="bytes">byte array to enpre</param>    /// <returns>The bytes in enpred into a radix-enpred string</returns>    /// <remarks>If <paramref name="bytes"/> is zero length, returns an empty string</remarks>    public string Enpre(byte[] bytes)    {        Contract.Requires<ArgumentNullException>(bytes != null);        Contract.Ensures(Contract.Result<string>() != null);        // Don't really have to do this, our pre will build this result (empty string),        // but why not catch the condition before doing work?        if (bytes.Length == 0) return string.Empty;        // if the array ends with zeros, having the capacity set to this will help us know how much        // 'padding' we will need to add        int result_length = EncodingCharsCount(bytes.Length);        // List<> has a(n in-place) Reverse method. StringBuilder doesn't. That's why.        var result = new List<char>(result_length);        // HACK: BigInteger uses the last byte as the 'sign' byte. If that byte's MSB is set,         // we need to pad the input with an extra 0 (ie, make it positive)        if ( (bytes[bytes.Length-1] & 0x80) == 0x80 ) Array.Resize(ref bytes, bytes.Length+1);        var dividend = new BigInteger(bytes);        // IsZero's computation is less complex than evaluating "dividend > 0"        // which invokes BigInteger.CompareTo(BigInteger)        while (!dividend.IsZero)        { BigInteger remainder; dividend = BigInteger.DivRem(dividend, kRadixBig, out remainder); int digit_index = System.Math.Abs((int)remainder); result.Add(kDigits[digit_index]);        }        if (kIncludeProceedingZeros) for (int x = result.Count; x < result.Capacity; x++)     result.Add(kDigits[0]); // pad with the character that represents 'zero'        // orientate the characters in big-endian ordering        if (kEndian == EndianFormat.Little) result.Reverse();        // If we didn't end up adding padding, ToArray will end up returning a TrimExcess'd array,         // so nothing wasted        return new string(result.ToArray());    }    void DepreImplPadResult(ref byte[] result, int padCount)    {        if (padCount > 0)        { int new_length = result.Length + DecodingBytesCount(padCount); Array.Resize(ref result, new_length); // new bytes will be zero, just the way we want it        }    }    #region Depre (Little Endian)    byte[] DepreImpl(string chars, int startIndex = 0)    {        var bi = new BigInteger();        for (int x = startIndex; x < chars.Length; x++)        { int i = kDigits.IndexOf(chars[x]); if (i < 0) return null; // invalid character bi *= kRadixBig; bi += i;        }        return bi.ToByteArray();    }    byte[] DepreImplWithPadding(string chars)    {        int pad_count = 0;        for (int x = 0; x < chars.Length; x++, pad_count++) if (chars[x] != kDigits[0]) break;        var result = DepreImpl(chars, pad_count);        DepreImplPadResult(ref result, pad_count);        return result;    }    #endregion    #region Depre (Big Endian)    byte[] DepreImplReversed(string chars, int startIndex = 0)    {        var bi = new BigInteger();        for (int x = (chars.Length-1)-startIndex; x >= 0; x--)        { int i = kDigits.IndexOf(chars[x]); if (i < 0) return null; // invalid character bi *= kRadixBig; bi += i;        }        return bi.ToByteArray();    }    byte[] DepreImplReversedWithPadding(string chars)    {        int pad_count = 0;        for (int x = chars.Length - 1; x >= 0; x--, pad_count++) if (chars[x] != kDigits[0]) break;        var result = DepreImplReversed(chars, pad_count);        DepreImplPadResult(ref result, pad_count);        return result;    }    #endregion    /// <summary>Depre a radix-enpred string into a byte array</summary>    /// <param name="radixChars">radix string</param>    /// <returns>The depred bytes, or null if an invalid character is encountered</returns>    /// <remarks>    /// If <paramref name="radixChars"/> is an empty string, returns a zero length array    ///     /// Using <paramref name="IncludeProceedingZeros"/> has the potential to return a buffer with an    /// additional zero byte that wasn't in the input. So a 4 byte buffer was enpred, this could end up    /// returning a 5 byte buffer, with the extra byte being null.    /// </remarks>    public byte[] Depre(string radixChars)    {        Contract.Requires<ArgumentNullException>(radixChars != null);        if (kEndian == EndianFormat.Big) return kIncludeProceedingZeros ? DepreImplReversedWithPadding(radixChars) : DepreImplReversed(radixChars);        else return kIncludeProceedingZeros ? DepreImplWithPadding(radixChars) : DepreImpl(radixChars);    }};

基本单元测试

using System;using Microsoft.VisualStudio.TestTools.UnitTesting;static bool ArraysCompareN<T>(T[] input, T[] output)    where T : IEquatable<T>{    if (output.Length < input.Length) return false;    for (int x = 0; x < input.Length; x++)        if(!output[x].Equals(input[x])) return false;    return true;}static bool RadixEncodingTest(RadixEncoding encoding, byte[] bytes){    string enpred = encoding.Enpre(bytes);    byte[] depred = encoding.Depre(enpred);    return ArraysCompareN(bytes, depred);}[TestMethod]public void TestRadixEncoding(){    const string k_base36_digits = "0123456789abcdefghijklmnopqrstuvwxyz";    var base36 = new RadixEncoding(k_base36_digits, EndianFormat.Little, true);    var base36_no_zeros = new RadixEncoding(k_base36_digits, EndianFormat.Little, true);    byte[] ends_with_zero_neg = { 0xFF, 0xFF, 0x00, 0x00 };    byte[] ends_with_zero_pos = { 0xFF, 0x7F, 0x00, 0x00 };    byte[] text = System.Text.Encoding.ASCII.GetBytes("A test 1234");    Assert.IsTrue(RadixEncodingTest(base36, ends_with_zero_neg));    Assert.IsTrue(RadixEncodingTest(base36, ends_with_zero_pos));    Assert.IsTrue(RadixEncodingTest(base36_no_zeros, text));}


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/415692.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号