从非直接字节缓冲区获取/输入比从直接字节缓冲区获取/输入更快吗?
如果将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较(大多数系统为低字节序,直接字节缓冲区的默认值为大字节序),则性能非常相似。
如果使用本机有序字节缓冲区,则对于多字节值,性能可能会明显更好。因为
byte无论您做什么,它都没什么区别。
在HotSpot /
OpenJDK中,ByteBuffer使用Unsafe类,并且许多
native方法都被视为内在函数。这是依赖于JVM的,并且AFAIK
Android VM将其视为最新版本中的固有特性。
如果转储生成的程序集,则可以在一条机器代码指令中看到“不安全”中的内在函数。即,它们没有JNI调用的开销。
实际上,如果您要进行微调,则可能会发现ByteBuffer getXxxx或setXxxx的大部分时间都用于边界检查,而不是实际的内存访问。因此,我仍然
必须在必须达到 最高性能的情况下直接使用Unsafe (注意:Oracle不鼓励这样做)
如果我必须从直接字节缓冲区读取/写入,最好先读取/写入线程本地字节数组,然后再用字节数组完全更新(用于写入)直接字节缓冲区吗?
我不愿看到比这更好的东西。;)听起来很复杂。
通常,最简单的解决方案会更好,更快。
您可以使用此代码自己对此进行测试。
public static void main(String... args) { ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); for (int i = 0; i < 10; i++) runTest(bb1, bb2);}private static void runTest(ByteBuffer bb1, ByteBuffer bb2) { bb1.clear(); bb2.clear(); long start = System.nanoTime(); int count = 0; while (bb2.remaining() > 0) bb2.putInt(bb1.getInt()); long time = System.nanoTime() - start; int operations = bb1.capacity() / 4 * 2; System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);}版画
Each putInt/getInt took an average of 83.9 nsEach putInt/getInt took an average of 1.4 nsEach putInt/getInt took an average of 34.7 nsEach putInt/getInt took an average of 1.3 nsEach putInt/getInt took an average of 1.2 nsEach putInt/getInt took an average of 1.3 nsEach putInt/getInt took an average of 1.2 nsEach putInt/getInt took an average of 1.2 nsEach putInt/getInt took an average of 1.2 nsEach putInt/getInt took an average of 1.2 ns
我很确定JNI调用花费的时间超过1.2 ns。
为了证明它不是“ JNI”调用,而是引起延迟的周围信号。您可以直接使用Unsafe编写相同的循环。
public static void main(String... args) { ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder()); for (int i = 0; i < 10; i++) runTest(bb1, bb2);}private static void runTest(ByteBuffer bb1, ByteBuffer bb2) { Unsafe unsafe = getTheUnsafe(); long start = System.nanoTime(); long addr1 = ((DirectBuffer) bb1).address(); long addr2 = ((DirectBuffer) bb2).address(); for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4) unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i)); long time = System.nanoTime() - start; int operations = bb1.capacity() / 4 * 2; System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);}public static Unsafe getTheUnsafe() { try { Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); return (Unsafe) theUnsafe.get(null); } catch (Exception e) { throw new AssertionError(e); }}版画
Each putInt/getInt took an average of 40.4 nsEach putInt/getInt took an average of 44.4 nsEach putInt/getInt took an average of 0.4 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 nsEach putInt/getInt took an average of 0.3 ns
因此,您可以看到该
native调用比JNI调用的预期要快得多。此延迟的主要原因可能是二级缓存速度。;)
全部在i3 3.3 GHz上运行



