Thursday, July 26, 2012

ARM's New 64 Bit Instruction Set

You may have heard that ARM, whose CPUs are extremely popular for embedded devices, is trying to move into the low-power server market. One of the current main difficulties for using ARM processors in servers is that it is only a 32 bit architecture (A32). That means, that a single process can address at most 4GB of memory (and some of it is reserved for the OS kernel). That isn't a problem on current embedded devices, but it can be on large multi-threaded server applications. To address this issue ARM has been working on a 64 bit instruction set (A64). To my knowledge there is no commercially available hardware that implements this instruction set, yet, but ARM has already released patches to the Linux kernel to support it.

To my surprise, this new 64 bit instruction set is quite different from the existing 32 bit instruction sets. (Perhaps, I shouldn't be surprised since the two Thumb instruction sets were indeed quite different from the existing instruction sets.) It looks like a very clean RISC-style design. Here are my highlights:
  • All instructions are 32 bits wide (unlike the Thumb variants, but like the original A32).
  • 31 general purpose 64 bit wide registers (instead of 14 general purpose 32-bit registers in A32). The 32nd register is either hardwired to zero or the stack pointer. These registers can be accessed as 32 bit (called w0, w1, ..., w31) or 64 bit registers (called x0, x1, ..., x31).
  • Neither the stack pointer (SP) nor the program counter (PC) are general purpose registers. They are only read and modified by certain instructions.
  • In A32, most instructions could be executed conditionally. This is no longer the case.
  • Conditional instructions are not executed conditionally, but instead pick one of two inputs based on a condition. For example, the "conditional select" instruction CSEL x2, x4, x5, cond implements x2 = if cond then x4 else x5. This subsumes a conditional move: CMOV x1, x2, cond can be defined as a synonym for CSEL x1, x2, x1, cond. There are many more of these conditional instructions, but they all will modify the target register.
  • A conditional compare instruction can be used to implement C's short-circuiting semantics. In a conditional compare the condition flags are only updated if the previous condition was true.
  • There is now an integer division instruction. However, it does not generate an exception/trap upon division by zero. Instead (x/0) = 0. That may seem odd, but I think it's a good idea. A conditional test before a division instruction is likely to be cheaper than a kernel trap.
  • The virtual address space is 49 bits or 512TB. Unlike x86-64/AMD64, where the top 16 bits must all be zero or all one, the highest 8 bits may optionally be usable as a tag. This is configured using a system register. I'm not sure if that will require kernel support. It would certainly come in handy for implementing many higher-level programming languages.
  • A number of instructions for PC-relative addressing. This is useful for position independent code.
  • SIMD instruction support is now guaranteed. ARMv8 also support for crypto instructions. These are also available in A32.
All the existing ARM instruction sets (except perhaps Jazelle) will still be supported. I don't think you can dynamically switch between different instruction sets as was the case for A32/Thumb, though.

Further reading:


lewurm said...

nice post, I'm really excited about this harsh step on ISA level. btw, there is also an article on lwn:

Unknown said...

In the future (or maybe just a few year's), ARM processor will be replaces X86 processor if ARM still moving forward 3X faster than the veteran Intel. We can see how long ARM have take a time from single-core to quad-core, still in miliWatt power consumption.