Assembly Arithmetic and Logic Instruction

Embedded Systems with ARM Cortex-M #6

Data processing instructions can be classified into seven categories:

  1. Arithmetic Instructions

  2. Reorder Instructions

  3. Extension Instructions

  4. Bitwise Logic Instructions

  5. Shift Instructions

  6. Comparison Instructions

  7. Data Copy Instructions

Program Status Register

Cortex-M processors have five status flags: N, Z, V, C, Q

  • The negative flag (N) is set if the result of ALU is negative and is cleared otherwise.

  • The zero flag (Z) is set if the ALU result is zero and is cleared otherwise.

  • The carry flag (C) is set if a carry occurs in unsigned addition and is cleared otherwise.
    For unsigned subtraction, it is set if no borrow has occurred, and is cleared otherwise.

  • The overflow flag (V) is set if an overflow takes place when performing a signed addition or subtraction and is cleared otherwise.

  • The saturation flag (Q) is set if an SSAT or USAT instruction causes saturation and is cleared otherwise.

Most data processing instructions of Cortex-M processors have an option to update these ALU status flags. These flags are stored in the program status register (PSR).

The program status register is a combination of three special registers:

  1. The application program status register (APSR)

  2. The interrupt program status register (IPSR)

  3. The execution program status register (EPSR)

Because APSR, IPSR, and EPSR have no overlap in bit fields, the processor combines them into one register PSR, or called xPSR to allow convenient accesses.

In APSR, the GE flags indicate whether the corresponding results are greater than or equal to zero. The GE flags are only available on Cortex-M4 and M7.

In EPSR, the T flag indicates whether the processor is in Thumb state or ARM32 state. Since Cortex-M processors only support Thumb-2/Thumb instructions, the T flag has a fixed value of 1 in Cortex-M.
Additionally, the IT bit fields (IT [ 7: 6] and IT [ 5: 0]) in EPSR hold the condition states associated with the current IF-THEN (IT) block. An IT block is a convenient approach to implementing conditionally executed instructions.

In IPSR, the least significant 9 bits are zero if the processor is in the thread mode, or the exception or interrupt number if the processor is in the handler mode. On reset, the processor is in the thread mode.

These special registers can only be accessed by using two special instructions:

  • MRS (move from a special register to a general register)

  • MSR (move from a general register to a special register)

Specifically, MRS reads these registers, and MSR writes to these registers. The following gives a few examples.

MRS r0, apsr        ; Read APSR
MRS r0, ipsr        ; Read IPSR
MRS r0, epsr        ; Read EPSR
MRS r0, xpsr        ; Read APSR, IPSR, and EPSR
MSR apsr_nzcvq, r0  ; Change N,Z,C,V,Q flags in APSR
MSR apsr_g, r0      ; Copy r0{19:16) to GE{3:0] in APSR (Not on Cortex-M3)
MSR apsr_nzcvqg, r0 ; Change N,Z,C,V,Q and GE flags (Not on Cortex-M3)

Updating Program Status Flags

It is an option for an arithmetic or logic instruction to set the processor status flags. If the 'S' suffix is appended to an instruction mnemonic, the processor modifies the status flags based on the computation result.

For example, the ADDS instruction changes the N, Z, C, and V flags when performing addition.

On the contrary, ADD cannot change these flags. If an instruction does not update these flags, the existing value of each flag, set by a previous instruction, is preserved.

Data comparison instructions, such as CMP (compare), CMN (compare negative), TST (test), and TEQ (test equivalence), set these flags even though they do not have the 'S' suffix.

ADD  r1, r2, r3  ; r1 = r2 + r3, but will not update N, z, c, and v flags
ADDS r1, r2, r3  ; r1 = r2 + r3, and will update N, Z, C, and V flags

While the first instruction ADD does not change the N, Z, C, and V flags, the second instruction ADDS modifies the flags in the following ways:

  1. The overflow flag by assuming that r2 and r3 hold signed integers represented in two's complement.

  2. The carry flag by assuming that r2 and r3 hold unsigned integers.

  3. The zero flag by checking whether the result saved in the destination register r1 is zero or not.

  4. The negative flag by checking the sign bit of r1 (the most significant bit of r1).

If the Barrel shifter is used, the source operand may update the program status flags.

For example, the bitwise logical ANDS instruction can update the N, Z, and C flags. In the following instruction, the N flag is set if the most significant bit of r1 is 1, and the Z flag is set if r1 equals 0.

ANDS r1, r2, r3     ; r1 = r2 AND r3

It is easy to understand that most logical instructions do not update the overflow flag. How does a logical instruction update the carry flag? The answer lies in the second operand of a logical instruction. If the second operand uses the Barrel shifter, then the processor updates the carry flag based on the shift or rotation result.

ANDS r1, r2 , r3, LSL #3  ; r1 = r2 AND (r3 << 3)

When MOVS uses the Barrel shifter, the processor also updates the Z, N, and C flags.

MOVS r2, r1, LSR #3     ; r2 = r1 << 3

However, the Barrel shifter does not change the flags if it is employed in an arithmetic instruction. For example, in the following instruction, the flags depend on the result of addition, instead of logical shift left.

ADDS r1, r2, r3, LSL #3     ; r1 = r2 AND (r3 << 3)

If the program is written in assembly, it is the programmer's responsibility to interpret and use these flags correctly. For programs written in high-level languages, compilers automatically interpret these flags.

Shift and Rotate

As shown in the below figure, the second ALU operand is equipped with a Barrel shifter, which is a special digital circuit for quick shift and rotation. Barrel shifters are usually not available on other processors such as PIC and AVR.

There are five types of shift and rotate operations: LSL, LSR, ASR, ROR, and RRX

  • LSL (logical shift left) moves all bits of a register value left by n bits and zeros are shifted in at the right end.
    LSL is equivalent to multiplication by 2**n (<< operation in C).

  • LSR (logical shift right) moves all bits of a register value right by n bits and zeros are shifted in at the left end.
    LSR is equivalent to unsigned division by 2**n (>> operation on signed numbers in C).

  • ASR (arithmetic shift right) moves all bits right by n bits and copies of the left most bit (the sign bit) are shifted in at the left end.
    ASR is equivalent to signed division by 2**n (>> operation on signed numbers in C).

  • ROR (rotate right) is the circular shift, in which all 32 bits are shifted right simultaneously as if the right end of the register is joined with its left end. The bit shifted out from the right end of the register is copied into the carry bit. The carry bit can be optionally used to update the carry flag of the processor status register.

  • RRX (rotate right with extend) works similarly to ROR except that the carry bit joins the rotating circle, and RRX can rotate the data by only one bit.

LSL r1, r2     ; r1 = r1 << r2
LSL r1, #3     ; r1 = r1 << 3
LSL r1, r2, #3 ; r1 = r2 << 3
LSL r1, r2, r3 ; r1 = r2 << r3
ROR r1, r2     ; r1 = rotate r1 by r2 bits
RRX r1, r2     ; rotate r2 right by one bit (with extension)

The C language does not provide rotate operations (ROR and RRX). The compiler automatically uses a rotation instruction if it can improve the performance. Besides, ARM assembly language does not provide a rotate left assembly instruction. However, a rotate left by (n) bits can be replaced with a rotate right by (32 - n) bits. For example, rotating left by 6 bits has the same result as rotating right by 26 bits.

Note the carry bit is not the carry flag of the processor status register. Therefore, none of these shift and rotate instructions updates the status flags by default. If these flags need to be updated, a shift or rotate instruction must have the suffix S specified. What's more, these instructions cannot modify the overflow flags.

LSL r1, #3     ; r1 = r1 << 3, but will not update the flags
LSLS r1, #3    ; r1 = r1 << 3, and will update the N, Z, C flags
               ; LSLS does not update the V flag

Programs often use the Barrel shifter to replace slow multiplication and division instructions to improve the speed, as shown below.

ADD r0, r2, r1, LSL #1    ; r0 = r2 + r1 << 1 = r2 + 2 x r1
ADD r1, r0, r0, LSR #3    ; r1 = r0 + r0 >> 3 = r0 + r0/8

The Barrel shifter used in a move (MOVS and MVNS), and logical/bitwise instruction with the 'S' suffix (such as ANDS, ORRS, EORS, BICS) updates the carry flag.

Arithmetic Instructions

The below table lists arithmetic instructions that produce 32-bit results.

InstructionDescription
ADD {Rd,} Rn, Op2Add. Rd = Rn + Op2
ADC {Rd,} Rn, Op2Add with Carry. Rd = Rn + Op2 + Carry
SUB {Rd,} Rn, Op2Subtract. Rd = Rn - Op2
SBC {Rd,} Rn, Op2Subtract with Carry. Rd = Rn - Op2 + Carry - 1
RSB {Rd,} Rn, Op2Reverse Subtract. Rd = Op2 - Rn
MUL {Rd,} Rn, RmMultiply. Rd = Rn * Rm
MLA Rd, Rn, Rm, RaMultiply with Accumulate. Rd = Ra + (Rn * Rm)
MLS Rd, Rn, Rm, RaMultiply and Subtract. Rd = Ra - (Rn * Rm)
SDIV {Rd,} Rn, RmSigned Divide. Rd = Rn/Rm
UDIV {Rd,} Rn, RmUnsigned Divide. Rd = Rn/Rm
SSAT Rd, #n, Rm{,shift #s}Signed Saturate
USAT Rd, #n, Rm{,shift #s}Unsigned Saturate

Addition and Subtraction Instructions

Most of these instructions take two source operands, and the 32-bit result is saved in a destination register. While the first source operand is a register, the second source operand is flexible and can be a register, an immediate constant, or an inline Barrel shifter.

SUB r3, r2, r1          ; r3 = r2 - r1
RSB r3, r2, #987        ; r3 = 987 - r2
ADD r0, r0, r0, LSL #3  ; r0 = r0 + (r0 << 3)

If an instruction has three operands, the second operand cannot be a constant number in most instructions (except SSAT and USAT).

SUB r0, #1, r3  ; Not allowed, causing a syntax error

The example below shows the implementation of subtracting two 96-bit integers by using SUB and SBC. A 96-bit integer is saved in three registers.

; C = A - B
; Subtracting two 96-bit integers A (r2:r1:r0) and B (r5:r4:r3).
; Three registers to hold a 96-bit integer: upper, middle, Lower word
; Result C (r8:r7:r6)
, A = 00001234,00000002,FFFFFFFF
; B = 12345678,00000004,00000001

LDR r0, =0xFFFFFFFF  ; A's Lower 32 bits
LDR r1, =0x00000002  ; A's middle 32 bits
LDR r2, =0x00001234  ; A's upper 32 bits

LDR r3, =0x00000001  ; B's Lower 32 bits
LDR r4, =0x00000004  ; B's middle 32 bits
LDR r5, =0x12345678  ; B's upper 32 bits

; Subtract A from B
SUBS r6, r0, r3 , C[31:0] = A[31:0] - B[31:0], update carry

; Carry f Lag is 1 if no borrow has occurred in the previous subtraction
SBCS r7, r1, r4 ; C[64:32] = A[64:32] - B[64:32] + carry - 1, update carry
SBC  r8, r2, r5 ; C[96:64] = A[96:64] - B[96:64] + carry - 1

Short Multiplication and Division Instructions

The result of a multiplication may have more than 32 bits. However, the destination register only holds the least significant 32 bits (LSB32) of the result.

MUL r6, r4, r2      ; signed multiply, r6 = LSB32( r4 x r2 )
UMUL r6, r4, r2     ; unsigned multiply, r6 = LSB32( r4 x r2 )
MLA r6, r4, r1, r0  ; r6 = LSB32( r4 x rl ) + r0
MLS r6, r4, r1, r0  ; r6 = LSB32( r4 x rl ) - r0
SDIV r3, r2, r1     ; signed divide, r3 = r2/r1
UDIV r3, r2, r1     ; unsigned divide, r3 = r2/r1

Long Multiplication Instructions

Long multiplication instructions that produce 64-bit result.

InstructionDescription
UMULL RdLo, RdHi, Rn, RmUnsigned long multiply. RdHi, RdLo = unsigned (Rn * Rm)
SMULL RdLo, RdHi, Rn, RmSigned long multiply. RdHi, RdLo = signed (Rn * Rm)
UMLAL RdLo, RdHi, Rn, RmUnsigned long multiply with accumulate.

RdHi, RdLo = unsigned (RdHi, RdLo + Rn * Rm) | | SMLAL RdLo, RdHi, Rn, Rm | Signed long multiply with accumulate.
RdHi, RdLo = signed (RdHi, RdLo + Rn * Rm) |

Two registers are used to store a 64-bit result, with the high register (RdHi) holding the most significant 32 bits, and the low register (RdLo) holding the least significant 32 bits.

UMULL r3, r4, r0, rl   ; r4:r3 = r0 x r1, r4 = MSB bits r3 = LSB bits
SMULL r3, r4, r0, rl   ; r4:r3 = r0 x r1
UMLAL r3, r4, r0, rl   ; r4:r3 = r4:r3 + r0 x r1
SMLAL r3, r4, r0, rl   ; r4:r3 = r4:r3 + r0 x r1

Saturation Instructions

The saturation instructions limit a given input to a configurable signed or unsigned range. When the input value exceeds the specified range, its output is then set as the maximum or minimum value of the selected range. Otherwise, the output is equal to the input. The saturation instructions take one immediate source operand and one register source operand.

$$SSAT(x) = \left\{ \begin{array}{cl} 2^{n-1} - 1 & : \ x \gt 2^{n-1} - 1\\ -2^{n-1} & : \ x \lt 2^{n-1} \\ x & : \ otherwise \end{array} \right.$$

$$USAT(x) = \left\{ \begin{array}{cl} 2^{n} - 1 & : \ x \gt 2^{n} - 1\\ x & : \ otherwise \end{array} \right.$$

The following gives two examples in which n is 11. Note the second operand is an immediate number in SSAT and USAT.

SSAT r2, #11, r1  ; output range: -2^10 : 2^10
USAT r2, #11, r1  ; output range: 0 : 2^11

Barrel Shifter

The key advantage of Barrel shifters is that it can shift or rotate a register by a specified number of bits in one clock cycle.

Typically, a Barrel shifter is implemented as a cascade of parallel 2-to-1 multiplexers. The below figure gives an example implementation of a four-bit Barrel shifter that performs rotate right. The S1 S0 indicates the amount of rotation. The implementation of logic shift is similar, except that a zero bit is shifted in either from the right end or the left end.

four-bit Barrel shifter that performs rotate right

Truth table of rotation right

The Barrel shifter is special hardware that can perform shift and rotation on the second ALU source operand. Therefore, not only can a shift and rotate instruction be used as a standalone assembly instruction, but it can also be utilized in other instructions to make changes to the second source operand.

ASR and LSR differ on whether the sign is preserved. For example:

ADD r1, r0, r0, LSL #3    ; r0 = r0 + r0 << 3 = r0 + 8 * r0
ADD r1, r0, r0, LSR #3    ; r1 = r0 + r0 >> 3 = r0 + r0/8 (unsigned)
ADD r1, r0, r0, ASR #3    ; r1 = r0 + r0 >> 3 = r0 + r0/8 (signed)

We can leverage Barrel shifter to speed up the application.

  • Without Barrel shifter, two separate instructions would be required to carry out each of the above instructions. This would not only increase the size of a binary program but also take more processor cycles to complete the same task.

  • Barrel shifter can also replace slow multiplication instructions, as shown in the following example.

ADD r1, r0, r0, LSL #3    r1 = r0 + r0 * 8 = r0 * 9
; Instead of
MOV r2, #9        ; r2 = 9
MUL r1, r0, r2    ; r1 = r0 * 9

Bitwise Logic Operations

Bitwise operations treat input operands as a sequence of binary bits, rather than as integer numbers. The computation is carried out at the bit level. For example, we can reset a specific bit of a register to zero or set a specific bit a register to one, leaving the other bits unchanged.

There are four commonly used bitwise Boolean operators: AND, OR, Exclusive OR (EOR), and negation (NOT).

The following table shows the bitwise assembly instructions supported in Cortex-M.

InstructionDescription
AND {Rd,} Rn, Op2Bitwise logic AND. Rd = Rn & Op2
ORR {Rd,} Rn, Op2Bitwise logic OR. Rd = Rn I Op2
EOR {Rd,} Rn, Op2Bitwise logic XOR. Rd = Rn ^ Op2
ORN {Rd,} Rn, Op2Bitwise logic NOT OR. Rd = Rn I (NOT Op2)
BIC {Rd,} Rn, Op2Bitwise clear. Rd = Rn & (NOT Op2)
BFC Rd, #lsb, #widthBit field clear. Rd[(width+lsb-1) : lsb] = 0
BFI Rd, Rn, #lsb, #widthBit field insert. Rd[(width+lsb-1) : lsb] = Rn[(width-1) : 0]
MVN Rd, Op2Logically negate all bits. Rd = 0xFFFFFFFF EOR Op2

These instructions operate at the bit level. They perform logic operations for each pair of bits that are at the same position of inputs.

Bit Mask

We often use bit masks to manipulate a particular subset of binary bits in a single bitwise operation conveniently. For an integer N, its bit mask is constructed as follows:

  • The mask has the same number of bits in binary as the integer N.

  • Bit mask(i) is set if bit N(i) is to be operated; otherwise, mask(i) is 0.

  • If N(i) is 1, we say bit N(i) is masked.

The mask can separate the binary bits of an integer into two parts. The part selected by the bit mask is examined or modified, and the other part is ignored. If a bit in the bit mask is 1, the corresponding bit in the target variable is chosen. For example, a mask of 0b00110100 (0x34) selects bits 2, 4, and 5 of the target variable.

  • N = 0xA2 = 0b10100010

  • Mask = 0x34 = 0b00110100

Bitwise OperatorSymbolExample
AND&C = N & Mask;
ORIC = N I Mask;
EXCLUSIVE-OR (EOR)^C = N ^ Mask;
NOT~C = ~N;
SHIFT RIGHT>>C = N >> 2;
SHIFT LEFT<<C = N << 2;

The following gives C and assembly example programs to set, clear, toggle and check bits in a variable.

Checking a Bit (&)

We can check whether a bit is 1 by performing bitwise AND operation with the corresponding mask. In this example, register r2, representing variable b, is non-zero only when bit 5 in register r0 is 1.

char a = 0x34;
char mask = 1<<5;
charb;
// Check bit 5
b = a & mask;
LDR r0, #0x34      ; r0 = a
LDR r1, #(1<<5)    ; r1 = mask
ANDS r2, r0, r1    ; r2 = b

; Another solution

LDR r0, #0x34
ANDS r2, r0, #(1<<5)

Setting a Bit

ORR a bit with 1 sets this bit. ORR a bit with 0 does not change it. Therefore, ORR a variable with the mask sets all bits marked by the mask, while keeping all the other bits unchanged.

char a = 0x34;
char mask = 1<<5;
// Set bit 5
a |= mask;
LDR r0, #0x34      ; r0 = a
LDR r1, #(1<<5)    ; r1 = mask
ORR r0, r0, r1

; Another solution

LDR r0, #0x34
ORR r0, r0, #(1<<5)

Clearing a Bit (&, ~)

AND a bit with 0 clears this bit. AND a bit with 1 does not change it. Therefore, AND a variable with the negation of the mask clears all data bits marked by the mask.

char a = 0x34;
char mask = 1<<5;
// Reset bit 5
a &= ~mask;
LDR r0, #0x34        ; r0 = a
LDR r1, #(1<<5)      ; r1 = mask
MVN r1, r1           ; NOT
ANDS r0, r0, r1

; Another solution

LDR r0, #0x34
BIC r0, #(1<<5)

Toggling a Bit (^)

Exclusive OR (EOR) between 1 and a bit inverts this bit, and exclusive OR between 0 and a bit keeps the bit unchanged. Therefore, exclusive OR between a data and its mask toggles all data bits masked.

char a = 0x34;
char mask = 1<<5;
// Toggle bit 5
a ^= mask;
LDR r0, #0x34        ; r0 = a
LDR r1, #(1<<5)      ; r1 = mask
EOR r0, r0, r1

; Another solution

LDR r0, #0x34
EOR r0, r0, #(1<<5)

In C, the Boolean operations are A && B (Boolean and), A || B (Boolean or), and !B (Boolean not), which are different from the above bitwise operations.

  • The Boolean operators perform word-wide operations, not bitwise. For example, 0x10 & 0x01 equals 0x00, but 0x10 && 0x01 equals 0x01.

  • The bitwise negation expression ~0x01 equals 0xFFFFFFFE, but Boolean NOT expression !0x01 equals 0x00.

Using EQU to Define a Mask in Assembly

To make programs easier to read, we often give a name to a mask. For example, we define the bit masks for the clock enable and disable bits for GPIO ports.

RCC_AHB2ENR_GPIOAEN EQU (0x00000001) ; GPIO port A clock enable
RCC_AHB2ENR_GPIOBEN EQU (0x00000002) ; GPIO port B clock enable
RCC_AHB2ENR_GPIOCEN EQU (0x00000004) ; GPIO port C clock enable

LDR r7, =RCC_BASE                ; Address of reset and clock control (RCC)
LDR r1, [r7, #RCC_AHB2ENR]       ; Load AHB2ENR from memory into r1
ORR r1, r1, #RCC_AHB2ENR_GPIOAEN ; Enable clock of GPIO port A
ORR r1, r1, #RCC_AHB2ENR_GPIOBEN ; Enable clock of GPIO port B
ORR r1, r1, #RCC_AHB2ENR_GPIOCEN ; Enable clock of GPIO port C
STR r1, [r7, #RCC_AHB2ENR]       ; Save to RCC->AHB2ENR

By using EQU, the program defines three constants (such as RCC_AHB2ENR_GPIOAEN). These constants are bit masks, which make it easier to manipulate individual bits. It is not a good programming style to set or clear bits directly by using constants instead of a named mask, such as the following instruction.

ORR r1, r1, #0x7  ; Set bits 0, 1, and 2

Updating Program Status Flags in Assembly

The logic operations with S suffix, including ANDS, ORRS, EORS, ORNS, and MVNS update the N, Z, C flags in APSR. None of them affects the V flag. Neither BFC nor BFI updates these four flags.

It is understandable that a logical instruction with S suffix can update the negative and zero flags in APSR. You may wonder how a logic operation can change the carry flag. The carry flag is updated when the second source operand uses the Barrel shifter. For example,

ANDS r0, r1, r2, LSL #3 ; Update N, z, c flags. (V is unchanged)

The carry flag of the above ANDS operation is, in fact, the carry of the LSLS r2, #3 operation.

Reversing the Order of Bits and Bytes

Instructions for reversing the bit or byte orders are useful, particularly when data exchanged between two systems have different formats. For example, the REV instruction is useful to convert data that are exchanged between different endian systems.

InstructionDescription
RBIT Rd, RnReverse bit order in a word.

for (i = e; i < 32; i++) Rd[i] = Rn[31- i] | | REV Rd, Rn | Reverse byte order in a word.
Rd[31:24] = Rn[7:0]
Rd[23:16] = Rn[15:8]
Rd[15:8] = Rn[23:16]
Rd[7:0] = Rn[31:24] | | REV16 Rd, Rn | Reverse byte order in a word.
Rd[15:8] = Rn[7:0]
Rd[7:0] = Rn[15:8]
Rd[31:24] = Rn[23:16]
Rd[23:16] = Rn[31:24] | | REVSH Rd, Rn | Reverse byte order in bottom halfword and sign extend.
Rd[15:8] = Rn[7:0]
Rd[7:0] = Rn[15:8]
Rd[31:16] = Rn[7] & 0xFFFF |

The following gives a few examples of changing the bit or the byte order of a value stored in register r0.

LDR r0, =0x12345678  ; r0 = 0x12345678
RBIT r1, r0          ; Reverse bits, r1 = 0x1E6A2C48

LDR r0, =0x12345678  ; r0 = 0x12345678
REV r1, r0           ; Reverse byte order, r2 = 0x78563412
REV16 r2, r0 ,       ; Reserve byte order in halfwords, r2 = 0x34127856

LDR r0, =0x33448899  ; r0 = 0x33448899
REVSH r1, r0         ; Reverse bytes in Lower halfword and extend sign
                     ; r0 = 0xFFFF9988

Sign and Zero Extension

Most computers represent signed integers in two's complement. When a signed integer is converted to another signed integer with more bits, the sign bit (i.e., the most significant bit or the leftmost bit) should be duplicated to maintain the integer's sign. Duplicating the sign bit is called sign extension.

When an unsigned integer is converted to another unsigned integer with more bits, zero extension is deployed to place zeros in the upper bits of the output.

int_8 a  = -1;    // a signed 8-bit integer, a = 0xFF
int_16 b = -2;    // a signed 16-bit integer, b = 0xFFFE
int_32 c;         // a signed 32-bit integer
c = a;            // sign extension, c = 0xFFFFFFFF
c = b;            // sign extension, c = 0xFFFFFFFE

uint_8 d = 1;     // an unsigned 8-bit integer, d = 0x01
uint_32 e;        // an unsigned 32-bit integer
e = d;            // zero extension, e = 0x00000001

The below table shows the assembly instructions that perform sign and zero extension.

InstructionDescription
SXTB {Rd,} Rm {,ROR #n}Sign extend a byte.

Rd[31:0] = Sign Extend ((Rm ROR(8 * n))[7:0]) | | SXTH {Rd,} Rm {,ROR #n} | Sign extend a halfword.
Rd[31:0] = Sign Extend ((Rm ROR(8 * n))[15:0]) | | UXTB {Rd,} Rm {,ROR #n} | Zero extend a byte.
Rd[31:0] = Zero Extend ((Rm ROR(8 * n))[7:0]) | | UXTH {Rd,} Rm {,ROR #n} | Zero extend a halfword.
Rd[31:0] = Zero Extend ((Rm ROR(8 * n))[15:0]) |

The SXTB (Sign Extend Byte) instruction takes an 8-bit value from a register, sign-extends it to 32 bits, and stores the result in a destination register. The SXTB {Rd,} Rm {, ROR #n} variant of this instruction allows for an optional rotation of the source register's value before the sign extension is applied.

; r0 = 0x11228091
SXTB r1, r0    ; r1 = 0xFFFFFF91, sign extend a byte
SXTH r1, r0    ; r1 = 0xFFFF8091, sign extend a halfword
UXTB r1, r0    ; r1 = 0x00000091, zero extend a byte
UXTH r1, r0    ; r1 = 0x00008091, zero extend a halfword

Data Comparison

There are four different data comparison instructions.

CommandFunctionFlags
CMP Rn, Op2CompareSet NZCV flags on Rn - Op2
CMN Rn, Op2Compare NegativeSet NZCV flags on Rn + Op2
TST Rn, Op2TestSet NZC flags on Rn AND Op2
TEQ Rn, Op2Test EquivalenceSet NZC flags on Rn EOR Op2
  • The CMP instruction subtracts the value of Op2 from the value in Rn. It is the same as a SUBS instruction, except that the processor discards the result. CMP updates the N, Z, C, and V flags per the subtraction result.

  • The CMN instruction adds the value of Op2 to the value in Rn. CMN Rn, Op2 is like ADDS Rn, Op2 except that the result is discarded. CMN updates N, Z, C, and V.

  • The instruction TST Rn, Op2 performs a bitwise AND operation on Rn and Op2. Different from ANDS Rn, Op2, the TST instruction discards the result. TST updates the N and Z flags. If Op2 uses the Barrel shifter, TST also updates the C flag during the calculation of Op2. However, it does not affect the V flag.

  • The TEQ instruction performs a bitwise exclusive OR operation on Rn and Op2. TEQ Rn, Op2 is the same as EOR Rn, Op2 except that the result is discarded. TEQ updates the N, Z, and C flags.

TEQ and TST have different usages.

  • TEQ is to check whether two values are equal. After TEQ completes, the zero flag is set if two operands are equal; otherwise, the zero flag is clear.

  • TST is to exam whether target bits set by the second operand are clear.

  • TST cannot check the equivalence of two operands.
    For example, when r0 = 0b1010 and r1 = 0b0101, the instruction TST r0, r1 sets the zero flag because the result of AND is 0. However, these two operands are not equal.

Instructionr0r1Action
TST r0, r10b10100b0101Set Z flag
TEQ r0, r10b10100b0101Clear Z flag
TST r0, r10b10100b1010Clear Z flag
TEQ r0, r10b10100b1010Set Z flag

The following gives a few examples of data comparison.

CMP r0, #3          ; Compare r0 with 3
CMN r0, #10         ; Compare r0 with -10 
CMP r0, r1          ; Compare r0 with r1
TEQ r0, #'?'        ; Compare r0 with ASCII value of '?' (0x3F)

MOV r1, #(1<<31)    ; r1 = 0x80000000
TST r0, r1          ; check whether the sign bit is 1

Data Movement between Registers

We can classify instructions for moving data between registers into two categories:

  • Move data between two general-purpose registers

  • Move data between a general-purpose register and a special-purpose register

MOV (move) and MVN (move not) are used to copy data between two general-purpose registers.

MRS and MSR move content between special registers and general registers.

Special registers include APSR, IPSR, EPSR, IEPSR, IAPSR, EAPSR, PSR, MSP, PSP, PRIMASK, BASEPRI, BASEPRI_MAX, FAULTMASK, and CONTROL.

InstructionDescription
MOV Rd, Op2Rd = Op2
MVN Rd, Op2Rd = NOT Op2
MRS Rd, spec_regMove from special register to general register
MSR spec_reg, RmMove from general register to special register

MOV and MVN can also load an immediate value into a register.

; examples of MOV and MVN
MOV r4, r5            ; Copy r5 to r4
MVN r4, r5            ; r4 = bitwise Logical NOT of r5
MOV r1, r2, LSL #3    ; r1 = r2 « 3
MOV r0, PC            ; Copy PC (r15) to r0
MOV r1, SP            ; Copy SP (r14) to r1

; copy a special-purpose register to a general-purpose register
MRS r0 , APSR         ; Read flag state into r0
MRS r0, IPSR          ; Read exception/interrupt state into r0
MRS r0 , EPSR         ; Read execution state into r0
MRS r0, PSR           ; Copy combined CPSR, EPSR, and SPSR into r0

; copy a general-purpose register to a special-purpose register
MSR APSR, r 0          ; Write flag state
MSR BASE PRI, r0       ; Write to base priority mask register; Disable
                       ; exceptions with same or Lower priority level

Bit Field Extract

There are two instructions that extract adjacent bits from one register.

InstructionDescription
SBFX Rd, Rn, #lsb, #widthSigned Bit Field Extract

Rd[(width-1):0] = Rn[(width+lsb-1):lsb]
Rd[31:width] = Replicate (Rn[width+lsb-1]) | | UBFX Rd, Rn, #lsb, #width | Unsigned Bit Field Extract
Rd[(width-1):0] = Rn[(width+lsb-1):lsb]
Rd[31:width] = Replicate (0) |

  • The #lsb parameter, ranging from 0 to 31, specifies the starting position.

  • The #width parameter, ranging from 1 to (32 - #lsb), indicates the number of contiguous bits to be extracted.

UBFX simply places zero in the upper bits, while SBFX duplicates the sign bit. The sign bit, in this case, is not the most significant bit; instead, it is the bit at the position of #width + #lsb - 1.

The following shows two examples of extracting 8 bits from register r3, starting at bit 4. One has no sign extension, and the other has sign extension.

; Assume r3 = 0x1234CDEF
UBFX r4, r3, #4, #8    ; r4 = 0x000000DE (zero extension)
SBFX r4, r3, #4, #8    ; r4 = 0xFFFFFFDE (sign extention)