Optimization techniques are employed to improve the translated code in terms of performance.
Otions provided by compilers to control the optimization level. These flags instruct the compiler on how aggressively it should optimise, and in what manner, to potentially achieve better runtime performance, reduced code size, or a balance between the two.
For example, gcc provides a variety of optimization flags that can be used to control the optimization process. Some of the most commonly used flags include:
- O0: No optimization. Default. Fastest. Generates the most straightforward, often least performant, machine code.
- O1: Basic optimization. Reduces code size, execution time. Does not take an excessive amount of compilation time.
- O2: Further optimization. Includes almost all recommended optimizations that do not involve a space-speed trade-off.
- O3: Full optimization. Including those that might increase the generated code size.
- Os: Optimise for size. Prioritises reducing the code size over execution speed.
Instruction Compression
Section titled “Instruction Compression”In many modern architectures, including RISC designs, there’s often a fixed instruction length, ensuring simplicity in fetching and decoding operations. However, not all instructions need the full width provided, leading to potential inefficiencies in memory usage.
Instruction compression aims to address this by:
- Identifying Common Patterns
By analysing frequently used instruction sequences or patterns, these can be represented in a compressed form. - Variable-length Encoding
Instead of having a fixed length for every instruction, compressed instructions might use variable-length encoding, where frequent instructions are represented using fewer bits. - Decompression Mechanism
For execution, compressed instructions need to be decompressed. This decompression happens either in hardware (before the instruction is executed) or via specialised software routines.
Instruction Level Optimization
Section titled “Instruction Level Optimization”The process of enhancing the efficiency and performance of individual instructions in a program, often within the context of a particular ISA. Directly impacts the speed, power consumption, and overall efficiency of code execution on a hardware platform.
Multiple techniques can be used for this.
Static Scheduling
Section titled “Static Scheduling”Reordering instructions at compile-time to reduce pipeline hazards.
What it tries to achieve: • No two instructions fight for the same resource in the same cycle. • No instruction reads a value before it’s produced. • Pipeline bubbles are minimized.
Loop Unrolling
Section titled “Loop Unrolling”Increasing the loop body’s size by replicating its content multiple times, reducing the overhead of loop control. For smaller loops, the loop control can be removed entirely. Can be paired with pipeline scheduling for better results.
Reduce branch frequency and stalls. Increases register pressure.
Strip Mining
Section titled “Strip Mining”Dividing a loop (with unknown iteration count) into smaller loops. Each smaller loop operates on a smaller subset of data. Improves cache locality and reduce the number of iterations required to complete the loop.
Split loop into:
- Small leftover loop (
n mod k) - Main big loop (
n/k) unrolled version.
Function Inlining
Section titled “Function Inlining”Replacing a function call with the actual body of the function. Avoids the overhead of function calling. Causes duplicated code; bigger executable file.
Precompute values
Section titled “Precompute values”Replacing computations with their values at compile time. Beneficial when dealing with invariant values inside loops or frequently called functions.