1. 5
  1. 2

    The linker relaxation logic is one of my least favourite parts of RISC-V. Every modern linker has leaned heavily into single-pass linking (mold to an extreme degree) and into parallelisation, RISC-V requires a bunch of inherently sequential multi-pass steps for good code density. The way that some of the relocations are encoded is also a bit horrifying, where the relocation doesn’t point to the target, it points to the instruction that has a different relocation that identifies the target.

    1. 1

      RISC-V object files aren’t that hostile to parallelization. I managed to implement a multi-threaded relaxation logic to mold, and it looks like it’s working satisfactory. RISC-V object files are indeed inefficient though; it simply has too many relocations, and relaxation is not optional due to the presence of R_RISCV_ALIGN.

      For the RISC-V style paired relocation, there’s a simple but efficient way to handle them. You don’t need to associate LO12 relocations with HI20 relocations. Just write full 32-bit values (as opposed to 20-bit values) for HI20 relocations so that LO12 relocations can read the least significant 12-bits from the output buffer. After you apply all relocations, write back the original instructions for HI20 relocations. Here is the code: https://github.com/rui314/mold/blob/main/elf/arch-riscv64.cc#L376-L390

      1. 1

        If mold currently uses one iteration for linker relaxation, it may need to switch to multiple iterations to address some rare cases (even if a linker script is not used). Since symbol values will change, the displacement computation for a branch is inaccurate and is made stale by updating symbol addresses.

        HI20/LO12 do have more problems. One is GP and seems it that some folks don’t want to remove it even if the real world benefit is questionable: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/298 (I think my binutils patch is essentially rejected.) The lui relaxation requires many InputSectionBase::relocations in lld code, which has a small memory/performance impact but still not nice.

        Having InputSectionBase::relocations does hurt memory usage/performance but it probably makes certain extensions (e.g. adopting a new relocation format) more convenient.

        1. 1

          Thanks! If I read your code correctly, you’re assuming that the HI20 relocation is applied only to AUIPC instructions. This is true in the base specification but is not the case in all of the RISC-V extensions that I’ve worked with. I think it would be reasonable to specify that other instructions needed to introduce a different relocation type, but I believe that relocation types are 8-bit values (not sure if this is an ELF restriction or an internal LLVM one?) and RISC-V already needs twice as many as most architectures because equivalent load and store instructions put their immediate fields in different places and so if every vendor extension needed a non-overlapping bit of the relocation space then we could run out very quickly.

          1. 1

            I don’t think the code assume that HI20 relocation is applied only to AUIPC. It should work for any instructions.

            1. 1

              Ah, sorry, I see - you’re grabbing the original instruction from the original source. I thought you were filling it in with a pattern. I shouldn’t read code when I’m this tired.

        2. 1

          Thanks for the comments.

          where the relocation doesn’t point to the target, it points to the instruction that has a different relocation that identifies the target.

          I think this is for the PC-relative auipc used by RISC-V. RISC-V uses the first instruction (usually hi20) as the anchor, therefore the second instruction (lo12) has to know its offset to the first instruction. AArch64’s page-aligned adrp seems to solve this problem in a more elegant way, but I heard somewhere that it is patented.

          RISC-V’s linker relaxation has three main issues.

          • Assembler complexity. Seem unavoidable
          • R_RISCV_ALIGN cannot be silently ignored. This is unfixable now.
          • Relocation size overhead. I have more complaint on this point…

          ELF relocations were designed to be generic, supporting (unlimited in practice) types and both dynamic/static uses in one format. The format is very bloated and the least efficient among binary formats used by moderns OSes (Mach-O, PE/COFF). I am thinking of picking a more efficient format for static relocations. The section header is costly on ELF (sizeof(Elf64_Shdr)=64) and the ELF spirit discourages a section providing metadata for multiple sections. These points make a space-efficient format particular difficult to design. Using ELFCLASS32 on a 64-bit architecture wouldn’t be too bad but ELFCLASS32 is somewhat unfortunately taken by not-commonly-used ILP32 ABIs…