Overengineering a Global Descriptor Table in C++

1 December 2022

x86 is a complex architecture with numerous functionalities and possible configuration options (naturally, given it being categorised as a complex instruction set - it certainly lives up to such a label). At the start of the long journey of building my own operating system, I noticed a recurring theme while trying my best to not make the emulated computer crash - everything is specified with tables. Memory segmentation, paging, interrupt handling - all specified by defining table structures in memory and then informing the CPU of their existence using special-purpose assembly instructions (lgdt for the Global Descriptor Table, lidt for the Interrupt Descriptor Table, and so on). The focus of this article is on only the first table a budding x86 operating system developer is likely to define: the Global Descriptor Table (GDT). The GDT is quite an annoying table because for the majority of OSes it is essentially useless, but still it must be declared or you risk causing your future self much frustration. Let’s take a look at how it works.

The Global Descriptor Table is Pointless

The purpose of the GDT is to define memory segmentation. Memory segmentation is a system of dividing up computer memory into, as the name would imply, segments (i.e., subdivisions of memory). Different segments can have different privilege levels (to prevent user code from meddling with the kernel for example), be designated as storing only code or only data, or be made read-only. Virtual memory is also supported using segmentation.

Having fine-grain control over memory like this certainly sounds appealing (for security purposes and preventing memory corruption for example) and virtual memory is certainly useful, so why did I call the GDT, the structure which defines how memory is segmented, useless? Because there’s an alternative system available - paging. Paging is a memory management system that offers all the benefits of memory segmentation but with additional flexibility. I intend to write in detail about paging in a future article so keep a look out for that!

For compatibility purposes, the CPU boots in 16-bit real mode which places significant restrictions on what our operating system can do (in 16-bit real mode only a single megabyte of memory can be addressed versus up to four gigabytes in 32-bit protected mode, to give one example). Setting up the GDT is a requirement to transition to 32-bit protected mode and access all (or most of) the computer’s resources.

One of the nice advantages of using a preexisting bootloader such as GRUB is that the transition to 32-bit protected mode is typically handled for us. So that means GRUB handles the GDT, right? While it does create and load a GDT (hurrah!), we do still need to define our own once control is handed over from GRUB (aww). This is because while it is probably possible to get away just using GRUB’s GDT, there is a risk it may get overwritten, corrupted, or otherwise cause problems because it exists only for the purpose of transitioning to protected mode and not for long term use. As a sensible operating system developer there’s no getting out of it - you will define your own GDT and you will enjoy it!

Template of a Table

All of the various tables in the x86 universe share a number of common features: a starting address in memory, a type of entry (i.e., what fields are contained in each row of the table), some number of entries/rows, and a process by which the CPU is informed of the table’s existence. We can define a class template which models all of this:

template <typename Entry>
class Table {
 public:
  explicit Table(uint8_t* ptr) : base_addr(reinterpret_cast<uint32_t>(ptr)), current_ptr(ptr) {}
  explicit Table(uint32_t addr) : base_addr(addr), current_ptr(reinterpret_cast<uint8_t*>(addr)) {}

  virtual void write(Entry entry) = 0;
  virtual void load() const = 0;

 protected:
  const uint32_t base_addr;
  uint8_t* current_ptr;
};

Here our type parameter Entry defines what fields each table row will have, write will write an entry to current_ptr, the base_addr member is the 32-bit address at which the table table will start being written from, and load will tell the CPU to start using our new table.

Creating the GDT

Rows in the GDT are in memory structured as follows:

Base Flags Limit Access Byte Base Limit
1 byte 4 bits 4 bits 1 byte 3 bytes 2 bytes

The GDT page on the OS Dev wiki gives a nice description of what each of these columns mean - I won’t bother repeating that here.

It is possible to have a table row be a tightly-packed struct and then write it directly to the appropriate location in memory like:

struct Row {
  uint16_t limit_low;
  uint16_t base_low;
  uint8_t base_middle;
  uint8_t access;
  uint8_t limit_high_and_flags;
  uint8_t base_high;
} __attribute__((packed)); // force compiler to tightly pack the struct in memory

There are two disadvantages to this approach from a usability perspective. Firstly, encoding the limit and base is a pain due to the need to split each across multiple fields in the struct - it would be much nicer if we could just write the base as a single 32-bit value rather than splitting it across 3 different struct fields. The other annoyance is that flags and the access byte both encode quite a few different options in a way that a numerical value does not make clear. For example, the flags value 0b0100 indicates that the limit is expressed in 1 byte blocks and that the current GDT row defines a 32-bit protected mode segment - is it possible to tell as much without looking it up? Consider the following example:

Row{
  .limit_low = 0x00FF,
  .base_low = 0x0000,
  .base_middle = 0x01,
  .access = 0b10010110, // what does this mean?!
  .limit_high_and_flags = 0b00001111, // and this?!
  .base_high = 0,
}

This would work fine but it isn’t very readable in my opinion. Instead, I opted to make things a bit more high level. Beginning with the access byte:

enum class DescriptorType : uint8_t {
  //          S  E
  System,  // 0  0
  Data,    // 1  0
  Code,    // 1  1
};

enum class DataSegmentDirection : uint8_t {
  Up,    // DC = 0
  Down,  // DC = 1
};

enum class CodeSegmentConforming : uint8_t {
  ExactRing,         // DC = 0
  EqualOrLowerRing,  // DC = 1
};

class Access {
 public:
  static Access null() {
    return Access(false, DescriptorType::System, tables::PrivilegeLevel::Kernel);
  }
  static Access system(tables::PrivilegeLevel level) {
    return Access(true, DescriptorType::System, level);
  }
  static Access data(tables::PrivilegeLevel level, DataSegmentDirection direction, bool readable) {
    return Access(true, DescriptorType::Data, level, readable, direction);
  }
  static Access code(tables::PrivilegeLevel level, CodeSegmentConforming conforming, bool writable) {
    return Access(true, DescriptorType::Code, level, writable, DataSegmentDirection::Up, conforming);
  }

  uint8_t as_byte() const {
    uint8_t b = present ? 0b10000000 : 0;

    b |= static_cast<uint8_t>(tables::privilege_level_as_crumb(privilege_level) << 5);

    switch (type) {
      case DescriptorType::System:
        break;
      case DescriptorType::Data:
        b |= (1 << 4);
        break;
      case DescriptorType::Code:
        b |= (0b11 << 3);
        break;
    }

    bool data_seg_down = (data_seg_direction == DataSegmentDirection::Down);
    bool code_seg_equal_or_lower_ring = (code_seg_conforming == CodeSegmentConforming::EqualOrLowerRing);
    if (data_seg_down || code_seg_equal_or_lower_ring) {
      b |= (1 << 2);
    }

    if (readable_writable) {
      b |= 0b10;
    }

    return b;
  }

  const bool present;
  const DescriptorType type;
  const tables::PrivilegeLevel privilege_level;
  const bool readable_writable;
  const DataSegmentDirection data_seg_direction;
  const CodeSegmentConforming code_seg_conforming;

 private:
  Access(
      bool p,
      DescriptorType t,
      tables::PrivilegeLevel l,
      bool rw = false,
      DataSegmentDirection d = DataSegmentDirection::Up,
      CodeSegmentConforming c = CodeSegmentConforming::ExactRing)
      : present(p), type(t), privilege_level(l), readable_writable(rw), data_seg_direction(d), code_seg_conforming(c) {}
};

One of the slightly awkward things about the access byte is that the behaviour of some of its bits is dependent on the type of segment in question (code, data, or segment) - this led to the decision to use multiple constructors to logically model this (well, technically just one private constructor and then a number of public static functions returning Access, for all you pedants out there). In a sense, I made an effort to model a sum type - something which C++ doesn’t really support. The equivalent code in Haskell is much more succinct:

import Data.Bits

data PrivilegeLevel = User | Kernel

privilegeLevelAsCrumb (User) = 0b11
privilegeLevelAsCrumb (Kernel) = 0

data Direction = GrowsUp | GrowsDown deriving (Eq)
data Conforming = ExactRing | EqualOrLowerRing deriving (Eq)
data Access = Null
            | System PrivilegeLevel
            | Data PrivilegeLevel Direction Bool
            | Code PrivilegeLevel Conforming Bool

accessAsByte (Null) = 0

accessAsByte (System privilege) =
    0b10010000 + (shiftL (privilegeLevelAsCrumb privilege) 5)

accessAsByte (Data privilege direction readable) =
    0b10010000 + (shiftL dpl 5) + (shiftL (fromEnum isDown) 2) + (shiftL (fromEnum readable) 1)
    where dpl = privilegeLevelAsCrumb privilege
          isDown = (direction == GrowsDown)

accessAsByte (Code privilege conforming writable) =
    0b10011000 + (shiftL dpl 5) + (shiftL (fromEnum isConforming) 2) + (shiftL (fromEnum writable) 1)
    where dpl = privilegeLevelAsCrumb privilege
          isConforming = (conforming == EqualOrLowerRing)

Much nicer. Unfortunately, much to the disdain of current me, me of a couple months ago decided to use C++ instead of a more elegant language.

In addition to those show above, various other enums and structs are defined so that flags can also be expressed in a human-readable manner, ultimately culminating in the GDT entry struct looking like:

struct EntryGDT {
  const uint32_t base;
  const uint32_t limit;
  const Access access;
  const Flags flags;
};

An entry may then be created like so:

EntryGDT{
      .base = 0x10000,
      .limit = 0xFFFFF,
      .access = Access::data(
          PrivilegeLevel::Kernel,
          DataSegmentDirection::Down,
          true),
      .flags = Flags{
          .granularity = GranularityFlag::Page,
          .size = SizeFlag::Protected32,
      },
  }

Hopefully I am not alone in thinking this is much more readable than the tightly-packed struct alternative!

Loading the GDT

Defining the GDT structure in memory is all well and good but of no use until the CPU is actually made to use it. How do we do that?

As mentioned earlier, the lgdt is the assembly instruction that we need, but it’s not enough to just slap that instruction somewhere in our code. First, we need to create a small structure to describe where in memory our GDT starts and ends and then we can pass the address of that structure to the lgdt instruction. Let’s create a function which we can call from C++ to do this:

section .bss
  gdt_descriptor: resb 6

section .text
  global load_gdt:function
  load_gdt:
    mov ax, [esp + 4] ; limit parameter
    mov [gdt_descriptor], ax

    mov eax, [esp + 8] ; base parameter
    mov [gdt_descriptor + 2], eax

    lgdt [gdt_descriptor]

The lgdt instruction does not automatically update the various segment registers (cs, ds, es, etc.) automatically - we need to do that ourselves. With the exception of cs (the code segment register), updating these registers is quite simple - each just needs to be set to the offset of the entry describing the relevant data segment within the GDT using mov instructions. For example, if the data segment is described in the second row of the GDT after the null row, then each segment register should be set to the value 0x10 (2 times the size of each 8 byte entry).

Setting the cs register is a bit different as it cannot be modified with mov directly. Instead, we can use what is known as a far jump (a jump to address and segment address pair). Assuming the code segment is described in the 1st row after the null row, then:

jmp 0x8:.label
.label:

But what if we don’t want to hardcore this 0x8 value? We’ve built this system in C++ to construct an arbitrary GDT - what if the code segment entry isn’t defined at the 0x8 offset? Let’s see if we can take the code segment entry offset as an argument on the stack instead:

jmp [esp + 12]:.label ; error: comma, decorator or end of line expected
.label:

This doesn’t compile as the far jump instruction expects immediate values that are known at compile time. Fortunately, it’s possible to get around this with a slightly hacky approach using the retf instruction. This instruction is to ret what the far jump instruction is to the common or garden jmp label instruction, or in other words it is for performing a far jump when returning from a function call. By strategically placing the segment and address on the stack, we can trick retf into thinking we’re returning from a function call and therefore set cs to a value not known at compile time:

push dword [esp + 12] ; segment to jump to
push .label ; label to jump to
retf

This is equivalent to our jmp instruction before except it actually compiles!

Bringing it all together, our final GDT-loading assembly code is as follows:

; void load_gdt(uint32_t limit, uint32_t base, uint32_t code_seg_offset, uint32_t data_seg_offset)
global load_gdt:function
load_gdt:
  mov ax, [esp + 4] ; limit parameter
  mov [gdt_descriptor], ax

  mov eax, [esp + 8] ; base parameter
  mov [gdt_descriptor + 2], eax

  lgdt [gdt_descriptor] ; load the GDT

  ; set CS register appropriately
  push dword [esp + 12] ; segment to jump to (code_seg_offset parameter)
  push .flush ; label to jump to
  retf ; far return is used as bit of a hack to perform `jmp segment:offset` where `segment` isn't an immediate

.flush:
  mov ax, [esp + 16] ; offset of the data segment (data_seg_offset parameter)
  mov ds, ax
  mov es, ax
  mov fs, ax
  mov gs, ax
  mov ss, ax

  ret

Back in C++, we can now implement the GlobalDescriptorTable::load method like so:

extern "C" void load_gdt(uint32_t limit, uint32_t base, uint32_t code_seg_offset, uint32_t data_seg_offset);

void GlobalDescriptorTable::load() const {
  auto limit = reinterpret_cast<uint32_t>(current_ptr) - base_addr - 1;
  load_gdt(limit, base_addr, code_seg_offset, data_seg_offset);
}

Amazing - now we can call GlobalDescriptorTable::write a few times, then call load, and we will have memory segmentation. More importantly, we will most likely never have to think about the GDT again (hopefully)!

That concludes our time looking at the global descriptor table. In my next article I hope to begin diving into paging - the superior, more complex, and more widely used x86 memory management scheme. Thanks for reading!

Relevant Source Code

Sources/Additional Reading