资源与支持

SiFive 博客

来自 RISC-V 专家的最新洞察与深度技术解析

December 11, 2017

All Aboard, Part 9: Paging and the MMU in the RISC-V Linux Kernel

This entry will cover the RISC-V port of Linux's memory management subsystem. Since the vast majority of the memory management code in Linux is architecture-independent, the vast majority of our memory management code handles interfacing with our MMU, defining our page table format, and interfacing with drivers that have memory allocation constraints.

I will refrain from discussing the RISC-V memory model in this blog, both because it isn't yet finished and because it's complicated enough to warrant its own series of blog posts.

Also, as a side note: for those of you not following the internet, we've gotten our core architecture port into Linus' tree and are slated to release as part of 4.15. This won't be a fully bootable RISC-V system, but it's at least a good first step!

Privilege Levels in RISC-V Systems

The RISC-V ISA defines a stack of execution environments. While each environment is designed to be classically virtualizable, in standard systems each level in the stack is designed to provide the next level's execution environment. Starting from the least-privileged level in the stack, the execution environments are:

  • User-mode software executes in an AEE (Application Execution Environment). On Linux systems, the AEE is also known as the user ABI: the set of system calls supported by the kernel. The AEE also includes the entire user ISA, since user-mode programs are expected to be able to execute instructions other than just scall.
  • Supervisor-mode software executes in an SEE (Supervisor Execution Environment). This environment consists of the supervisor-mode instructions and CSRs defined by the privileged ISA document, along with the SBI. Supervisor-mode software is expected to provide multiple AEEs, Linux provides one AEE to each process.
  • Hypervisor-mode software executes in an HEE (Hypervisor Execution Environment), and is expected to provide multiple SEEs. The hypervisor-mode section of the privileged ISA document is still being written, so we'll ignore this for now.
  • Machine-mode software executes in an MEE (Machine Execution Environment), and is expected to provide one higher-level execution context. Since the privileged mode ISA makes implementing the U, S, and H extensions optional (thus allowing for M, M+U, M+S+U, and M+H+S+U systems), it's expected that different machine-mode software implementation will provide either an HEE, SEE, or AEE.

While it's fairly standard to provide execution environment stacks that do not match this hierarchy, the software executing in each environment can't tell the difference. That's not to say that user-mode software is entirely portable: for example, the Linux AEE is different than the FreeBSD AEE because they provide different system calls. The intent is simply that programs written to execute in the Linux AEE can't tell if they're executing on Linux on hardware, in Linux running in Spike, or in QEMU's user-mode emulation. None of this is a new concept, it's just a bit more explicitly stated in the RISC-V ISA specification than it is on many other architectures.

Since RISC-V is classically virtualizable at every level of the privilege stack, no explicit hardware support is necessary to provide any execution environment: for example, QEMU's user-mode emulation provides an AEE on systems that have no hardware support for any RISC-V ISA. While these systems can be made reasonably performant, the main purpose of RISC-V is to enable hardware implementations of the ISA -- for example, even though a Xeon running QEMU will probably be the fastest implementation of the RISC-V Linux AEE for the foreseeable future it would be more appropriate to run a hardware implementation of RISC-V on my wristwatch.

The RISC-V ISA documents are designed to allow the software implementations at various levels of the privilege stack to use the execution environment they're written against in order to provide the execution environment above them. Some of these are so obvious you probably haven't noticed that we've been talking about them for a while: for example, it's assumed that the hardware handles executing an addi instruction in userspace without the kernel's intervention because it would be silly not to.

We've been able to put off the discussion of privileged levels until now because the vast majority of the design is obvious: all of the user instructions are handled by the hardware without supervisor intervention except for scall, which just transfers control to the kernel's single trap entry point -- see my previous blog post All Aboard, Part 7: Entering and Exiting the Linux Kernel on RISC-V for details. Since application execution environments provide the illusion that user-space programs have access to a big flat address space we've been able to more or less ignore memory when discussing user applications. As is common with computing systems, memory is the hard part -- thus, we only really need to discuss RISC-V privilege modes when talking about paging.

For this blog post, we'll be focusing on supervisor code running on reasonable systems -- thus we won't do things like emulating unsupported instructions from userspace. The focus of the blog will be on how to provide a RISC-V application execution environment given a supervisor execution environment.

The RISC-V Application-Class Supervisor Execution Environments

Supervisor programs, like Linux, execute on a supervisor execution environment. Much like how the user-level ISA leaves many of the specifics of the AEE to be implemented in different ways on different platforms (system calls on Linux vs BSD, for example), the privileged ISA doesn't specify all the details of the SEE that application-class supervisors (like Linux or BSD) can expect -- those will be specified as part of the platform specification.

In this blog, I'll quickly go over a few key aspects of the RISC-V supervisor execution environment for application-class supervisors. This environment is designed to support UNIX-style operating systems running in supervisor mode, emulating POSIX-compliant application execution environments. A highlight of the proposed (with the caveat that I'm not in the platform specification working group, so this is all just my guess) requirements of the application-class SEE are:

  • Either the RV32I or RV64I base ISAs, along with the M, A, and C extensions. The F and D extensions are optional but paired together, leaving the possible standard ISAs for application-class SEEs as RV32IMAC, RV32IMAFDC (RV32GC), RV64IMAC, and RV64IMAFDC (RV64GC).
  • On RV32I-based systems, support for Sv32 page-based virtual memory.
  • On RV64I-based systems, support for at least Sv48 page-based virtual memory.
  • Upon entering the SEE, the PMAs are set such that memory accesses are point-to-point strongly ordered between harts and devices.
  • An SBI that implements various fences, timers, and a console.

The application-class SEE, as specified by the upcoming RISC-V platform specification, in the contract between standard Linux distributions and hardware vendors -- of course these restrictions don't apply for the embedded space, where many of them would be onerous. In practice: if you expect users to be able to swap out the boot media on your platform, then you should meet the requirements of the application-class SEE.

The RISC-V Linux Application Execution Environments

Supervisor-mode software on RISC-V uses a supervisor execution environment in order to provide one or multiple application execution environments. Fundamentally, an AEE (like any execution environment) is simply the definition of the next state of the machine upon every instruction's execution. On RISC-V systems, the AEE depends on:

  • The ISA string, which determines what the vast majority of instructions do as well as which registers constitute the machine's current state.
  • The supervisor's user-visible ABI, which determines what the scall instruction does. This is different than the C compiler's ABI, which defines the interface between different components of the application.
  • The contents of the entire memory address space.

In an idealized world, each process consists of its own independent AEE, with Linux multiplexing these on top of a single SEE. Of course, there's all sort of problems to this model in practice, but none of this is specific to RISC-V systems. The concept of a self-contained and well-defined AEE is still useful from a standards standpoint, and we hope to progress on properly specifying the RISC-V Linux AEE family (as well as AEEs for other POSIX-like systems) as we progress with our ports.

Paging on RISC-V Systems

After that lengthy divergence into the definition of RISC-V's privileged modes, we can finally get to the whole point of this blog post: paging on RISC-V systems. Paging is the main mechanism used to provide user mode with the illusion of having an AEE -- like most things in computer architecture, it turns out that memory is the tricky part.

One of the nice things about designing an ISA at the time RISC-V was designed is that so many different solutions to difficult problems have been tried that we pretty much know what to do now. Thus, we arrived at a pretty standard page-based virtual memory system when designing the RISC-V's supervisor virtual memory interface. The exact page table formats and such are listed in the relevant RISC-V ISA manuals so I won't go through them here, but there are a few highlights:

  • Pages are 4KiB at the leaf node, and it's possible to map large contiguous regions with every level of the page table.
  • RV32I-based systems can have up to 34-bit physical addresses with a three level page table.
  • RV64I-based systems can have multiple virtual address widths, starting with 39-bit and extending up to 64-bit in increments of 9 bits.
  • Mappings must be synchronized via the sfence.vma instruction.
  • There are bits for global mappings, supervisor-only, read/write/execute, and accessed/dirty.
  • There is a single valid bit, which allows storing XLEN-1 bits of flags in an otherwise unused page tables. Additionally, there are two bits of software flags in mapped pages.
  • Address space identifiers are 9 bits or RV32I and 16 bits on RV64I, and they're a hint so a valid implementation is to ignore them.
  • The accessed and dirty bits are strongly ordered with respect to accesses from the same hart, but are optional (with a trap-based mechanism when unsupported).

The Linux implementation of paging is functional but not complete: we're missing support for ASIDs, for example. Like many things in our port, these extra features will come with time.

Handling Device DMA

RISC-V does not currently define an IOMMU, so device accesses are performed in a single linear address space provided by the SEE (aka, physical memory). Combined with the lack of a mechanism to modify PMAs, this makes device IO on RISC-V very simple: we essentially just don't do anything specific to our ISA.

Handling 32-bit DMA Regions

Some devices only support 32-bit addressing even when attached to a system with longer physical addresses. Since RISC-V lacks an IOMMU, we handle these devices by using kernel bounce buffers. This is correct but slow: while it may be fine for SoC-style systems where the set of devices is well known at elaboration time, as more complicated RISC-V systems become available we will eventually need to standardize a mechanism for virtualizing device addressing.

Our bounce buffer mechanism simply uses the standard mechanisms provided by Linux, so there isn't anything RISC-V specific about it. We provide a 32-bit ZONE_DMA, allow allocating from that, and use bounce buffers to handle ioremap() for already-allocated pages outside the legal region.


Read more of the All Aboard blog series:

Palmer Dabbelt
Palmer Dabbelt

Palmer Dabbelt is an Engineer for Meta, having prior experience as an Engineer at Rivos and Software Engineer at Google. Prior to that he worked at SiFive as Director of Software Engineering and as an Engineer.

Read more Insights from the RISC-V Experts

全力投入:开启增长新篇章
最新文章
全力投入:开启增长新篇章
我们自信地宣布公司发展历程中最重要的里程碑之一:完成 **4 亿美元** 的融资。本轮融资由 Atreides Management 领投,其他顶级投资机构\*包括 Apollo Global Management、NVIDIA(英伟达)、Point72 Turion 和 T. Rowe Price Investment Management, Inc.,以及现有投资者 Prosperity7 Ventures 和 Sutter Hill Ventures 参投。此次融资使公司估值达到 **36.5 亿美元**,并将加速 SiFive 的 RISC-V CPU 及 AI IP 解决方案推向数据中心和 AI 基础设施市场的核心地带。
RISC-V 代码模型(2026 版)
最新文章
RISC-V 代码模型(2026 版)
RISC-V 指令集架构 (ISA) 在设计上兼顾简洁与模块化。为了实现上述设计目标,RISC-V 有意识地减少了寻址方式的种类,从而降低了实现复杂 ISA 时的一项重大成本。寻址方式成本高昂:在小型设计中,会增加解码开销;在大型设计中,则会引入隐式依赖成本。
模块化是 AI 的未来:为何 SiFive-NVIDIA 的里程碑意义重大
最新文章
模块化是 AI 的未来:为何 SiFive-NVIDIA 的里程碑意义重大
AI 的巨大潜力目前正受限于一个主要瓶颈:数据传输。在当今系统中,GPU 的处理速度往往受到互联技术以及 CPU、加速器与系统其余部分间数据流动效率的限制。