Notes on Virtualization
From Ggl's wiki
Contents |
Xen
Concepts
Xen is a hypervisor i.e. a small kernel that multiplexes hardware resources between domains. It provides an interface to access the underlying hardware. CPU, Memory and I/Os (inputs and outputs) are resources. A domain is a guest executable instance. The concept of host and guest might be seem as the host lets guests share and access its hardware. There are three types of domains:
- dom0
- domU
- stub domain
The dom0 is a priviledge domain. It runs the device drivers (block device like SATA disks, char device like network card) and administrative processes (like the xenstore and xend). To take advantage of existing drivers, the dom0 is usually a modified linux kernel.
The domU is what is commonly called a VM (Virtual Machine). In paravirtual (PV) mode it communicates with the xen hypervisor and the dom0 through shared memory (ringbuffer and event channels). The hypervisor provides acces to memory and CPU. The dom0 provides acces to device drivers. Drivers in the dom0 are called backend drivers (blkback, netback, etc...) while drivers in the domU are called frontend drivers (blkfront, netfront, ...). This schema abstracts the underlying device to the domU. This way the dom0 provides a common interface to device families (block, network, console, ...). Whatever actual device is installed on the machine, the domU only have to implement this interface as a client.
Memory management is a core task of the hypervisor and the essential part of any operating system. When approaching memory management in the context of a hypervisor, you should consider from which point of view you do. The hypervisor sees the actual machine pages. In order to multiplex these pages to guest domains it maintains several pagetables. A domU sees pages it believes as the machine ones. If it supports virtual memory, it maintains tables to provide a consistent address space to processes. In fact, the operating system maps a process into virtual memory and must maintain the same address space even when it pages out some process pages to disk and remap the page when the process reclaims it. A process access memory through linear address conversion.
Xen Memory Management
Extracted from Xen Memory Management Presentation by Samuel Thibault.
Definitions for memory pages, frame numbers, addresses, allocations, etc... in xen/include/mm.h. See also xen/include/asm-x86/mm.h:
1. gpfn/gpaddr: A guest-specific pseudo-physical frame number or address. 2. gmfn/gmaddr: A machine address from the p.o.v. of a particular guest. 3. mfn/maddr: A real machine frame number or address. 4. pfn/paddr: Used in 'polymorphic' functions that work across all address spaces, depending on context. See the pagetable conversion macros in asm-x86/page.h for examples. Also 'paddr_t' is big enough to store any physical address. This scheme provides consistent function and variable names even when different guests are running in different memory-management modes. 1. A guest running in auto-translated mode (e.g., shadow_mode_translate()) will have gpfn == gmfn and gmfn != mfn. 2. A paravirtualised x86 guest will have gpfn != gmfn and gmfn == mfn. 3. A paravirtualised guest with no pseudophysical overlay will have gpfn == gpmfn == mfn.
Virtual / Physical / Machine address
- Frame vs Page
- PFN: physical frame number, guest abstraction for tracking/allocating RAM (usually fairly contiguous)
- GFN: guest frame number, guest idea of hardware address, used in guest pagetables
- MFN: machine frame number, actual hardware virtual address
PV (paravirt) pagetables (direct paging):
- pfn -> mfn table managed by the guest
- shared mfn -> pfn provided by xen
- gfn == mfn so pagetables can be directly used by the hardware
- Xen checks the content of the guest pagetables before allowing the hardware to see them
Enforcing isolation:
- Guest pagetables must have a pagetable type
- Xen checks that page contents obey the typing rules before allowing them to take on pagetable type
- typing rules
- No mapping other guests' frames
- No read-write mappings of frames with pagetable type
- Modify an already-typed pagetable needs a call to Xen to check the modification obeys the rules (or trap-and-emulate assistance from Xen)
Grant tables:
- A guest explicitly allows other guests to map its frames
- The mapper makes a hypercall with a domid, an opaque index, and the address of a PTE (PageTable Entry)
- Xen checks the guest grant table allows to access this page and modifies the PTE accordingly
- Needs explicit unmap hypercall when finished
- Can also grant-copy, where Xen memcpy() from/to a granted frame instead of mapping it
HVM pagetables:
- PFN -> MFN table managed by Xen
- GFM == PFN so need another layer of translation
- Guest won't cooperate in enforcing access control
- Two options:
- en builds shadow copies of guest pagetables with the extra translations and controls added, or
- hardware support for using a second set of pagetables containing extra translations and controls
Shadow pagetables:
- Xen maintains a copy of guest frames that are used as pagetables
- Guest never sees the shadow => we can add any translations and restrictions we like
- 13 differnt kinds of shadows depending on the kind of pagetable: a single frame can have up to 10 shadows at once
- 3 kinds of shadows for faking out superpages (2MB of contiguous PFNs does not mean 2MB of contiguous MFNs)
Interesting article on lwn.
References
- Linux KVM as a Learning Tool
- libvirt
- Augeas Configuration parsing and editing library Haskell bindings.
- libguestfs Library for accessing and modifying virtual machine (VM) disk images.
- other interesting projects from Red Hat Emerging Technologies

