1. x86_64上不支持segment机制,Xen是通过页表机制来控制访问权限的,Xen及其相关数据驻留在0xffff8000 00000000 - 0xffff87ff ffffffff,也就是在原来的kernel space的低地址部分,而x86_32上驻留在最上面的。
[include/asm-x86/config.h]
/*
 * Memory layout:
 *  0x0000000000000000 - 0x00007fffffffffff [128TB, 2^47 bytes, PML4:0-255]
 *    Guest-defined use (see below for compatibility mode guests).
 *  0x0000800000000000 - 0xffff7fffffffffff [16EB]
 *    Inaccessible: current arch only supports 48-bit sign-extended VAs.
 *  0xffff800000000000 - 0xffff803fffffffff [256GB, 2^38 bytes, PML4:256]
 *    Read-only machine-to-phys translation table (GUEST ACCESSIBLE).
 *  0xffff804000000000 - 0xffff807fffffffff [256GB, 2^38 bytes, PML4:256]
 *    Reserved for future shared info with the guest OS (GUEST ACCESSIBLE).
 *  0xffff808000000000 - 0xffff80ffffffffff [512GB, 2^39 bytes, PML4:257]
 *    Reserved for future use.
 *  0xffff810000000000 - 0xffff817fffffffff [512GB, 2^39 bytes, PML4:258]
 *    Guest linear page table.
 *  0xffff818000000000 - 0xffff81ffffffffff [512GB, 2^39 bytes, PML4:259]
 *    Shadow linear page table.
 *  0xffff820000000000 - 0xffff827fffffffff [512GB, 2^39 bytes, PML4:260]
 *    Per-domain mappings (e.g., GDT, LDT).
 *  0xffff828000000000 - 0xffff8283ffffffff [16GB,  2^34 bytes, PML4:261]
 *    Machine-to-phys translation table.
 *  0xffff828400000000 - 0xffff8287ffffffff [16GB,  2^34 bytes, PML4:261]
 *    Page-frame information array.
 *  0xffff828800000000 - 0xffff828bffffffff [16GB,  2^34 bytes, PML4:261]
 *    ioremap()/fixmap area.
 *  0xffff828c00000000 - 0xffff828c3fffffff [1GB,   2^30 bytes, PML4:261]
 *    Compatibility machine-to-phys translation table.
 *  0xffff828c40000000 - 0xffff828c7fffffff [1GB,   2^30 bytes, PML4:261]
 *    High read-only compatibility machine-to-phys translation table.
 *  0xffff828c80000000 - 0xffff828cbfffffff [1GB,   2^30 bytes, PML4:261]
 *    Xen text, static data, bss.
 *  0xffff828cc0000000 - 0xffff82ffffffffff [461GB,             PML4:261]
 *    Reserved for future use.
 *  0xffff830000000000 - 0xffff83ffffffffff [1TB,   2^40 bytes, PML4:262-263]
 *    1:1 direct mapping of all physical memory.
 *  0xffff840000000000 - 0xffff87ffffffffff [4TB,   2^42 bytes, PML4:264-271]
 *    Reserved for future use.
 *  0xffff880000000000 - 0xffffffffffffffff [120TB, PML4:272-511]
 *    Guest-defined use.
 */

2.shadow page table主要用在两个地方,一是full-virtualization下的页表维护,overhead很大,不过有了VT-x或者AMD-V的硬件支持后会在一定程度上减少这个代价;二是在guest os被live-migrate的时候,需要一个shadow page table来跟踪转移后被修改的页面。

今天还搞清楚了以前我一直模糊的一个概念。以前翻过一点那本The Definitive Guide to Xen Hypervisor,里面提到一个writable page table,然后我就把这个东西和后来看的那篇paper的shadow page table搞混了,其实是两个完全不一样的东西。shadow page table如前文所说,仅用于full-virtualization的情况,硬件访问到的是Xen维护的shadow page table而不是guest page table;而writable page table则是用在para-virtualization的场合,

  1. arch/x86/traps.c::do_page_fault()->fixup_page_fault()

当满足以下几个条件时,xen调用ptwr_do_page_fault()处理guest os更新页表的情况: (1) 不在irq中断过程中 且 中断未被禁用(eflags的if被置上) (2) 出错地址不属于hypervisor的保留地址 (3) guest os处于kernel mode (4) error code的write位被置上,而reserved位未被置上

接下来看这个关键性的ptwr_do_page_fault(),通过guest_get_eff_l1e获得被访问的virtual address对应的PTE,然后获得这个PTE对应的page,接下来确定当前的情况是guest os正在尝试修改一个PTE,要满足下面几个条件: (1) present位被置上,rw位没有被置上 (2) mfn(machine frame number)正确,即小于最大值,检查的代码是!mfn_valid(l1e_get_pfn(pte)),这是由于是在pv模式下,mfn=pfn。 (3) page的类型PGT_l1_page_table,即最下层的page table (4) page的引用计数不为0 (5) page的owner为当前domain

这些检查都通过后,调用x86_emulate()函数执行ptwr_emulate_ops代码。

另外Xen 3.3.1这里似乎利用了reserved bit位,根据Intel手册的说法,When the PSE and PAE flags in control register CR4 are set, the processor generates a page fault if reserved bits are not set to 0. 以及The RSVD flag indicates that the processor detected 1s in reserved bits of the page directory, when the PSE or PAE flags in control register CR4 are set to 1。于是就可以在第一次guest os试图修改pte被xen截获后把这个reserved bit给置上,下次访问前还是会因为这个reserved bit而出page table,此时检查下guest os改的machine address是否正确,然后再把reserved bit给清零即可。