• [PATCH v2] kvm/x86: Handle async PF in RCU read-side critical secti

    From Paul E. McKenney@21:1/5 to Boqun Feng on Mon Oct 2 22:50:10 2017
    On Mon, Oct 02, 2017 at 10:43:00PM +0800, Boqun Feng wrote:
    On Mon, Oct 02, 2017 at 01:41:03PM +0000, Paolo Bonzini wrote:
    [...]

    Wanpeng, the callsite of kvm_async_pf_task_wait() in kvm_handle_page_fault() is for nested scenario, right? I take it we should handle it as if the fault happens when l1 guest is running in kernel mode, so @user should be 0, right?

    In that case we can schedule, actually. The guest will let another
    process run.

    In fact we could schedule safely most of the time in the
    !user_mode(regs) case, it's just that with PREEMPT=n there's no
    knowledge of whether we can do so. This explains why we have never seen the bug before.

    Thanks, looks like I confused myself a little bit here. You are right.
    So in PREEMPT=n kernel, we only couldn't schedule when the async PF *interrupts* the *kernel*, while in the kvm_handle_page_fault(), we
    actually didn't interrupt the kernel, so it's fine.

    Actually, we should be able to do a little bit better than that.
    If PREEMPT=n but PREEMPT_COUNT=y, then preempt_count() will know
    about RCU read-side critical sections via preempt_disable().

    So maybe something like this?

    n.halted = is_idle_task(current) ||
    preempt_count() > 1 ||
    (!IS_ENABLED(CONFIG_PREEMPT) && !IS_ENABLED(CONFIG_PREEMPT_COUNT) && !user) ||
    rcu_preempt_depth();

    Thanx, Paul

    I had already applied v1, can you rebase and resend please? Thanks,


    Sure, I'm going to rename that parameter to "interrupt_kernel"(probably
    a bad name"), indicating whether the async PF interrupts the kernel.

    But it's a little bit late today, will do that tomorrow.

    Regards,
    Boqun


    Paolo

    arch/x86/include/asm/kvm_para.h | 4 ++--
    arch/x86/kernel/kvm.c | 9 ++++++---
    arch/x86/kvm/mmu.c | 2 +-
    3 files changed, 9 insertions(+), 6 deletions(-)

    diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
    index bc62e7cbf1b1..0a5ae6bb128b 100644
    --- a/arch/x86/include/asm/kvm_para.h
    +++ b/arch/x86/include/asm/kvm_para.h
    @@ -88,7 +88,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
    bool kvm_para_available(void);
    unsigned int kvm_arch_para_features(void);
    void __init kvm_guest_init(void);
    -void kvm_async_pf_task_wait(u32 token);
    +void kvm_async_pf_task_wait(u32 token, int user);
    void kvm_async_pf_task_wake(u32 token);
    u32 kvm_read_and_reset_pf_reason(void);
    extern void kvm_disable_steal_time(void);
    @@ -103,7 +103,7 @@ static inline void kvm_spinlock_init(void)

    #else /* CONFIG_KVM_GUEST */
    #define kvm_guest_init() do {} while (0)
    -#define kvm_async_pf_task_wait(T) do {} while(0)
    +#define kvm_async_pf_task_wait(T, U) do {} while(0)
    #define kvm_async_pf_task_wake(T) do {} while(0)

    static inline bool kvm_para_available(void)
    diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
    index aa60a08b65b1..916f519e54c9 100644
    --- a/arch/x86/kernel/kvm.c
    +++ b/arch/x86/kernel/kvm.c
    @@ -117,7 +117,7 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
    return NULL;
    }

    -void kvm_async_pf_task_wait(u32 token)
    +void kvm_async_pf_task_wait(u32 token, int user)
    {
    u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
    struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
    @@ -140,7 +140,10 @@ void kvm_async_pf_task_wait(u32 token)

    n.token = token;
    n.cpu = smp_processor_id();
    - n.halted = is_idle_task(current) || preempt_count() > 1;
    + n.halted = is_idle_task(current) ||
    + preempt_count() > 1 ||
    + (!IS_ENABLED(CONFIG_PREEMPT) && !user) ||
    + rcu_preempt_depth();
    init_swait_queue_head(&n.wq);
    hlist_add_head(&n.link, &b->list);
    raw_spin_unlock(&b->lock);
    @@ -268,7 +271,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
    case KVM_PV_REASON_PAGE_NOT_PRESENT:
    /* page is swapped out by the host. */
    prev_state = exception_enter();
    - kvm_async_pf_task_wait((u32)read_cr2());
    + kvm_async_pf_task_wait((u32)read_cr2(), user_mode(regs));
    exception_exit(prev_state);
    break;
    case KVM_PV_REASON_PAGE_READY:
    diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
    index eca30c1eb1d9..106d4a029a8a 100644
    --- a/arch/x86/kvm/mmu.c
    +++ b/arch/x86/kvm/mmu.c
    @@ -3837,7 +3837,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
    case KVM_PV_REASON_PAGE_NOT_PRESENT:
    vcpu->arch.apf.host_apf_reason = 0;
    local_irq_disable();
    - kvm_async_pf_task_wait(fault_address);
    + kvm_async_pf_task_wait(fault_address, 0);
    local_irq_enable();
    break;
    case KVM_PV_REASON_PAGE_READY:



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)