ARM: Use TTBR1 instead of reserved context ID

neobuddy89 · gregtwallace · commit fba333a1b469 · 2015-08-12T21:09:42.000-04:00
On ARMv7 CPUs that cache first level page table entries (like the Cortex-A15), using a reserved ASID while changing the TTBR or flushing the TLB is unsafe. This is because the CPU may cache the first level entry as the result of a speculative memory access while the reserved ASID is assigned. After the process owning the page tables dies, the memory will be reallocated and may be written with junk values which can be interpreted as global, valid PTEs by the processor. This will result in the TLB being populated with bogus global entries. This patch avoids the use of a reserved context ID in the v7 switch_mm and ASID rollover code by temporarily using the swapper_pg_dir pointed at by TTBR1, which contains only global entries that are not tagged with ASIDs. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Tested-by: Marc Zyngier <Marc.Zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> [catalin.marinas@arm.com: add LPAE support] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Change-Id: I2dd82501dac5ee402765aaa0ffb3f7f577a603c9 ARM: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW on ASID-capable CPUs Since the ASIDs must be unique to an mm across all the CPUs in a system, the __new_context() function needs to broadcast a context reset event to all the CPUs during ASID allocation if a roll-over occurred. Such IPIs cannot be issued with interrupts disabled and ARM had to define __ARCH_WANT_INTERRUPTS_ON_CTXSW. This patch changes the check_context() function to check_and_switch_context() called from switch_mm(). In case of ASID-capable CPUs (ARMv6 onwards), if a new ASID is needed and the interrupts are disabled, it defers the __new_context() and cpu_switch_mm() calls to the post-lock switch hook where the interrupts are enabled. Setting the reserved TTBR0 was also moved to check_and_switch_context() from cpu_v7_switch_mm(). Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Tested-by: Marc Zyngier <Marc.Zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Conflicts: arch/arm/mm/proc-v7-2level.S Change-Id: I48e39730c49ff8d30ce566e8a03cf54557869a52 ARM: Remove current_mm per-cpu variable The current_mm variable was used to store the new mm between the switch_mm() and switch_to() calls where an IPI to reset the context could have set the wrong mm. Since the interrupts are disabled during context switch, there is no need for this variable, current->active_mm already points to the current mm when interrupts are re-enabled. Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Tested-by: Marc Zyngier <Marc.Zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> ARM: 7502/1: contextidr: avoid using bfi instruction during notifier The bfi instruction is not available on ARMv6, so instead use an and/orr sequence in the contextidr_notifier. This gets rid of the assembler error: Assembler messages: Error: selected processor does not support ARM mode `bfi r3,r2,#0,lg-devs#8' Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Conflicts: arch/arm/mm/context.c Change-Id: Id64c0145d0dcd1cfe9e2aba59f86c08ec5fbf649 ARM: mm: remove IPI broadcasting on ASID rollover ASIDs are allocated to MMU contexts based on a rolling counter. This means that after 255 allocations we must invalidate all existing ASIDs via an expensive IPI mechanism to synchronise all of the online CPUs and ensure that all tasks execute with an ASID from the new generation. This patch changes the rollover behaviour so that we rely instead on the hardware broadcasting of the TLB invalidation to avoid the IPI calls. This works by keeping track of the active ASID on each core, which is then reserved in the case of a rollover so that currently scheduled tasks can continue to run. For cores without hardware TLB broadcasting, we keep track of pending flushes in a cpumask, so cores can flush their local TLB before scheduling a new mm. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Conflicts: arch/arm/mm/context.c Change-Id: I58990400aaaaef35319f7b3fb2f84fe7e46cb581 ARM: mm: avoid taking ASID spinlock on fastpath When scheduling a new mm, we take a spinlock so that we can: 1. Safely allocate a new ASID, if required 2. Update our active_asids field without worrying about parallel updates to reserved_asids 3. Ensure that we flush our local TLB, if required However, this has the nasty affect of serialising context-switch across all CPUs in the system. The usual (fast) case is where the next mm has a valid ASID for the current generation. In such a scenario, we can avoid taking the lock and instead use atomic64_xchg to update the active_asids variable for the current CPU. If a rollover occurs on another CPU (which would take the lock), when copying the active_asids into the reserved_asids another atomic64_xchg is used to replace each active_asids with 0. The fast path can then detect this case and fall back to spinning on the lock. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> ARM: mm: use bitmap operations when allocating new ASIDs When allocating a new ASID, we must take care not to re-assign a reserved ASID-value to a new mm. This requires us to check each candidate ASID against those currently reserved by other cores before assigning a new ASID to the current mm. This patch improves the ASID allocation algorithm by using a bitmap-based approach. Rather than iterating over the reserved ASID array for each candidate ASID, we simply find the first zero bit, ensuring that those indices corresponding to reserved ASIDs are set when flushing during a rollover event. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> ARM: 7649/1: mm: mm->context.id fix for big-endian Since the new ASID code in b5466f8728527a05a493cc4abe9e6f034a1bbaab ("ARM: mm: remove IPI broadcasting on ASID rollover") was changed to use 64bit operations it has broken the BE operation due to an issue with the MM code accessing sub-fields of mm->context.id. When running in BE mode we see the values in mm->context.id are stored with the highest value first, so the LDR in the arch/arm/mm/proc-macros.S reads the wrong part of this field. To resolve this, change the LDR in the mmid macro to load from +4. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> ARM: 7658/1: mm: fix race updating mm->context.id on ASID rollover If a thread triggers an ASID rollover, other threads of the same process must be made to wait until the mm->context.id for the shared mm_struct has been updated to new generation and associated book-keeping (e.g. TLB invalidation) has ben performed. However, there is a *tiny* window where both mm->context.id and the relevant active_asids entry are updated to the new generation, but the TLB flush has not been performed, which could allow another thread to return to userspace with a dirty TLB, potentially leading to data corruption. In reality this will never occur because one CPU would need to perform a context-switch in the time it takes another to do a couple of atomic test/set operations but we should plug the race anyway. This patch moves the active_asids update until after the potential TLB flush on context-switch. Cc: <stable@vger.kernel.org> # 3.8 Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> ARM: 7659/1: mm: make mm->context.id an atomic64_t variable mm->context.id is updated under asid_lock when a new ASID is allocated to an mm_struct. However, it is also read without the lock when a task is being scheduled and checking whether or not the current ASID generation is up-to-date. If two threads of the same process are being scheduled in parallel and the bottom bits of the generation in their mm->context.id match the current generation (that is, the mm_struct has not been used for ~2^24 rollovers) then the non-atomic, lockless access to mm->context.id may yield the incorrect ASID. This patch fixes this issue by making mm->context.id and atomic64_t, ensuring that the generation is always read consistently. For code that only requires access to the ASID bits (e.g. TLB flushing by mm), then the value is accessed directly, which GCC converts to an ldrb. Cc: <stable@vger.kernel.org> # 3.8 Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Conflicts: arch/arm/include/asm/mmu.h Change-Id: I682895d6357a91ecc439c8543fa94f1aecbfcb4c ARM: 7661/1: mm: perform explicit branch predictor maintenance when required The ARM ARM requires branch predictor maintenance if, for a given ASID, the instructions at a specific virtual address appear to change. From the kernel's point of view, that means: - Changing the kernel's view of memory (e.g. switching to the identity map) - ASID rollover (since ASIDs will be re-allocated to new tasks) This patch adds explicit branch predictor maintenance when either of the two conditions above are met. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations) On Cortex-A15 (r0p0..r3p2) the TLBI/DSB are not adequately shooting down all use of the old entries. This patch implements the erratum workaround which consists of: 1. Dummy TLBIMVAIS and DSB on the CPU doing the TLBI operation. 2. Send IPI to the CPUs that are running the same mm (and ASID) as the one being invalidated (or all the online CPUs for global pages). 3. CPU receiving the IPI executes a DMB and CLREX (part of the exception return code already). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Conflicts: arch/arm/Kconfig Change-Id: I4513d042301a1faad817b83434280462cc176df1 msm: rtb: Log the context id in the rtb Store the context id in the register trace buffer. The process id can be derived from the context id. This gives a general idea about what process was last running when the RTB stopped. Change-Id: I2fb8934d008b8cf3666f1df2652846c15faca776 Signed-off-by: Laura Abbott <lauraa@codeaurora.org> (cherry picked from commit 445eb9a) Conflicts: arch/arm/mach-msm/include/mach/msm_rtb.h ARM: 7767/1: let the ASID allocator handle suspended animation commit ae120d9edfe96628f03d87634acda0bfa7110632 upstream. When a CPU is running a process, the ASID for that process is held in a per-CPU variable (the "active ASIDs" array). When the ASID allocator handles a rollover, it copies the active ASIDs into a "reserved ASIDs" array to ensure that a process currently running on another CPU will continue to run unaffected. The active array is zero-ed to indicate that a rollover occurred. Because of this mechanism, a reserved ASID is only remembered for a single rollover. A subsequent rollover will completely refill the reserved ASIDs array. In a severely oversubscribed environment where a CPU can be prevented from running for extended periods of time (think virtual machines), the above has a horrible side effect: [P{a} denotes process P running with ASID a] CPU-0 CPU-1 A{x} [active = <x 0>] [suspended] runs B{y} [active = <x y>] [rollover: active = <0 0> reserved = <x y>] runs B{y} [active = <0 y> reserved = <x y>] [rollover: active = <0 0> reserved = <0 y>] runs C{x} [active = <0 x>] [resumes] runs A{x} At that stage, both A and C have the same ASID, with deadly consequences. The fix is to preserve reserved ASIDs across rollovers if the CPU doesn't have an active ASID when the rollover occurs. Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Carinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> ARM: 7768/1: prevent risks of out-of-bound access in ASID allocator commit b8e4a4740fa2b17c0a447b3ab783b3dc10702e27 upstream. On a CPU that never ran anything, both the active and reserved ASID fields are set to zero. In this case the ASID_TO_IDX() macro will return -1, which is not a very useful value to index a bitmap. Instead of trying to offset the ASID so that ASID lg-devs#1 is actually bit 0 in the asid_map bitmap, just always ignore bit 0 and start the search from bit 1. This makes the code a bit more readable, and without risk of OoB access. Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> ARM: 7703/1: Disable preemption in broadcast_tlb*_a15_erratum() Commit 93dc688 (ARM: 7684/1: errata: Workaround for Cortex-A15 erratum 798181 (TLBI/DSB operations)) introduces calls to smp_processor_id() and smp_call_function_many() with preemption enabled. This patch disables preemption and also optimises the smp_processor_id() call in broadcast_tlb_mm_a15_erratum(). The broadcast_tlb_a15_erratum() function is changed to use smp_call_function() which disables preemption. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Geoff Levand <geoff@infradead.org> Reported-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> ARM: 7769/1: Cortex-A15: fix erratum 798181 implementation commit 0d0752bca1f9a91fb646647aa4abbb21156f316c upstream. Looking into the active_asids array is not enough, as we also need to look into the reserved_asids array (they both represent processes that are currently running). Also, not holding the ASID allocator lock is racy, as another CPU could schedule that process and trigger a rollover, making the erratum workaround miss an IPI. Exposing this outside of context.c is a little ugly on the side, so let's define a new entry point that the erratum workaround can call to obtain the cpumask. Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> arm: mm: Clean ASID patchset Change-Id: Id5b0cc6d5300d293e33baf6603453bde0df6d6d8 android/lowmemorykiller: Ignore tasks with freed mm A killed task can stay in the task list long after its memory has been returned to the system, therefore ignore any tasks whose mm struct has been freed. Change-Id: I76394b203b4ab2312437c839976f0ecb7b6dde4e CRs-fixed: 450383 Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
@@ -1534,6 +1534,16 @@ config ARM_ERRATA_775420
 	 to deadlock. This workaround puts DSB before executing ISB if
 	 an abort may occur on cache maintenance.
 
+config ARM_ERRATA_798181
+	bool "ARM errata: TLBI/DSB failure on Cortex-A15"
+	depends on CPU_V7 && SMP
+	help
+	  On Cortex-A15 (r0p0..r3p2) the TLBI*IS/DSB operations are not
+	  adequately shooting down all use of the old entries. This
+	  option enables the Linux kernel workaround for this erratum
+	  which sends an IPI to the CPUs that are running the same ASID
+	  as the one being invalidated.
+
 endmenu
 
 source "arch/arm/common/Kconfig"
diff --git a/arch/arm/include/asm/highmem.h b/arch/arm/include/asm/highmem.h
@@ -41,6 +41,13 @@ extern void kunmap_high(struct page *page);
 #endif
 #endif
 
+/*
+ * Needed to be able to broadcast the TLB invalidation for kmap.
+ */
+#ifdef CONFIG_ARM_ERRATA_798181
+#undef ARCH_NEEDS_KMAP_HIGH_GET
+#endif
+
 #ifdef ARCH_NEEDS_KMAP_HIGH_GET
 extern void *kmap_high_get(struct page *page);
 #else
diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h
@@ -5,19 +5,16 @@
 
 typedef struct {
 #ifdef CONFIG_CPU_HAS_ASID
-	unsigned int id;
-	raw_spinlock_t id_lock;
+	atomic64_t	id;
 #endif
 	unsigned int kvm_seq;
 	unsigned long	sigpage;
 } mm_context_t;
 
 #ifdef CONFIG_CPU_HAS_ASID
-#define ASID(mm)	((mm)->context.id & 255)
-
-/* init_mm.context.id_lock should be initialized. */
-#define INIT_MM_CONTEXT(name)                                                 \
-	.context.id_lock    = __RAW_SPIN_LOCK_UNLOCKED(name.context.id_lock),
+#define ASID_BITS	8
+#define ASID_MASK	((~0ULL) << ASID_BITS)
+#define ASID(mm)	((mm)->context.id.counter & ~ASID_MASK)
 #else
 #define ASID(mm)	(0)
 #endif
@@ -30,7 +27,7 @@ typedef struct {
  *  modified for 2.6 by Hyok S. Choi <hyok.choi@samsung.com>
  */
 typedef struct {
-	unsigned long		end_brk;
+	unsigned long	end_brk;
 } mm_context_t;
 
 #endif
@@ -40,6 +37,8 @@ typedef struct {
  * so enable interrupts over the context switch to avoid high
  * latency.
  */
+#ifndef CONFIG_CPU_HAS_ASID
 #define __ARCH_WANT_INTERRUPTS_ON_CTXSW
+#endif
 
 #endif
diff --git a/arch/arm/include/asm/mmu_context.h b/arch/arm/include/asm/mmu_context.h
@@ -24,66 +24,39 @@ void __check_kvm_seq(struct mm_struct *mm);
 
 #ifdef CONFIG_CPU_HAS_ASID
 
-/*
- * On ARMv6, we have the following structure in the Context ID:
- *
- * 31                         7          0
- * +-------------------------+-----------+
- * |      process ID         |   ASID    |
- * +-------------------------+-----------+
- * |              context ID             |
- * +-------------------------------------+
- *
- * The ASID is used to tag entries in the CPU caches and TLBs.
- * The context ID is used by debuggers and trace logic, and
- * should be unique within all running processes.
- */
-#define ASID_BITS		8
-#define ASID_MASK		((~0) << ASID_BITS)
-#define ASID_FIRST_VERSION	(1 << ASID_BITS)
-
-extern unsigned int cpu_last_asid;
-#ifdef CONFIG_SMP
-DECLARE_PER_CPU(struct mm_struct *, current_mm);
-#endif
-
-void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
-void __new_context(struct mm_struct *mm);
-
-static inline void check_context(struct mm_struct *mm)
+void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk);
+#define init_new_context(tsk,mm)	({ atomic64_set(&mm->context.id, 0); 0; })
+
+#ifdef CONFIG_ARM_ERRATA_798181
+void a15_erratum_get_cpumask(int this_cpu, struct mm_struct *mm,
+			     cpumask_t *mask);
+#else  /* !CONFIG_ARM_ERRATA_798181 */
+static inline void a15_erratum_get_cpumask(int this_cpu, struct mm_struct *mm,
+					   cpumask_t *mask)
 {
-	/*
-	 * This code is executed with interrupts enabled. Therefore,
-	 * mm->context.id cannot be updated to the latest ASID version
-	 * on a different CPU (and condition below not triggered)
-	 * without first getting an IPI to reset the context. The
-	 * alternative is to take a read_lock on mm->context.id_lock
-	 * (after changing its type to rwlock_t).
-	 */
-	if (unlikely((mm->context.id ^ cpu_last_asid) >> ASID_BITS))
-		__new_context(mm);
-
-	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
-		__check_kvm_seq(mm);
 }
+#endif /* CONFIG_ARM_ERRATA_798181 */
 
-#define init_new_context(tsk,mm)	(__init_new_context(tsk,mm),0)
-
-#else
+#else	/* !CONFIG_CPU_HAS_ASID */
 
-static inline void check_context(struct mm_struct *mm)
+static inline void check_and_switch_context(struct mm_struct *mm,
+					    struct task_struct *tsk)
 {
 #ifdef CONFIG_MMU
 	if (unlikely(mm->context.kvm_seq != init_mm.context.kvm_seq))
 		__check_kvm_seq(mm);
+	cpu_switch_mm(mm->pgd, mm);
 #endif
 }
 
 #define init_new_context(tsk,mm)	0
 
-#endif
+#define finish_arch_post_lock_switch()	do { } while (0)
+
+#endif	/* CONFIG_CPU_HAS_ASID */
 
 #define destroy_context(mm)		do { } while(0)
+#define activate_mm(prev,next)		switch_mm(prev, next, NULL)
 
 /*
  * This is called when "tsk" is about to enter lazy TLB mode.
@@ -119,19 +92,13 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 		__flush_icache_all();
 #endif
 	if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next) {
-#ifdef CONFIG_SMP
-		struct mm_struct **crt_mm = &per_cpu(current_mm, cpu);
-		*crt_mm = next;
-#endif
-		check_context(next);
-		cpu_switch_mm(next->pgd, next);
+		check_and_switch_context(next, tsk);
 		if (cache_is_vivt())
 			cpumask_clear_cpu(cpu, mm_cpumask(prev));
 	}
 #endif
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
-#define activate_mm(prev,next)	switch_mm(prev, next, NULL)
 
 #endif
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
@@ -154,7 +154,9 @@ extern int vfp_restore_user_hwstate(struct user_vfp __user *,
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
 #define TIF_RESTORE_SIGMASK	20
 #define TIF_SECCOMP		21
-#define TIF_MM_RELEASED		22	/* task MM has been released */
+#define TIF_SWITCH_MM		22	/* deferred switch_mm */
+#define TIF_MM_RELEASED		23	/* task MM has been released */
+
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
@@ -471,6 +471,21 @@ static inline void local_flush_bp_all(void)
 		isb();
 }
 
+#ifdef CONFIG_ARM_ERRATA_798181
+static inline void dummy_flush_tlb_a15_erratum(void)
+{
+	/*
+	 * Dummy TLBIMVAIS. Using the unmapped address 0 and ASID 0.
+	 */
+	asm("mcr p15, 0, %0, c8, c3, 1" : : "r" (0));
+	dsb();
+}
+#else
+static inline void dummy_flush_tlb_a15_erratum(void)
+{
+}
+#endif
+
 /*
  *	flush_pmd_entry
  *
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
@@ -105,7 +105,7 @@ int main(void)
   BLANK();
 #endif
 #ifdef CONFIG_CPU_HAS_ASID
-  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id));
+  DEFINE(MM_CONTEXT_ID,		offsetof(struct mm_struct, context.id.counter));
   BLANK();
 #endif
   DEFINE(VMA_VM_MM,		offsetof(struct vm_area_struct, vm_mm));
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
@@ -297,6 +297,7 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
 	 * switch away from it before attempting any exclusive accesses.
 	 */
 	cpu_switch_mm(mm->pgd, mm);
+	local_flush_bp_all();
 	enter_lazy_tlb(mm, current);
 	local_flush_tlb_all();
 
diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
@@ -12,6 +12,7 @@
 
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
+#include <asm/mmu_context.h>
 
 /**********************************************************************/
 
@@ -69,12 +70,59 @@ static inline void ipi_flush_bp_all(void *ignored)
 	local_flush_bp_all();
 }
 
+#ifdef CONFIG_ARM_ERRATA_798181
+static int erratum_a15_798181(void)
+{
+	unsigned int midr = read_cpuid_id();
+
+	/* Cortex-A15 r0p0..r3p2 affected */
+	if ((midr & 0xff0ffff0) != 0x410fc0f0 || midr > 0x413fc0f2)
+		return 0;
+	return 1;
+}
+#else
+static int erratum_a15_798181(void)
+{
+	return 0;
+}
+#endif
+
+static void ipi_flush_tlb_a15_erratum(void *arg)
+{
+	dmb();
+}
+
+static void broadcast_tlb_a15_erratum(void)
+{
+	if (!erratum_a15_798181())
+		return;
+
+	dummy_flush_tlb_a15_erratum();
+	smp_call_function(ipi_flush_tlb_a15_erratum, NULL, 1);
+}
+
+static void broadcast_tlb_mm_a15_erratum(struct mm_struct *mm)
+{
+	int this_cpu;
+	cpumask_t mask = { CPU_BITS_NONE };
+
+	if (!erratum_a15_798181())
+		return;
+
+	dummy_flush_tlb_a15_erratum();
+	this_cpu = get_cpu();
+	a15_erratum_get_cpumask(this_cpu, mm, &mask);
+	smp_call_function_many(&mask, ipi_flush_tlb_a15_erratum, NULL, 1);
+	put_cpu();
+}
+
 void flush_tlb_all(void)
 {
 	if (tlb_ops_need_broadcast())
 		on_each_cpu(ipi_flush_tlb_all, NULL, 1);
 	else
 		local_flush_tlb_all();
+	broadcast_tlb_a15_erratum();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
@@ -83,6 +131,7 @@ void flush_tlb_mm(struct mm_struct *mm)
 		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
+	broadcast_tlb_mm_a15_erratum(mm);
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
@@ -95,6 +144,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
+	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
 }
 
 void flush_tlb_kernel_page(unsigned long kaddr)
@@ -105,6 +155,7 @@ void flush_tlb_kernel_page(unsigned long kaddr)
 		on_each_cpu(ipi_flush_tlb_kernel_page, &ta, 1);
 	} else
 		local_flush_tlb_kernel_page(kaddr);
+	broadcast_tlb_a15_erratum();
 }
 
 void flush_tlb_range(struct vm_area_struct *vma,
@@ -119,6 +170,7 @@ void flush_tlb_range(struct vm_area_struct *vma,
 					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
+	broadcast_tlb_mm_a15_erratum(vma->vm_mm);
 }
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
@@ -130,6 +182,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		on_each_cpu(ipi_flush_tlb_kernel_range, &ta, 1);
 	} else
 		local_flush_tlb_kernel_range(start, end);
+	broadcast_tlb_a15_erratum();
 }
 
 void flush_bp_all(void)
diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
@@ -68,6 +68,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	ret = __cpu_suspend(arg, fn);
 	if (ret == 0) {
 		cpu_switch_mm(mm->pgd, mm);
+		local_flush_bp_all();
 		local_flush_tlb_all();
 	}
 
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S

Original file line number	Diff line number	Diff line change
`@@ -68,6 +68,7 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))`
`68`	`68`	`ret = __cpu_suspend(arg, fn);`
`69`	`69`	`if (ret == 0) {`
`70`	`70`	`cpu_switch_mm(mm->pgd, mm);`
	`71`	`+ local_flush_bp_all();`
`71`	`72`	`local_flush_tlb_all();`
`72`	`73`	`}`
`73`	`74`