Nuvoton: Fix crypto AC management #9081

ccli8 · 2018-12-13T03:21:40Z

Description

This PR is continuation of #8985. Originally, it is to replace busy-wait loop with mutex in crypto AC management. But in PDMC Greentea test, mbedtls_sha256_init() (lock mutex) and mbedtls_sha256_free() (unlock mutex) are not necessarily called by the same thread, so managing crypto AC with mutex is suspended. This PR is reduced to include minor fixes with crypto AC.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

kjbracey · 2018-12-13T09:21:01Z

That hang isn't really possible is it? Except in terms of software error? If you're releasing, you know the lock is held, so the expected value will be there. That is never going to loop, right?

This code is something I've been conscious of, because it's one of the very few users of atomic_cas, and like others you don't really need the atomic_cas. You just want an atomic flag.

I actually added atomic flag for 5.11, so I suggest you switch to using it. Then that routine would just be atomic_flag_clear.

yanesca

I have left comments with a suggestion and a couple of questions.

Even in a single thread application it is legal to have several mbedtls_xxx_context with overlapping lifetimes. For example initialising ctx1 and then initialising ctx2 without freeing ctx1 first should be supported. But looking at the code this second init call would always hang. Or am I missing something?

yanesca · 2018-12-13T15:16:59Z

features/mbedtls/targets/TARGET_NUVOTON/TARGET_M480/ecp/ecp_internal_alt.c

@@ -53,6 +53,7 @@
 *        would be defined in mbedtls/ecp.h from ecp.c for our inclusion */
 #define ECP_SHORTWEIERSTRASS

+#include "mbedtls/ssl.h"


I think it would be better to include just"mbedtls/platform.h" here.

@yanesca I fix it in b16b1db.

yanesca · 2018-12-13T15:49:06Z

targets/TARGET_NUVOTON/TARGET_NUC472/crypto/crypto-misc.c

@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail)
 static void crypto_submodule_release(uint16_t *submodule_avail)
 {
    uint16_t expectedCurrentValue = 0;


Isn't the expected value 1 here?

It's a backwards "available" flag. So 0 when in use, and write 1 to release.

ccli8 · 2018-12-14T10:00:56Z

targets/TARGET_NUVOTON/TARGET_NUC472/crypto/crypto-misc.c

@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail)
 static void crypto_submodule_release(uint16_t *submodule_avail)
 {
    uint16_t expectedCurrentValue = 0;
-    while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1));
+    while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1)) {


@kjbracey-arm Yes, this loop is unnecessary. Anyway, semaphore is used to substitute for busy-wait loop.

kjbracey · 2018-12-13T16:02:48Z

Yes, it is perfectly legal to init two contexts within 1 thread, and this would indeed hang. Sounds like we're missing a test case.

I've made other comments on #8985 about the locking problem for this - I already thought we needed a software fallback, and now I'm sure we do.

ccli8 · 2018-12-14T09:56:41Z

@kjbracey-arm @yanesca Following your recommendations, I re-organize the commits. The title is also changed to reflect the intention.

For deadlock,
1. For AES/DES AC, they are locked for their real operation rather than for the whole lifetime of AES/DES context. So no deadlock.
2. For SHA AC, because it doesn't support SHA context save & restore, it can only be locked for the whole lifetime of SHA context. When SHA AC is not available, S/W SHA fallback has supported before to avoid deadlock.
3. For ECC AC, its lock period is changed to its real operation from the whole lifetime of ECP context. So deadlock can be avoided as AES/DES AC.
For double init for same/different context,
1. For SHA AC, same as above, double init is allowed with support for S/W SHA fallback.
2. For AES/DES/ECC AC, same as above, they are locked in start of real operation rather than context init, so double init is allowed.
Choose semaphore for synchronization mechanism. Compared to busy-wait loop, it doesn't bite CPU. Compared to mutex, it allows SHA context init/free in different threads. Known drawback of it is priority inversion.

kjbracey

~~I don't think this addresses the fundamental problem that there can be two SHA contexts active in the same thread, and there's no reason why a SHA context can't otherwise be very long-running.~~

It's not acceptable to make other SHA users wait potentially forever - this implementation means there can only ever be 1 SHA instance live at a time in a system. There are potential applications that need to do a high-priority SHA, eg for a radio ack, and they must not be delayed by a large bulk SHA happening on a firmware download.

I think you need to have a software fallback for the SHA for 2nd, 3rd, 4th live context. You could have an implementation which lets a context use hardware for its lifetime if the hardware is free when it's inited. That is going to be a bit performance unpredictable as to who gets to do acceleration, but would probably be acceptable in practice. Such an implementation could just use an atomic flag, as it's not going to block waiting for the hardware.

For the others, they should just use mutexes, avoiding the priority inversion.

Actually, this does somewhat cover what I'm talking about, let me reconsider.

kjbracey · 2018-12-14T10:11:45Z

targets/TARGET_NUVOTON/TARGET_NUC472/crypto/crypto-misc.c

+    uint16_t expectedCurrentValue = SYNCOBJ_INITSTATUS_UNINIT;
+    while (1) {
+        /* Try to initialize semaphore if it has not initialized yet */
+        if (core_util_atomic_cas_u16(&crypto_submod_sem->init_status,


This model still leads to a busy-wait. I suggest you have a look at the cxa_guard implementations in mbed_retarget.cpp - uses a mutex to wait for another thread initialising. Borrow the logic (and the singleton_lock) from that.

Once C11 is in place you should be able to use call_once.

Another possibility is to use SingletonPtr with a custom class that contains the Semaphore(1,1) you need. That would mean changing this file to C++.

kjbracey · 2018-12-14T10:29:49Z

Okay, this is getting close to functional, but attempting to share the same locking types for the SHA and non-SHA cases doesn't make sense, and it's getting complicated by the attempt to do everything in C, which means you can't use the supporting C++ primitives for init-on-first-use.

I suggest:

Convert to C++, so you can use SingletonPtr for your AES etc locking objects, and that deals with init on first reference. Having a whole new local implementation of call_once is just increasing the scope for errors.
The AES etc locking objects should contain a Mutex. (Or just be a SingletonPtr<Mutex>).
The SHA locking should not share that mechanism at all - it just needs a core_util_atomic_flag, as it's never blocking.

(And hooray, no more semaphores. I hate semaphores. They're almost never the answer.)

ccli8 · 2018-12-17T08:13:06Z

@kjbracey-arm Following your suggestions, I make the following modifications:

For SHA AC, change to atomic flag core_util_atomic_flag_test_and_set/core_util_atomic_flag_clear for its lock management.
For non-SHA AC (AES, DES, etc.), change back to mutex from semaphore for their lock management. Because these locks last only for AC's real operation rather than the whole lifetime of context, there would be no case lock/unlock are called in different threads and disobeys mutex requirement.
For thread-safe mutex construct on its first use, convert to C++ to utilize SingletonPtr.

kjbracey

Looking good, but think it could be simplified and clarified a bit more.

kjbracey · 2018-12-17T08:18:41Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.cpp

 }

 bool crypto_des_acquire(bool blocking)
 {
-    return crypto_submodule_acquire(&crypto_des_sem, blocking);
+    return crypto_des_mutex->trylock_for(blocking ? osWaitForever : 0);


Use of osWaitForever isn't documented with trylock_for, so I'd rather this was split into separate lock and trylock_for cases, if you really want the nonblocking version.

Do you really need the nonblocking version?

@kjbracey-arm I remove the blocking parameter. As suggested, blocking or not depends on lock scope. For non-SHA AC, it is non-blocking to support trylock and fallback to S/W SHA on failure; for SHA AC, it is blocking to acquire AC for a short sequence of operations.

kjbracey · 2018-12-17T08:43:22Z

features/mbedtls/targets/TARGET_NUVOTON/TARGET_M480/ecp/ecp_internal_alt.c

-    crypto_init();
-    ECC_ENABLE_INT();
-
+    /* Address mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free() are not strictly paired


Is this change needed any longer? The intent is that the init is held only for a short sequence of operations, and your case 2 certainly couldn't happen on a single thread.

Breaking it up like this does make it more granular, I guess - gives the opportunity to interleave ops. Not sure how important that is for ECP, if at all. (It's not like the lighter-weight stuff like SHA, where there will sometimes be time-critical ones).

Even in a single thread application it is legal to have several mbedtls_xxx_context with overlapping lifetimes. For example initialising ctx1 and then initialising ctx2 without freeing ctx1 first should be supported ...

@kjbracey-arm Per @yanesca commented as above, it is legal to have two mbedtls_xxx_context overlapping in the same thread, so I move locking ecc AC from mbedtls_internal_ecp_init to its real operation period.

internal_ecp_init/free are not contexts (despite the init/free naming being shared with that for contexts). They're specific paired pre- and post-op calls.

See documentation changes made in Mbed-TLS/mbedtls#2267

@kjbracey-arm With mbedtls_internal_ecp_init/mbedtls_internal_ecp_free guaranteeing paired, non-overlapping, and non-cross-public-function, I move ecc AC init/free back to them.

kjbracey · 2018-12-17T08:44:35Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.cpp

+ * 1. Except for SHA AC which doesn't support context save & restore, we lock all crypto AC
+ *    for just their real operation rather than the whole lifetime of their crypto context.
+ *    For SHA AC, we provide SHA S/W fallback when SHA AC is not available. This policy can
+ *    avoid deadlock. We choose mutex to synchronize access to crypto non-SHA AC. Known


No priority inversion any more

@kjbracey-arm I fixed the comment in 4d882aa. Because mutex lock scope is limited to a short sequence of operations, priority inversion issue gets less serious, right?

There is no priority inversion at all any more - that's the point of using a mutex. The RTOS can and does have priority inheritance for mutexes: https://www.keil.com/pack/doc/CMSIS/RTOS2/html/group__CMSIS__RTOS__MutexMgmt.html

@kjbracey-arm I got it. Thanks for your information.

kjbracey · 2018-12-17T08:45:12Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.cpp

+ */
+
+/* Mutex for crypto AES AC management */
+SingletonPtr<rtos::Mutex> crypto_aes_mutex;


These should still be static.

@kjbracey-arm Fixed in 2cc76eb.

kjbracey · 2018-12-17T08:45:54Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.cpp

 {
-    return crypto_submodule_acquire(&crypto_sha_avail);
+    if (blocking) {
+        while (core_util_atomic_flag_test_and_set(&crypto_sha_atomic_flag));


No, you mustn't ever do this. And you don't, so take out the blocking case.

@kjbracey-arm As above, blocking parameter has removed.

kjbracey · 2018-12-17T08:47:25Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.h

-bool crypto_aes_acquire(void);
+/* Acquire/release ownership of crypto sub-module
+ * 
+ * \param blocking  false for non-blocking


You don't need the blocking parameter, and the entire choice of locking primitive is on the basis the AES/DES/ECC always block, and SHA never does, so it would be clearer to take it out.

And the distinction could then just be in the function names:

void crypto_aes_acquire(void); bool crypto_sha_try_acquire(void);

@kjbracey-arm Fix as above

kjbracey · 2018-12-17T08:51:23Z

features/mbedtls/targets/TARGET_NUVOTON/TARGET_M480/ecp/ecp_internal_alt.c

+     * 3. Read back result through internal_mpi_read_eccreg()
+     */
+    internal_open_ecc_ac();
+    ecc_ac_open = true;


Can't you just put the close ahead of the cleanup label?

@kjbracey-arm I must guarantee internal_open_ecc_ac and internal_close_ecc_ac are paired. When MBEDTLS_MPI_CHK evaluates false, it will go to cleanup label. So I can only call internal_close_ecc_ac after cleanup label for code concision.

#define MBEDTLS_MPI_CHK(f) do { if( ( ret = f ) != 0 ) goto cleanup; } while( 0 )

Ah, missed that. Okay, that could be dealt with with another cleanup_hw label ahead of the close.

(Or, as you're in C++ now you start putting in RAII stuff that automatically destroys the things)

But the bool is fine.

(Except I still think you could continue to make these calls in the init/free anyway, as per other comments).

@kjbracey-arm As above, ecc AC init/free are moved back to mbedtls_internal_ecp_init / mbedtls_internal_ecp_free.

kjbracey

Looks good, only remaining nit is about the mutex lock - don't check the return value, it can't fail. System would error out first.

Having void xxx_acquire versus bool xxx_try_acquire would make the design difference clearer.

kjbracey · 2018-12-18T07:50:42Z

targets/TARGET_NUVOTON/TARGET_M480/crypto/crypto-misc.cpp

@@ -121,42 +140,42 @@ void crypto_zeroize32(uint32_t *v, size_t n)

 bool crypto_aes_acquire(void)
 {
-    return crypto_submodule_acquire(&crypto_aes_avail);
+    return crypto_aes_mutex->lock() == osOK;


No need for this check - indeed it's deprecated.

@note This function treats RTOS errors as fatal system errors, so it can only return osOK. Use of the return value is deprecated, as the return is expected to become void in the future.

@kjbracey-arm OK. I remove the return code check and introduce xxx_try_acquire to make code more clear.

gilles-peskine-arm

Looks ok to me, coming from the Mbed TLS side and not familiar with the Mbed OS APIs or the Nuvoton hardware.

kjbracey

All good now, but the commits should all be squashed together - don't need the 13 commits it took to get here in the final history.

yanesca

Looks good to me.

cmonr · 2018-12-18T22:53:25Z

@ccli8 Squash the commits, and then we'll be ready to start CI.

1. For SHA AC, use atomic flag to manage its ownership. (1) Nuvoton SHA AC doesn't support SHA context save & restore, so S/W SHA fallback has been supported before. To make non-blocking 'acquire' semantics clearer, introduce 'try_acquire' to substitute for 'acquire'. (2) No biting CPU due to mechanism above. (3) No deadlock due to mechanism above. 2. For AES/DES/ECC AC, change to mutex to manage their ownership. (1) Change crypto-misc.c to crypto-misc.cpp to utilize C++ SingletonPtr which guarantees thread-safe mutex construct-on-first-use. (2) With change to crypto-misc.cpp, add 'extern "C"' modifier to CRYPTO_IRQHandler() to avoid name mangling in C++. (3) No priority inversion because mutex has osMutexPrioInherit attribute bit set. (4) No deadlock because these AC are all locked for a short sequence of operations rather than the whole lifetime of mbedtls context. (5) For double mbedtls_internal_ecp_init() issue, it has been fixed in upper mbedtls layer. So no need to change ecc init/free flow.

ccli8 · 2018-12-19T02:50:24Z

@kjbracey-arm @cmonr squash done

0xc0170 · 2018-12-19T09:51:49Z

CI started

mbed-ci · 2018-12-19T13:55:09Z

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 1
Build artifacts

…o_ac Nuvoton: Fix crypto AC management

kjbracey · 2019-02-04T12:35:15Z

@ccli8 - searching the tree, I see that M2351 has a bunch of crypto stuff which still spinlocks like these platforms originally did. I think this PR's work needs to be ported to that too.

ccli8 · 2019-02-04T16:03:57Z

searching the tree, I see that M2351 has a bunch of crypto stuff which still spinlocks like these platforms originally did. I think this PR's work needs to be ported to that too.

@kjbracey-arm OK. I will port this PR's work to M2351.

cmonr requested a review from a team December 13, 2018 03:56

cmonr added the needs: review label Dec 13, 2018

0xc0170 requested a review from a team December 13, 2018 11:20

yanesca reviewed Dec 13, 2018

View reviewed changes

[M487] Fix return error code when ECC H/W acceleratioin is failed

b16b1db

ccli8 force-pushed the nuvoton_refine_crypto_ac branch from 333d573 to 4578639 Compare December 14, 2018 09:33

ccli8 changed the title ~~Nuvoton: Fix minor issues with crypto AC~~ Nuvoton: Fix crypto AC management Dec 14, 2018

kjbracey suggested changes Dec 14, 2018

View reviewed changes

0xc0170 added needs: work and removed needs: review labels Dec 14, 2018

kjbracey reviewed Dec 17, 2018

View reviewed changes

kjbracey suggested changes Dec 18, 2018

View reviewed changes

gilles-peskine-arm approved these changes Dec 18, 2018

View reviewed changes

kjbracey approved these changes Dec 18, 2018

View reviewed changes

yanesca approved these changes Dec 18, 2018

View reviewed changes

ccli8 force-pushed the nuvoton_refine_crypto_ac branch from 9347486 to ca44675 Compare December 19, 2018 02:48

cmonr added needs: CI and removed needs: work labels Dec 19, 2018

0xc0170 added ready for merge release-version: 5.11.1 labels Dec 19, 2018

0xc0170 removed the needs: CI label Dec 19, 2018

cmonr pushed a commit to cmonr/mbed-os that referenced this pull request Dec 19, 2018

Merge pull request ARMmbed#9081 from OpenNuvoton/nuvoton_refine_crypt…

c447b81

…o_ac Nuvoton: Fix crypto AC management

cmonr merged commit 9edbcd7 into ARMmbed:master Dec 19, 2018

cmonr removed the ready for merge label Dec 20, 2018

screamerbg mentioned this pull request Dec 23, 2018

PR #9081 breaks Nuvoton M480 based targets for non-RTOS/baremetal mode #9190

Closed

ccli8 mentioned this pull request Dec 24, 2018

Nuvoton: Fix crypto compile error with Mbed OS 2 #9191

Merged

ccli8 mentioned this pull request Feb 11, 2019

M2351: Fix crypto AC management #9656

Merged

ccli8 deleted the nuvoton_refine_crypto_ac branch February 12, 2019 01:51

Nuvoton: Fix crypto AC management #9081

Nuvoton: Fix crypto AC management #9081

Uh oh!

Conversation

ccli8 commented Dec 13, 2018

Description

Pull request type

Uh oh!

kjbracey commented Dec 13, 2018

Uh oh!

yanesca left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey commented Dec 13, 2018

Uh oh!

ccli8 commented Dec 14, 2018

Uh oh!

kjbracey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjbracey commented Dec 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccli8 commented Dec 17, 2018

Uh oh!

kjbracey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kjbracey left a comment •

edited

Loading

kjbracey commented Dec 14, 2018 •

edited

Loading