Skip to content

Nuvoton: Fix crypto AC management #9081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 19, 2018

Conversation

ccli8
Copy link
Contributor

@ccli8 ccli8 commented Dec 13, 2018

Description

This PR is continuation of #8985. Originally, it is to replace busy-wait loop with mutex in crypto AC management. But in PDMC Greentea test, mbedtls_sha256_init() (lock mutex) and mbedtls_sha256_free() (unlock mutex) are not necessarily called by the same thread, so managing crypto AC with mutex is suspended. This PR is reduced to include minor fixes with crypto AC.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

@cmonr cmonr requested a review from a team December 13, 2018 03:56
@kjbracey
Copy link
Contributor

That hang isn't really possible is it? Except in terms of software error? If you're releasing, you know the lock is held, so the expected value will be there. That is never going to loop, right?

This code is something I've been conscious of, because it's one of the very few users of atomic_cas, and like others you don't really need the atomic_cas. You just want an atomic flag.

I actually added atomic flag for 5.11, so I suggest you switch to using it. Then that routine would just be atomic_flag_clear.

@0xc0170 0xc0170 requested a review from a team December 13, 2018 11:20
Copy link
Contributor

@yanesca yanesca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left comments with a suggestion and a couple of questions.

Even in a single thread application it is legal to have several mbedtls_xxx_context with overlapping lifetimes. For example initialising ctx1 and then initialising ctx2 without freeing ctx1 first should be supported. But looking at the code this second init call would always hang. Or am I missing something?

@@ -53,6 +53,7 @@
* would be defined in mbedtls/ecp.h from ecp.c for our inclusion */
#define ECP_SHORTWEIERSTRASS

#include "mbedtls/ssl.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to include just"mbedtls/platform.h" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanesca I fix it in b16b1db.

@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail)
static void crypto_submodule_release(uint16_t *submodule_avail)
{
uint16_t expectedCurrentValue = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the expected value 1 here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a backwards "available" flag. So 0 when in use, and write 1 to release.

@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail)
static void crypto_submodule_release(uint16_t *submodule_avail)
{
uint16_t expectedCurrentValue = 0;
while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1));
while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1)) {

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm Yes, this loop is unnecessary. Anyway, semaphore is used to substitute for busy-wait loop.

@kjbracey
Copy link
Contributor

Yes, it is perfectly legal to init two contexts within 1 thread, and this would indeed hang. Sounds like we're missing a test case.

I've made other comments on #8985 about the locking problem for this - I already thought we needed a software fallback, and now I'm sure we do.

@ccli8 ccli8 force-pushed the nuvoton_refine_crypto_ac branch from 333d573 to 4578639 Compare December 14, 2018 09:33
@ccli8 ccli8 changed the title Nuvoton: Fix minor issues with crypto AC Nuvoton: Fix crypto AC management Dec 14, 2018
@ccli8
Copy link
Contributor Author

ccli8 commented Dec 14, 2018

@kjbracey-arm @yanesca Following your recommendations, I re-organize the commits. The title is also changed to reflect the intention.

  1. For deadlock,
    1. For AES/DES AC, they are locked for their real operation rather than for the whole lifetime of AES/DES context. So no deadlock.
    2. For SHA AC, because it doesn't support SHA context save & restore, it can only be locked for the whole lifetime of SHA context. When SHA AC is not available, S/W SHA fallback has supported before to avoid deadlock.
    3. For ECC AC, its lock period is changed to its real operation from the whole lifetime of ECP context. So deadlock can be avoided as AES/DES AC.
  2. For double init for same/different context,
    1. For SHA AC, same as above, double init is allowed with support for S/W SHA fallback.
    2. For AES/DES/ECC AC, same as above, they are locked in start of real operation rather than context init, so double init is allowed.
  3. Choose semaphore for synchronization mechanism. Compared to busy-wait loop, it doesn't bite CPU. Compared to mutex, it allows SHA context init/free in different threads. Known drawback of it is priority inversion.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this addresses the fundamental problem that there can be two SHA contexts active in the same thread, and there's no reason why a SHA context can't otherwise be very long-running.

It's not acceptable to make other SHA users wait potentially forever - this implementation means there can only ever be 1 SHA instance live at a time in a system. There are potential applications that need to do a high-priority SHA, eg for a radio ack, and they must not be delayed by a large bulk SHA happening on a firmware download.

I think you need to have a software fallback for the SHA for 2nd, 3rd, 4th live context. You could have an implementation which lets a context use hardware for its lifetime if the hardware is free when it's inited. That is going to be a bit performance unpredictable as to who gets to do acceleration, but would probably be acceptable in practice. Such an implementation could just use an atomic flag, as it's not going to block waiting for the hardware.

For the others, they should just use mutexes, avoiding the priority inversion.

Actually, this does somewhat cover what I'm talking about, let me reconsider.

uint16_t expectedCurrentValue = SYNCOBJ_INITSTATUS_UNINIT;
while (1) {
/* Try to initialize semaphore if it has not initialized yet */
if (core_util_atomic_cas_u16(&crypto_submod_sem->init_status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model still leads to a busy-wait. I suggest you have a look at the cxa_guard implementations in mbed_retarget.cpp - uses a mutex to wait for another thread initialising. Borrow the logic (and the singleton_lock) from that.

Once C11 is in place you should be able to use call_once.

Another possibility is to use SingletonPtr with a custom class that contains the Semaphore(1,1) you need. That would mean changing this file to C++.

@kjbracey
Copy link
Contributor

kjbracey commented Dec 14, 2018

Okay, this is getting close to functional, but attempting to share the same locking types for the SHA and non-SHA cases doesn't make sense, and it's getting complicated by the attempt to do everything in C, which means you can't use the supporting C++ primitives for init-on-first-use.

I suggest:

  • Convert to C++, so you can use SingletonPtr for your AES etc locking objects, and that deals with init on first reference. Having a whole new local implementation of call_once is just increasing the scope for errors.
  • The AES etc locking objects should contain a Mutex. (Or just be a SingletonPtr<Mutex>).
  • The SHA locking should not share that mechanism at all - it just needs a core_util_atomic_flag, as it's never blocking.

(And hooray, no more semaphores. I hate semaphores. They're almost never the answer.)

@ccli8
Copy link
Contributor Author

ccli8 commented Dec 17, 2018

@kjbracey-arm Following your suggestions, I make the following modifications:

  • For SHA AC, change to atomic flag core_util_atomic_flag_test_and_set/core_util_atomic_flag_clear for its lock management.
  • For non-SHA AC (AES, DES, etc.), change back to mutex from semaphore for their lock management. Because these locks last only for AC's real operation rather than the whole lifetime of context, there would be no case lock/unlock are called in different threads and disobeys mutex requirement.
  • For thread-safe mutex construct on its first use, convert to C++ to utilize SingletonPtr.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, but think it could be simplified and clarified a bit more.

}

bool crypto_des_acquire(bool blocking)
{
return crypto_submodule_acquire(&crypto_des_sem, blocking);
return crypto_des_mutex->trylock_for(blocking ? osWaitForever : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of osWaitForever isn't documented with trylock_for, so I'd rather this was split into separate lock and trylock_for cases, if you really want the nonblocking version.

Do you really need the nonblocking version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm I remove the blocking parameter. As suggested, blocking or not depends on lock scope. For non-SHA AC, it is non-blocking to support trylock and fallback to S/W SHA on failure; for SHA AC, it is blocking to acquire AC for a short sequence of operations.

crypto_init();
ECC_ENABLE_INT();

/* Address mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free() are not strictly paired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed any longer? The intent is that the init is held only for a short sequence of operations, and your case 2 certainly couldn't happen on a single thread.

Breaking it up like this does make it more granular, I guess - gives the opportunity to interleave ops. Not sure how important that is for ECP, if at all. (It's not like the lighter-weight stuff like SHA, where there will sometimes be time-critical ones).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in a single thread application it is legal to have several mbedtls_xxx_context with overlapping lifetimes. For example initialising ctx1 and then initialising ctx2 without freeing ctx1 first should be supported ...

@kjbracey-arm Per @yanesca commented as above, it is legal to have two mbedtls_xxx_context overlapping in the same thread, so I move locking ecc AC from mbedtls_internal_ecp_init to its real operation period.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal_ecp_init/free are not contexts (despite the init/free naming being shared with that for contexts). They're specific paired pre- and post-op calls.

See documentation changes made in Mbed-TLS/mbedtls#2267

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm With mbedtls_internal_ecp_init/mbedtls_internal_ecp_free guaranteeing paired, non-overlapping, and non-cross-public-function, I move ecc AC init/free back to them.

* 1. Except for SHA AC which doesn't support context save & restore, we lock all crypto AC
* for just their real operation rather than the whole lifetime of their crypto context.
* For SHA AC, we provide SHA S/W fallback when SHA AC is not available. This policy can
* avoid deadlock. We choose mutex to synchronize access to crypto non-SHA AC. Known
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No priority inversion any more

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm I fixed the comment in 4d882aa. Because mutex lock scope is limited to a short sequence of operations, priority inversion issue gets less serious, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no priority inversion at all any more - that's the point of using a mutex. The RTOS can and does have priority inheritance for mutexes: https://www.keil.com/pack/doc/CMSIS/RTOS2/html/group__CMSIS__RTOS__MutexMgmt.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm I got it. Thanks for your information.

*/

/* Mutex for crypto AES AC management */
SingletonPtr<rtos::Mutex> crypto_aes_mutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should still be static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm Fixed in 2cc76eb.

{
return crypto_submodule_acquire(&crypto_sha_avail);
if (blocking) {
while (core_util_atomic_flag_test_and_set(&crypto_sha_atomic_flag));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you mustn't ever do this. And you don't, so take out the blocking case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm As above, blocking parameter has removed.

bool crypto_aes_acquire(void);
/* Acquire/release ownership of crypto sub-module
*
* \param blocking false for non-blocking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the blocking parameter, and the entire choice of locking primitive is on the basis the AES/DES/ECC always block, and SHA never does, so it would be clearer to take it out.

And the distinction could then just be in the function names:

void crypto_aes_acquire(void);
bool crypto_sha_try_acquire(void);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm Fix as above

* 3. Read back result through internal_mpi_read_eccreg()
*/
internal_open_ecc_ac();
ecc_ac_open = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just put the close ahead of the cleanup label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm I must guarantee internal_open_ecc_ac and internal_close_ecc_ac are paired. When MBEDTLS_MPI_CHK evaluates false, it will go to cleanup label. So I can only call internal_close_ecc_ac after cleanup label for code concision.

#define MBEDTLS_MPI_CHK(f) do { if( ( ret = f ) != 0 ) goto cleanup; } while( 0 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that. Okay, that could be dealt with with another cleanup_hw label ahead of the close.

(Or, as you're in C++ now you start putting in RAII stuff that automatically destroys the things)

But the bool is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Except I still think you could continue to make these calls in the init/free anyway, as per other comments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm As above, ecc AC init/free are moved back to mbedtls_internal_ecp_init / mbedtls_internal_ecp_free.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only remaining nit is about the mutex lock - don't check the return value, it can't fail. System would error out first.

Having void xxx_acquire versus bool xxx_try_acquire would make the design difference clearer.

@@ -121,42 +140,42 @@ void crypto_zeroize32(uint32_t *v, size_t n)

bool crypto_aes_acquire(void)
{
return crypto_submodule_acquire(&crypto_aes_avail);
return crypto_aes_mutex->lock() == osOK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this check - indeed it's deprecated.

     @note This function treats RTOS errors as fatal system errors, so it can only return osOK.
     Use of the return value is deprecated, as the return is expected to become void in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm OK. I remove the return code check and introduce xxx_try_acquire to make code more clear.

Copy link

@gilles-peskine-arm gilles-peskine-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, coming from the Mbed TLS side and not familiar with the Mbed OS APIs or the Nuvoton hardware.

Copy link
Contributor

@kjbracey kjbracey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good now, but the commits should all be squashed together - don't need the 13 commits it took to get here in the final history.

Copy link
Contributor

@yanesca yanesca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@cmonr
Copy link
Contributor

cmonr commented Dec 18, 2018

@ccli8 Squash the commits, and then we'll be ready to start CI.

1. For SHA AC, use atomic flag to manage its ownership.
   (1) Nuvoton SHA AC doesn't support SHA context save & restore, so S/W
       SHA fallback has been supported before. To make non-blocking 'acquire'
       semantics clearer, introduce 'try_acquire' to substitute for 'acquire'.
   (2) No biting CPU due to mechanism above.
   (3) No deadlock due to mechanism above.
2. For AES/DES/ECC AC, change to mutex to manage their ownership.
   (1) Change crypto-misc.c to crypto-misc.cpp to utilize C++ SingletonPtr
       which guarantees thread-safe mutex construct-on-first-use.
   (2) With change to crypto-misc.cpp, add 'extern "C"' modifier to CRYPTO_IRQHandler()
       to avoid name mangling in C++.
   (3) No priority inversion because mutex has osMutexPrioInherit attribute
       bit set.
   (4) No deadlock because these AC are all locked for a short sequence
       of operations rather than the whole lifetime of mbedtls context.
   (5) For double mbedtls_internal_ecp_init() issue, it has been fixed in upper
       mbedtls layer. So no need to change ecc init/free flow.
@ccli8 ccli8 force-pushed the nuvoton_refine_crypto_ac branch from 9347486 to ca44675 Compare December 19, 2018 02:48
@ccli8
Copy link
Contributor Author

ccli8 commented Dec 19, 2018

@kjbracey-arm @cmonr squash done

@0xc0170
Copy link
Contributor

0xc0170 commented Dec 19, 2018

CI started

@mbed-ci
Copy link

mbed-ci commented Dec 19, 2018

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 1
Build artifacts

@0xc0170 0xc0170 removed the needs: CI label Dec 19, 2018
cmonr pushed a commit to cmonr/mbed-os that referenced this pull request Dec 19, 2018
@cmonr cmonr merged commit 9edbcd7 into ARMmbed:master Dec 19, 2018
@kjbracey
Copy link
Contributor

kjbracey commented Feb 4, 2019

@ccli8 - searching the tree, I see that M2351 has a bunch of crypto stuff which still spinlocks like these platforms originally did. I think this PR's work needs to be ported to that too.

@ccli8
Copy link
Contributor Author

ccli8 commented Feb 4, 2019

searching the tree, I see that M2351 has a bunch of crypto stuff which still spinlocks like these platforms originally did. I think this PR's work needs to be ported to that too.

@kjbracey-arm OK. I will port this PR's work to M2351.

@ccli8 ccli8 deleted the nuvoton_refine_crypto_ac branch February 12, 2019 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants