-
Notifications
You must be signed in to change notification settings - Fork 3k
Nuvoton: Fix crypto AC management #9081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That hang isn't really possible is it? Except in terms of software error? If you're releasing, you know the lock is held, so the expected value will be there. That is never going to loop, right? This code is something I've been conscious of, because it's one of the very few users of atomic_cas, and like others you don't really need the atomic_cas. You just want an atomic flag. I actually added atomic flag for 5.11, so I suggest you switch to using it. Then that routine would just be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left comments with a suggestion and a couple of questions.
Even in a single thread application it is legal to have several mbedtls_xxx_context
with overlapping lifetimes. For example initialising ctx1
and then initialising ctx2
without freeing ctx1
first should be supported. But looking at the code this second init call would always hang. Or am I missing something?
@@ -53,6 +53,7 @@ | |||
* would be defined in mbedtls/ecp.h from ecp.c for our inclusion */ | |||
#define ECP_SHORTWEIERSTRASS | |||
|
|||
#include "mbedtls/ssl.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to include just"mbedtls/platform.h"
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail) | |||
static void crypto_submodule_release(uint16_t *submodule_avail) | |||
{ | |||
uint16_t expectedCurrentValue = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the expected value 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a backwards "available" flag. So 0 when in use, and write 1 to release.
@@ -216,7 +216,11 @@ static bool crypto_submodule_acquire(uint16_t *submodule_avail) | |||
static void crypto_submodule_release(uint16_t *submodule_avail) | |||
{ | |||
uint16_t expectedCurrentValue = 0; | |||
while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1)); | |||
while (! core_util_atomic_cas_u16(submodule_avail, &expectedCurrentValue, 1)) { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm Yes, this loop is unnecessary. Anyway, semaphore is used to substitute for busy-wait loop.
Yes, it is perfectly legal to init two contexts within 1 thread, and this would indeed hang. Sounds like we're missing a test case. I've made other comments on #8985 about the locking problem for this - I already thought we needed a software fallback, and now I'm sure we do. |
333d573
to
4578639
Compare
@kjbracey-arm @yanesca Following your recommendations, I re-organize the commits. The title is also changed to reflect the intention.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this addresses the fundamental problem that there can be two SHA contexts active in the same thread, and there's no reason why a SHA context can't otherwise be very long-running.
It's not acceptable to make other SHA users wait potentially forever - this implementation means there can only ever be 1 SHA instance live at a time in a system. There are potential applications that need to do a high-priority SHA, eg for a radio ack, and they must not be delayed by a large bulk SHA happening on a firmware download.
I think you need to have a software fallback for the SHA for 2nd, 3rd, 4th live context. You could have an implementation which lets a context use hardware for its lifetime if the hardware is free when it's inited. That is going to be a bit performance unpredictable as to who gets to do acceleration, but would probably be acceptable in practice. Such an implementation could just use an atomic flag, as it's not going to block waiting for the hardware.
For the others, they should just use mutexes, avoiding the priority inversion.
Actually, this does somewhat cover what I'm talking about, let me reconsider.
uint16_t expectedCurrentValue = SYNCOBJ_INITSTATUS_UNINIT; | ||
while (1) { | ||
/* Try to initialize semaphore if it has not initialized yet */ | ||
if (core_util_atomic_cas_u16(&crypto_submod_sem->init_status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model still leads to a busy-wait. I suggest you have a look at the cxa_guard implementations in mbed_retarget.cpp - uses a mutex to wait for another thread initialising. Borrow the logic (and the singleton_lock
) from that.
Once C11 is in place you should be able to use call_once
.
Another possibility is to use SingletonPtr
with a custom class that contains the Semaphore(1,1)
you need. That would mean changing this file to C++.
Okay, this is getting close to functional, but attempting to share the same locking types for the SHA and non-SHA cases doesn't make sense, and it's getting complicated by the attempt to do everything in C, which means you can't use the supporting C++ primitives for init-on-first-use. I suggest:
(And hooray, no more semaphores. I hate semaphores. They're almost never the answer.) |
@kjbracey-arm Following your suggestions, I make the following modifications:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but think it could be simplified and clarified a bit more.
} | ||
|
||
bool crypto_des_acquire(bool blocking) | ||
{ | ||
return crypto_submodule_acquire(&crypto_des_sem, blocking); | ||
return crypto_des_mutex->trylock_for(blocking ? osWaitForever : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of osWaitForever
isn't documented with trylock_for
, so I'd rather this was split into separate lock
and trylock_for
cases, if you really want the nonblocking version.
Do you really need the nonblocking version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm I remove the blocking
parameter. As suggested, blocking or not depends on lock scope. For non-SHA AC, it is non-blocking to support trylock
and fallback to S/W SHA on failure; for SHA AC, it is blocking to acquire AC for a short sequence of operations.
crypto_init(); | ||
ECC_ENABLE_INT(); | ||
|
||
/* Address mbedtls_internal_ecp_init()/mbedtls_internal_ecp_free() are not strictly paired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change needed any longer? The intent is that the init
is held only for a short sequence of operations, and your case 2 certainly couldn't happen on a single thread.
Breaking it up like this does make it more granular, I guess - gives the opportunity to interleave ops. Not sure how important that is for ECP, if at all. (It's not like the lighter-weight stuff like SHA, where there will sometimes be time-critical ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in a single thread application it is legal to have several mbedtls_xxx_context with overlapping lifetimes. For example initialising ctx1 and then initialising ctx2 without freeing ctx1 first should be supported ...
@kjbracey-arm Per @yanesca commented as above, it is legal to have two mbedtls_xxx_context overlapping in the same thread, so I move locking ecc AC from mbedtls_internal_ecp_init
to its real operation period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal_ecp_init/free
are not contexts (despite the init/free naming being shared with that for contexts). They're specific paired pre- and post-op calls.
See documentation changes made in Mbed-TLS/mbedtls#2267
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm With mbedtls_internal_ecp_init
/mbedtls_internal_ecp_free
guaranteeing paired, non-overlapping, and non-cross-public-function, I move ecc AC init/free back to them.
* 1. Except for SHA AC which doesn't support context save & restore, we lock all crypto AC | ||
* for just their real operation rather than the whole lifetime of their crypto context. | ||
* For SHA AC, we provide SHA S/W fallback when SHA AC is not available. This policy can | ||
* avoid deadlock. We choose mutex to synchronize access to crypto non-SHA AC. Known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No priority inversion any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm I fixed the comment in 4d882aa. Because mutex lock scope is limited to a short sequence of operations, priority inversion issue gets less serious, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no priority inversion at all any more - that's the point of using a mutex. The RTOS can and does have priority inheritance for mutexes: https://www.keil.com/pack/doc/CMSIS/RTOS2/html/group__CMSIS__RTOS__MutexMgmt.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm I got it. Thanks for your information.
*/ | ||
|
||
/* Mutex for crypto AES AC management */ | ||
SingletonPtr<rtos::Mutex> crypto_aes_mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should still be static
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm Fixed in 2cc76eb.
{ | ||
return crypto_submodule_acquire(&crypto_sha_avail); | ||
if (blocking) { | ||
while (core_util_atomic_flag_test_and_set(&crypto_sha_atomic_flag)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you mustn't ever do this. And you don't, so take out the blocking
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm As above, blocking
parameter has removed.
bool crypto_aes_acquire(void); | ||
/* Acquire/release ownership of crypto sub-module | ||
* | ||
* \param blocking false for non-blocking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the blocking parameter, and the entire choice of locking primitive is on the basis the AES/DES/ECC always block, and SHA never does, so it would be clearer to take it out.
And the distinction could then just be in the function names:
void crypto_aes_acquire(void);
bool crypto_sha_try_acquire(void);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm Fix as above
* 3. Read back result through internal_mpi_read_eccreg() | ||
*/ | ||
internal_open_ecc_ac(); | ||
ecc_ac_open = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just put the close
ahead of the cleanup
label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm I must guarantee internal_open_ecc_ac
and internal_close_ecc_ac
are paired. When MBEDTLS_MPI_CHK
evaluates false, it will go to cleanup
label. So I can only call internal_close_ecc_ac
after cleanup
label for code concision.
#define MBEDTLS_MPI_CHK(f) do { if( ( ret = f ) != 0 ) goto cleanup; } while( 0 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed that. Okay, that could be dealt with with another cleanup_hw
label ahead of the close.
(Or, as you're in C++ now you start putting in RAII stuff that automatically destroys the things)
But the bool is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Except I still think you could continue to make these calls in the init
/free
anyway, as per other comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm As above, ecc AC init/free are moved back to mbedtls_internal_ecp_init
/ mbedtls_internal_ecp_free
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only remaining nit is about the mutex lock - don't check the return value, it can't fail. System would error out first.
Having void xxx_acquire
versus bool xxx_try_acquire
would make the design difference clearer.
@@ -121,42 +140,42 @@ void crypto_zeroize32(uint32_t *v, size_t n) | |||
|
|||
bool crypto_aes_acquire(void) | |||
{ | |||
return crypto_submodule_acquire(&crypto_aes_avail); | |||
return crypto_aes_mutex->lock() == osOK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this check - indeed it's deprecated.
@note This function treats RTOS errors as fatal system errors, so it can only return osOK.
Use of the return value is deprecated, as the return is expected to become void in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm OK. I remove the return code check and introduce xxx_try_acquire
to make code more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me, coming from the Mbed TLS side and not familiar with the Mbed OS APIs or the Nuvoton hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good now, but the commits should all be squashed together - don't need the 13 commits it took to get here in the final history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@ccli8 Squash the commits, and then we'll be ready to start CI. |
1. For SHA AC, use atomic flag to manage its ownership. (1) Nuvoton SHA AC doesn't support SHA context save & restore, so S/W SHA fallback has been supported before. To make non-blocking 'acquire' semantics clearer, introduce 'try_acquire' to substitute for 'acquire'. (2) No biting CPU due to mechanism above. (3) No deadlock due to mechanism above. 2. For AES/DES/ECC AC, change to mutex to manage their ownership. (1) Change crypto-misc.c to crypto-misc.cpp to utilize C++ SingletonPtr which guarantees thread-safe mutex construct-on-first-use. (2) With change to crypto-misc.cpp, add 'extern "C"' modifier to CRYPTO_IRQHandler() to avoid name mangling in C++. (3) No priority inversion because mutex has osMutexPrioInherit attribute bit set. (4) No deadlock because these AC are all locked for a short sequence of operations rather than the whole lifetime of mbedtls context. (5) For double mbedtls_internal_ecp_init() issue, it has been fixed in upper mbedtls layer. So no need to change ecc init/free flow.
9347486
to
ca44675
Compare
@kjbracey-arm @cmonr squash done |
CI started |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
…o_ac Nuvoton: Fix crypto AC management
@ccli8 - searching the tree, I see that M2351 has a bunch of crypto stuff which still spinlocks like these platforms originally did. I think this PR's work needs to be ported to that too. |
@kjbracey-arm OK. I will port this PR's work to M2351. |
Description
This PR is continuation of #8985. Originally, it is to replace busy-wait loop with mutex in crypto AC management. But in PDMC Greentea test,
mbedtls_sha256_init()
(lock mutex) andmbedtls_sha256_free()
(unlock mutex) are not necessarily called by the same thread, so managing crypto AC with mutex is suspended. This PR is reduced to include minor fixes with crypto AC.Pull request type