Skip to content

MbedCRC and CRC HAL revisions #11559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 13, 2019
Merged

MbedCRC and CRC HAL revisions #11559

merged 3 commits into from
Nov 13, 2019

Conversation

kjbracey
Copy link
Contributor

Description

Coupled changes to help speed and size optimisation, reduce special cases, and increase configurability.

  • Use compile-time detection of hardware CRC capability, so unneeded code and tables do not go into the image.

  • Add global JSON config option to allow choice between no tables, 16-entry tables or 256-entry tables for software CRC. Default set to 16-entry, reducing ROM size from previous 256-entry.

  • Allow manual override in template parameter to force software or bitwise CRC for a particular instance.

  • Micro-optimisations, particularly use of RBIT instruction and optimising bitwise computation using inline assembler.

  • Add support for 32-bit CRCs and reversals to TMPM3HQ target.

Incompatible changes as part of removing special cases:

  • Remove special-case "POLY_32BIT_REV_ANSI" - users can use standard POLY_32BIT_ANSI, which now uses the same 16-entry tables by default, or can use hardware acceleration, which was disabled for POLY_32BIT_REV_ANSI. MbedCRC<POLY_32BIT_ANSI, 32, CrcMode::TABLE> can be used to force software like POLY_32BIT_REV_ANSI.
  • The precomputed table for POLY_16BIT_IBM had errors - this has been corrected, but software CRC results will be different from the previous software calculation.
  • < 8-bit CRC results are no longer are shifted up in the output value, but placed in the lowest bits, like other sizes. This means that code performing the SD command CRC will now need to use (crc << 1) | 1, rather than crc | 1.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[X] Breaking change

Reviewers

@bulislaw, @deepikabhavnani, @evedon, @pan-

Release Notes

  • The MbedCRC class has been improved and optimised. There is now a global JSON configuration drivers.crc-table-size controlling table usage, and individual uses can be size-optimised via template parameters. However some changes are backwards-incompatible:
    • The special-case handling of POLY_32BIT_REV_ANSI has been removed - the same result can be obtained via POLY_32BIT_ANSI.
    • CRCs smaller than 8 bits now return results in the standard position at the bottom of the register - previously they were shifted with random bits at the bottom.
    • The previous precomputed table for POLY_16BIT_IBM had errors - this has been corrected, but CRC results will be different from the previous software calculation (but will match any hardware calculation)

@kjbracey kjbracey force-pushed the crc branch 2 times, most recently from 8f3b1e9 to d8acc7e Compare September 24, 2019 14:55
@ciarmcom
Copy link
Member

@kjbracey-arm, thank you for your changes.
@evedon @bulislaw @maclobdell @deepikabhavnani @pan- @ARMmbed/mbed-os-core @ARMmbed/mbed-os-storage @ARMmbed/mbed-os-hal @ARMmbed/mbed-os-test @ARMmbed/mbed-os-maintainers please review.

@kjbracey kjbracey force-pushed the crc branch 2 times, most recently from 82de4e6 to 4c63461 Compare September 25, 2019 07:18
Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kjbracey-arm that looks great. I will spend some time latter (not today) to review it in deep.

I just wonder if it wouldn't have been easier for porting to have hardware CRC supported in targets.json. Inheriting targets benefit from parents definitions. Not all vendors have a common/objects.h mechanism. As long as it is documented; it should be sufficient.

*
* Platform support for particular CRC polynomials is indicated by the
* macro HAL_CRC_IS_SUPPORTED(polynomial, width), which must be defined
* in device.h, or a file included from there.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this pr, it is defined in objects.h

Copy link
Contributor Author

@kjbracey kjbracey Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe objects.h is a formal part of the HAL API - the core code includes device.h, and most (or all?) targets include objects.h from there.

It seems the usual structure is that each device.h is maximally specific for a target, and those include a more common objects.h.

(Update: searched, and there's no reference to objects.h outside /target, apart from one mention in cryptocell's Readme.md on porting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device.h is the one. We used to have objects.h but after added config system it almost vanished.

@kjbracey
Copy link
Contributor Author

kjbracey commented Sep 25, 2019

I just wonder if it wouldn't have been easier for porting to have hardware CRC supported in targets.json.

We do have DEVICE_CRC, which is an overall indication. But we need a more specific determination based on polynomial and width. It's typical for targets to support 16 and/or 32-bit only, or only fixed polynomials.

So DEVICE_CRC indicates that we provide the HAL_CRC_IS_SUPPORTED macro (was previously a function); one of the (pre-existing) tests checks that returns true for at least one of the standard CRCs, and then the function definitions must exist.

Copy link
Contributor

@SeppoTakalo SeppoTakalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No unittests?

@kjbracey
Copy link
Contributor Author

kjbracey commented Sep 25, 2019

No unittests?

Not sure what that would add over the Greentea module test - which we need anyway to cover the board and chip variance (M0 vs M4 assembly and HW accel versus not).

The Greentea test could do with being expanded to ensure fuller coverage - in particular the previous table errors were not picked up because it only CRC checks 123456789.

There is an issue about the config though - how can we give Greentea tests a config such that at least some targets test the table size set to 256? The 3 main modes (bitwise, table, hardware) get exercised already, but the two possible table sizes aren't.

@SeppoTakalo
Copy link
Contributor

I was thinking that does it make sense to test the software implementation in unittests, and HW implementation in Greentea.. but we still need to test the SW in Greentea as well.. but does it have to be all platforms?

@kjbracey
Copy link
Contributor Author

Yeah, could shift to unit test away from Greentea for the "driver" software test. But I think you do want to check that the MbedCRC class does the right thing on a bunch of platforms with different HAL_CRC_IS_SUPPORTED and DEVICE_CRC settings, plus you need to test the assembler. The code is quite target-adaptive, unlike more straight middleware bits like event loops or whatever.

There is a separate HAL CRC Greentea test just exercising the HAL API for targets that support it.

@kjbracey
Copy link
Contributor Author

kjbracey commented Sep 27, 2019

@pan-, if you're interested in staring at compiler output again, much of the aim of this PR was to try to exclude dead code, with the most important point being to make the "do I have hardware for this CRC" test compile-time.

That leaves us with polynomial and size as template parameters, hardware availability based on template parameters, and the 2 xor and 2 flip options as constructor parameters.

So, in principle, given

  MbedCRC<poly, 32> crc;
  crc.compute(data, len, &result);

The compiler should be able to fully optimise out the xor and flip options from the constructor. And that's the reason for the SDBlockDriver changes.

GCC fully optimises that as I intend, but clang x86 (in compiler explorer) doesn't. It ends up inlining pointless code that checks _reflect_result, when it knows full well what it's set to. Weird. (I haven't checked to see if ARMC6 does better - I doubt it would). Not found any way of persuading it to do better. :(

@bulislaw
Copy link
Member

bulislaw commented Oct 2, 2019

that looks awesome! I'm afraid I don't fully understand all of the C++ magic though, @pan- would be awesome if you could have a closer look.

@ARMmbed/team-cypress @ARMmbed/team-st-mcd @ARMmbed/team-silabs @ARMmbed/team-nxp @ARMmbed/team-toshiba can you please review changes to your HAL implementation in this commit b7f41cf

fyi @MarceloSalazar @maclobdell

@bulislaw
Copy link
Member

bulislaw commented Oct 2, 2019

@ARMmbed/mbed-os-maintainers that's a breaking change that shouldn't go to 5.15
@kjbracey-arm the handbook will need to be updated probably both on user and porting sides

@jeromecoutant
Copy link
Collaborator

ST CI:

target platform_name test suite result elapsed_time (sec) copy_method
NUCLEO_F091RC-ARMC6 NUCLEO_F091RC tests-mbed_drivers-crc OK 17.95 default
NUCLEO_F091RC-ARMC6 NUCLEO_F091RC tests-mbed_hal-crc OK 18.19 default
NUCLEO_F103RB-ARMC6 NUCLEO_F103RB tests-mbed_drivers-crc OK 18.42 default
NUCLEO_F207ZG-ARMC6 NUCLEO_F207ZG tests-mbed_drivers-crc OK 17.0 default
NUCLEO_F303ZE-ARMC6 NUCLEO_F303ZE tests-mbed_drivers-crc OK 18.21 default
NUCLEO_F303ZE-ARMC6 NUCLEO_F303ZE tests-mbed_hal-crc OK 18.45 default
NUCLEO_F446RE-ARMC6 NUCLEO_F446RE tests-mbed_drivers-crc OK 16.94 default
NUCLEO_F767ZI-ARMC6 NUCLEO_F767ZI tests-mbed_drivers-crc OK 17.24 default
NUCLEO_F767ZI-ARMC6 NUCLEO_F767ZI tests-mbed_hal-crc OK 17.56 default
NUCLEO_H743ZI-ARMC6 NUCLEO_H743ZI tests-mbed_drivers-crc OK 16.9 default
NUCLEO_H743ZI-ARMC6 NUCLEO_H743ZI tests-mbed_hal-crc OK 17.35 default
NUCLEO_L073RZ-ARMC6 NUCLEO_L073RZ tests-mbed_drivers-crc OK 20.11 default
NUCLEO_L073RZ-ARMC6 NUCLEO_L073RZ tests-mbed_hal-crc OK 20.3 default
NUCLEO_L152RE-ARMC6 NUCLEO_L152RE tests-mbed_drivers-crc OK 17.84 default
NUCLEO_L476RG-ARMC6 NUCLEO_L476RG tests-mbed_drivers-crc OK 17.88 default
NUCLEO_L476RG-ARMC6 NUCLEO_L476RG tests-mbed_hal-crc OK 18.04 default
NUCLEO_WB55RG-ARMC6 NUCLEO_WB55RG tests-mbed_drivers-crc OK 16.88 default
NUCLEO_WB55RG-ARMC6 NUCLEO_WB55RG tests-mbed_hal-crc OK 17.29 default

@adbridge adbridge added the release-version: 6.0.0-alpha-1 First pre-release version of 6.0.0 label Oct 7, 2019
@kjbracey
Copy link
Contributor Author

GCC fully optimises that as I intend, but clang x86 (in compiler explorer) doesn't. It ends up inlining pointless code that checks _reflect_result, when it knows full well what it's set to. Weird. (I haven't checked to see if ARMC6 does better - I doubt it would). Not found any way of persuading it to do better. :(

I think I sort of understand the reluctance to optimise - when compiling

  • compute_partial_start
  • compute_partial
  • compute_partial_end

When clang doesn't inline the compute_partial, it decides it can't be absolutely certain about what the compute_partial does, so considers the possibility that the object could be changed by the call. (Even though the members are all const). So the optimisation of compute_partial_end is suppressed.

GCC does inline the compute_partial, so can be more certain. I'm not sure there is anything technically stopping clang assuming (non-volatile) const members don't change though.

@mbed-ci
Copy link

mbed-ci commented Nov 12, 2019

Test run: FAILED

Summary: 2 of 4 test jobs failed
Build number : 3
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_unittests
  • jenkins-ci/mbed-os-ci_build-ARM

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 13, 2019

Please review unit test failures, related to the change set

@kjbracey
Copy link
Contributor Author

The ARM build failure was spurious though?

* Change "is supported" check to be a macro, so it can be done at
  compile-time.
* Eliminate weird shift on 7-bit CRCs.
* Add support for 32-bit CRCs and reversals to TMPM3HQ.
* Use compile-time detection of hardware CRC capability, so unneeded
  code and tables do not go into the image.
* Add global JSON config option to allow choice between no tables,
  16-entry tables or 256-entry tables for software CRC. Default set
  to 16-entry, reducing ROM size from previous 256-entry.
* Allow manual override in template parameter to force software or
  bitwise CRC for a particular instance.
* Micro-optimisations, particularly use of `RBIT` instruction and
  optimising bitwise computation using inline assembler.

Incompatible changes:

* Remove special-case "POLY_32BIT_REV_ANSI" - users can use standard
  POLY_32BIT_ANSI, which now uses the same 16-entry tables by default,
  or can use hardware acceleration, which was disabled for
  POLY_32BIT_REV_ANSI. MbedCRC<POLY_32BIT_ANSI, 32, CrcMode::TABLE> can
  be used to force software like POLY_32BIT_REV_ANSI.
* The precomputed table for POLY_16BIT_IBM had errors - this has been
  corrected, but software CRC results will be different from the previous
  software calculation.
* < 8-bit CRC results are no longer are shifted up in the output value,
  but placed in the lowest bits, like other sizes. This means that code
  performing the SD command CRC will now need to use `(crc << 1) | 1`,
  rather than `crc | 1`.
* Make mbed_error use bitwise MbedCRC call rather than local
  implementation.
* Remove use of POLY_32BIT_REV_ANSI from LittleFS.
* Move some MbedCRC instances closer to use - construction cost is
  trivial, and visibility aids compiler optimisation.
@kjbracey
Copy link
Contributor Author

kjbracey commented Nov 13, 2019

Updated to fix unit tests - needed a portable definition of __RBIT, and to disable DEVICE_CRC from the build so we don't look for HAL_CRC_IS_SUPPORTED.

Unit test build failure was newly-arising from the MbedCRC-using TDBStore being added to the unit test source list in #11797

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 13, 2019

CI restarted

@mbed-ci
Copy link

mbed-ci commented Nov 13, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 4
Build artifacts

@adbridge
Copy link
Contributor

OK this should not have gone to master prior to the creation of 5.15. @0xc0170 this needs to be reverted before we make the 5.15 branch ....

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 27, 2019

Remarked, revert also in 5.15. Master should be now OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants