Skip to content

[flang] IEEE underflow control for Arm #124170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 27, 2025
Merged

[flang] IEEE underflow control for Arm #124170

merged 1 commit into from
Jan 27, 2025

Conversation

vdonaldson
Copy link
Contributor

Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.

Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and
IEEE_SET_UNDERFLOW_MODE code for Arm.
@llvmbot llvmbot added flang:runtime flang Flang issues not falling into any other category labels Jan 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-flang-runtime

Author: None (vdonaldson)

Changes

Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and IEEE_SET_UNDERFLOW_MODE code for Arm.


Full diff: https://github.com/llvm/llvm-project/pull/124170.diff

2 Files Affected:

  • (modified) flang/include/flang/Tools/TargetSetup.h (+16-15)
  • (modified) flang/runtime/exceptions.cpp (+29-7)
diff --git a/flang/include/flang/Tools/TargetSetup.h b/flang/include/flang/Tools/TargetSetup.h
index d1b0da3a42c897..5d23df6823a948 100644
--- a/flang/include/flang/Tools/TargetSetup.h
+++ b/flang/include/flang/Tools/TargetSetup.h
@@ -24,34 +24,35 @@ namespace Fortran::tools {
     const std::string &compilerVersion, const std::string &compilerOptions) {
 
   const llvm::Triple &targetTriple{targetMachine.getTargetTriple()};
-  // FIXME: Handle real(3) ?
-  if (targetTriple.getArch() != llvm::Triple::ArchType::x86_64) {
-    targetCharacteristics.DisableType(
-        Fortran::common::TypeCategory::Real, /*kind=*/10);
-  }
+
+  targetCharacteristics.set_ieeeFeature(evaluate::IeeeFeature::Halting, true);
+
   if (targetTriple.getArch() == llvm::Triple::ArchType::x86_64) {
     targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/3);
     targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/4);
     targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/8);
   }
+
   if (targetTriple.isARM() || targetTriple.isAArch64()) {
     targetCharacteristics.set_haltingSupportIsUnknownAtCompileTime();
     targetCharacteristics.set_ieeeFeature(
         evaluate::IeeeFeature::Halting, false);
-  } else {
-    targetCharacteristics.set_ieeeFeature(evaluate::IeeeFeature::Halting);
+    targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/3);
+    targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/4);
+    targetCharacteristics.set_hasSubnormalFlushingControl(/*kind=*/8);
+  }
+
+  if (targetTriple.getArch() != llvm::Triple::ArchType::x86_64) {
+    targetCharacteristics.DisableType(
+        Fortran::common::TypeCategory::Real, /*kind=*/10);
   }
 
-  // Figure out if we can support F128: see
-  // flang/runtime/Float128Math/math-entries.h
-  // TODO: this should be taken from TargetInfo::getLongDoubleFormat to support
-  // cross-compilation
+  // Check for kind=16 support. See flang/runtime/Float128Math/math-entries.h.
+  // TODO: Take this from TargetInfo::getLongDoubleFormat for cross compilation.
 #ifdef FLANG_RUNTIME_F128_MATH_LIB
-  // we can use libquadmath wrappers
-  constexpr bool f128Support = true;
+  constexpr bool f128Support = true; // use libquadmath wrappers
 #elif HAS_LDBL128
-  // we can use libm wrappers
-  constexpr bool f128Support = true;
+  constexpr bool f128Support = true; // use libm wrappers
 #else
   constexpr bool f128Support = false;
 #endif
diff --git a/flang/runtime/exceptions.cpp b/flang/runtime/exceptions.cpp
index f541b8e844aded..7fca0c431f8cd0 100644
--- a/flang/runtime/exceptions.cpp
+++ b/flang/runtime/exceptions.cpp
@@ -11,7 +11,9 @@
 #include "flang/Runtime/exceptions.h"
 #include "terminator.h"
 #include <cfenv>
-#if __x86_64__
+#if __aarch64__
+#include <fpu_control.h>
+#elif __x86_64__
 #include <xmmintrin.h>
 #endif
 
@@ -90,20 +92,40 @@ bool RTNAME(SupportHalting)([[maybe_unused]] uint32_t except) {
 #endif
 }
 
+// A hardware FZ (flush to zero) bit is the negation of the
+// ieee_[get|set]_underflow_mode GRADUAL argument.
+#if defined(_MM_FLUSH_ZERO_MASK)
+// The MXCSR FZ bit affects computations of real kinds 3, 4, and 8.
+#elif defined(_FPU_GETCW)
+// The FPCR FZ bit affects computations of real kinds 3, 4, and 8.
+// bit 24: FZ   -- single, double precision flush to zero bit
+// bit 19: FZ16 -- half precision flush to zero bit [not currently relevant]
+#define _FPU_FPCR_FZ_MASK_ 0x01080000
+#endif
+
 bool RTNAME(GetUnderflowMode)(void) {
-#if _MM_FLUSH_ZERO_MASK
-  // The MXCSR Flush to Zero flag is the negation of the ieee_get_underflow_mode
-  // GRADUAL argument. It affects real computations of kinds 3, 4, and 8.
+#if defined(_MM_FLUSH_ZERO_MASK)
   return _MM_GET_FLUSH_ZERO_MODE() == _MM_FLUSH_ZERO_OFF;
+#elif defined(_FPU_GETCW)
+  uint32_t fpcr;
+  _FPU_GETCW(fpcr);
+  return (fpcr & _FPU_FPCR_FZ_MASK_) != _FPU_FPCR_FZ_MASK_;
 #else
   return false;
 #endif
 }
 void RTNAME(SetUnderflowMode)(bool flag) {
-#if _MM_FLUSH_ZERO_MASK
-  // The MXCSR Flush to Zero flag is the negation of the ieee_set_underflow_mode
-  // GRADUAL argument. It affects real computations of kinds 3, 4, and 8.
+#if defined(_MM_FLUSH_ZERO_MASK)
   _MM_SET_FLUSH_ZERO_MODE(flag ? _MM_FLUSH_ZERO_OFF : _MM_FLUSH_ZERO_ON);
+#elif defined(_FPU_GETCW)
+  uint32_t fpcr;
+  _FPU_GETCW(fpcr);
+  if (flag) {
+    fpcr &= ~_FPU_FPCR_FZ_MASK_;
+  } else {
+    fpcr |= _FPU_FPCR_FZ_MASK_;
+  }
+  _FPU_SETCW(fpcr);
 #endif
 }
 

Copy link
Contributor

@pawosm-arm pawosm-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the expert here, but code looks good to me.
One question looking at ARM doc for FPCR.FZ, does FPCR.AH also needs to be set to 0 or 1 to get the expected IEEE behavior (it looks like it impacts the flushing of fp operation inputs in the FZ doc)?

FPCR.AH seems to also impacts the rounding (RMode), so maybe modifying it here is a bad idea.

@vdonaldson
Copy link
Contributor Author

Not the expert here, but code looks good to me. One question looking at ARM doc for FPCR.FZ, does FPCR.AH also needs to be set to 0 or 1 to get the expected IEEE behavior (it looks like it impacts the flushing of fp operation inputs in the FZ doc)?

FPCR.AH seems to also impacts the rounding (RMode), so maybe modifying it here is a bad idea.

Thanks @jeanPerier for looking into this. My reading is that the AH bit affects whether denormal operands are treated as zero or not, which is distinct from whether or not denormal results are flushed to zero. My interpretation of the Fortran standard is that these ieee functions apply to results, not operands, and the AH bit is not relevant here. If at some point we implement (standard-external) compilation options to control denormal operand handling, that would involve the AH bit.

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation Val, LGTM.

@vdonaldson vdonaldson merged commit 3684ec4 into llvm:main Jan 27, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 27, 2025

LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx running on linaro-flang-aarch64-libcxx while building flang at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/15262

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
115.462 [123/5/7217] Generating ../../../../include/flang/ieee_exceptions.mod
115.473 [123/4/7218] Generating ../../../../include/flang/cudadevice.mod
115.489 [123/3/7219] Generating ../../../../include/flang/ieee_arithmetic.mod
115.563 [123/2/7220] Generating ../../../../include/flang/iso_fortran_env.mod
116.024 [6/118/7221] Building C object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/complex-reduction.c.o
116.025 [6/117/7222] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/iostat.cpp.o
116.026 [6/116/7223] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/terminator.cpp.o
116.028 [6/115/7224] Building C object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/Float128Math/complex-math.c.o
116.145 [6/114/7225] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/buffer.cpp.o
116.194 [6/113/7226] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/exceptions.cpp.o
FAILED: tools/flang/runtime/CMakeFiles/FortranRuntime.dir/exceptions.cpp.o 
/usr/local/bin/c++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/tools/flang/runtime -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/runtime -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/include -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/tools/flang/include -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/tools/mlir/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-libcxx/build/tools/clang/include -isystem /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/llvm/../clang/include -stdlib=libc++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -fno-lto -O3 -DNDEBUG -std=c++17 -fPIC   -U_GLIBCXX_ASSERTIONS -U_LIBCPP_ENABLE_ASSERTIONS -UNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -MD -MT tools/flang/runtime/CMakeFiles/FortranRuntime.dir/exceptions.cpp.o -MF tools/flang/runtime/CMakeFiles/FortranRuntime.dir/exceptions.cpp.o.d -o tools/flang/runtime/CMakeFiles/FortranRuntime.dir/exceptions.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-libcxx/llvm-project/flang/runtime/exceptions.cpp
../llvm-project/flang/runtime/exceptions.cpp:111:14: error: value size does not match register size specified by the constraint and modifier [-Werror,-Wasm-operand-widths]
  111 |   _FPU_GETCW(fpcr);
      |              ^
../llvm-project/flang/runtime/exceptions.cpp:111:3: note: use constraint modifier "w"
  111 |   _FPU_GETCW(fpcr);
      |   ^
/usr/include/aarch64-linux-gnu/fpu_control.h:33:30: note: expanded from macro '_FPU_GETCW'
   33 |   __asm__ __volatile__ ("mrs    %0, fpcr" : "=r" (fpcr))
      |                                 ^
../llvm-project/flang/runtime/exceptions.cpp:122:14: error: value size does not match register size specified by the constraint and modifier [-Werror,-Wasm-operand-widths]
  122 |   _FPU_GETCW(fpcr);
      |              ^
../llvm-project/flang/runtime/exceptions.cpp:122:3: note: use constraint modifier "w"
  122 |   _FPU_GETCW(fpcr);
      |   ^
/usr/include/aarch64-linux-gnu/fpu_control.h:33:30: note: expanded from macro '_FPU_GETCW'
   33 |   __asm__ __volatile__ ("mrs    %0, fpcr" : "=r" (fpcr))
      |                                 ^
../llvm-project/flang/runtime/exceptions.cpp:128:14: error: value size does not match register size specified by the constraint and modifier [-Werror,-Wasm-operand-widths]
  128 |   _FPU_SETCW(fpcr);
      |              ^
../llvm-project/flang/runtime/exceptions.cpp:128:3: note: use constraint modifier "w"
  128 |   _FPU_SETCW(fpcr);
      |   ^
/usr/include/aarch64-linux-gnu/fpu_control.h:36:36: note: expanded from macro '_FPU_SETCW'
   36 |   __asm__ __volatile__ ("msr    fpcr, %0" : : "r" (fpcr))
      |                                       ^
3 errors generated.
116.246 [6/112/7227] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/main.cpp.o
116.265 [6/111/7228] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/character.cpp.o
116.268 [6/110/7229] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/extrema.cpp.o
116.297 [6/109/7230] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/allocatable.cpp.o
116.360 [6/108/7231] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/format.cpp.o
116.362 [6/107/7232] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/edit-input.cpp.o
116.373 [6/106/7233] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/descriptor.cpp.o
116.401 [6/105/7234] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/edit-output.cpp.o
116.426 [6/104/7235] Building CXX object tools/flang/runtime/CMakeFiles/FortranRuntime.dir/file.cpp.o

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:runtime flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants