Skip to content

Commit ec0f5aa

Browse files
jensmaurertkoeppe
authored andcommitted
P1467R9 Extended floating-point types and standard names
1 parent b834e4e commit ec0f5aa

14 files changed

+656
-436
lines changed

source/back.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@ \chapter{Bibliography}
1313
\doccite{Information technology --- Language independent arithmetic ---
1414
Part 1: Integer and floating point arithmetic}
1515
\item
16-
ISO/IEC/IEEE 60559:2011, \doccite{Information technology ---
17-
Microprocessor Systems --- Floating-Point arithmetic}
16+
ISO/IEC TS 18661-3:2015,
17+
\doccite{Information Technology ---
18+
Programming languages, their environments, and system software interfaces ---
19+
Floating-point extensions for C --- Part 3: Interchange and extended types}
1820
% Other international standards.
1921
\item
2022
%%% Format for the following entry is based on that specified at

source/basic.tex

Lines changed: 150 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4995,15 +4995,23 @@
49954995
The types
49964996
\keyword{float}, \keyword{double}, and \tcode{\keyword{long} \keyword{double}},
49974997
and cv-qualified versions\iref{basic.type.qualifier} thereof,
4998+
are collectively termed
4999+
\defnx{standard floating-point types}{type!floating-point!standard}.
5000+
An implementation may also provide additional types
5001+
that represent floating-point values and define them (and cv-qualified versions thereof) to be
5002+
\defnx{extended floating-point types}{type!floating-point!extended}.
5003+
The standard and extended floating-point types
49985004
are collectively termed \defnx{floating-point types}{type!floating-point}.
4999-
The value
5000-
representation of floating-point types is \impldef{value representation of
5001-
floating-point types}.
5002-
\indextext{floating-point type!implementation-defined}%
50035005
\begin{note}
5004-
This document imposes no requirements on the accuracy of
5005-
floating-point operations; see also~\ref{support.limits}.
5006+
Any additional implementation-specific types representing floating-point values
5007+
that are not defined by the implementation to be extended floating-point types
5008+
are not considered to be floating-point types, and
5009+
this document imposes no requirements on them or
5010+
their interactions with floating-point types.
50065011
\end{note}
5012+
Except as specified in \ref{basic.extended.fp},
5013+
the object and value representations and accuracy of operations
5014+
of floating-point types is \impldef{representation of floating-point types}.
50075015

50085016
\pnum
50095017
Integral and floating-point types are collectively
@@ -5049,6 +5057,90 @@
50495057
same value representation, they are nevertheless different types.
50505058
\end{note}
50515059

5060+
\rSec2[basic.extended.fp]{Optional extended floating-point types}
5061+
5062+
\pnum
5063+
If the implementation supports an extended floating-point type\iref{basic.fundamental}
5064+
whose properties are specified by
5065+
the ISO/IEC/IEEE 60559 floating-point interchange format binary16,
5066+
then the \grammarterm{typedef-name} \tcode{std::float16_t}
5067+
is defined in the header \libheaderref{stdfloat} and names such a type,
5068+
the macro \mname{STDCPP_FLOAT16_T} is defined\iref{cpp.predefined}, and
5069+
the floating-point literal suffixes \tcode{f16} and \tcode{F16}
5070+
are supported\iref{lex.fcon}.
5071+
5072+
\pnum
5073+
If the implementation supports an extended floating-point type
5074+
whose properties are specified by
5075+
the ISO/IEC/IEEE 60559 floating-point interchange format binary32,
5076+
then the \grammarterm{typedef-name} \tcode{std::float32_t}
5077+
is defined in the header \libheader{stdfloat} and names such a type,
5078+
the macro \mname{STDCPP_FLOAT32_T} is defined, and
5079+
the floating-point literal suffixes \tcode{f32} and \tcode{F32} are supported.
5080+
5081+
\pnum
5082+
If the implementation supports an extended floating-point type
5083+
whose properties are specified by
5084+
the ISO/IEC/IEEE 60559 floating-point interchange format binary64,
5085+
then the \grammarterm{typedef-name} \tcode{std::float64_t}
5086+
is defined in the header \libheader{stdfloat} and names such a type,
5087+
the macro \mname{STDCPP_FLOAT64_T} is defined, and
5088+
the floating-point literal suffixes \tcode{f64} and \tcode{F64} are supported.
5089+
5090+
\pnum
5091+
If the implementation supports an extended floating-point type
5092+
whose properties are specified by
5093+
the ISO/IEC/IEEE 60559 floating-point interchange format binary128,
5094+
then the \grammarterm{typedef-name} \tcode{std::float128_t}
5095+
is defined in the header \libheader{stdfloat} and names such a type,
5096+
the macro \mname{STDCPP_FLOAT128_T} is defined, and
5097+
the floating-point literal suffixes \tcode{f128} and \tcode{F128} are supported.
5098+
5099+
\pnum
5100+
If the implementation supports an extended floating-point type
5101+
with the properties, as specified by ISO/IEC/IEEE 60559, of
5102+
radix ($b$) of 2,
5103+
storage width in bits ($k$) of 16,
5104+
precision in bits ($p$) of 8,
5105+
maximum exponent ($emax$) of 127, and
5106+
exponent field width in bits ($w$) of 8, then
5107+
the \grammarterm{typedef-name} \tcode{std::bfloat16_t}
5108+
is defined in the header \libheader{stdfloat} and names such a type,
5109+
the macro \mname{STDCPP_BFLOAT16_T} is defined, and
5110+
the floating-point literal suffixes \tcode{bf16} and \tcode{BF16} are supported.
5111+
5112+
\pnum
5113+
\begin{note}
5114+
A summary of the parameters for each type is given in \tref{basic.extended.fp}.
5115+
The precision $p$ includes the implicit 1 bit at the beginning of the mantissa,
5116+
so the storage used for the mantissa is $p-1$ bits.
5117+
ISO/IEC/IEEE 60559 does not assign a name for a type
5118+
having the parameters specified for \tcode{std::bfloat16_t}.
5119+
\end{note}
5120+
\begin{floattable}
5121+
{Properties of named extended floating-point types}{basic.extended.fp}{llllll}
5122+
\topline
5123+
\lhdr{Parameter} & \chdr{\tcode{float16_t}} & \chdr{\tcode{float32_t}} &
5124+
\chdr{\tcode{float64_t}} & \chdr{\tcode{float128_t}} &
5125+
\rhdr{\tcode{bfloat16_t}} \\
5126+
\capsep
5127+
ISO/IEC/IEEE 60559 name & binary16 & binary32 & binary64 & binary128 & \\
5128+
$k$, storage width in bits & 16 & 32 & 64 & 128 & 16 \\
5129+
$p$, precision in bits & 11 & 24 & 53 & 113 & 8 \\
5130+
$emax$, maximum exponent & 15 & 127 & 1023 & 16383 & 127 \\
5131+
$w$, exponent field width in bits & 5 & 8 & 11 & 15 & 8 \\
5132+
\end{floattable}
5133+
5134+
\pnum
5135+
\recommended
5136+
Any names that the implementation provides for
5137+
the extended floating-point types described in this subsection
5138+
that are in addition to the names defined in the \libheader{stdfloat} header
5139+
should be chosen to increase compatibility and interoperability
5140+
with the interchange types
5141+
\tcode{_Float16}, \tcode{_Float32}, \tcode{_Float64}, and \tcode{_Float128}
5142+
defined in ISO/IEC TS 18661-3 and with future versions of the C standard.
5143+
50525144
\rSec2[basic.compound]{Compound types}
50535145

50545146
\pnum
@@ -5337,7 +5429,7 @@
53375429
has the top-level cv-qualifier \keyword{volatile}.
53385430
\end{example}
53395431

5340-
\rSec2[conv.rank]{Integer conversion rank}%
5432+
\rSec2[conv.rank]{Conversion ranks}%
53415433
\indextext{conversion!integer rank}
53425434

53435435
\pnum
@@ -5394,6 +5486,57 @@
53945486
conversions\iref{expr.arith.conv}.
53955487
\end{note}
53965488

5489+
\pnum
5490+
Every floating-point type has a \defnadj{floating-point}{conversion rank}
5491+
defined as follows:
5492+
\begin{itemize}
5493+
\item
5494+
The rank of a floating point type \tcode{T} is greater than
5495+
the rank of any floating-point type
5496+
whose set of values is a proper subset of the set of values of \tcode{T}.
5497+
\item
5498+
The rank of \tcode{\keyword{long} \keyword{double}} is greater than
5499+
the rank of \keyword{double},
5500+
which is greater than the rank of \keyword{float}.
5501+
\item
5502+
Two extended floating-point types with the same set of values have equal ranks.
5503+
\item
5504+
An extended floating-point type with the same set of values as
5505+
exactly one cv-unqualified standard floating-point type
5506+
has a rank equal to the rank of that standard floating-point type.
5507+
\item
5508+
An extended floating-point type with the same set of values as
5509+
more than one cv-unqualified standard floating-point type
5510+
has a rank equal to the rank of \keyword{double}.
5511+
\end{itemize}
5512+
\begin{note}
5513+
The conversion ranks of floating-point types \tcode{T1} and \tcode{T2}
5514+
are unordered if the set of values of \tcode{T1} is
5515+
neither a subset nor a superset of the set of values of \tcode{T2}.
5516+
This can happen when one type has both a larger range and a lower precision
5517+
than the other.
5518+
\end{note}
5519+
5520+
\pnum
5521+
Floating-point types that have equal floating-point conversion ranks
5522+
are ordered by floating-point conversion subrank.
5523+
The subrank forms a total order among types with equal ranks.
5524+
The types
5525+
\tcode{std::float16_t},
5526+
\tcode{std::float32_t},
5527+
\tcode{std::float64_t}, and
5528+
\tcode{std::float128_t}\iref{stdfloat.syn}
5529+
have a greater conversion subrank than any standard floating-point type
5530+
with equal conversion rank.
5531+
Otherwise, the conversion subrank order is
5532+
\impldef{floating-point conversion subrank}.
5533+
5534+
\pnum
5535+
\begin{note}
5536+
The floating-point conversion rank and subrank are used in
5537+
the definition of the usual arithmetic conversions\iref{expr.arith.conv}.
5538+
\end{note}
5539+
53975540
\rSec1[basic.exec]{Program execution}
53985541

53995542
\rSec2[intro.execution]{Sequential execution}

source/declarations.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6019,8 +6019,10 @@
60196019
\begin{itemize}
60206020
\item from a floating-point type to an integer type, or
60216021

6022-
\item from \tcode{long double} to \tcode{double} or \tcode{float}, or from
6023-
\tcode{double} to \tcode{float}, except where the source is a constant expression and
6022+
\item from a floating-point type \tcode{T} to another floating-point type
6023+
whose floating-point conversion rank is neither greater than nor equal to
6024+
that of \tcode{T},
6025+
except where the source is a constant expression and
60246026
the actual value after conversion
60256027
is within the range of values that can be represented (even if it cannot be represented exactly),
60266028
or

source/expressions.tex

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -927,7 +927,13 @@
927927
\pnum
928928
\indextext{conversion!floating-point}%
929929
A prvalue of floating-point type can be converted to a prvalue of
930-
another floating-point type. If the source value can be exactly
930+
another floating-point type
931+
with a greater or equal conversion rank\iref{conv.rank}.
932+
A prvalue of standard floating-point type can be converted to
933+
a prvalue of another standard floating-point type.
934+
935+
\pnum
936+
If the source value can be exactly
931937
represented in the destination type, the result of the conversion is
932938
that exact representation. If the source value is between two adjacent
933939
destination values, the result of the conversion is an
@@ -1112,24 +1118,36 @@
11121118
\item If either operand is of scoped enumeration type\iref{dcl.enum}, no conversions
11131119
are performed; if the other operand does not have the same type, the expression is
11141120
ill-formed.
1115-
1116-
\item If either operand is of type \tcode{\keyword{long} \keyword{double}}, the
1117-
other shall be converted to \tcode{\keyword{long} \keyword{double}}.
1118-
1119-
\item Otherwise, if either operand is \keyword{double}, the other shall be
1120-
converted to \keyword{double}.
1121-
1122-
\item Otherwise, if either operand is \keyword{float}, the other shall be
1123-
converted to \keyword{float}.
1124-
1125-
\item Otherwise, the integral promotions\iref{conv.prom} shall be
1121+
\item Otherwise, if either operand is of floating-point type,
1122+
the following rules are applied:
1123+
\begin{itemize}
1124+
\item
1125+
If both operands have the same type, no further conversion is needed.
1126+
\item
1127+
Otherwise, if one of the operands is of a non-floating-point type,
1128+
that operand is converted to the type of
1129+
the operand with the floating-point type.
1130+
\item
1131+
Otherwise, if the floating-point conversion ranks\iref{conv.rank} of
1132+
the types of the operands are ordered but not equal,
1133+
then the operand of the type with the lesser floating-point conversion rank
1134+
is converted to the type of the other operand.
1135+
\item
1136+
Otherwise, if the floating-point conversion ranks of the types of
1137+
the operands are equal,
1138+
then the operand with the lesser floating-point conversion subrank\iref{conv.rank}
1139+
is converted to the type of the other operand.
1140+
\item
1141+
Otherwise, the expression is ill-formed.
1142+
\end{itemize}
1143+
\item Otherwise, the integral promotions\iref{conv.prom} are
11261144
performed on both operands.
11271145
\begin{footnote}
11281146
As a consequence, operands of type \keyword{bool}, \keyword{char8_t}, \keyword{char16_t},
11291147
\keyword{char32_t}, \keyword{wchar_t}, or an enumerated type are converted
11301148
to some integral type.
11311149
\end{footnote}
1132-
Then the following rules shall be applied to the promoted operands:
1150+
Then the following rules are applied to the promoted operands:
11331151

11341152
\begin{itemize}
11351153

@@ -1138,20 +1156,20 @@
11381156

11391157
\item Otherwise, if both operands have signed integer types or both have
11401158
unsigned integer types, the operand with the type of lesser integer
1141-
conversion rank shall be converted to the type of the operand with
1159+
conversion rank is converted to the type of the operand with
11421160
greater rank.
11431161

11441162
\item Otherwise, if the operand that has unsigned integer type has rank
11451163
greater than or equal to the rank of the type of the other operand, the
1146-
operand with signed integer type shall be converted to the type of the
1164+
operand with signed integer type is converted to the type of the
11471165
operand with unsigned integer type.
11481166

11491167
\item Otherwise, if the type of the operand with signed integer type can
11501168
represent all of the values of the type of the operand with unsigned
1151-
integer type, the operand with unsigned integer type shall be converted
1169+
integer type, the operand with unsigned integer type is converted
11521170
to the type of the operand with signed integer type.
11531171

1154-
\item Otherwise, both operands shall be converted to the unsigned
1172+
\item Otherwise, both operands are converted to the unsigned
11551173
integer type corresponding to the type of the operand with signed
11561174
integer type.
11571175
\end{itemize}
@@ -4046,6 +4064,17 @@
40464064
underlying type of the enumeration\iref{conv.fpint}, and subsequently to
40474065
the enumeration type.
40484066

4067+
\pnum
4068+
A prvalue of floating-point type can be explicitly converted to
4069+
any other floating-point type.
4070+
If the source value can be exactly represented in the destination type,
4071+
the result of the conversion has that exact representation.
4072+
If the source value is between two adjacent destination values,
4073+
the result of the conversion is
4074+
an \impldef{result of inexact floating-point conversion} choice of
4075+
either of those values.
4076+
Otherwise, the behavior is undefined.
4077+
40494078
\pnum
40504079
\indextext{cast!base class}%
40514080
\indextext{cast!derived class}%

source/intro.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
For undated references, the latest edition of the referenced document
3434
(including any amendments) applies.
3535
\begin{itemize}
36+
% ISO documents in numerical order.
3637
\item ISO/IEC 2382, \doccite{Information technology --- Vocabulary}
3738
\item ISO 8601:2004, \doccite{Data elements and interchange formats ---
3839
Information interchange --- Representation of dates and times}
@@ -58,9 +59,12 @@
5859
\end{footnote}
5960
\doccite{Information technology ---
6061
Universal Multiple-Octet Coded Character Set (UCS)}
62+
\item ISO/IEC/IEEE 60559:2020, \doccite{Information technology ---
63+
Microprocessor Systems --- Floating-Point arithmetic}
6164
\item ISO 80000-2:2009, \doccite{Quantities and units ---
6265
Part 2: Mathematical signs and symbols
6366
to be used in the natural sciences and technology}
67+
% Other international standards.
6468
\item Ecma International, \doccite{ECMAScript
6569
\begin{footnote}
6670
ECMAScript\textregistered\ is a registered trademark of Ecma

0 commit comments

Comments
 (0)