Skip to content

Commit e26d26e

Browse files
authored
Merge 2022-07 CWG Motion 14
P1467R9 Extended floating-point types and standard names
2 parents 63a1dd7 + 9aa76a2 commit e26d26e

15 files changed

+659
-526
lines changed

source/back.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@ \chapter{Bibliography}
1313
\doccite{Information technology --- Language independent arithmetic ---
1414
Part 1: Integer and floating point arithmetic}
1515
\item
16-
ISO/IEC/IEEE 60559:2011, \doccite{Information technology ---
17-
Microprocessor Systems --- Floating-Point arithmetic}
16+
ISO/IEC TS 18661-3:2015,
17+
\doccite{Information Technology ---
18+
Programming languages, their environments, and system software interfaces ---
19+
Floating-point extensions for C --- Part 3: Interchange and extended types}
1820
% Other international standards.
1921
\item
2022
%%% Format for the following entry is based on that specified at

source/basic.tex

Lines changed: 150 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4995,15 +4995,23 @@
49954995
The types
49964996
\keyword{float}, \keyword{double}, and \tcode{\keyword{long} \keyword{double}},
49974997
and cv-qualified versions\iref{basic.type.qualifier} thereof,
4998+
are collectively termed
4999+
\defnx{standard floating-point types}{type!floating-point!standard}.
5000+
An implementation may also provide additional types
5001+
that represent floating-point values and define them (and cv-qualified versions thereof) to be
5002+
\defnx{extended floating-point types}{type!floating-point!extended}.
5003+
The standard and extended floating-point types
49985004
are collectively termed \defnx{floating-point types}{type!floating-point}.
4999-
The value
5000-
representation of floating-point types is \impldef{value representation of
5001-
floating-point types}.
5002-
\indextext{floating-point type!implementation-defined}%
50035005
\begin{note}
5004-
This document imposes no requirements on the accuracy of
5005-
floating-point operations; see also~\ref{support.limits}.
5006+
Any additional implementation-specific types representing floating-point values
5007+
that are not defined by the implementation to be extended floating-point types
5008+
are not considered to be floating-point types, and
5009+
this document imposes no requirements on them or
5010+
their interactions with floating-point types.
50065011
\end{note}
5012+
Except as specified in \ref{basic.extended.fp},
5013+
the object and value representations and accuracy of operations
5014+
of floating-point types is \impldef{representation of floating-point types}.
50075015

50085016
\pnum
50095017
Integral and floating-point types are collectively
@@ -5049,6 +5057,90 @@
50495057
same value representation, they are nevertheless different types.
50505058
\end{note}
50515059

5060+
\rSec2[basic.extended.fp]{Optional extended floating-point types}
5061+
5062+
\pnum
5063+
If the implementation supports an extended floating-point type\iref{basic.fundamental}
5064+
whose properties are specified by
5065+
the ISO/IEC/IEEE 60559 floating-point interchange format binary16,
5066+
then the \grammarterm{typedef-name} \tcode{std::float16_t}
5067+
is defined in the header \libheaderref{stdfloat} and names such a type,
5068+
the macro \mname{STDCPP_FLOAT16_T} is defined\iref{cpp.predefined}, and
5069+
the floating-point literal suffixes \tcode{f16} and \tcode{F16}
5070+
are supported\iref{lex.fcon}.
5071+
5072+
\pnum
5073+
If the implementation supports an extended floating-point type
5074+
whose properties are specified by
5075+
the ISO/IEC/IEEE 60559 floating-point interchange format binary32,
5076+
then the \grammarterm{typedef-name} \tcode{std::float32_t}
5077+
is defined in the header \libheader{stdfloat} and names such a type,
5078+
the macro \mname{STDCPP_FLOAT32_T} is defined, and
5079+
the floating-point literal suffixes \tcode{f32} and \tcode{F32} are supported.
5080+
5081+
\pnum
5082+
If the implementation supports an extended floating-point type
5083+
whose properties are specified by
5084+
the ISO/IEC/IEEE 60559 floating-point interchange format binary64,
5085+
then the \grammarterm{typedef-name} \tcode{std::float64_t}
5086+
is defined in the header \libheader{stdfloat} and names such a type,
5087+
the macro \mname{STDCPP_FLOAT64_T} is defined, and
5088+
the floating-point literal suffixes \tcode{f64} and \tcode{F64} are supported.
5089+
5090+
\pnum
5091+
If the implementation supports an extended floating-point type
5092+
whose properties are specified by
5093+
the ISO/IEC/IEEE 60559 floating-point interchange format binary128,
5094+
then the \grammarterm{typedef-name} \tcode{std::float128_t}
5095+
is defined in the header \libheader{stdfloat} and names such a type,
5096+
the macro \mname{STDCPP_FLOAT128_T} is defined, and
5097+
the floating-point literal suffixes \tcode{f128} and \tcode{F128} are supported.
5098+
5099+
\pnum
5100+
If the implementation supports an extended floating-point type
5101+
with the properties, as specified by ISO/IEC/IEEE 60559, of
5102+
radix ($b$) of 2,
5103+
storage width in bits ($k$) of 16,
5104+
precision in bits ($p$) of 8,
5105+
maximum exponent ($emax$) of 127, and
5106+
exponent field width in bits ($w$) of 8, then
5107+
the \grammarterm{typedef-name} \tcode{std::bfloat16_t}
5108+
is defined in the header \libheader{stdfloat} and names such a type,
5109+
the macro \mname{STDCPP_BFLOAT16_T} is defined, and
5110+
the floating-point literal suffixes \tcode{bf16} and \tcode{BF16} are supported.
5111+
5112+
\pnum
5113+
\begin{note}
5114+
A summary of the parameters for each type is given in \tref{basic.extended.fp}.
5115+
The precision $p$ includes the implicit 1 bit at the beginning of the mantissa,
5116+
so the storage used for the mantissa is $p-1$ bits.
5117+
ISO/IEC/IEEE 60559 does not assign a name for a type
5118+
having the parameters specified for \tcode{std::bfloat16_t}.
5119+
\end{note}
5120+
\begin{floattable}
5121+
{Properties of named extended floating-point types}{basic.extended.fp}{llllll}
5122+
\topline
5123+
\lhdr{Parameter} & \chdr{\tcode{float16_t}} & \chdr{\tcode{float32_t}} &
5124+
\chdr{\tcode{float64_t}} & \chdr{\tcode{float128_t}} &
5125+
\rhdr{\tcode{bfloat16_t}} \\
5126+
\capsep
5127+
ISO/IEC/IEEE 60559 name & binary16 & binary32 & binary64 & binary128 & \\
5128+
$k$, storage width in bits & 16 & 32 & 64 & 128 & 16 \\
5129+
$p$, precision in bits & 11 & 24 & 53 & 113 & 8 \\
5130+
$emax$, maximum exponent & 15 & 127 & 1023 & 16383 & 127 \\
5131+
$w$, exponent field width in bits & 5 & 8 & 11 & 15 & 8 \\
5132+
\end{floattable}
5133+
5134+
\pnum
5135+
\recommended
5136+
Any names that the implementation provides for
5137+
the extended floating-point types described in this subsection
5138+
that are in addition to the names defined in the \libheader{stdfloat} header
5139+
should be chosen to increase compatibility and interoperability
5140+
with the interchange types
5141+
\tcode{_Float16}, \tcode{_Float32}, \tcode{_Float64}, and \tcode{_Float128}
5142+
defined in ISO/IEC TS 18661-3 and with future versions of the C standard.
5143+
50525144
\rSec2[basic.compound]{Compound types}
50535145

50545146
\pnum
@@ -5337,7 +5429,7 @@
53375429
has the top-level cv-qualifier \keyword{volatile}.
53385430
\end{example}
53395431

5340-
\rSec2[conv.rank]{Integer conversion rank}%
5432+
\rSec2[conv.rank]{Conversion ranks}%
53415433
\indextext{conversion!integer rank}
53425434

53435435
\pnum
@@ -5394,6 +5486,57 @@
53945486
conversions\iref{expr.arith.conv}.
53955487
\end{note}
53965488

5489+
\pnum
5490+
Every floating-point type has a \defnadj{floating-point}{conversion rank}
5491+
defined as follows:
5492+
\begin{itemize}
5493+
\item
5494+
The rank of a floating point type \tcode{T} is greater than
5495+
the rank of any floating-point type
5496+
whose set of values is a proper subset of the set of values of \tcode{T}.
5497+
\item
5498+
The rank of \tcode{\keyword{long} \keyword{double}} is greater than
5499+
the rank of \keyword{double},
5500+
which is greater than the rank of \keyword{float}.
5501+
\item
5502+
Two extended floating-point types with the same set of values have equal ranks.
5503+
\item
5504+
An extended floating-point type with the same set of values as
5505+
exactly one cv-unqualified standard floating-point type
5506+
has a rank equal to the rank of that standard floating-point type.
5507+
\item
5508+
An extended floating-point type with the same set of values as
5509+
more than one cv-unqualified standard floating-point type
5510+
has a rank equal to the rank of \keyword{double}.
5511+
\end{itemize}
5512+
\begin{note}
5513+
The conversion ranks of floating-point types \tcode{T1} and \tcode{T2}
5514+
are unordered if the set of values of \tcode{T1} is
5515+
neither a subset nor a superset of the set of values of \tcode{T2}.
5516+
This can happen when one type has both a larger range and a lower precision
5517+
than the other.
5518+
\end{note}
5519+
5520+
\pnum
5521+
Floating-point types that have equal floating-point conversion ranks
5522+
are ordered by floating-point conversion subrank.
5523+
The subrank forms a total order among types with equal ranks.
5524+
The types
5525+
\tcode{std::float16_t},
5526+
\tcode{std::float32_t},
5527+
\tcode{std::float64_t}, and
5528+
\tcode{std::float128_t}\iref{stdfloat.syn}
5529+
have a greater conversion subrank than any standard floating-point type
5530+
with equal conversion rank.
5531+
Otherwise, the conversion subrank order is
5532+
\impldef{floating-point conversion subrank}.
5533+
5534+
\pnum
5535+
\begin{note}
5536+
The floating-point conversion rank and subrank are used in
5537+
the definition of the usual arithmetic conversions\iref{expr.arith.conv}.
5538+
\end{note}
5539+
53975540
\rSec1[basic.exec]{Program execution}
53985541

53995542
\rSec2[intro.execution]{Sequential execution}

source/declarations.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5948,8 +5948,10 @@
59485948
\begin{itemize}
59495949
\item from a floating-point type to an integer type, or
59505950

5951-
\item from \tcode{long double} to \tcode{double} or \tcode{float}, or from
5952-
\tcode{double} to \tcode{float}, except where the source is a constant expression and
5951+
\item from a floating-point type \tcode{T} to another floating-point type
5952+
whose floating-point conversion rank is neither greater than nor equal to
5953+
that of \tcode{T},
5954+
except where the source is a constant expression and
59535955
the actual value after conversion
59545956
is within the range of values that can be represented (even if it cannot be represented exactly),
59555957
or

source/expressions.tex

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -929,7 +929,13 @@
929929
\pnum
930930
\indextext{conversion!floating-point}%
931931
A prvalue of floating-point type can be converted to a prvalue of
932-
another floating-point type. If the source value can be exactly
932+
another floating-point type
933+
with a greater or equal conversion rank\iref{conv.rank}.
934+
A prvalue of standard floating-point type can be converted to
935+
a prvalue of another standard floating-point type.
936+
937+
\pnum
938+
If the source value can be exactly
933939
represented in the destination type, the result of the conversion is
934940
that exact representation. If the source value is between two adjacent
935941
destination values, the result of the conversion is an
@@ -1114,24 +1120,36 @@
11141120
\item If either operand is of scoped enumeration type\iref{dcl.enum}, no conversions
11151121
are performed; if the other operand does not have the same type, the expression is
11161122
ill-formed.
1117-
1118-
\item If either operand is of type \tcode{\keyword{long} \keyword{double}}, the
1119-
other shall be converted to \tcode{\keyword{long} \keyword{double}}.
1120-
1121-
\item Otherwise, if either operand is \keyword{double}, the other shall be
1122-
converted to \keyword{double}.
1123-
1124-
\item Otherwise, if either operand is \keyword{float}, the other shall be
1125-
converted to \keyword{float}.
1126-
1127-
\item Otherwise, the integral promotions\iref{conv.prom} shall be
1123+
\item Otherwise, if either operand is of floating-point type,
1124+
the following rules are applied:
1125+
\begin{itemize}
1126+
\item
1127+
If both operands have the same type, no further conversion is needed.
1128+
\item
1129+
Otherwise, if one of the operands is of a non-floating-point type,
1130+
that operand is converted to the type of
1131+
the operand with the floating-point type.
1132+
\item
1133+
Otherwise, if the floating-point conversion ranks\iref{conv.rank} of
1134+
the types of the operands are ordered but not equal,
1135+
then the operand of the type with the lesser floating-point conversion rank
1136+
is converted to the type of the other operand.
1137+
\item
1138+
Otherwise, if the floating-point conversion ranks of the types of
1139+
the operands are equal,
1140+
then the operand with the lesser floating-point conversion subrank\iref{conv.rank}
1141+
is converted to the type of the other operand.
1142+
\item
1143+
Otherwise, the expression is ill-formed.
1144+
\end{itemize}
1145+
\item Otherwise, the integral promotions\iref{conv.prom} are
11281146
performed on both operands.
11291147
\begin{footnote}
11301148
As a consequence, operands of type \keyword{bool}, \keyword{char8_t}, \keyword{char16_t},
11311149
\keyword{char32_t}, \keyword{wchar_t}, or an enumerated type are converted
11321150
to some integral type.
11331151
\end{footnote}
1134-
Then the following rules shall be applied to the promoted operands:
1152+
Then the following rules are applied to the promoted operands:
11351153

11361154
\begin{itemize}
11371155

@@ -1140,20 +1158,20 @@
11401158

11411159
\item Otherwise, if both operands have signed integer types or both have
11421160
unsigned integer types, the operand with the type of lesser integer
1143-
conversion rank shall be converted to the type of the operand with
1161+
conversion rank is converted to the type of the operand with
11441162
greater rank.
11451163

11461164
\item Otherwise, if the operand that has unsigned integer type has rank
11471165
greater than or equal to the rank of the type of the other operand, the
1148-
operand with signed integer type shall be converted to the type of the
1166+
operand with signed integer type is converted to the type of the
11491167
operand with unsigned integer type.
11501168

11511169
\item Otherwise, if the type of the operand with signed integer type can
11521170
represent all of the values of the type of the operand with unsigned
1153-
integer type, the operand with unsigned integer type shall be converted
1171+
integer type, the operand with unsigned integer type is converted
11541172
to the type of the operand with signed integer type.
11551173

1156-
\item Otherwise, both operands shall be converted to the unsigned
1174+
\item Otherwise, both operands are converted to the unsigned
11571175
integer type corresponding to the type of the operand with signed
11581176
integer type.
11591177
\end{itemize}
@@ -4096,6 +4114,17 @@
40964114
underlying type of the enumeration\iref{conv.fpint}, and subsequently to
40974115
the enumeration type.
40984116

4117+
\pnum
4118+
A prvalue of floating-point type can be explicitly converted to
4119+
any other floating-point type.
4120+
If the source value can be exactly represented in the destination type,
4121+
the result of the conversion has that exact representation.
4122+
If the source value is between two adjacent destination values,
4123+
the result of the conversion is
4124+
an \impldef{result of inexact floating-point conversion} choice of
4125+
either of those values.
4126+
Otherwise, the behavior is undefined.
4127+
40994128
\pnum
41004129
\indextext{cast!base class}%
41014130
\indextext{cast!derived class}%

source/intro.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
For undated references, the latest edition of the referenced document
3434
(including any amendments) applies.
3535
\begin{itemize}
36+
% ISO documents in numerical order.
3637
\item ISO/IEC 2382, \doccite{Information technology --- Vocabulary}
3738
\item ISO 8601:2004, \doccite{Data elements and interchange formats ---
3839
Information interchange --- Representation of dates and times}
@@ -58,9 +59,12 @@
5859
\end{footnote}
5960
\doccite{Information technology ---
6061
Universal Multiple-Octet Coded Character Set (UCS)}
62+
\item ISO/IEC/IEEE 60559:2020, \doccite{Information technology ---
63+
Microprocessor Systems --- Floating-Point arithmetic}
6164
\item ISO 80000-2:2009, \doccite{Quantities and units ---
6265
Part 2: Mathematical signs and symbols
6366
to be used in the natural sciences and technology}
67+
% Other international standards.
6468
\item Ecma International, \doccite{ECMAScript
6569
\begin{footnote}
6670
ECMAScript\textregistered\ is a registered trademark of Ecma

0 commit comments

Comments
 (0)