Skip to content

Commit 1b205b3

Browse files
committed
P1467R9 Extended floating-point types and standard names
1 parent d59a4f3 commit 1b205b3

14 files changed

+656
-436
lines changed

source/back.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@ \chapter{Bibliography}
1313
\doccite{Information technology --- Language independent arithmetic ---
1414
Part 1: Integer and floating point arithmetic}
1515
\item
16-
ISO/IEC/IEEE 60559:2011, \doccite{Information technology ---
17-
Microprocessor Systems --- Floating-Point arithmetic}
16+
ISO/IEC TS 18661-3:2015,
17+
\doccite{Information Technology ---
18+
Programming languages, their environments, and system software interfaces ---
19+
Floating-point extensions for C --- Part 3: Interchange and extended types}
1820
% Other international standards.
1921
\item
2022
%%% Format for the following entry is based on that specified at

source/basic.tex

Lines changed: 150 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4993,15 +4993,23 @@
49934993
The types
49944994
\keyword{float}, \keyword{double}, and \tcode{\keyword{long} \keyword{double}},
49954995
and cv-qualified versions\iref{basic.type.qualifier} thereof,
4996+
are collectively termed
4997+
\defnx{standard floating-point types}{type!floating-point!standard}.
4998+
An implementation may also provide additional types
4999+
that represent floating-point values and define them (and cv-qualified versions thereof) to be
5000+
\defnx{extended floating-point types}{type!floating-point!extended}.
5001+
The standard and extended floating-point types
49965002
are collectively termed \defnx{floating-point types}{type!floating-point}.
4997-
The value
4998-
representation of floating-point types is \impldef{value representation of
4999-
floating-point types}.
5000-
\indextext{floating-point type!implementation-defined}%
50015003
\begin{note}
5002-
This document imposes no requirements on the accuracy of
5003-
floating-point operations; see also~\ref{support.limits}.
5004+
Any additional implementation-specific types representing floating-point values
5005+
that are not defined by the implementation to be extended floating-point types
5006+
are not considered to be floating-point types, and
5007+
this document imposes no requirements on them or
5008+
their interactions with floating-point types.
50045009
\end{note}
5010+
Except as specified in \ref{basic.extended.fp},
5011+
the object and value representations and accuracy of operations
5012+
of floating-point types is \impldef{representation of floating-point types}.
50055013

50065014
\pnum
50075015
Integral and floating-point types are collectively
@@ -5047,6 +5055,90 @@
50475055
same value representation, they are nevertheless different types.
50485056
\end{note}
50495057

5058+
\rSec2[basic.extended.fp]{Optional extended floating-point types}
5059+
5060+
\pnum
5061+
If the implementation supports an extended floating-point type\iref{basic.fundamental}
5062+
whose properties are specified by
5063+
the ISO/IEC/IEEE 60559 floating-point interchange format binary16,
5064+
then the \grammarterm{typedef-name} \tcode{std::float16_t}
5065+
is defined in the header \libheaderref{stdfloat} and names such a type,
5066+
the macro \mname{STDCPP_FLOAT16_T} is defined\iref{cpp.predefined}, and
5067+
the floating-point literal suffixes \tcode{f16} and \tcode{F16}
5068+
are supported\iref{lex.fcon}.
5069+
5070+
\pnum
5071+
If the implementation supports an extended floating-point type
5072+
whose properties are specified by
5073+
the ISO/IEC/IEEE 60559 floating-point interchange format binary32,
5074+
then the \grammarterm{typedef-name} \tcode{std::float32_t}
5075+
is defined in the header \libheader{stdfloat} and names such a type,
5076+
the macro \mname{STDCPP_FLOAT32_T} is defined, and
5077+
the floating-point literal suffixes \tcode{f32} and \tcode{F32} are supported.
5078+
5079+
\pnum
5080+
If the implementation supports an extended floating-point type
5081+
whose properties are specified by
5082+
the ISO/IEC/IEEE 60559 floating-point interchange format binary64,
5083+
then the \grammarterm{typedef-name} \tcode{std::float64_t}
5084+
is defined in the header \libheader{stdfloat} and names such a type,
5085+
the macro \mname{STDCPP_FLOAT64_T} is defined, and
5086+
the floating-point literal suffixes \tcode{f64} and \tcode{F64} are supported.
5087+
5088+
\pnum
5089+
If the implementation supports an extended floating-point type
5090+
whose properties are specified by
5091+
the ISO/IEC/IEEE 60559 floating-point interchange format binary128,
5092+
then the \grammarterm{typedef-name} \tcode{std::float128_t}
5093+
is defined in the header \libheader{stdfloat} and names such a type,
5094+
the macro \mname{STDCPP_FLOAT128_T} is defined, and
5095+
the floating-point literal suffixes \tcode{f128} and \tcode{F128} are supported.
5096+
5097+
\pnum
5098+
If the implementation supports an extended floating-point type
5099+
with the properties, as specified by ISO/IEC/IEEE 60559, of
5100+
radix ($b$) of 2,
5101+
storage width in bits ($k$) of 16,
5102+
precision in bits ($p$) of 8,
5103+
maximum exponent ($emax$) of 127, and
5104+
exponent field width in bits ($w$) of 8, then
5105+
the \grammarterm{typedef-name} \tcode{std::bfloat16_t}
5106+
is defined in the header \libheader{stdfloat} and names such a type,
5107+
the macro \mname{STDCPP_BFLOAT16_T} is defined, and
5108+
the floating-point literal suffixes \tcode{bf16} and \tcode{BF16} are supported.
5109+
5110+
\pnum
5111+
\begin{note}
5112+
A summary of the parameters for each type is given in \tref{basic.extended.fp}.
5113+
The precision $p$ includes the implicit 1 bit at the beginning of the mantissa,
5114+
so the storage used for the mantissa is $p-1$ bits.
5115+
ISO/IEC/IEEE 60559 does not assign a name for a type
5116+
having the parameters specified for \tcode{std::bfloat16_t}.
5117+
\end{note}
5118+
\begin{floattable}
5119+
{Properties of named extended floating-point types}{basic.extended.fp}{llllll}
5120+
\topline
5121+
\lhdr{Parameter} & \chdr{\tcode{float16_t}} & \chdr{\tcode{float32_t}} &
5122+
\chdr{\tcode{float64_t}} & \chdr{\tcode{float128_t}} &
5123+
\rhdr{\tcode{bfloat16_t}} \\
5124+
\capsep
5125+
ISO/IEC/IEEE 60559 name & binary16 & binary32 & binary64 & binary128 & \\
5126+
$k$, storage width in bits & 16 & 32 & 64 & 128 & 16 \\
5127+
$p$, precision in bits & 11 & 24 & 53 & 113 & 8 \\
5128+
$emax$, maximum exponent & 15 & 127 & 1023 & 16383 & 127 \\
5129+
$w$, exponent field width in bits & 5 & 8 & 11 & 15 & 8 \\
5130+
\end{floattable}
5131+
5132+
\pnum
5133+
\recommended
5134+
Any names that the implementation provides for
5135+
the extended floating-point types described in this subsection
5136+
that are in addition to the names defined in the \libheader{stdfloat} header
5137+
should be chosen to increase compatibility and interoperability
5138+
with the interchange types
5139+
\tcode{_Float16}, \tcode{_Float32}, \tcode{_Float64}, and \tcode{_Float128}
5140+
defined in ISO/IEC TS 18661-3 and with future versions of the C standard.
5141+
50505142
\rSec2[basic.compound]{Compound types}
50515143

50525144
\pnum
@@ -5335,7 +5427,7 @@
53355427
has the top-level cv-qualifier \keyword{volatile}.
53365428
\end{example}
53375429

5338-
\rSec2[conv.rank]{Integer conversion rank}%
5430+
\rSec2[conv.rank]{Conversion ranks}%
53395431
\indextext{conversion!integer rank}
53405432

53415433
\pnum
@@ -5392,6 +5484,57 @@
53925484
conversions\iref{expr.arith.conv}.
53935485
\end{note}
53945486

5487+
\pnum
5488+
Every floating-point type has a \defnadj{floating-point}{conversion rank}
5489+
defined as follows:
5490+
\begin{itemize}
5491+
\item
5492+
The rank of a floating point type \tcode{T} is greater than
5493+
the rank of any floating-point type
5494+
whose set of values is a proper subset of the set of values of \tcode{T}.
5495+
\item
5496+
The rank of \tcode{\keyword{long} \keyword{double}} is greater than
5497+
the rank of \keyword{double},
5498+
which is greater than the rank of \keyword{float}.
5499+
\item
5500+
Two extended floating-point types with the same set of values have equal ranks.
5501+
\item
5502+
An extended floating-point type with the same set of values as
5503+
exactly one cv-unqualified standard floating-point type
5504+
has a rank equal to the rank of that standard floating-point type.
5505+
\item
5506+
An extended floating-point type with the same set of values as
5507+
more than one cv-unqualified standard floating-point type
5508+
has a rank equal to the rank of \keyword{double}.
5509+
\end{itemize}
5510+
\begin{note}
5511+
The conversion ranks of floating-point types \tcode{T1} and \tcode{T2}
5512+
are unordered if the set of values of \tcode{T1} is
5513+
neither a subset nor a superset of the set of values of \tcode{T2}.
5514+
This can happen when one type has both a larger range and a lower precision
5515+
than the other.
5516+
\end{note}
5517+
5518+
\pnum
5519+
Floating-point types that have equal floating-point conversion ranks
5520+
are ordered by floating-point conversion subrank.
5521+
The subrank forms a total order among types with equal ranks.
5522+
The types
5523+
\tcode{std::float16_t},
5524+
\tcode{std::float32_t},
5525+
\tcode{std::float64_t}, and
5526+
\tcode{std::float128_t}\iref{stdfloat.syn}
5527+
have a greater conversion subrank than any standard floating-point type
5528+
with equal conversion rank.
5529+
Otherwise, the conversion subrank order is
5530+
\impldef{floating-point conversion subrank}.
5531+
5532+
\pnum
5533+
\begin{note}
5534+
The floating-point conversion rank and subrank are used in
5535+
the definition of the usual arithmetic conversions\iref{expr.arith.conv}.
5536+
\end{note}
5537+
53955538
\rSec1[basic.exec]{Program execution}
53965539

53975540
\rSec2[intro.execution]{Sequential execution}

source/declarations.tex

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6019,8 +6019,10 @@
60196019
\begin{itemize}
60206020
\item from a floating-point type to an integer type, or
60216021

6022-
\item from \tcode{long double} to \tcode{double} or \tcode{float}, or from
6023-
\tcode{double} to \tcode{float}, except where the source is a constant expression and
6022+
\item from a floating-point type \tcode{T} to another floating-point type
6023+
whose floating-point conversion rank is neither greater than nor equal to
6024+
that of \tcode{T},
6025+
except where the source is a constant expression and
60246026
the actual value after conversion
60256027
is within the range of values that can be represented (even if it cannot be represented exactly),
60266028
or

source/expressions.tex

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -927,7 +927,13 @@
927927
\pnum
928928
\indextext{conversion!floating-point}%
929929
A prvalue of floating-point type can be converted to a prvalue of
930-
another floating-point type. If the source value can be exactly
930+
another floating-point type
931+
with a greater or equal conversion rank\iref{conv.rank}.
932+
A prvalue of standard floating-point type can be converted to
933+
a prvalue of another standard floating-point type.
934+
935+
\pnum
936+
If the source value can be exactly
931937
represented in the destination type, the result of the conversion is
932938
that exact representation. If the source value is between two adjacent
933939
destination values, the result of the conversion is an
@@ -1112,24 +1118,36 @@
11121118
\item If either operand is of scoped enumeration type\iref{dcl.enum}, no conversions
11131119
are performed; if the other operand does not have the same type, the expression is
11141120
ill-formed.
1115-
1116-
\item If either operand is of type \tcode{\keyword{long} \keyword{double}}, the
1117-
other shall be converted to \tcode{\keyword{long} \keyword{double}}.
1118-
1119-
\item Otherwise, if either operand is \keyword{double}, the other shall be
1120-
converted to \keyword{double}.
1121-
1122-
\item Otherwise, if either operand is \keyword{float}, the other shall be
1123-
converted to \keyword{float}.
1124-
1125-
\item Otherwise, the integral promotions\iref{conv.prom} shall be
1121+
\item Otherwise, if either operand is of floating-point type,
1122+
the following rules are applied:
1123+
\begin{itemize}
1124+
\item
1125+
If both operands have the same type, no further conversion is needed.
1126+
\item
1127+
Otherwise, if one of the operands is of a non-floating-point type,
1128+
that operand is converted to the type of
1129+
the operand with the floating-point type.
1130+
\item
1131+
Otherwise, if the floating-point conversion ranks\iref{conv.rank} of
1132+
the types of the operands are ordered but not equal,
1133+
then the operand of the type with the lesser floating-point conversion rank
1134+
is converted to the type of the other operand.
1135+
\item
1136+
Otherwise, if the floating-point conversion ranks of the types of
1137+
the operands are equal,
1138+
then the operand with the lesser floating-point conversion subrank\iref{conv.rank}
1139+
is converted to the type of the other operand.
1140+
\item
1141+
Otherwise, the expression is ill-formed.
1142+
\end{itemize}
1143+
\item Otherwise, the integral promotions\iref{conv.prom} are
11261144
performed on both operands.
11271145
\begin{footnote}
11281146
As a consequence, operands of type \keyword{bool}, \keyword{char8_t}, \keyword{char16_t},
11291147
\keyword{char32_t}, \keyword{wchar_t}, or an enumerated type are converted
11301148
to some integral type.
11311149
\end{footnote}
1132-
Then the following rules shall be applied to the promoted operands:
1150+
Then the following rules are applied to the promoted operands:
11331151

11341152
\begin{itemize}
11351153

@@ -1138,20 +1156,20 @@
11381156

11391157
\item Otherwise, if both operands have signed integer types or both have
11401158
unsigned integer types, the operand with the type of lesser integer
1141-
conversion rank shall be converted to the type of the operand with
1159+
conversion rank is converted to the type of the operand with
11421160
greater rank.
11431161

11441162
\item Otherwise, if the operand that has unsigned integer type has rank
11451163
greater than or equal to the rank of the type of the other operand, the
1146-
operand with signed integer type shall be converted to the type of the
1164+
operand with signed integer type is converted to the type of the
11471165
operand with unsigned integer type.
11481166

11491167
\item Otherwise, if the type of the operand with signed integer type can
11501168
represent all of the values of the type of the operand with unsigned
1151-
integer type, the operand with unsigned integer type shall be converted
1169+
integer type, the operand with unsigned integer type is converted
11521170
to the type of the operand with signed integer type.
11531171

1154-
\item Otherwise, both operands shall be converted to the unsigned
1172+
\item Otherwise, both operands are converted to the unsigned
11551173
integer type corresponding to the type of the operand with signed
11561174
integer type.
11571175
\end{itemize}
@@ -4020,6 +4038,17 @@
40204038
underlying type of the enumeration\iref{conv.fpint}, and subsequently to
40214039
the enumeration type.
40224040

4041+
\pnum
4042+
A prvalue of floating-point type can be explicitly converted to
4043+
any other floating-point type.
4044+
If the source value can be exactly represented in the destination type,
4045+
the result of the conversion has that exact representation.
4046+
If the source value is between two adjacent destination values,
4047+
the result of the conversion is
4048+
an \impldef{result of inexact floating-point conversion} choice of
4049+
either of those values.
4050+
Otherwise, the behavior is undefined.
4051+
40234052
\pnum
40244053
\indextext{cast!base class}%
40254054
\indextext{cast!derived class}%

source/intro.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
For undated references, the latest edition of the referenced document
3434
(including any amendments) applies.
3535
\begin{itemize}
36+
% ISO documents in numerical order.
3637
\item ISO/IEC 2382, \doccite{Information technology --- Vocabulary}
3738
\item ISO 8601:2004, \doccite{Data elements and interchange formats ---
3839
Information interchange --- Representation of dates and times}
@@ -58,9 +59,12 @@
5859
\end{footnote}
5960
\doccite{Information technology ---
6061
Universal Multiple-Octet Coded Character Set (UCS)}
62+
\item ISO/IEC/IEEE 60559:2020, \doccite{Information technology ---
63+
Microprocessor Systems --- Floating-Point arithmetic}
6164
\item ISO 80000-2:2009, \doccite{Quantities and units ---
6265
Part 2: Mathematical signs and symbols
6366
to be used in the natural sciences and technology}
67+
% Other international standards.
6468
\item Ecma International, \doccite{ECMAScript
6569
\begin{footnote}
6670
ECMAScript\textregistered\ is a registered trademark of Ecma

0 commit comments

Comments
 (0)