jspωiki
Floating-point

Overview#

Floating-point is a Data representation of a number

Floating-point Ldapwiki refers you to Wikipedia: Floating-point_arithmetic#Floating-point_numbers

IEEE standardized the computer representation for binary Floating-point numbers in IEEE 754 (a.k.a. IEC 60559) in 1985 and was revised in 2008 which is used by almost all modern computers. IBM mainframes support IBM's own hexadecimal Floating-point format and IEEE 754-2008 decimal Floating-point in addition to the IEEE 754 binary format. The Cray T90 series had an IEEE version, but the SV1 still uses Cray Floating-point format.

Bfloat16 (Brain Floating Point) format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a truncated (16-bit) version of the 32-bit IEEE 754 single-precision Floating-point format (binary32) with the intent of accelerating Machine Learning and near-sensor computing. Bfloat16 preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations.

Bfloat16 was originally developed originally by Google and implemented in its third generation Tensor Processing Unit (TPU)

More Information#

There might be more information for this subject on one of the following: