## How to get the sign, mantissa and exponent of a floating point number

mantissa and exponent calculator

floating point mantissa

floating point addition calculator

floating point number representation

floating point multiplication

single precision floating point

floating point binary to decimal calculator

I have a program, which is running on two processors, one of which does not have floating point support. So, I need to perform floating point calculations using fixed point in that processor. For that purpose, I will be using a floating point emulation library.

I need to first extract the signs, mantissas and exponents of floating point numbers on the processor which do support floating point. So, my question is how can I get the sign, mantissa and exponent of a single precision floating point number.

Following the format from this figure,

That is what I've done so far, but except sign, neither mantissa and exponent are correct. I think, I'm missing something.

void getSME( int& s, int& m, int& e, float number ) { unsigned int* ptr = (unsigned int*)&number; s = *ptr >> 31; e = *ptr & 0x7f800000; e >>= 23; m = *ptr & 0x007fffff; }

I think it is better to use unions to do the casts, it is clearer.

#include <stdio.h> typedef union { float f; struct { unsigned int mantisa : 23; unsigned int exponent : 8; unsigned int sign : 1; } parts; } float_cast; int main(void) { float_cast d1 = { .f = 0.15625 }; printf("sign = %x\n", d1.parts.sign); printf("exponent = %x\n", d1.parts.exponent); printf("mantisa = %x\n", d1.parts.mantisa); }

Example based on http://en.wikipedia.org/wiki/Single_precision

**What is mantissa? Please + Example,** How do you write a floating point number in binary? The mantissa and exponent returned by frexp are in no way related to the actual mantissa and exponent values stored in the floating point number format. For example, the IEEE-754 format uses a sign bit and modifies the exponent value using an exponent bias.

My advice is to stick to rule 0 and not redo what standard libraries already do, if this is enough. Look at math.h (cmath in standard C++) and functions frexp, frexpf, frexpl, that break a floating point value (double, float, or long double) in its significand and exponent part. To extract the sign from the significand you can use signbit, also in math.h / cmath, or copysign (only C++11). Some alternatives, with slighter different semantics, are modf and ilogb/scalbn, available in C++11; http://en.cppreference.com/w/cpp/numeric/math/logb compares them, but I didn't find in the documentation how all these functions behave with +/-inf and NaNs. Finally, if you really want to use bitmasks (e.g., you desperately need to know the exact bits, and your program may have different NaNs with different representations, and you don't trust the above functions), at least make everything platform-independent by using the macros in float.h/cfloat.

**Floating Point Numbers,** Computer arithmetic that supports such numbers is called Floating Point. The form is 1.xxxx… Single Precision: mantissa ===> 1 bit + 23 bits Sign, Exponent, Mantissa. 1.0 × 2-1 (only have a hidden bit with binary floating point numbers) Look at math.h (cmath in standard C++) and functions frexp, frexpf, frexpl, that break a floating point value (double, float, or long double) in its significand and exponent part. To extract the sign from the significand you can use signbit, also in math.h / cmath, or copysign (only C++11).

Find out the format of the floating point numbers used on the CPU that directly supports floating point and break it down into those parts. The most common format is IEEE-754.

Alternatively, you could obtain those parts using a few special functions (`double frexp(double value, int *exp);`

and `double ldexp(double x, int exp);`

) as shown in this answer.

Another option is to use `%a`

with `printf()`

.

**Finding the Mantissa and Exponent in floating point and 32 bit ,** Finding the Mantissa and Exponent in floating point and 32 bit Binary. How to Learn. Loading Duration: 9:24
Posted: Sep 27, 2018 The Sign. The sign of a binary floating-point number is represented by a single bit. A 1 bit indicates a negative number, and a 0 bit indicates a positive number. The Mantissa. It is useful to consider the way decimal floating-point numbers represent their mantissa.

You're `&`

ing the wrong bits. I think you want:

s = *ptr >> 31; e = *ptr & 0x7f800000; e >>= 23; m = *ptr & 0x007fffff;

Remember, when you `&`

, you are zeroing out bits that you don't set. So in this case, you want to zero out the sign bit when you get the exponent, and you want to zero out the sign bit and the exponent when you get the mantissa.

Note that the masks come directly from your picture. So, the exponent mask will look like:

0 11111111 00000000000000000000000

and the mantissa mask will look like:

0 00000000 11111111111111111111111

**How to get the sign, mantissa and exponent of a floating point number,** I think it is better to use unions to do the casts, it is clearer. #include <stdio.h> typedef union { float f; struct { unsigned int mantisa : 23; unsigned int exponent : 8; A floating-point number is made of two parts called the Mantissa and Exponent The mantissa dictates the precision of a number, the more bits allocated to the mantissa, the more precise a number can be

On Linux package glibc-headers provides header `#include <ieee754.h>`

with floating point types definitions, e.g.:

union ieee754_double { double d; /* This is the IEEE 754 double-precision format. */ struct { #if __BYTE_ORDER == __BIG_ENDIAN unsigned int negative:1; unsigned int exponent:11; /* Together these comprise the mantissa. */ unsigned int mantissa0:20; unsigned int mantissa1:32; #endif /* Big endian. */ #if __BYTE_ORDER == __LITTLE_ENDIAN # if __FLOAT_WORD_ORDER == __BIG_ENDIAN unsigned int mantissa0:20; unsigned int exponent:11; unsigned int negative:1; unsigned int mantissa1:32; # else /* Together these comprise the mantissa. */ unsigned int mantissa1:32; unsigned int mantissa0:20; unsigned int exponent:11; unsigned int negative:1; # endif #endif /* Little endian. */ } ieee; /* This format makes it easier to see if a NaN is a signalling NaN. */ struct { #if __BYTE_ORDER == __BIG_ENDIAN unsigned int negative:1; unsigned int exponent:11; unsigned int quiet_nan:1; /* Together these comprise the mantissa. */ unsigned int mantissa0:19; unsigned int mantissa1:32; #else # if __FLOAT_WORD_ORDER == __BIG_ENDIAN unsigned int mantissa0:19; unsigned int quiet_nan:1; unsigned int exponent:11; unsigned int negative:1; unsigned int mantissa1:32; # else /* Together these comprise the mantissa. */ unsigned int mantissa1:32; unsigned int mantissa0:19; unsigned int quiet_nan:1; unsigned int exponent:11; unsigned int negative:1; # endif #endif } ieee_nan; }; #define IEEE754_DOUBLE_BIAS 0x3ff /* Added to exponent. */

**Tutorial: Floating-Point Binary,** The sign of a binary floating-point number is represented by a single bit. It is useful to consider the way decimal floating-point numbers represent their mantissa. We have now reached the point where we can combine the sign, exponent, Sign - find the sign of the mantissa (make a note of this) Slide - find the value of the exponent and whether it is positive or negative Bounce - move the decimal the distance the exponent asks, left for a negative exponent, right for a positive If Moving Left and Is Positive Number, Then pad with zeroes

**Representing Floating Point Numbers,** 2 signs - one for the exponent and one for the mantissa Floating Point Numbers Using Decimal Digits and Excess 49 Notation If two numbers have the same sign, then they can be compared numerically after the sign bit to determine which It is easy to get confused here as the sign bit for the floating point number as a whole has 0 for positive and 1 for negative but this is flipped for the exponent due to it using an offset mechanism. It's just something you have to keep in mind when working with floating point numbers. The Mantissa

**Mantissa,** The mantissa and the exponent can be changed as appropriate to obtain the most The exponents for two floating point numbers must be the same for floating If the zero did not have a sign then the relation 1/(1/a) = a would fail to hold for a Floating Point Addition. Add the following two decimal numbers in scientific notation: 8.70 × 10-1 with 9.95 × 10 1. Rewrite the smaller number such that its exponent matches with the exponent of the larger number. 8.70 × 10-1 = 0.087 × 10 1; Add the mantissas 9.95 + 0.087 = 10.037 and write the sum 10.037 × 10 1; Put the result in Normalised Form

**Introduction of Floating Point Representation,** To convert the floating point into decimal, we have 3 elements in a 32-bit The decimal number hence given as: Sign*Exponent*Mantissa = (-1)*(16)*(1.625) = - An overview of IEEE Standard 754 floating-point representation. Note that the extreme values occur (regardless of sign) when the exponent is at the maximum value for finite numbers (2 127 for single-precision, 2 1023 for double), and the mantissa is filled with 1s (including the normalizing 1 bit).

##### Comments

- Try to start from here: en.wikipedia.org/wiki/Single-precision_floating-point_format, but I am almost sure that you saw this
- Aliasing through pointer conversion is not supported by the C standard and may be troublesome in some compilers. It is preferable to use
`(union { float f; uint32_t u; }) { number } .u`

. This returns a`uint32_t`

that is the bytes of the`float`

`number`

reinterpreted as a 32-bit unsigned integer. - I'm assuming IEEE 754 32 bit binary. Are you aware of the following issues? (1) The exponent is biassed, by adding 127 to the actual exponent. (2) All except very small floats are normalized, and the leading 1 bit of a normalized float mantissa is not stored.
- Do you mean C or C++ (C has no references, only pointers)
- Three problems: 0. not removing the bias from the encoded exponent 1. not adding the implicit mantissa bit for normal nonzero numbers 2. not handling denormals, infinities and sNaN/qNaNs's
- "For some reason, this original purpose of the union got "overriden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation is not a valid use of unions. It generally leads to undefined behavior." stackoverflow.com/a/2313676/1127387
- There's no law that says you have to only use things for what they were originally created for. Otherwise the first plane wouldn't have used bits of bicycle. "Generally" undefined? What about those occasions when it is defined, or when you're happy with the behaviour on a given platform/situation?
- This method fails when 1)
`float`

is not IEEE 754 32 bit binary (not so rare) 2)`unsigned`

is 16-bit (common in embedded world) 3) endian of`unsigned/float`

do not match. (rare). 4) Mathematical interpretation is used for`exponent/mantissa`

as this answer shows the biased exponent and the incomplete significand/mantissa. - Is the above code portable? What happens on big and little endian machines?
- Very late to the party here, but no, the
`union`

is not better because it is not guaranteed to work at all. It certainly is not portable. Nothing constrains the C implementation to lay out the bitfields such that the union maps them to the desired pieces of the`float`

representation, the separate question of relying on type punning at all notwithstanding. - @MetallicPriest Try now, I had the wrong masks the first time.
- What about the so called hidden bit? I don't see anyone set it:
`m |= 0x00800000;`

. Note that the number should be checked for special values (denormals, NaN, infinities) first, since these require different treatment. - @RudyVelthuis From their original code, it doesn't look they were trying to actually obtain the values of the exponent and mantissa, just trying to get the bit representation of each. I'm assuming this because they didn't or in the hidden bit or normalize the sign, but I could be wrong.