Performance – Generating a random number from a binomial distribution

In my Monte Carlo simulation application, the crucial part is to generate a random number from a binomial distribution with parameters n = size and p = 0.5. Here is my current implementation

#include 
#include 
#include 

int64_t rbinom (int64_t size) {
yes (! size) {
returns 0;
}

int64_t result = 0;
while (size> = 64) {
uint64_t random64;
while (! _rdrand64_step (& random64)) {
fprintf (stderr, "HW_RND_GEN is not ready  n");
}
result + = _popcnt64 (random64);
size - = 64;
}

uint64_t random64;
while (! _rdrand64_step (& random64)) {
fprintf (stderr, "HW_RND_GEN is not ready  n");
}
result + = _popcnt64 (random64 & ~ (UINT64_MAX << size));

return result
}

However, the result of the comparative evaluation terrifies me:

enter the description of the image here

I am spending 99.68% of the time in this function! How can I optimize it?

The result does not need to be cryptographically secure, as long as it is good enough for Monte Carlo simulations.