Algorithms: shuffling items efficiently in \$ N \$ cubes using \$ O (N) \$ of space

I have come across a puzzle of challenging algorithms when trying to generate a large amount of test data. The problem is the following:

• We have $$N$$ cubes $$B_1$$ through $$B_N$$. Each bucket $$B_i$$ maps to a single Item $$a_i$$ and a tell $$k_i$$. In total, the collection keeps $$T = sum_1 ^ N {k_i}$$ articles. This is a more compact representation of a vector of $$T$$ articles where each $$a_i$$ is repeated $$k_i$$ times.

• We want to output a random list of $$T$$ articles, all equally probable permutations, using only $$O (N)$$ Space and minimum complexity time. (Assume a perfect RNG.)

• $$N$$ it's pretty big and $$T$$ it is much bigger; 5,000 and 5,000,000 in the problem that led me to this investigation.

Now clearly the complexity of time is at least $$O (T)$$ since we have to take out so many articles. But how close can we get to that lower limit? Some algorithms:

• Algorithm 1: Expand the cubes in a vector $$T$$ Articles and use of Fisher-Yates. This uses $$O (T)$$ time, but also $$O (T)$$ Space, we want to avoid.

• Algorithm 2: For each step, choose a random number $$R$$ since $$[0,T-1]$$. Go through the cubes, subtracting. $$k_i$$ since $$R$$ every time, until $$R <0$$; then exit $$i$$ and decrement $$k_i$$ Y $$T$$. This seems correct and does not use extra space. However, take $$O (NT)$$ time, which is quite slow when $$N$$ It is long.

• Algorithm 3: Convert the cube vector into a balanced binary tree with cubes in the leaf nodes; the depth should be close to $$log_2 {N}$$. Each node stores the total count of all the cubes below it. To shuffle, choose a random number $$R$$ since $$[0,T-1]$$, then descend to the tree accordingly, decreasing each count of nodes as we go; when descending to the right, reduce $$R$$ By the count of the left. When we reach a leaf node, we will emit its value. Uses $$O (N)$$ space and $$O (T log {N})$$ hour.

• Algorithm 3a: same as Algorithm 3, but with a Huffman tree; this should be faster if the $$k_i$$ the values ​​vary widely, since the most visited nodes will be closer to the root. The performance is more difficult to evaluate, but it seems that it would vary from $$O (T)$$ to $$O (T log {N})$$ depending on the distribution of $$k_i$$.

Algorithm 3 is the best I have found. Here are some illustrations to clarify it:

Does anyone know of a more efficient algorithm? I tried to search with several terms, but I could not find any discussion about this particular task.