c ++ – Creating 3 groups of commands per thread

I'm trying to set up a multiprocessor renderer with Vulkan and I have a question about the command groups.

Here https://on-demand.gputechconf.com/siggraph/2016/video/sig1625-tristan-lorach-vulkan-nvidia-essentials.mp4 about 14 minutes, they talk about how to do 3 groups of commands per thread and cycle them

I have some questions about this, because something seems to be off.
Please excuse my ignorance if I have misunderstood, so far I have only done one thread rendering.

  1. Does this mean that you alternate between the 3 images you get from vkAcquireNextImageKHR (assuming they are 3) and use a different set of commands for each one?

  2. Why not just use a group of commands per thread, why more than one? A single thread can only write one at a time. Why not just have 3 command buffers from 1 group per thread?

  3. Assuming it is implemented as suggested, does that mean it should be something like:
    For each thread out of N threads:
    Obtain the command group (1 of 3) for the thread, depending on what you get from vkAcquireNextImageKHR
    Make your recording in the command buffer that was created from this group
    Once you have finished recording in the N buffers, put them in a primary command buffer
    Send the buffer to the GPU
    Current exchange chain

  4. Is the removal of groups of commands an expensive operation or does it simply set some indicator, such as when a std :: vector is deleted (int does not have a constructor, so it only sets the internal size to 0)

  5. Is there any reason why I would use more than one command buffer per thread in a single frame, since everything you have to record can put in the current one that you are using (so basically in the Nvidia example?) You would have 3 groups of per-thread commands and a command buffer of each of these groups)

For number 2, I suspect it has something to do with the buffer in the previous box not being available for writings at the time the new one starts. That's a bit strange, since the data was already sent to the GPU and the last table is processed, why does not it indicate that it is already free? Or am I wrong?