What is this "time delay"?
When a miner finds a block, it will transmit it to the network. As with all real-world networks, there is a latency ("delay time") for this information to be received by the other nodes in the network. Specifically, there will be some latency between the miner who found the block and other miners in the network. Even if the latency is small (milliseconds), there is still some chance that another miner will find a block within that "delay time."
What exactly is the "proof of work size"?
The nodes will follow what is known colloquially as the "longest chain", but in reality they are following the "chain with more work" (the difference is subtle, but important). Each valid block is equally valid, so that there is "more proof of work" in a chain that is to say that the chain is "longer" and, therefore, the valid chain.
If two miners find valid blocks at the same block height, it will not be clear which block will be part of the longest chain eventually, until a miner finds a new block that is built on one of those two blocks. At that point, the longer chain will become clear and the network nodes will follow it.
Protocol rules mentioned above: When exactly do the nodes (or the network) verify them?
The consensus rules are defined by the source code itself, there is no formal specification.
When a node in the network finds out about a new transaction or block, it will execute a series of checks to determine the validity of that transaction / block. If it is not valid, it is discarded.
When a miner assembles a block template, he will choose only the transactions he has determined to be valid and will carefully elaborate the block so that it is also valid. So, this is done before they start the hash, otherwise they would be wasting the hashpower creating an invalid block.