performance tuning – Should I worry about concurrent access to LibraryLinked code?

According to these answers, when several heterogeneous results (say {MTensor, mint, double}) are needed to be returned from a LibraryLink computation(), the way to go is to store the results in some global library state, and then retrieve every field MTensor, mint, double with a new dedicated function like retrieve_array_from_computation(), retrieve_int_from_computation() and retrieve_real_from_computation().

Does this imply that some {MTensor, mint, double} struct must be kept in some global state within the library?
If yes, should I worry about possible concurrent access, e.g. if the Mathematica user is calling into computation() from a ParallelTable or so?

In this case I should store not only a {MTensor, mint, double} in the global state, but a whole random-access collection of such objects protected by a mutex. And computation() should return some index into the collection, that the caller in a thread must use in order not to retrieve the results computed in another thread.

While I’m okay with doing all this, I’m not confident it’s truly needed because I’m not sure what’s the memory model for parallelism with LibraryLink. Will library global space be actually shared among threads, or will each kernel run its own independent version of the loaded library?