Loops over large matrices are not really a good idea in Python. That is why his understanding of the original list is not terribly fast.
Its numpy version is loop free, but to my knowledge,
np.repeat It actually makes copies of your data, which again is really inefficient. An alternative would be to use
np.tile, you may not need to copy the data. But we really don't need to bother, since numpy has a great feature called broadcasting, which often does
np.tile completely unnecessary Diffusion basically makes
To evaluate performance, I created a more abstract version of the understanding of your list:
def get_valid_op(arr, lowers, uppers):
return np.asarray((any((val >= lowers) & (val < uppers)) for val in arr))
and also a streaming version
def get_valid_arr(arr, lowers, uppers):
valid = np.logical_and(arr.reshape(1, -1) >= lowers.reshape(-1, 1), arr.reshape(1, -1) < uppers.reshape(-1, 1))
The second is practically the same algorithm as its repetition / remodeling code.
With some test data modeled according to its previous description
arr = np.linspace(0, 1000, 70000)
starts = np.linspace(0, 150, 151) * 400
ends = starts + np.random.randint(0, 200, region_starts.shape) # I assumed non-overlapping regions here
we can first
assert all(get_valid_op(arr, starts, ends) == get_valid_arr(arr, starts, ends)) and then time:
%timeit -n 10 get_valid_op(arr, starts, ends)
511 ms ± 5.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 get_valid_arr(arr, starts, ends)
37.8 ms ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
An order of magnitude faster. Not bad to start 😉
From working with large matrices (
valid has a way of
(150, 70000) before the reduction) also has a cost, then I stepped back and went back to the list of loops (just a little).
def get_valid_loop(arr, lowers, uppers):
valid = np.zeros(arr.shape, dtype=bool)
for start, end in zip(lowers, uppers):
valid = np.logical_or(valid, np.logical_and(start <= arr, arr < end))
Unlike your understanding of the list, this version now only iterates over the shorter region boundary vectors, which means approximately two orders of magnitude less iterations.
We can then again
assert all(get_valid_op(arr, starts, ends) == get_valid_loop(arr, starts, ends)) and time it:
%timeit -n 10 get_valid_loop(arr, starts, ends)
18.1 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
As the results show, this version is even faster in my "synthetic" reference entries.
In the end, you will have to check the versions in your application and see which one works best.