I’m working on an side-project which will perform some API requests to a third party service. I’m limited how many requests I can send, so each minute the application is scheduled to get a certain number of entities from the database and make API requets for those entities. The application is designed to work in a clustered environment, so if the limit for how many requets I can send to the third party API is 1000 requets per minute, and I have 4 instances running, each instance should sent only 250 requests per minute.
The part that I’m having problem of finding a good solution for, is about the process of retrieving the entities from the database in a way so I don’t send requests for the same entites by more than 1 instance.
In a case when 4 instances are running, and the database holds 2500 rows of those entities, the first instance will query from 0 to 250, do some work with those entities, and then send requests for them to the third party service, the second instance when triggered by the scheduler, should now pickup from 250 to 500, since the first entities were processed by the first instance already.
The schedulers as you know can be triggered at the same time, so it can be problematic.
I’m using Redis for caching, and one solution that came to mind was to use GETSET of Redis, and save the offset and limit in Redis, since it is single-threaded.
e.g. If the first instance got the entities, it would set offset and limit on Redis to 250:500 (offset:limit), when the second instance is triggered, it would get that value, query the entities in this range, and update it to 500:750.
I’m not sure if this is the correct solution, and if there is something out there that can be more appropriate for this use case.
Note: I know that 1000 requests per minute are nothing, and can be handled perfectly by one node, but this is more of a side-project I’m using for learning mostly.