I have the need to distribute a set of long-term jobs in a group of containers in ECS. Essentially, these jobs would need to open a socket connection to a remote server and begin transmitting data to be imported into a database.
For each client, there may be any number of socket connections needed to consume different data sources. The creation of a new ECS service for each client is impractical since this would also require new definitions of ECS tasks with slightly different configurations, which would ultimately result in thousands of service / task definitions. This approach would quickly become a maintenance and monitoring nightmare, so I am looking for a simpler solution.
The list of "feeds" is relatively static and is stored in a document database. Feeds are only added when new customers register. My initial idea is to have a fixed number of containers responsible for retrieving the database feed configurations. Each container will attempt to acquire a lease for the feed and, if purchased, start the feed and let it run until the container is removed or the connection is broken. Each container would have to check periodically if there are new sources available in the group. They would also have to extend the lease while the feed is running so that another container does not drag the same feed. There might Be a career condition here where the lease expires before it is extended, so I'll have to be careful to always extend it.
This solution would work, but there are some obvious points. If each container starts relatively at the same time, there must be a way to control how many jobs each container can start at the same time so that it does not have 1 or 2 containers that start all jobs at once. One approach would be to take a job every two seconds until the pool is empty and all power supplies are leased. This would lead to a potentially unequal distribution of jobs, and the acceleration time could take a while until all jobs are removed from the pool and rented. It could also make containers get a feed and start it. so Go find another one. Some containers can start feeding faster and go to find another job before another container finishes feeding, which leads to hot spots in the container.
Another approach could be to use something like consistent hashing. If each container can know the ID of itself and the other containers, you can modify the source configuration ID and determine which container it belongs to. This would distribute jobs more evenly, but it would also have to deal with the periodic verification of new sources, for example, if a container was eliminated and the leases of the sources expired.
I have also thought that the programming of actors like Akka could be exactly for this problem, but that does not come without a significant complexity in the implementation.
I am open to any and all suggestions or any other holes I may introduce in my existing proposals!