python – Ideal architecture for ML training -> Workflow of the API service, with multiple models / services?

I am planning to build a workflow / environment to train and serve the NLP classifiers, which follows something like:

  1. The model training system takes in annotated documents from a subset of a variety of preconfigured sources, along with a set of user-defined parameters on how to execute the model (for example, what n-gram functions will be generated, either for apply negation / stemming). etc)
  2. The model training system issues a model file to an S3 cube
  3. A flask-based API service loads the model from S3 at startup and uses it to provide real-time predictions

However, there are some caveats:

  • The training workflow will be integrated into multiple independent services, not just one
  • Each service can have several models attached (therefore, an incoming POST of a document would receive a response with multiple classifications, based on multiple predictions of different models)
  • Calls per minute per service would be relatively low (maybe a call to a service every few minutes)

I've researched existing offers like SageMaker, but that's limited to one API service per model. Apparently, it is also designed for API services that would receive thousands of calls per second, which is not at all cost-effective for my needs.

As such, here is my plan:

Pre- / Post-processing package. Have a repo code that contains all the preprocessing or postprocessing methods that can be called by the classification channel (both in training and in prediction). All these methods include a large amount of logical variation, dictated by input parameters. This code, by itself, is not implemented anywhere.

Training service. A high-resource EC2 instance that imports the package before or after processing, has input connectors to all possible data sources and sends it to an S3 repository. Data scientists will enter a set of parameters and data sources and execute the training in this instance.

Storage model. The output models are stored in several groups of S3, according to the organizational structure related to the data sources and the type of classifier.

API Services. A series of low-resource flask-based API services that use configuration files to dictate which models to load from the S3 cube. Is also you would need to import the preprocessing / postprocessing package, so that you can apply those methods during the prediction of incoming documents.

So, my questions:

Does this general architecture make sense? Or are there sections that I should rethink?

Should I look for systems that are profitable and that can handle this better than building the entire ecosystem?