I'm working on a system that analyzes non-trivial amounts of data. The analysis is pyramidal, since several combine to generate intermediate values, which in turn combine with each other and more inputs to generate intermediate values of higher order, and so on, until the value is produced. As a simple example: the inputs A and B combine to make the intermediate product X, the inputs A and C combine to make the intermediate product Y, the inputs C and D combine to make the intermediate product Z and then combine the intermediate products X, Y and Z With input E to make the final result. See below for a rough diagram that represents this example relationship.
Almost all of these steps are expensive enough to have significant performance costs for the user, both to retrieve the inputs and to calculate the intermediate products. On the positive side, all entries change only when requested by the user, so their values can be cached and reused, and the system can be specifically notified when one of the entries has changed. Therefore, worry about when to invalidate caches It is not a problem
The problem that I am struggling with is where is the responsibility to clean the caches of intermediate products; after all, if the input A changes in the previous example, we must recalculate X and Y, but we can keep the values stored in Z cache, as well as B, C, D and E.
I am using the event aggregator pattern, but I can not decide which event the cache will use, say, and to activate its invalidation. I can see a couple of options:
- The Y cache can subscribe to the A-data-changes and C-data change events and invalidate when it receives any of them. This requires Y to know that it depends on A & C, and means that high-level caches must subscribe to more and more modified data events to cover all their dependencies (and the dependencies of their dependencies, etc.).
- The A & C caches could send a data change event AND each time they receive their own data change events. This will require that they know that And depends on them, and it would also require them to send data change events for all subsequent products that depend on them. In addition, it would generate unnecessary duplicate change events for higher level products (since all dependencies try to transmit the message).
- A separate dependency manager / event translator could track all dependencies and know that, for example, if you see a C data change event, you need to send Y data change events and Z data changes, as well as events for anything that depends on Y or Z, up to the pyramid.
I'm inclined to go with # 3 and put the new manager with / near the root DI. Are there advantages / disadvantages to these approaches that I have overlooked? Or alternative approaches, to that?