Let’s say we have N different input static datasets D_n, for example (don’t focus on the example but more on the concept here) using N=3;
a dataset D_1 composed of images,
a dataset D_2 of features
and a dataset D_3 of objects.
On each dataset, I can apply some parameters P_n to do some (simple or sometime quite complex) computations on it.
Let’s call a ‘run’, the results of the combination of the application of
some defined set of values for the image parameters P_1 applied to the images dataset D_1 (which would lead to some processed data D_1′ corresponding to the dataset D_1) +
some defined set values of the features parameters P_2 applied to the features dataset D_2 (which would lead to some processed data D_2′ corresponding to the dataset D_2) +
some defined set of values of the object parameters P_3 applied to the object dataset D_3 (which would lead to some processed data D_3′ corresponding to the dataset D_3) .
How would you designed a relational database (I’m using PostgreSQL but I guess it’s not important at this stage) so that we can store both 1) the results of each and every run using different sets of defined values for the parameters P_n on the datasets D_n and 2) the values or the parameters used for getting these results.
For the moment, each time I want to run the computations using different parameter values, I change these values in a config file, overwriting the previous ones, and I output results to files on the disk, which also got overwritten for each new run. But I’d like to keep them all now.
Another problem that I have is that I’m writing images and some geographical data on the disk, mainly raster files, which are quite huge, therefore it would probably not be a good idea to store them in the database but only keeping a link (to the file) in the DB.
Last but not least, if I store values for parameters in a
parameter table in my database, I cannot run the computation in parallel on the same database as I would need different values setup in the same table for two different runs.
That’s why I’m searching to better design and organize my datasets and parameters when playing with different sets of values.