We are currently researching our case of storing the distances between cities. Right now we have 6 billion of those distances.
Our structure right now in SQL Server is that we have a column that represents a float number which represents the relationship between cities.
So here is what I mean, we have a city with ID 1 inside the
Locations table and a city with ID 2 inside the same table, a row with distance from 1 to 2 will look like so
1000 miles. That column is indexed.
So as an example we need to get the distance from the city
1000 to the city
2535, we need to find
1000.2535 inside the
Distances table to get that distance.
Besides that example, we need to select 1000 rows with a generated list of 1000 IDs from those 6 billion rows. What I mean is
SELECT id, distance FROM Distances WHERE id IN (1000.2535, 1.2, etc.)
Right now we’ve only tested SQL Server on a local machine and it gives us around 300 ms for such query of 1000 rows, but only when we set a 50 ms timeout (this is needed for a lot of parallel requests from multiple users), if 50 ms timeout is not used it just grows exponentially like 300 ms for the first, 500 ms for the second, 800 ms for the third, etc.
And right now we taking a look at ElasticSearch specifically for
So my questions are:
- Which database would you recommend for such a use case?
- What would you recommend besides what we’ve thought of maybe some other ideas like splitting into two different columns cities IDs, etc?
- What would be the best ways to optimize such a database?