sql server – Which database to use when you have a 6 billion of rows inside and need to query rows with from list of IDs?


We are currently researching our case of storing the distances between cities. Right now we have 6 billion of those distances.

Our structure right now in SQL Server is that we have a column that represents a float number which represents the relationship between cities.

So here is what I mean, we have a city with ID 1 inside the Locations table and a city with ID 2 inside the same table, a row with distance from 1 to 2 will look like so 1.21000 miles. That column is indexed.

So as an example we need to get the distance from the city 1000 to the city 2535, we need to find 1000.2535 inside the Distances table to get that distance.

Besides that example, we need to select 1000 rows with a generated list of 1000 IDs from those 6 billion rows. What I mean is SELECT id, distance FROM Distances WHERE id IN (1000.2535, 1.2, etc.)

Right now we’ve only tested SQL Server on a local machine and it gives us around 300 ms for such query of 1000 rows, but only when we set a 50 ms timeout (this is needed for a lot of parallel requests from multiple users), if 50 ms timeout is not used it just grows exponentially like 300 ms for the first, 500 ms for the second, 800 ms for the third, etc.

And right now we taking a look at ElasticSearch specifically for mget.

So my questions are:

  1. Which database would you recommend for such a use case?
  2. What would you recommend besides what we’ve thought of maybe some other ideas like splitting into two different columns cities IDs, etc?
  3. What would be the best ways to optimize such a database?