Cache tag problem causing block to disappear

We have a Drupal 8 site with a few thousand pages which are all in a huge hierarchical menu. Many pages display this menu as a navigation sidebar and each page can be customized so the root of this menu can be the page itself, its parent or its grandparent and set to how many levels of the menu should be displayed. Did it make sense to put everything in one giant menu? Probably not, but that’s what I’m working with.

The problem is they, being in the same menu, all get the cache tag: which means that every node save purges basically the whole site. The sidebar menu was rendered using a view which only matched the node itself then a node_view hook to build the appropriate menu. This added the node_list tag also causing site-wide purges on node saves.

To solve this I moved the code into a block in a custom module which got rid of the node_list tag. (btw I have no idea why a view was being used, seems totally unnecessary). For the menu tags I removed the and add node:(node_id) for each menu item so that any given page is only purged if one of the nodes it’s specific menu is changes. For new nodes I have it purge a sibling (which purges all pages in the menu) so that any page that shares the same menu will be purged and the new page appears in their menu too.

finally, I added url.path to the cache context so pages wouldn’t get a cached menu from the wrong page.

And it worked!

… but only on the local environment and the staging environment on an Acquia server. When we deployed to production it appeared okay at first until we got reports that some pages were missing the side menu altogether!

Production goes through a CDN and Varnish on Acquia before it hits Drupal but I don’t see how the CDN or Varnish could affect it since Drupal builds the page. No errors in the log. Could I be missing a cache context or something? I’m stumped on this because I can’t reproduce the error locally. Any suggestions would be great. Thanks!