git – How can I keep track of which existing lines of code/text have passed some new review?

Firstly, this is a really tough topic to do searches for because you get a wealth of information on “code review”, which is different from what I’m referring to here. That normally refers to reviewing new code that is being introduced into the codebase. That’s not what I’m interested in here.

The basic problem: You have a fairly large collection of textual files, some of them source code, and some plain text or Markdown files. You have it all under source control (though perhaps only recently). Some characteristic or requirement of the project changes. Perhaps there’s a new security requirement, or perhaps you want to migrate to a different API. Perhaps you want to make sure documentation hasn’t gotten out of sync with the actual code, or perhaps some new idea has come along and you want to make sure all engineering/planning documentation takes it into account.

At that moment, conceptually, every line of code and every paragraph of text in all of the project’s files has in some sense become “invalid”, and someone will have to go through every class or every function or every line of code or every code comment or every section or every paragraph and “revalidate” them. Many elements will simply be found to be trivially “valid”, but some will have to be changed, and may trigger changes that are scattered through other elements and files.

There may be occasion to put this review aside, work on other things, and come back to it later.

The main challenge I’m concerned with here is: How do you keep track of which lines have and haven’t been validated according to the new requirement?

I’m looking for a solution that is agnostic to the type of work unit being reviewed. For instance, a solution that assumes that the review is being done on classes or functions wouldn’t work, because external documentation will often be part of what needs review.

Possible Ideas

I’ve thought about how I might go about this with the tools I have presently. My projects of interest will tend to be versioned under Git, although asset-heavy projects will probably use Perforce instead. (I’ve been learning Git, still have to research Perforce.) Projects right now are mostly in either Visual Studio 2008 or 2019. Non-code text files tend to be edited outside these, typically using EmEditor, for instance.

Idea #1: Mark up in a branch

Basically, start a new branch dedicated to the review, and within that branch mark things up with code comments and the like to indicate what has been validated. This is nice and simple and can be done with my existing tools but does have some downsides, mostly connected with how copiously you insert these markings.

If you use them copiously:

  • It’s more effort.
  • It creates more divergence from the mainline which may make it harder to keep the branch synchronized. (You don’t want to ever merge these marks into the mainline as they would simply clutter it up.)
  • The marks could become distracting to the reviewer.

If you use them sparingly, you mostly avoid these problems but then it becomes error-prone because it’s harder to notice what has and hasn’t actually been validated.

Idea #2: Totally valid branch

Start with a completely empty branch (by git checkout --orphan and then unstage everything), and only admit pieces into this branch as they have been validated.

On one hand, this is an attractive approach because it’s less error-prone: you know everything in this branch was explicitly validated. It also avoids awkward mark-up, and you can use existing GUI tools such as git-gui to pick off individual lines to mark as valid, just by staging and committing them.

In Git I think it would be fairly easy to implement this by doing:

git switch some-review
git merge --no-commit main
git reset HEAD

At this point, your work tree will have the whole project in it, with anything that still needs review as unstaged changes. From here, you can just pick off some bit to review, and once it’s valid, stage and commit it.

This does have some drawbacks:

  • You have to periodically do the above song and dance, because otherwise the branch is totally unbuildable and doesn’t even contain anything you still need to review. And if at any point you forget the --no-commit or the git reset HEAD, you’re going to make an annoying mess.
  • You have to be careful not to accidentally stage things you haven’t actually validated, because there is really no other indication of whether you’ve reviewed it or not.
  • It may push the limits of what Git can keep track of. As a pathological example, consider a single valid line at some random position in an otherwise unvalidated 500 line file. Git can stage this one line, but then on the next merge, it would have no idea where this line came from, so you’d be faced with an artificial merge conflict. It would likewise be impossible to straightforwardly cherry-pick this into the mainline. So, you’d have to validate in hunks large enough for Git to recognize the context, and this could also become more challenging if you’ve made changes to many of the lines involved. Worse, if the context you’re staging is not good enough, there is no warning of this until you go to do another merge and it fails.
  • Due to the context problem, you’re probably better off to only make changes outside the review branch and then merge them back in, but this will become rather cumbersome.
  • Merging mainline changes would be a bit cumbersome and error-prone. You can’t just do a normal merge because that would clobber your entire validation state. You have to do a --no-commit merge and then figure out what parts to stage based on what was already on the review branch or has otherwise been newly reviewed.
  • Because large parts of the project tree are intentionally missing from the review branch, diffs will not differentiate between new mainline changes and stuff which has always been in mainline but which you just haven’t reviewed yet. This may make it harder to understand any parallel changes.
  • git-gui performs very badly on text files that are not hard-wrapped, so when working on documentation, you’d have to jump back and forth between git-gui and some editor that can actually word wrap lines instead of having them scroll off the screen. This back-and-forth would be cumbersome and a bit error-prone.
  • I‘m not sure if Perforce would permit a similar workflow and be able to track this even as well as Git could.

Question(s)

Is there any better way to do this, than my above two ideas? Either with my existing tools, or with some other type of tool?

Or, if these are my only options, what other factors might favour the one approach versus the other?

So as not to turn this into a shopping question, I’m not going to ask if there’s any specific piece of software that would be better suited than Git, Visual Studio, and EmEditor, but if there is some class of software other than revision control systems, IDEs, and text editors that would handle this better, then identifying that general class would be useful and then I can investigate the individual options on my own. (In the same sense as Q:”We keep clobbering each other’s edits,” A:”You need version control software.”)