2021-10-06 UTC
# [KevinMarks] hashes are good for checking if it's exactly the same - sha256 is a reasonable choice though in practice an accidental rather than malicious collision in md5 is unlikely. "edit distance" is probably a good search term - there are higher order ways to decide like word vector spaces, but that may be overkill.