As far as I understand they use some kind of hash. I suspect their paper on avoiding hash collisions is right next to the Nobel price wining description of the worlds first working perpetuum mobile.
Yes, and you can put a number on the probability of that happening, say a fixed p << 1. And then you can choose the number of required matches before flagging, say N. And then (assuming independence [1]) you have an overall probability of p^N, which you can make arbitrarily small by making N sufficiently large. (I'm pretty sure that's how Apple came up with their "1 in a trillion chance per year".) And then you still have manual review.
[1] you could "help" independence by requiring a certain distance between images you simultaneously flag.
Absolutely not true. Apple is using a similarity based hash, so if the NCMEC database contains a picture that's similar to one that you have, it could produce a match even if it's not the same. Apple says this isn't an issue, because a person will look at your picture(yes, a random person somewhere will look at the pictures of your newborn) and judge whether they are pictures of child abuse or not. If this unknown person thinks your picture shows child abuse, you will be reported to NCMEC and then what happens is unknown - but likely that it would result in some legal action against you.
> because a person will look at your picture(yes, a random person somewhere will look at the pictures of your newborn)
No. If the number of matches to known CSAM in your library exceeds a threshold, then a person will look at a "visual derivative" of only those pictures whose perceptual hatch match that of known CSAM.
Note that, if I understand correctly, pictures that Android users sync to Google have already been scanned for some time. Where are all those false positives?
>>the number of matches to known CSAM in your library exceeds a threshold, then a person will look at a "visual derivative" of only those pictures whose perceptual hatch match that of known CSAM
Apple very specifically said that their employees will look at the suspected picture before sending it through to authorities. Where do you see the bit about visual derivatives? What would that even mean or look like?
Also what is this threshold? As others have pointed out, us parents have literally hundreds of pictures of our newborns, toddlers and kids - having to trigger some detector "multiple" times doesn't give me any peace of mind at all.
>>Where are all those false positives?
Google doesn't use perceptual hashing, or at least haven't said they do.
The "perceptual hash" is supposed to match a specific image (though possibly cropped, or otherwise altered a bit, such as through a filter), not "toddlers" per se.
> Google doesn't use perceptual hashing, or at least haven't said they do.
I don't know what the other cloud providers are doing, but I'd be very surprised if they use (trivially circumventable) cryptographic hashes.
Literally Apple said in their own FAQ that they are using a perceptual(similarity based) hash and that their employees will review images when flagged. If that's not good enough(somehow) then even the New York Times article about it says the same thing. What other evidence do you need?
* As addressed in the comments below, this isn’t entirely true: the hash looks for visually similar picture and there may be false positives.