This is a misunderstanding of the system. It's not a classifier, it's a hash. The only images that are going to be seen by this low-wage worker are going to be:
A) Images which are a correct hash match to an image already known to NCMEC or other agencies which have already been assigned CSAM category A1 (A=prepubescent, B=sex act);
B) Images which are a hash collision. According to Apple, the likelihood of a collision is 1 in 1 trillion per user account.
This system isn't a child detector strapped to a porn detector, being backed up by a low-wage worker making legal or editorial judgement calls. It's searching for images already known to child safety organisations—and even then only the most unambiguously horrific classification within the set of known images, far far far beyond the point where any ambiguity could possibly reside.
> This is a misunderstanding of the system. It's not a classifier, it's a hash
You know that a perceptual hash has more in common with a classifier than with a cryptographic hash, right?
If they were using a crypto-hash, then, yes, your argument could be valid, but with perceptual hashing your picture of your baby in the bath could VERY well generate the same hash as a CSAM image. And nobody believes Apples "1 in a trillion" number.
I'd be surprised if a baby in the bath—or indeed any legitmately innocent photograph taken by a parent—could ever be a perceptual match to images tagged as "A1" by NCMEC and other agencies. This is the most extreme of the extreme: prepubescent minors actively engaged in sex acts.
I’d be surprised that a YouTube video of white noise would be flagged for a copyright violation, and yet here we are.
Things may work right now in 2021. Things will always be changing. New hashes will be introduced. New code will be introduce. New laws in different countries will be introduced.
Now that Apple has introduced this technology that no other phone manufacturer has, it can and will be changed to decrease privacy and increase government control. If China passes a law that states that IPhone needs to scan everyone’s phones for anti-government material, do you really think Tim Cook will say no? They already acquiesced about storing iCloud backups unencrypted on Chinese servers.
And before you say that China would never do that, China and Hong Kong are hunting down the people who were videoed booing Chinese anthem in a shopping mall.
> I’d be surprised that a YouTube video of white noise would be flagged for a copyright violation, and yet here we are.
Speak for yourself, because that didn't surprise me at all. When your corpus of "copyrighted" material is so utterly massive and almost entirely devoid of defined rules or boundaries, this kind of error is inevitable. If anything I'm more surprised that we haven't seen even more of these kinds of matches, like the sound of generic telephone ringtones, recordings of mechanised church bells, or machinery which performs highly uniform tasks, etc.
In the case of A1-categorised CSAM, the quantity of items is many orders of magnitude lower, the degree of technical curation will be orders of magnitude higher, and the thresholds for matches will be narrower. Is there a chance that the A1 corpus will have a few images that should have been classified as A2, B1 or B2? Yes. Is there a chance that it includes pictures of The Statue of Liberty or Westminster Tower? Almost certainly not.
You completely bypassed the point of my post. It doesn't matter about what happens now. What matters is in the future. If China creates a law saying that not only should this system work for CSAM but also for objectionable material or anti-government material, is Apple really going to say no if it means billions of dollars in losses and Apple execs being targets by the CCP? Of course they won't.
If you think it's Apple's (or Google's, or Microsoft's) role to act as the international human rights arm of the US State Department, then I have some bad news for you.
China can do whatever China wants. If China wanted Apple to scan the iPhones of Chinese residents for pictures of Winnie The Pooh, they could have done that last year. They pass a law, Apple must comply or leave. They don't have a choice.
Of course I don't like it. I think many things China does are awful. But at the end of the day I wouldn't stand for China exporting their morality onto me, and I'm not a hypocrite.
One explanation for this CSAM thing is to muddy the water about content scanning using an unassailable goal so that apple can give the Chinese and Saudi governments what they want and not get the same degree of criticism.
You have a fundamental misunderstanding of perceptual hashes, then. If two images look similar to one another, then they will have similar perceptual hashes. That's the point of perceptual hashing.
They aren't doing simple hash matching, they're doing fuzzy matches on the hashes, so there will be far more false positives than just hash collisions.
All of us can only guess how much "fuzz" is being allowed for in their hash matches. I see no reason to believe Apple phoned in their analysis which leads them to be confident in a false positive rate of 1 in 1 billion per iCloud account. While we can only guess, they actually know how their algorithm behaves, and I dare say they've validated it against tens of millions, if not hundreds of millions of real customer images.
A) Images which are a correct hash match to an image already known to NCMEC or other agencies which have already been assigned CSAM category A1 (A=prepubescent, B=sex act);
B) Images which are a hash collision. According to Apple, the likelihood of a collision is 1 in 1 trillion per user account.
This system isn't a child detector strapped to a porn detector, being backed up by a low-wage worker making legal or editorial judgement calls. It's searching for images already known to child safety organisations—and even then only the most unambiguously horrific classification within the set of known images, far far far beyond the point where any ambiguity could possibly reside.