This is a misunderstanding of the system. It's not a classifier, it's a hash. Th...

deepspace · on Aug 13, 2021

> This is a misunderstanding of the system. It's not a classifier, it's a hash

You know that a perceptual hash has more in common with a classifier than with a cryptographic hash, right?

If they were using a crypto-hash, then, yes, your argument could be valid, but with perceptual hashing your picture of your baby in the bath could VERY well generate the same hash as a CSAM image. And nobody believes Apples "1 in a trillion" number.

simondotau · on Aug 13, 2021

I'd be surprised if a baby in the bath—or indeed any legitmately innocent photograph taken by a parent—could ever be a perceptual match to images tagged as "A1" by NCMEC and other agencies. This is the most extreme of the extreme: prepubescent minors actively engaged in sex acts.

farmerstan · on Aug 13, 2021

I’d be surprised that a YouTube video of white noise would be flagged for a copyright violation, and yet here we are.

Things may work right now in 2021. Things will always be changing. New hashes will be introduced. New code will be introduce. New laws in different countries will be introduced.

Now that Apple has introduced this technology that no other phone manufacturer has, it can and will be changed to decrease privacy and increase government control. If China passes a law that states that IPhone needs to scan everyone’s phones for anti-government material, do you really think Tim Cook will say no? They already acquiesced about storing iCloud backups unencrypted on Chinese servers.

And before you say that China would never do that, China and Hong Kong are hunting down the people who were videoed booing Chinese anthem in a shopping mall.

simondotau · on Aug 13, 2021

> I’d be surprised that a YouTube video of white noise would be flagged for a copyright violation, and yet here we are.

Speak for yourself, because that didn't surprise me at all. When your corpus of "copyrighted" material is so utterly massive and almost entirely devoid of defined rules or boundaries, this kind of error is inevitable. If anything I'm more surprised that we haven't seen even more of these kinds of matches, like the sound of generic telephone ringtones, recordings of mechanised church bells, or machinery which performs highly uniform tasks, etc.

In the case of A1-categorised CSAM, the quantity of items is many orders of magnitude lower, the degree of technical curation will be orders of magnitude higher, and the thresholds for matches will be narrower. Is there a chance that the A1 corpus will have a few images that should have been classified as A2, B1 or B2? Yes. Is there a chance that it includes pictures of The Statue of Liberty or Westminster Tower? Almost certainly not.

farmerstan · on Aug 13, 2021

You completely bypassed the point of my post. It doesn't matter about what happens now. What matters is in the future. If China creates a law saying that not only should this system work for CSAM but also for objectionable material or anti-government material, is Apple really going to say no if it means billions of dollars in losses and Apple execs being targets by the CCP? Of course they won't.

simondotau · on Aug 13, 2021

If you think it's Apple's (or Google's, or Microsoft's) role to act as the international human rights arm of the US State Department, then I have some bad news for you.

China can do whatever China wants. If China wanted Apple to scan the iPhones of Chinese residents for pictures of Winnie The Pooh, they could have done that last year. They pass a law, Apple must comply or leave. They don't have a choice.

Of course I don't like it. I think many things China does are awful. But at the end of the day I wouldn't stand for China exporting their morality onto me, and I'm not a hypocrite.

foobiekr · on Aug 13, 2021

One explanation for this CSAM thing is to muddy the water about content scanning using an unassailable goal so that apple can give the Chinese and Saudi governments what they want and not get the same degree of criticism.

heavyset_go · on Aug 13, 2021

You have a fundamental misunderstanding of perceptual hashes, then. If two images look similar to one another, then they will have similar perceptual hashes. That's the point of perceptual hashing.

They aren't doing simple hash matching, they're doing fuzzy matches on the hashes, so there will be far more false positives than just hash collisions.

simondotau · on Aug 13, 2021

All of us can only guess how much "fuzz" is being allowed for in their hash matches. I see no reason to believe Apple phoned in their analysis which leads them to be confident in a false positive rate of 1 in 1 billion per iCloud account. While we can only guess, they actually know how their algorithm behaves, and I dare say they've validated it against tens of millions, if not hundreds of millions of real customer images.