HN2new | past | comments | ask | show | jobs | submitlogin

99% of internet discussion on this topic is junk.

And how is that?

It seems like the Gruber article follows a common formula for justifying controversial approaches. First, "most of what you hear is junk", then "here's a bunch of technical points everyone gets wrong"(but where the wrongness might not change the basic situation), then go over the non-controversial and then finally go to the controversial parts and give the standard "think of the children" explanation. But if you've cleared away all other discussion of the situation, you might make these apologistics sound like new insight.

Is Apple "scanning people's photos"? Basically yes? They're doing it with signatures but that's how any mass surveillance would work. They promise to do this only with CSAM but they previously promised to not scan your phone's data at all.



But some of those technical points are important. Parent comment was concerned that photos of their own kids will get them in trouble - it appears the system was designed to explicitly to prevent that.


The Daring Fireball article actually is a little deceptive here. It goes over a bunch of that won't get parents in trouble and gives a further couched justification of the finger printing example.

The question is whether an ordinary baby photo is likely to collide with the one of the CSAM hashes Apple will be scanning for. I don't think Apple can give a definite no here (Edit: how could give a guarantee that a system that finds any disguised/distorted CSAM won't tag a random baby picture with a similar appearance. And given such collision, the picture might be looked at by Apple and maybe law enforcement).

Separately, Apple does promise only to scan things going to iCloud for now. But their credibility no long appears high given they're suddenly scanning users' photos on the users' own machines.

Edited for clarity.


> how could give a guarantee that a system that finds any disguised/distorted CSAM won't tag a random baby picture with a similar appearance.

Cannot guarantee, but by choosing a sufficiently high threshold, you can make the probability of that happening arbitrarily small. And then you have human review.

> And given such collision, the picture might be looked at by Apple and maybe law enforcement

No, not "the picture", but a "visual derivative".


> "visual derivative".

Do you have any idea that means? Because I certainly don't - how could you possibly identify whether an image is CSAM without looking at something which is reasonably the same image?

What is a visual derivative? Take that algorithm and run it over some normal images and show me what they look like.

All of this is being aggressively talked around because everyone knows it's not going to stand up to any reasonable scrutiny (i.e. plenty of big image datasets out there - does Apple's implementation flag on any of those? Who knows - they're not going to refer to anything specific about how they got "1 in a trillion").


> Do you have any idea that means?

No, I don't know what that means. Presumably it is some sort of thumbnail, maybe color inverted or something.

> does Apple's implementation flag on any of those? Who knows - they're not going to refer to anything specific about how they got "1 in a trillion"

I assume they've tested NeuralHash on big datasets of innocuous pictures, and gotten some sort of bound on the probability of false positives p, and then chosen N such that p^N << 10^-12, and furthermore imposed some condition on the "distance" between offending images (to ensure some semblance of independence). At least that's what I'd do after thinking about the problem for a minute.


I assume they've tested NeuralHash on big datasets of innocuous pictures, and gotten some sort of bound on the probability of false positives p, and then chosen N such that p^N << 10^-12

What's interesting about this faulty argument is that it hinges an assumption that "innocuous pictures" is a well defined space that you can use for testing and get reliable predictions from.

A neural network does classification by drawing a complex curve between one large set and another large set on a high dimensional feature space. The problem is those features can include, often include, incidental things like lighting, subject placement and so-forth. And this often work because your target data set really does uniquely have feature X. So you can get a result that your system can reliably find X but when you go out to the real world, you find those incidental features.

I don't know exactly how the NeuralHash works but I'd presume it has the same fundamental limitations. It has to find images even they've been put through easy filters that are going to change every particular pixel so it's hard to see how it wouldn't find picture A that looking like picture B if you squint.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: