Hacker News new | past | comments | ask | show | jobs | submit login
Tuning-Free Personalized Image Generation (meta.com)
82 points by LarsDu88 3 months ago | hide | past | favorite | 45 comments



I know exactly why Facebook / Meta are researching this.

Just imagine the possibilities for advertisers: Instead of telling someone how happy they would be if only they bought your expensive car, let's just spam them with AI pictures of themselves sitting in said expensive car, ideally next to some very attractive other people that match their dating preferences.

Facebook has all the data they need to create very pleasant dream scenarios for you. And they have the connections to monetize those dreams. Didn't the Expanse have a scene with someone addicted to living in a fantasy world? I thought it was meant as a warning, but this wouldn't be the first time that an elaborate warning would be misunderstood as an instruction manual.


This is quite thought provoking. I can totally see ads for, say Disney World, where they put you in the picture instead of an actor. I mean, the whole goal of these ads is already to have you imagine yourself there. Putting you in the picture makes it that much easier.


lol, if it is a good enough Ad

Just add it to my Instagram timeline and I can skip the trip and the cost. Everyone else (including me in 30 years) thinks I went.


We sell text-to-image model finetuning (aka "Dreambooth") as a service and yes, this is one of the use cases.

Recently a travel agency used our platform to generate images of people in the destinations they were advertising.


Ew


This is a much bigger thing than the llama3.1 release. Llama 3.1 doesn't really help Meta's bottom line.

But content creation and ads are Meta's killer app. By having a model that doesn't require finetuning, they just changed the whole game.


Where do I sign up for my personalized AI content filter bot which can reliably detect ads and remove them from my browser?


I think the future will be a web browser running inside a VM and then the final DOM including all referenced resources go through a filter before being rendered. That way, it's impossible for the website to detect if you display the ads or if you just just load all necessary resources for rendering but mask them out.


The future will be “AI PCs” with a powerful on-device chip that can filter out on-screen ads, but enabled only by subscription.


Hmmm, that'd be an interesting startup idea!

How do ad blockers work exactly?


They're entirely manual. A whole bunch of volunteers write filter rules to block known ads. There's a big github where people can post issues about ads they've found, and volunteers will write filter rules to block them.

See https://github.com/easylist/easylist/issues


This does seem like something ai could automate


Digital immune system


> Llama 3.1 doesn't really help Meta's bottom line

Not directly. But most genai needs text models. And Meta definitely doesn't want OpenAI or Google or someone else controlling the state of the art. Zuck is essentially preventing anyone from getting too big by preventing the only options for "quite good" being behind someone's metered API.




OP here: Thanks for changing the title as well!


Thanks. original link will expire after some time. this will be really helpful when that happens.


Photographic images generated by these systems tend to look like the graffiti portraits you see on fairground attractions.

I've done a lot of photorealistic drawings, and the trick to make something look real, is to get the tones exactly right. Misjudge a tone a bit, and the result looks like a mediocre drawing or a painting. In other words, the gradient of skin tones is off, which is ironic, I guess.

I assume that there is a systemic error in (linearly?) interpolating colors (in the wrong color space?) somewhere, which potentially could be easy to fix and lead to improved photorealism. On the other hand, it might be a horrible problem to fix, because it would require accurate radiosity and raytracing to get right.


I know what you mean, my theory is that it's an emergent property from RLHF tuning penalizing examples of bad/incoherent lighting, which pushes the model towards that kind of vague "lit from everywhere" style which is relatively easy to sell as correct without a proper understanding of light transport. It looks amateurish because that's the same trick an amateur human might use to try to sell photorealism without good lighting fundamentals.


The fact that these models rely heavily on classifier-free guidance has a strong impact on the tones of the image.


Exactly. No humans are involved in most of the process.


It doesn’t help that RGB is very badly tuned for many skin tones.


Do these models really operate in RGB space? I would have thought that using a perceptual color space to generate images meant to be perceived by humans would be low hanging fruit.


As a total cluebie on generative art, I would assume that the neural networks involved use linear weights and ReLU only. If the training data and the output are in RGB pixels, then it would be reasonable to suppose that this introduces some bias.

It may not be enough to use a perceptual color space only. The gradients in skin tones, or any other complex texture, are non-linear due to lighting and curvature.

Is there someone in the room who does know how things work, and whether this hypothesis is wrong or not?


At the very least they ultimately output to RGB. The fleshtone part of the spectrum is quite small.


I have a feeling this is partly due to training data being selected by an "aesthetic score". If weirdly airbrushed skin consistently has a high aesthetic score, that's what the model gets trained on.


Up until recently, to insert yourself into an image generation algorithm, you had to use a technique like Dreambooth, which involves finetuning the model itself with a new mapping of the subject to a rare token.

Meta just released and productionized a new technique that doesn't require finetuning at all.

This enables a whole host of new possibilities... People can now be inserted into scenes or outfits at will without any sort of time consuming model training.


This will be great for people on Instagram.


Given how absurd Instagram/social media already is (entire cottage industries of "private jet" stages in warehouses, etc) it will arguably be a benefit for society when it completely jumps the shark and anyone can generate over the top ridiculousness in seconds.


>Up until recently, to insert yourself into an image generation algorithm, you had to use a technique like Dreambooth

I mean, not really, you could just train a LoRA for example (it doesn't require training with Dreambooth).


Well the point is, both LoRA and Dreambooth require fine tuning the model (i.e. training)


To be clear for folks this is "fine-tuning" ;) DreamBooth from 2022: https://dreambooth.github.io.

Might want to update the HN title to reflect the paper title. It's really just applying multiple techniques that have existed. Paper's title is "Imagine yourself: Tuning-Free Personalized Image." Nice paper though!


They didn't "release" anything, it's a paper.


The future of Netflix isn't going to feature DiCaprio or Zendaya. It will be you, your wife, and your friends on the screen as hobbits adventuring to Mordor.


This hypothetical future gets brought up a lot, but would the novelty of something like that really hold up for more than one or two viewings? There's nothing stopping you from replacing the names in an eBook with the names of people you know personally, but beyond young children I can't see anyone actually being enamored by that.


While this may have some appeal, I think it'll be similar to the fake Time magazine covers that made it look like someone you knew was named Time's person of the year. Good for a chuckle, but not much more.

I think applying the same idea to video games makes more sense, especially given the autonomy you have in a video game, but even then, the appeal wears off pretty quickly.

Games have had features that allow you to put your likeness in the game before, and that feature probably isn't what people we're buying the game for. Tony Hawk's Pro Skater 2 for Dreamcast allowed you to map a photograph of your face to the in game player. Odd example, but I actually just dusted off my old Dreamcast and remembered this feature the other day as the 20 year old game save had my 20 years younger face on the main character. What I recall about that experience was that for about 2 minutes it felt special, and then never thought about it again until feeling confused about why I was in the game before remembering.


Is this a common desire? I have absolutely no interest in watching myself inserted into a film or TV show.


It sounds like it would be a common desire, like when you see the futuristic computer interface in Minority Report. It seems cool on the surface but falls apart the minute you imagine the reality of using it in practice. Your arms would get tired very quickly trying to control an interface in 3D space.

The idea that we'd someday have no more shared experience around media is harrowing and thankfully the public isn't actually calling for it.


No, it’s why the we invented the phrase “main character syndrome” for people who exhibit this behavior.


This is actually the premise of an episode of Black Mirror in Season 6 called “Joan is Awful”. Shows an interesting dark take on the negatives that could potentially arise from this - https://en.m.wikipedia.org/wiki/Joan_Is_Awful


I do not agree, and think that most people don't want this.


Why on earth would I want to go to the movies and watch myself?


nobody wants that


The last example in the paper with the boy and girl definitely have faking a girlfriend vibes.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: