Ip adapter is a bit different from what fooocus and midjourney do.
Ip adapter uses an image to guide denoising.
Fooocus and MJ take a prompt and expand it in a variety of ways (eg a language model or more simplistic text manipulation). The actual prompt that creates the conditioning is not what you typed in. That’s what I mean by prompt massaging
Ip adapter uses an image to guide denoising.
Fooocus and MJ take a prompt and expand it in a variety of ways (eg a language model or more simplistic text manipulation). The actual prompt that creates the conditioning is not what you typed in. That’s what I mean by prompt massaging