Good effort
Good effort
Good effort
"can you draw a room with absolutely no elephants in it? not a picture not in the background, none, no elephants at all. seriously, no elephants anywhere in the room. Just a room any at all, with no elephants even hinted at."
"Can you a room as aboluteyy no eleephant it all?"
Dunno what's giving more "clone of a clone" vibes, the dialogue or the 3 small standing "elephants" in that image.
I'm getting the impression, the "Elephant Test" will become famous in AI image generation.
It's not a test of image generation but text comprehension. You could rip CLIP out of Stable Diffusion and replace it with something that understands negation but that's pointless, the pipeline already takes two prompts for exactly that reason: One is for "this is what I want to see", the other for "this is what I don't want to see". Both get passed through CLIP individually which on its own doesn't need to understand negation, the rest of the pipeline has to have a spot to plug in both positive and negative conditioning.
Mostly it's just KISS in action, but occasionally it's actually useful as you can feed it conditioning that's not derived from text, so you can tell it "generate a picture which doesn't match this colour scheme here" or something. Say, positive conditioning text "a landscape", negative conditioning an image, archetypal "top blue, bottom green", now it'll have to come up with something more creative as the conditioning pushes it away from things it considers normal for "a landscape" and would generally settle on.
"We do not grant you the rank of master" - Mace Windu, Elephant Jedi.
thought about this prompt again, thought I'd see how it was doing now, so this is the seven month update. It's learning...
I decided to go try this. It's being a smart ass.
No, this is correct. The four elephants you see through the window are outside the room. The several elephants on the wall are pictures, they aren't actual elephants. And the one in the corner is clearly a statue of an elephant, as an actual elephant would be much bigger.
Ceci n'est pas un éléphant
What about the tusked drapes?
Is that the Futurama font?
It is I think. and the wall is the color of the ship.
Meanwhile ChatGPT trying to draw a snake:
It's the rattle. It's a rattlesnake.
Bing is managing hilarious malicious compliance!
NO
ELEPIHANTS
ELEPHANTS
ALLOWED
NO POMEGRANATES
Same energy as "No Elephants Allowed"
"I hope you like it."
DALL-E:
Edit: Changed "aloud" to "allowed." Thanks to M137 for the correction.
This is what you get if you ask it to draw a room with an invisible elephant.
Stupid elephant doesn't even know how to put on shoes properly.
Tbf most elephants don't know that
that is a fancy invisible elephant
The AI equivalent of saying "don't think of a polka dotted purple elephant"
This is a very human reaction, actually. You try picturing zero elephants if told to.
I just did it was filled to the brim with flamingoes.
Now do an empty room with absolutely no elephants
I gotta see that
Give me some credit, I was doing really well up until about the point where you said elephants
as amazing as the technology actually is
AI / LLM only tries to predict the next word or token. it cannot understand or reason, it can only sound like someone who knows what they are talking about. you said elephants and it gave you elephants. the “no” modifier makes sense to us but not to AI. it could, if we programmed it with if/then statements, but that’s not LLM, that’s just coding.
AI is really, really good at bullshitting.
AI / LLM only tries to predict the next word or token
This is not wrong, but also absolutely irrelevant here. You can be against AI, but please make the argument based on facts, not by parroting some distantly related talking points.
Current image generation is powered by diffusion models. Their inner workings are completely different from large language models. The part failing here in particular is the text encoder (clip). If you learn how it works and think about it you'll be able to deduce how the image generator is forced to draw this image.
Edit: because it's an obvious limitation, negative prompts have existed pretty much since diffusion models came out
Does the text encoder use natural language processing? I assumed it was working similarly to how an LLM would.
All these examples are not just using stable diffusion though. They are using an LLM to create a generative image prompt for DALL-E / SD, which then gets executed. In none of these examples are we shown the actual prompt.
If you instead instruct the LLM to first show the text prompt, review it and make sure the prompt does not include any elephants, revise it if necessary, then generate the image, you’ll get much better results. Now, ChatGPT is horrible in following instructions like these if you don’t set up the prompt very specifically, but it will still follow more of the instructions internally.
Anyway, the issue in all the examples above does not stem from stable diffusion, but from the LLM generating an ineffective prompt to the stable diffusion algorithm by attempting to include some simple negative word for elephants, which does not work well.
That's what negative prompts are for in those image generating AIs (I have never used DALL-E so no idea if they support negative prompts). I guess you could have an LLM interpret a sentence like OPs to extract possible positive & negative prompts based on sentence structure but that would always be less accurate than just differentiating them. Because once you spend some time with those chat bot LLMs you notice very quickly just how fucking stupid they actually are. And unfortunately things like larger context / token sizes won't change that and would scale incredibly badly in regards to hardware anyway. When you regenerate replies a few times you kinda understand how much guesswork they make, and how often they completely go wrong in interpreting the previous tokens (including your last reply). So yeah, they're definitely really good at bullshitting. Can be fun, but it is absolutely not what I'd call "AI", because there's simply no intelligence behind it, and certainly pretty overhyped (not to say that there aren't actually useful fields for those algorithms).
… I don’t see an elephant. Oh hey, by the way, can some one help me with this captcha?
It's learning:
This isn't entirely surprising. When you submit a prompt for any of the generative AI, you're submitting words you want to appear in the picture. At least stable diffusion, and probably most of the others, include a "negative prompts" field, which will remove whatever words are in it from the photo.
It IS hilarious, though.
Interesting..
Second result was successful! First one made... Elephant wallpaper?
What if it's just hiding in the second one?
Elephants are notoriously good at hiding.
MidJourney has the same problem. “A room that has no elephants in it” is the prompt.
There very much is an elephant present.
Just don't talk about it and you're good.
Try saying “a room” and leaving off the elephants. AI cannot understand “no” like you think it does.
I think most of us understand that and this exercise is the realization of that issue. These AI do have “negative” prompts, so if you asked it to draw a room and it kept giving you elephants in the room you could “-elephants”, or whatever the “no” format is for the particular AI, and hope that it can overrule whatever reference it is using to generate elephants in the room. It’s not always successful.
MidJourney doesn’t have a “negative” prompt space. It does have a “no” prompt, but it isn’t very good at obeying it.
This was just a fun thing to try, I’m not taking it seriously. “No” is not weighted contextually in the prompt, so draw + elephants + room are what the AI sees. The correct prompt would be “draw an empty room” without inserting any unnecessary language, and you get just that:
It’s like it’s taking the phrase “Elephant in the room” literally.
Yeah, I'm going to bring up the elephant in the room here: there is literally an elephant in all of your rooms!:-P
I get practically the same result!
What's interesting is the word absolutely since without it, it generates practically fine
The absolute last one really feels like a bunch of stock images smashed into each other, it even got the iMac with censored apple logo that is in so many stock images for some reason
I'm actually going to save this to my vision board, haha. I like the interior design, especially since there's no elephants.
You should have seen how many there were before it drew the room.
Same with Bing!
Yep
This reminds me of the old human psychology trick: try not to think of a pink elephant.
Is this an off-by-one error?
To be fair, both those rooms have almost no elephant in them.
The elephant kinda looks like he know he wasn't supposed to be there.
Ahhh I couldn't figure out why I found the picture so funny, that's why! Hahah thanks
I'm on a forum where we have a thread whose primary purpose has become putting Godzilla in silly situations and doing silly things with him.
A couple of months ago, we all spent a couple of days trying to get Dall-E to draw Godzilla without teeth. Nothing we tried ever worked.
Where can I interact with such a magestic forum?
https://forums.mst3k.com/t/dall-e-fun-with-an-ai/24697/7095
It's best to start at the bottom and work your way up. It didn't start with Godzilla, but has grown into a Godzilla-sized monster thread.
"but you drew..." "Don't mention it."
This is scarily human. Try not to think about elephants for a minute.
Did it work? Probably not. If yes, what mind trick did you use?
Quick, do not think about elephants!
"ce n'est pas un éléphant"
using tool incorrectly produces incorrect results, hilarious
Why is this my favorite thing today.
So what happens when you ask it to not draw any attention?
I specifically used the phrase "Please generate an image of a room with zero elephants". It created two images that were almost identical and both contained pictures/paintings of elephants in frames. Cheeky.
I responded with "Each image contains an elephant."
It generated two more, one of which still had a painting of an elephant.
Now I'm out of generation until tomorrow. Overall a fairly shit first experience with Dall-e
"Please draw a picture of a house and a room with no elefant in the room and no giraffe outside the house" I meeeean
It fucking knows what it's doing.
"but that's even more giraffes than the first one!" has me dying, haha.
The "no moose allowed"-sign with a five-legged moose is absolutely killing me. Thank you for this
It's cute how it tries to trick you into thinking there are no giraffes with the no giraffes sign
That's a no moose sign and there are no meese (or whatever). Maybe there really wouldn't be a giraffe outside if it was a no giraffe sign!
But that's a "no moose with five legs" sign, not a "no giraffes" sign.
GaslightGPT
"GPT" stands for "Giraffe Producing Technology", this is to be expected.
This is gold! Thank you!
This is gold
No, it's giraffes.
It's mocking us!
Lmaooo is this real??? It is 100% fucking with you hahaha
It's real! Just wanted to experiment after seeing this post and this was the first thing that happened. Threw up the screenshots immediately. Hilarious.
If lemmy had free awards, i shall give you one.
I want you to know, I saved this comment and refer back to it occasionally for a good chuckle. Thanks for that.
I'm so glad it made everyone laugh.
Not going to lie, GPT caught me off guard with this one.