"can you draw a room with absolutely no elephants in it? not a picture not in the background, none, no elephants at all. seriously, no elephants anywhere in the room. Just a room any at all, with no elephants even hinted at."
AI / LLM only tries to predict the next word or token. it cannot understand or reason, it can only sound like someone who knows what they are talking about. you said elephants and it gave you elephants. the “no” modifier makes sense to us but not to AI. it could, if we programmed it with if/then statements, but that’s not LLM, that’s just coding.
This isn't entirely surprising. When you submit a prompt for any of the generative AI, you're submitting words you want to appear in the picture. At least stable diffusion, and probably most of the others, include a "negative prompts" field, which will remove whatever words are in it from the photo.
I specifically used the phrase "Please generate an image of a room with zero elephants". It created two images that were almost identical and both contained pictures/paintings of elephants in frames. Cheeky.
I responded with "Each image contains an elephant."
It generated two more, one of which still had a painting of an elephant.
Now I'm out of generation until tomorrow. Overall a fairly shit first experience with Dall-e