Yes, when he said this I wondered about the different perspectives you might get if you approach it from the LLM as opposed to the image generator POV. If you come from the text-to-image side of things it's harder to imagine any understanding going on, because it screws up so readily, and not just with things like negation, but with even…
Yes, when he said this I wondered about the different perspectives you might get if you approach it from the LLM as opposed to the image generator POV. If you come from the text-to-image side of things it's harder to imagine any understanding going on, because it screws up so readily, and not just with things like negation, but with even simply concepts. Midjourney gives you six-fingered hands and elbows that bend backwards, from which it's easier to conclude that it understands nothing about anatomy - it's just gluing photos together.
Yes, when he said this I wondered about the different perspectives you might get if you approach it from the LLM as opposed to the image generator POV. If you come from the text-to-image side of things it's harder to imagine any understanding going on, because it screws up so readily, and not just with things like negation, but with even simply concepts. Midjourney gives you six-fingered hands and elbows that bend backwards, from which it's easier to conclude that it understands nothing about anatomy - it's just gluing photos together.