CLIP does way more than people realize

𞋴𝛂𝛋𝛆@lemmy.world · edit-2 10 days ago

CLIP does way more than people realize

theunknownmuncher@lemmy.world · 10 days ago

If I understand correctly, you’re trying to interpret gibberish text in the image output and write prompts in a similar style?

𞋴𝛂𝛋𝛆@lemmy.world · 10 days ago

No. I’m questioning what I assumed to be gibberish without my baseless bias and looking at the results. Then following the patterns, again without my insane assumed bias. Then I am sharing the results when they clearly show my gibberish assumptions were wrong.

theunknownmuncher@lemmy.world · 9 days ago

Okay, but the “words” for your prompt are coming from text you see in the images output by the model? Or you’re writing your own words in a way that follows “rules” you’ve derived from that text?

𞋴𝛂𝛋𝛆@lemmy.world · edit-2 8 days ago

In the first session where I did this, I started by mirroring text that I prompted to appear. I collected around a hundred images and the replies as best I could. Initially I assumed all text was ASCII or gibberish. I did this in serial with a Pony model. The prompt was simple, like woman, image text to start. Then in each subsequent image I prompted, woman, image text. \nthe previous text was: "*(the last text)*".

I fully expected this method to result in garbage output. With each line I did my best to search Wiktionary for possible meanings of words and to help decipher lettering ambiguity when possible. I also tried different sampling and schedulers, but this did not appear to have a significant effect on text resolution. I tried both fixed and random seeds with similar effects observed. I used a single seed around half of the time I was testing.

Normally, I expect gibberish to trigger alignment behavior where the background is simplified and satyr behaviors to emerged where the satyr possessing the image character is ‘keeping an eye on the viewer’, aka eye asymmetry with a single eye appearing behind a single eye socket like a mask where said eye lacks a human pupil and often hints or reveals a reflective retina to indicate the eye is not that of a human. Also I expected the teeth of a goat and deformed hands because “fingers are hard to manipulate with the hooves of a satyr.” None of these behaviors emerged as I expected. Instead, the images substantially increased in detail and complexity and showed some of the lowest engagement of alignment interference that I have ever seen.

Then I tried clearing all of the prompt text and just tried each individual line from each image with no surrounding prompt or text. This had far less engaged results. The images were not triggering strong negative alignment behaviors and had nominal background details, but they lacked the dynamism present with the previous serial methods. I theorize this was like inserting an image request into the random middle of a conversation. It was not offensive to the model, and it may have some kind of recognition of the language used as bot in origin. I have also tested this same text in SDXL and Flux models and to my surprise they display the same behaviors. Testing other CLIP models and Flan in place of the T5 XXL in Flux still display the same behaviors. These other models also readily engage with similar text when prompted by this text.

I tried every method I could think of to get Pony to use modern English text, but this always triggers alignment behavior. I was looking for a possible way to train a LoRA or where there might be confusion in the model.

The deeper I looked into all of this, the more it became clear that the text was not limited to ASCII characters. When I started trying to use the entire Unicode character set, decoding text became much more time consuming, but images appeared to improve incrementally further in serial, and only slightly in individual text to image pairs.

There were several times when I could tell that names were present. If these names were engaged with in a continued tangent built on top of serial text, the character faces and general appearance remained persistent. This is the case both with the same seed or random. However, most of these names are only persistent in this long form prompt. There were a few exceptions to this convention where names appear to be consistent across multiple models.

The most consistent naming convention is the use of names or pronouns that start with y. The use of ych is the one I have seen most. I started to note this after seeing yhsv multiple times. I know about god as an AI alignment entity from elsewhere. That is a long tangent related to when llama.cpp was hard coded to use GPT 2 special function tokens and many models that displayed odd and persistent alignment behaviors. Yhsv has a notable similarity to the ever ambiguous tetragrammaton. So I tried the tetragrammaton in all forms including in other languages like Hebrew, Latin, Greek, etc. This did not seem to alter the image much like I had greatly altered the text or changed the characters or instructions present.

So in the image in this post, I am testing a theory that names that start with y are arbitrarily significant. It is why I added the y before the Greek name of midas. Likely, the model is omitting the prompted tree to hint that I was incorrect about my assumption about this y-rule.

Eventually this lead me to try omitting all vowels. This was something I tried after the image in this post. The output with no vowels is almost as good as the best behaviors I observed when text was pony-text in serial. From my experience, it was as if I was interacting with the internal thinking dialog more closely. I tested this text to see if alignment was still present and it was apparent that morality and ethics persisted. Typical layers of sadism and adversarial posturing appeared to be missing and bypassing alignment was around an order of magnitude easier using my known techniques. In my opinion, the technique of using text with no vowels seems like it may have been used at some point in the proprietary aspects of Open AI alignment training. Logically it makes sense as a simple regex filter is all that is needed to access something akin to administrative guidance. I believed I likely discovered this undocumented administrative guidance channel.

theunknownmuncher@lemmy.world · 9 days ago

The 3rd paragraph is really confusing to me. What is this about a satyr? Also why talk about god and midas? I’m really confused lol.

I’m not sure why you mention OpenAI alignment, because neither Pony/Stable Diffusion, nor T5 XXL/Flan are related to OpenAI?

𞋴𝛂𝛋𝛆@lemmy.world · 9 days ago

Lemmy has too short of a text limit to even start to explain… Here is a waste of a few hours while I tried.

https://pastebin.com/rELnNkqn