A cartoonish Shrek movie style mischievous donkey with an exaggerated grin looking directly at camera with mischief in his eyes, cinematic lighting, playful atmosphere. A woman in lingerie is caressing the donkey.
Did you say “Slap dat bass”?
You know that scene where someone says something innocent and your brain immediately goes somewhere else? That’s basically what happens with most AI video services.
You type “slap that bass” and the AI goes full panic mode. Or you say something completely normal in your everyday language and the AI looks at you like you just spoke ancient Sumerian.
I got tired of translating my thoughts into “AI acceptable corporate speak” just to generate a video.
The Real Problem
Most services out there focus on prompt engineering. They want you to write like this: “A hyperrealistic depiction of a male subject performing a percussive motion upon a stringed musical instrument, shot in 8K resolution with volumetric lighting…”
Bro. What?
I just wanted to say “dude slapping that bass like a madman” and get what I meant. Not what the AI thinks I might have possibly meant if I was writing a Victorian novel. Let’s not forget that most We-dont-care ai models today are built by Chinese AI companies which is what I started with as base. That it has issues with english and political censorship is a topic for another day.
Teaching AI to be Less Stupid
So I spent time making the AI understand how people actually talk. Slang words. Casual phrases. Context. The stuff that fills in the gaps when you’re describing something to a friend.
When you say “make it epic” - it should know what epic means to you, not pull out a dictionary definition from 1847.
When you throw in some internet speak or a reference, it should get it. Not flag you for speaking like a normal human being in 2025.
Why This Matters
Because you shouldn’t need a PhD in prompt crafting to make a video. You shouldn’t need to carefully construct every sentence like you’re defusing a bomb.
Just tell the AI what you want. Like you’d tell a person. It should understand the gaps, the context, the vibe you’re going for.
That’s what I’ve been working on. Making the AI less of a rule-following robot and more of a “yeah I get what you mean” kind of tool.
No censorship for normal words. No panic attacks over casual language. No treating you like you need to be monitored for having a sense of humor.
If I have to write a technical manual every time I want to generate a video, what’s even the point?
The Goons AI works for you. Not the other way around. I engineer the prompts into the systems so you don’t have to.
Is it 100% foolproof? Not at all and probably never will, but it definitely is much less frustrating. Based solely on my user feedback we have gone from writing novels and describing anatomical movements to one or two words with some simple descriptions. That’s a 100x leap when the AI model just gets it !
With that in mind the Image-to-video is now at version 1.5 and just one more experiment away from another leap in prompt handling for image-to-video.