✨ Defaults are incredibly important for image models.
Most users won't spent hours tweaking prompts, they'll type the first thing that comes to their minds and run with the result.
Fortunately, it's pretty easy to optimize for that, and generate better images by default. 🤯
Image models like Stable Diffusion or DALL-E are so amazing because they're general purpose.
They can generate drawings, photographs, icons and logos, 3D environments, and much more! 🚀
However, good results require work: paragraphs of prompts and constant iteration.
If you're building a consumer-facing app using these models, most of your users won't do that.
They'll write "a picture of a cat" expecting the amazing results they've been seeing around, and churn when the result inevitably disappoints. 🥲
How do we fix this?
One solution is to make models less general-purpose (aka fine-tuning). 🔩
Similar to how you can teach AI a new concept (like your face, to make cool avatars), you can also teach it new styles.
This allows you to select for the most aesthetic results on short prompts. 🤩
I faced this issue when I made my first Stable Diffusion Twitter bot. Most people would write short prompts, and results were unimpressive.
So I rebuilt it to use a fine-tuned model (try it out below!) as an experiment.
Judge for yourself, but I'd say results are much better 😁
twitter.com/m1guelpf/status/1592359396629106689?s=20&t=Vqi7_iljEsN3ppfIdu5wDQ
For the tweet above, I used a model fine-tuned on Midjourney v4 images.
@midjourney is another image model, and their newest version is great at short prompts.
However, they don't have an API or are as cheap as SD. Fine-tuning on selected images gets us the same quality! ✨
And, if you have a specific style in mind, it takes 8-30 images on that style and around an hour to train a new model! ⚡️
I'm thinking of making a guide on how to do that, so let me know if you'd be interested in reading that! (by RTing the first tweet of this thread hehe 😁).
We're living in truly magical times ✨
We can create images in seconds for hardly any money at all (my twitter bot cost $10 for +1.5k images), even do it locally if we want.
And thanks to fine-tuning styles, we can target them to exactly what we want, and amaze first-timers! 🤯
I'm very new to this whole AI thing, but definitely plan to keep learning (in public 👀) and doing little experiments.
Next up, I wanna play with the newly-released DreamArtist paper, which claims to to fine-tune on styles (and concepts) with just one image.
Thread soon? 👀