Contents:
[Eric's note, I originally wrote this in January 2023 and apparently forgot to publish it, so here it is in April 2025]
Stable Diffusion Overview
If you have looked at anything techie lately you may have noticed the new buzzword of the times is "A.I.", which supposedly stands for Artificial Intelligence. Cameras are now "AI Powered." TV's have "AI Upscaling." Companies selling regular software are now magically "AI Software Companies." We love buzzwords. (My favorite is "The AI Cloud").
One more of those AI type things is the digital imaging tool called Stable Diffusion. I call it a "digital imaging tool" because all the techno-buzzwords in the descriptions across the Internet make it confusing. Because who wouldn't want to use a "deep learning, text-to-image latent diffusion model with deep generative neural network artwork"?
Fancy words.
Enough with the serious sarcasm?
What is Stable Diffusion?
An open-source tool that allows you to create or modify images using text descriptors.
Ok but what does Stable Diffusion do?
You can create an image with Stable Diffusion by giving it a text prompt to describe the image you want, hit go, and after some computation time an image gets generated for your viewing pleasure. The streets call this "txt2img" as in "Text to Image".
The image modification capabilities are numerous, and like the rest of Stable Diffusion being actively developed, but the main outcomes currently are:
Image modification, where you provide an image and give Stable Diffusion a text prompt with how to modify or warp the image. This is generally referred to as "img2img" as in "Image to Image".
Inpainting is the next feature set, which allows you to take an existing image, mark an area you want modified or replaced, then type out a text prompt for what you want the specific area to look like, and hit go.
Outpainting is the third main modification feature, it allows you to expand a given image based on your text prompt. Stable Diffusion will create the missing area out of thin air, thus expanding your image.
Image resizing is another feature, which is different than outpainting because it does not create new content, it will just resize your image to the specified pixel size. Resizing can take smaller images and upscale them to newer resolutions or shrink existing images. When upscaling to larger resolutions the image will be upscaled, meaning more details are added to the image (since more pixels are available) but the same point of view and scale are left in place.
Running Stable Diffusion
There are a multitude of online hosted versions of Stable Diffusion around, with new sites coming online and disappearing daily. You can find them by just searching "stable diffusion" on your favorite search engine. Though a lot of times they are too busy or overloaded to even operate. Others may also charge money or require you to create an account.
As Stable Diffusion is actually an open-source project, you can run it yourself on your own gear by cloning it from
https://github.com/Stability-AI/stablediffusion. This is the official code repository and instructions are included and straight forward. My advice is to make sure to follow the directions exactly and in the correct order. Stable Diffusion is a Python program so it will run on pretty much any platform. Your own computer running Windows or Linux, or even VM instances on GCP, AWS, Azure etc.
There is a highly suggested requirement to have a GPU from Nvidia with 8GB or more of onboard Video RAM. Work arounds exist for smaller cards, or AMD systems, as well as CPU only. The first installation I did was without a GPU and it took roughly 4 minutes to generate a single image. Correcting the configuration improved the output to only taking 35 seconds, but my GPU is several years old. Internet forums have people boasting they can generate an image in about a second with their fancy latest-generation $1200 GPU cards.
There are a ton of Stable Diffusion project clones out there, which can be both good and bad. One of the more popular clones is Stable Diffusion WebUI by AUTOMATIC1111 -
https://github.com/AUTOMATIC1111/stable-diffusion-webui which is just a straight clone of Stable Diffusion with an easier installer and a Web based frontend. For the images in this article I leveraged the aforementioned WebUI out of ease-of-use. The original Stable Diffusion is command line only as of the time of writing.
Show me the money
In 2021 I had the absolute honor and pleasure to do an interview with John Romero (of Doom, Quake, Commander Keen et al fame). One of the topics we ended up on was toolsets and the future of video game production. He said (paraphrasing) that soon he could see creators not having to actually create all their content pixel by pixel. Instead, he envisioned that you would be able to say "put a portrait of a famous looking person here" or "create a spooky house on a hillside." Granted a lot of toolsets exist with precreated assets, but he was talking about on-the-fly content creation without having an artist somewhere having to spend the time making it. It blew my mind as something that could absolutely revolutionize the indie or smaller game studios, and would definitely bleed over to the movie and TV industry, YouTube, and other content generators.
Well, without further ado, here is "spooky house on a hillside" leveraging Stable Diffusion txt2img and default settings:
That is actually pretty neat. We got a hillside, the house is dark so it is sort of spooky, and not too sure about the orange door thing but that kind of makes it spooky too. If I just flashed this image to you, unless you are some sort of professional, I doubt you would have known it was made in Stable Diffusion. And unless you look closely at a higher resolution it almost looks like a photograph. Pretty neat! This is even the first result I got with default settings and the stated prompt!
Now I will confess that "prompt engineer" is not something I will have on my resume anytime soon but with my few hours of playing with Stable Diffusion (and reading several blog posts) I've learned you can get a different image by using a more detailed prompt such as "spooky house on a hillside, fog, postapocalyptic, overgrown with plant life and ivy, artgerm, yoshitaka amano, gothic interior, 8k, octane render, unreal engine":
(Full disclosure, for this second picture I leveraged the prompt template from
https://github.com/Dalabad/stable-diffusion-prompt-templates).
Now that is badass. This one does not look as photo-realistic as the first, but that can be expected since we had two rendering engines mentioned in the prompt (Octane and Unreal). The image could use a little cleanup with that sky artifact above the trees, but otherwise this looks like something you could use for a book, movie poster, or story board. Frankly even just enhance and enlarge it for a print or desktop background.
You can even generate multiple images at once leveraging the same prompt:
This way you could play around with a prompt and when you get close to what results you want just tell it to batch print out 10 or 500 images. Then pick the ones you like, do some photoshop (or inpainting!) and you have your picture.
Image Modification, or Img2Img - Outpainting
Outpainting is such an interesting idea. Lets say you have an image, like our spooky house above, but you want it to be bigger. And by bigger I mean be a wider shot, so include more of the forest which surrounds it. Now, if you are out in the world with your camera you just take a few pictures by panning left and right, then stitch them together to make a panorama. With Stable Diffusion you can do what is called Outpainting, I assume it has that name because you are painting (creating) more outwards from the original. If we pump our spooky house in to img2img and select outpainting then tell it to expand the image left and right by 128pixels and keep the exact same prompt we get:
Pretty neat for about 90 seconds of computer time.
If we run it a second time on the new image we get even more of the surrounding landscape:
Not bad at all! There is still some of those flying artifacts above the trees which I should have cleaned up before performing the Outpainting, but otherwise it does look like our spooky house is in the haunted swampy woods. The skinny trees on the right side look repetitive, as does the fence or rock feature on the bottom portion of the picture, but that too could be modified. Closer inspection also makes the first big tree on the right look a little odd with the vertical line where the fog starts, this is where we started the Outpainting on the first run. I did use default settings so modifying the blur and denoising settings might give better results, but again nothing that seems too far fetched to modify to improve our spookiness.
Image Modification - img2img - Modification
If I give Stable Diffusion a portrait myself and tell it to convert it to cyberpunk art by using the prompt "nvinkpunk, man smiling", we get results like this:
And by changing the number of iterations we get this:
This could be really fun after playing with the prompt or input pictures. Cyberpunk family Christmas cards for everyone next year? The "nvinkpunk" part from the prompt is a keyword to leverage a model that was trained on just cyberpunk art, which I found here:
https://huggingface.co/Envvi/Inkpunk-Diffusion. The ability to train your own models, or leverage other specifically trained models, opens up a crazy new world even deeper in the Stable Diffusion rabbit hole (these models also seem to provide more accurate results when you know what you want).
Image Modification - img2img - Inpainting
Taking that same portrait of me and marking the background to be replaced with the Inpainting technique allows me to take the picture from my hallway and make it appear as if I am in the mountains with the prompt "man posing in front of mountains at sunset, photorealistic, light effect, ultra clear, 8k":
Granted the result is rather crude, but that is definitely because of my lack of skillsets and stable hand. If I had instead leveraged a real photo tool (Photoshop, Gimp) to create the mask instead of the Web GUI and modified a few parameters the result would be much improved.
Stable Diffusion - so what?
We've seen a few examples of what a beginner can do with Stable Diffusion leveraging off the shelf hardware. To say this technology is only in its infancy and developing rapidly would be an understatement, it has only been in the public domain for 5 months! It will be exciting to see where Stable Diffusion is in a year.
People are developing scripts and services for specific technique results, training models for all sorts of different artworks and styles (or even people and pets!), automation tools to deploy Stable Diffusion to multiple cloud providers, tools to run prompts on video, and more. There are results and ideas people are working on that I personally do not have the words to describe accurately, yet.
The most exciting opportunity for Stable Diffusion is the possibility for it to enable someone in the creation of images that may have an idea, but not the skillset or means, to create the result they want.
Think of someone who wants to write a children's book but has no drawing capability or funds to hire an artist. A person who starts a company and needs to keep costs down so creating their own stock photos versus paying Getty Images. A college film student who has the vision but struggles with story boards... and on and on.
Stable Diffusion being released as an open-source project with a very permissible license is only going to
inspire other people to build upon this team's work to create bigger and better software. The future impact of this industry is going to be absolutely astounding.
Originally I ignored the news about Stable Diffusion because it sounded like a Photoshop filter or just another AI fad. Then something happened in December 2022, what it was I do not know, but the floodgates of negative articles, comments, and posts absolutely exploded. Something is afoot when all the news of a new technology detail how dangerous it is, or that it is going to put people out of work, or make life for others that much harder. Even posts on Ycombinator were mostly negative: explaining how it is "useless", "doesn't make any sense", and "why not just use Photoshop".
The answer to all this [fake] fear is:
"Any sufficiently advanced technology is indistinguishable from magic." - Arthur C Clarke
...and people are afraid of magic.
To be fair, there are a ton of unknowns and issues with Stable Diffusion. Faces and hands it creates can be very creepy. Text and logos are downright horrible. It can accurately create phony pictures of famous people. Traits or attributes of specific people can come out with downright offensive results. There are already communities of people using it to create illicit pornography or put public figures in compromising situations. Stable Diffusion could be paired with other tools like Deepfakes where someone could potentially do great harm to others.
But, lets also be realistic, in July of 2022 (the month before Stable Diffusion was released) it was already possible for someone to create any sort of media that could potentially do great harm. We have not outlawed the media outlets, celebrity interviews, or Facebook so why should yet another image tool be targeting with such vitriol? With the release of Stable Diffusion, the robot AI overlords are not magically going to end the planet.
The opportunity for positive outcomes are great. I, for one, am excited to see where Stable Diffusion goes in the future.
References
*disclaimer*
This document is my own and does not represent anything from any other entity. I will not be held liable for anything bad that comes of it. All images displayed on this page were created or modified with AI by the author. AI tools used: Stable Diffusion 1.5 (model downloaded 12/28/2022).
Written by Eric Wamsley
Posted: Jan 4, 2023 2:01pm