Stable Diffusion Models – How to Install and Use a Model

Models in the Stable Diffusion framework are essentially sets of pre-trained weights, also known as checkpoint files. These models are designed to generate images based on the data used to train them.

In other words, the type of images a model can generate depends entirely on the data it has been trained on. For example, if a model has not been trained on cat images, it cannot generate a cat image. Conversely, if a model has only been trained on cat images, it can only generate images of cats.

In this article, we will provide a detailed introduction to Stable Diffusion models, including some of the most common ones such as v1.4, v1.5, F222, Anything V3, and Open Journey v4. We will also discuss how to install and use these models, as well as how to merge them to create new models with enhanced capabilities.

Fine-Tuned Models

What Is Fine-Tuning?

Regarding Stable Diffusion models, fine-tuning is an essential technique that can significantly improve their performance. Fine-tuning involves taking a pre-trained model trained on a broad dataset and further training it on a smaller, more specific dataset.

This allows the model to learn specific features and patterns from the new dataset, improving its accuracy and relevance for specific tasks. By fine-tuning a Stable Diffusion model, you can bias it towards generating images or text that are more similar to the new dataset while maintaining the versatility of the original model.

This is because the model retains its ability to generate high-quality outputs across various domains while improving its ability to generate outputs more relevant to the task at hand.

Fine-Tuning models can improve Stable Diffusion performance

Why Do People Make Them?

While Stable Diffusion models are highly versatile and effective, they may not always be able to generate images of specific sub-genres of anime. For instance, while the model can generate anime-style images using the keyword “anime” in the prompt, generating images of a specific sub-genre may be challenging.

In such cases, instead of attempting to modify the prompt, fine-tuning the model with images of the desired sub-genre is a more effective approach. By fine-tuning the model with specific sub-genre images, the model can learn to generate more accurate and relevant images for that particular sub-genre. This process of fine-tuning can significantly enhance the model’s performance on specific tasks and can be applied to other domains beyond anime as well.

How Are They Made?

Fine-tuning is a crucial aspect of Stable Diffusion models, and various methods are available for it. Two popular methods are Additional training and Dreambooth.

Additional training involves training a base model like Stable Diffusion v1.4 or v1.5 with an additional dataset specific to your interests. For example, suppose you want to generate images of vintage cars. In that case, you can train the Stable Diffusion v1.5 model with an additional dataset of vintage cars to bias the aesthetic of the generated images towards that sub-genre.

Dreambooth is another fine-tuning method initially developed by Google. It allows you to inject custom subjects into text-to-image models with just a few custom images. For instance, you can take pictures of yourself and use Dreambooth to put yourself into the model. However, to condition the model, a special keyword is required. This method is beneficial for generating personalized content.

Textual inversion is another less popular fine-tuning technique that serves the same purpose as Dreambooth. The goal is to inject a custom subject into the model with only a few examples. A new keyword is created specifically for the new object, and only the text embedding network is fine-tuned while keeping the rest of the model unchanged. In simpler terms, it’s like describing a new concept using existing words.

Models

Stable Diffusion models can be divided into two groups: v1 and v2. While v1 models have been around for some time and are still widely used, v2 models are the latest versions and significantly improve performance and features.

Within each group, thousands of fine-tuned Stable Diffusion models are available, and new ones are being created daily. These models have been trained on various datasets and are designed for specific tasks, such as text and image generation.

For general purposes, there are several popular models that you can use, including Stable Diffusion v1.4, v1.5, Anything V3, and Open Journey. Each model has its own strengths and weaknesses, so it’s essential to compare its features and performance to find the best one for your needs.

Stable Diffusion v1.4

The Stable Diffusion v1.4 model was developed and publicly released by Stability AI in August 2022. This model is considered to be the first of its kind to be available for public use. It is designed to be a versatile and general-purpose model, making it suitable for a wide range of tasks related to text and image generation.

The v1.4 model has gained popularity due to its ability to generate high-quality output with minimal customization. However, for users with specific requirements or preferences, it may be necessary to fine-tune the model or explore other Stable Diffusion models.

Stable Diffusion v1.5

Stable Diffusion v1.5 was introduced by Runway ML, a partner of Stability AI, in October 2022. The model is an updated version of v1.2 and has undergone further training to improve its performance. Unfortunately, the model page does not specify the exact improvements made to the model.

While the results produced by v1.5 are slightly different from those of v1.4, it is unclear whether they are better. However, like v1.4, v1.5 can be treated as a general-purpose model and used for various tasks.

The results produced by v1.5 are slightly different from those of v1.4

F222

F222 is a Stable Diffusion model initially trained for generating nude images. However, users have found that it can also be used to generate beautiful female portraits with accurate body part relations.

Despite its initial purpose, this model is particularly good at generating aesthetically pleasing clothing. Interestingly, when prompted with wardrobe-related terms such as “dress” and “jeans,” F222 is still able to produce visually appealing images of female subjects in different outfits.

Anything V3

Anything V3 is a Stable Diffusion model that is designed specifically for generating high-quality anime-style images. The model is trained using a large dataset of anime-style images, and you can use Danbooru tags, such as “1girl” and “white hair,” to generate images that match specific criteria.

One of the most significant benefits of Anything V3 is that it allows you to cast celebrities in anime style, which can then be blended seamlessly with illustrative elements. This feature makes it an excellent tool for creating unique and creative visual content.

However, one potential drawback of Anything V3 is that it may produce female characters with disproportional body shapes. If you prefer a more realistic body shape, you can combine Anything V3 with F222, another Stable Diffusion model designed to generate high-quality images.

By blending the two models, you can create anime-style images with a more realistic body shape that matches your preferences.

Open Journey

The Open Journey model has undergone a fine-tuning process using images generated by the Mid Journey v4 model, resulting in a unique aesthetic. This fine-tuning has improved the model’s general-purpose capabilities, making it suitable for various tasks.

The images generated by Mid Journey v4 have a distinct style, which has been incorporated into the Open Journey model to create a new visual language.

This new visual language combines the strengths of both models and results in visually stunning and diverse outputs. Therefore, the Open Journey model is an excellent choice for users looking for a general-purpose model with a unique visual style influenced by the Mid Journey v4 model.

Other models

DreamShaper

The Dreamshaper model is designed specifically for creating portrait illustrations that fall in between the photorealistic and computer-generated styles. It has been fine-tuned to ensure that it is easy to use and produces high-quality results that will appeal to those who appreciate this style of artwork.

ChilloutMix

ChilloutMix is a Stable Diffusion model designed to generate photo-quality images of Asian females. It is similar to F222, another Stable Diffusion model, but specifically focuses on generating Asian females. To generate these images, you can use the Korean embedding “ulzzang-6500-v1,” which specializes in creating girls that resemble K-pop stars.

However, it is essential to note that, like F222, ChilloutMix may sometimes generate nude images. To avoid this, you can use wardrobe-related terms like “dress” and “jeans” in the prompt when generating images and include “nude” as a negative prompt.

This will help suppress the generation of nude images and ensure that the generated images align with your desired output.

Waifu-diffusion

Waifu Diffusion is a Japanese anime style.

Robo Diffusion

A fascinating robot-style model called Robot Diffusion has the ability to transform any subject into a robot. To be more precise, Robot Diffusion can make any topic or concept resemble a robot in some way.

Mo-di-diffusion

If you are interested in creating designs similar to those seen in Pixar movies, this model suits you.

Inkpunk Diffusion

One artist with a unique illustration style is Inkpunk Diffusion, who received training from Dreambooth.

V2 Models

Stability AI has recently launched a new series of models called version 2. The company has released two versions, namely 2.0 and 2.1.

The v2 models have undergone significant changes, such as the availability of a higher resolution version, 768×768 pixels, in addition to the previously available 512×512 pixels.

Moreover, explicit content has been removed from the training data, so it is no longer possible to generate pornographic materials.

Despite assuming everyone has switched to the v2 models, the Stable Diffusion community has found that the 2.0 model produces worse images. Users have difficulty generating content with powerful keywords such as celebrity and artist names.

However, the 2.1 model has partially addressed these issues, and its images look better. It is also easier to generate artistic style with this model.

Nevertheless, most users have not completely transitioned to the 2.1 model yet. While some people occasionally use it, they mostly rely on v1 models. If you decide to try out the v2 models, make sure to check out some tips to avoid common frustrations.

Maybe you will intested: AI Lyrics Generator – Top 10 tool song writer generator 2023

How to Install and Use a Model

Note that these instructions are specific to v1 models only. Please refer to the relevant instructions if you are using v2.0 or v2.1.

To install a model in AUTOMATIC1111 GUI, follow these steps:

Download the checkpoint (.ckpt) file for the model you want to use.
Place the checkpoint file in the following folder: stable-diffusion-webui/models/Stable-diffusion/
Click the reload button next to the checkpoint dropbox.
The checkpoint file should now be available for selection. Choose the new checkpoint file to use the model.
Alternatively, you can access the model panel by clicking the “iPod” button under Generate. From there, select the Checkpoints tab and choose a model.

If you are new to AUTOMATIC1111 GUI, you may find that some models are already preloaded in the Colab notebook included in the Quick Start Guide.

How to Merge Two Models

To combine two models through the AUTOMATIC1111 GUI, follow these simple steps:

Open the AUTOMATIC1111 GUI and navigate to the Checkpoint Merger tab.
Select the two models you want to merge in the Primary model (A) and Secondary model (B) fields.
Use the multiplier (M) to adjust the weight of each model. For instance, setting it to 0.5 would give each model equal importance in the merged result.
Click on the Run button to start the merging process.
Once completed, the new merged model will be ready for use.

Other Model Types

Four types of files are commonly referred to as “models.” It’s essential to understand each class to understand what people are referring to.

Checkpoint models: These are the actual Stable Diffusion models that contain everything needed to generate an image. They are large files, typically 2-7 GB, and do not require additional files. This article focuses on checkpoint models.
Textual inversions or embeddings: These are small files that define new keywords to generate new objects or styles. They are typically between 10-100 KB in size and must be used with a checkpoint model.
LoRA models: These are small patch files used with checkpoint models to modify styles. They are typically between 10-200 MB in size and must be used with a checkpoint model.
Hypernetworks: These network modules can be added to checkpoint models. They are typically between 5-300 MB in size and must be used with a checkpoint model.

This article discusses Stable Diffusion models, their creation process, some commonly used ones, and the technique for combining them. Utilizing these models makes it much simpler to achieve the desired aesthetic style in your visual creations.