炼丹！训练 stable diffusion 来生成LoRA定制模型-六虎

LoRA，英文全称Low-Rank Adaptation of Large Language Models，直译为大言语模型的低阶适应，这是微软的研讨人员为了处理大言语模型微调而开发的一项技能。比方，GPT-3有1750亿参数，为了让它精干特定范畴的活儿，需求做微调，但是假如直接对GPT-3做微调，成本太高太麻烦了。

LoRA的做法是，冻住预练习好的模型权重参数，然后在每个Transformer（Transforme便是GPT的那个T）块里注入可练习的层，因为不需求对模型的权重参数重新核算梯度，所以，大大减少了需求练习的核算量。研讨发现，LoRA的微调质量与全模型微调相当，要做个比喻的话，就好比是大模型的一个小模型，或许说是一个插件。

依据显卡性能不同，练习一般需求一个到几个小时的时刻，这个过程俗称炼丹！

主要步骤有以下这些，话不多说，开整!

1. 显卡

首先是要有显卡了，引荐8G显存以上的N卡。然后便是装GPU驱动，能够参考我曾经文章centos中docker运用GPU

2. 练习环境

自从有了docker，我就不喜欢在宿主机上装一堆开发环境了，所以这次就直接运用stable-diffusion-webui带webui打包好的镜像，也便利练习完结今后测验。引荐一下 kestr3l/stable-diffusion-webui 这个镜像，是根据 nvidia/cuda:11.7.1-devel-ubuntu22.04 镜像，自己亲自测验过，可用的。附一个我用的 docker-compose.yml 文件

version: "3"
services: 
  sd-webui:
    image: kestr3l/stable-diffusion-webui:1.1.0
    container_name: sd-webui
    restart: always
    ports:
      - "7860:7860"
      - "7861:7861"
    ulimits:
      memlock: -1
      stack: 67108864
    shm_size: 4G
    deploy:
      resources:
        limits:
          cpus: "8.00"
          memory: 16G
        reservations:
          devices:
            - capabilities: [gpu]
    volumes:
      # 这儿主要是便利映射下载的模型文件
      - ./models:/home/user/stable-diffusion-webui/models:cached
      # 修正容器的默许发动脚本，便利咱们手动控制
      - ./entrypoint-debug.sh:/usr/local/bin/entrypoint.sh:cached

entrypoint-debug.sh文件内容：

#! /bin/sh
python3

能够去 civitai 下载 stable diffusion 的模型，放到宿主机的 ./models/Stable-diffusion 目录下面，也能够去下载一些LoRA模型丢在./models/Lora 目录下。

模型准备完毕了就能够跑个 stable diffusion 图形化界面试试看, 履行./webui.sh -f --listen 指令，发动之前会下载装置许多依赖包，国内环境不太顺，能够上署理装置。

假如输出以下内容，则表示装置成功：

root@cebe51b82933:/home/user/stable-diffusion-webui# ./webui.sh -f --listen
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################
################################################################
Running on root user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
################################################################
Launching launch.py...
################################################################
./webui.sh: line 168: lspci: command not found
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Commit hash: <none>
Installing requirements for Web UI
Launching Web UI with arguments: --listen
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading weights [fc2511737a] from /home/user/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp32Fix.safetensors
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0): 
Model loaded in 16.0s (0.8s create model, 14.9s load weights).
Running on local URL:  http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.

翻开浏览器访问：http://127.0.0.1:7860 或许 http://内网ip:7860 就能够AI绘画了

不得不说 chilloutmix_NiPrunedFp32Fix 模型生成的图片是针不戳！

3. 装置练习图形化界面

为了下降练习门槛，这儿选用的是根据Gradio做的一个WebGui图形化界面，该项目在GitHub上叫Kohya’s GUI。

# 下载项目
git clone https://github.com/bmaltais/kohya_ss.git
# 履行装置脚本
cd kohya_ss
bash ubuntu_setup.sh

因为是在docker内部履行，ubuntu_setup.sh 脚本可能有问题，所以我一般是直接进入容器，手动单条履行

apt install python3-tk
python3 -m venv venv
source venv/bin/activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --use-pep517 --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/linux/xformers-0.0.14.dev0-cp310-cp310-linux_x86_64.whl

履行accelerate config指令，生成对应配置文件，选项如下：

(venv) root@cebe51b82933:/home/user/kohya_ss# accelerate config
2023-03-13 06:45:22.678222: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-13 06:45:22.922383: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-03-13 06:45:23.593040: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-13 06:45:23.593158: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-13 06:45:23.593177: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
--------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine                                                                                      
--------------------------------------------------------------------------------------------------Which type of machine are you using?                                                              
No distributed training                                                                           
Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]:NO            
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO                                 
Do you want to use DeepSpeed? [yes/NO]: NO                                                        
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all                                                                                                
--------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?
fp16                                                                                              
accelerate configuration saved at /root/.cache/huggingface/accelerate/default_config.yaml

4. 发动练习图形化界面

履行指令python kohya_gui.py --listen 0.0.0.0 --server_port 7861 --inbrowser --share

(venv) root@cebe51b82933:/home/user/kohya_ss# python kohya_gui.py --listen 0.0.0.0 --server_port 7861 --inbrowser --share                                                                         
Load CSS...
Running on local URL:  http://0.0.0.0:7861
Running on public URL: https://49257631b1b39d3db5.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces

这时候浏览器就能够翻开http://127.0.0.1:7861 端口了，界面如下：

5. 准备要练习的图片

找到你要用来练习的一些图片放到统一文件夹下，主张15张以上，我这儿就用汤唯的相片：

然后翻开stable diffusion webui来预处理这些图片：

点击Preprocess按钮，等待处理完结。顺利的的话会在dest文件夹下生成512*512的图片，和描述词文件

6. 开端练习

咱们去练习的界面，需求设置一堆参数，直接看图吧

主张新建立文件夹，比方我这儿叫train_lora，在文件夹里创立image、log和model三个文件夹，其间，image里存放的图片便是预处理生成的图片。 image里的预处理图片不能直接放在里面，需求在里面创立一个文件夹，文件夹的命名非常有讲究。已知，LoRa的练习需求至少1500步，而每张图片至少需求练习100步。假如咱们有15张或许15张以上张图片，文件夹就需求写上100_Hunzi。假如练习的图片不够15张，比方10张，就需求改为150_Hunzi，以此类推。这部分很重要，一定要算清楚。当然，这也正是LoRa强大的地方，用这么少的图片即可完结练习。

点击练习按钮，开端炼丹：

生成的丹就在train_lora/model文件夹下面：

终究运用这个丹的生成的图片作用展示：

炼丹！训练 stable diffusion 来生成LoRA定制模型

1. 显卡

2. 练习环境

3. 装置练习图形化界面

4. 发动练习图形化界面

5. 准备要练习的图片

6. 开端练习

相关文章

Mac M1 安装 brew

浅谈 @Autowired 和 @Resource 的区别

使用Flutter写了一个GptChatBot

震惊！try catch 语句竟然失效了？

作者信息