介绍

2024 年 1 月 17 日，新一代大言语模型墨客浦语 2.0（InternLM2）正式发布（GitHub 库房地址）。比较于第一代 InternLM，InternLM2 在推理、对话体会等方面的能力全面提高，东西调用能力全体升级，并支撑 20 万字超长上下文，实现长文对话 “难如登天”。

InternLM2 包含 InternLM2-7B 和 InternLM2-20B 两种模型标准（20B 模型比 7B 模型功能更强壮），每种标准又根据不同的运用场景，分为以下四种模型：InternLM2-Base、InternLM2、InternLM2-Chat-SFT 和 InternLM2-Chat。其间 InternLM2 是官方引荐运用的基础模型，InternLM2-Chat 是官方引荐运用的对话模型。下文首要介绍 InternLM2-Chat-7B 模型的布置和运用。

模型	HuggingFace	ModelScope
InternLM2-Chat-7B	库房地址	库房地址
InternLM2-Chat-20B	库房地址	库房地址

环境预备

Featurize 算力渠道供给了高效快捷的在线实验环境，在渠道上租用合适的 GPU 实例，布置大模型，方便快捷，省时省力，并且价格亲民。

本人实际布置 InternLM2-Chat-7B 模型耗费显存 20 GB 左右（受实际参数配置影响，仅供参考），因此租用一张 RTX 3090 或许 RTX 4090 的 GPU 实例就能满意模型运转条件。

关于 Featurize 渠道的运用，建议直接阅览官方文档，上手操作十分简单，在此不在赘述。

模型布置&运用

页面交互方法

两种布置方法仅仅页面展现效果不同，并无本质区别，挑选其间一种方法布置即可。

Gradio

LMDeploy 东西中封装了 Gradio，咱们运用该东西布置模型。

LMDeploy 所需的运转环境和模型布置代码已整理到下方的脚本文件中，履行脚本文件即可一键布置。

首先解说下发动指令中的几个参数意义，各参数取值可根据硬件条件自行调整。

tp（tensor_parallel_size）：表明运用几张 GPU 来运转一个模型。
max_batch_size：批处理巨细，该参数值越大，吞吐量越高，但会占用更多显存。
cache_max_entry_count：设置 k/v 缓存巨细，会占用显存。当值为 0~1 之间的小数时，表明 k/v block 运用的内存百分比（例如显存 60 G，该值设置为 0.5，则 k/v 运用的内存总量为 60 * 0.5 = 30G）。当值 >1 时，表明 k/v block 数量。
./internlm2-chat-7b：模型本地存储路径。

具体操作步骤如下。

通过 ssh 终端连接到服务器实例，新建 deploy.sh 脚本文件，文件内容如下。

cd ~
# 安装运转环境
echo "Installing Python dependencies"
pip install lmdeploy socksio gradio==3.50.2

# 安装 Git ltf 扩展包
echo "Installing git lfs extension"
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install -y git-lfs
git lfs install

# 拉取模型库
echo "Download repo"
git clone https://huggingface.co/internlm/internlm2-chat-7b

# 发动模型
echo "start model"
python3 -m lmdeploy.serve.gradio.app --tp=1 --max_batch_size=64 --cache_max_entry_count=0.1 --server_name=0.0.0.0 --server_port=8888 ./internlm2-chat-7b

2. 履行 sh deploy.sh 指令发动脚本。脚本履行大约需求 5 分钟时刻（模型库房中有几个大文件）。新开一个终端窗口，履行指令 watch -n 1 nvidia-smi 可以实时观察 GPU 资源的运用状况。

模型布置完结，履行下面指令，敞开 Featurize 端口。端口敞开后 Featurize 会供给公网拜访地址。

# 敞开端口
featurize port export 8888
# 查看已敞开的端口
featurize port list

4. 拜访公网地址，运用模型。

Streamlit

官方 GitHub 库房中供给了运用 Streamlit 布置模型的代码。示例代码默许加载长途 Hugging Face 库房中的模型，如果现已将模型下载到本地，可以修改源码从本地加载模型。

脚本文件如下，可直接履行，一键布置。

cd ~

# 安装环境
pip install streamlit==1.24.0
pip install transformers==4.37.0

# 克隆代码
git clone https://github.com/InternLM/InternLM.git

# 运转
streamlit run ./InternLM/chat/web_demo.py

默许发动端口：8501，记住敞开 Featurize 端口。交互页面如下所示。

代码方法

注意：代码中./internlm2-chat-7b 为模型本地存储路径，请根据实际状况自行调整。

Transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

if __name__ == '__main__':
    # 没有本地模型，替换为 internlm/internlm2-chat-7b
   tokenizer = AutoTokenizer.from_pretrained("./internlm2-chat-7b", trust_remote_code=True)
   model = AutoModelForCausalLM.from_pretrained("./internlm2-chat-7b", device_map="auto",
                         trust_remote_code=True, torch_dtype=torch.float16)
   model = model.eval()
   response, history = model.chat(tokenizer, "你好 我是 Cleaner", history=[])
   print(response)

ModelScope

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM

if __name__ == '__main__':
    # 没有本地模型，替换为 Shanghai_AI_Laboratory/internlm2-chat-7b
   model_dir = snapshot_download('./internlm2-chat-7b')
   tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
   model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True,
                         torch_dtype=torch.float16)
   model = model.eval()
   response, history = model.chat(tokenizer, "你好 我是 Cleaner", history=[])
   print(response)

LMDeploy

LMDeploy 运用文档

from lmdeploy import pipeline, TurbomindEngineConfig

if __name__ == '__main__':
   backend_config = TurbomindEngineConfig(tp=1,
                      max_batch_size=64,
                      cache_max_entry_count=0.1)
  # 没有本地模型，替换为 internlm/internlm2-chat-7b
   pipe = pipeline("./internlm2-chat-7b", backend_cofing=backend_cofing)
   response = pipe(["你好 我是 Cleaner"])
   print(response)

总结

整理 InternLM2 的特色，协助想要运用大言语模型的个人开发者或许企业，在面对众多大言语模型时，能够了解大言语模型供给的能力，并结合本身的需求与本钱，做出清晰明确的挑选。

开源免费、可商用。
超长上下文支撑：200K token 的输入与了解。（书本等大文本数据做摘要总结、若干轮对话后回想之前的内容（难如登天））
支撑东西调用能力：能够在一次交互中屡次调用东西，完结相对复杂的任务。（Agent）
支撑微谐和练习。（供给专有数据集，打造个人/企业私有化大模型）。

结尾

作为一名软件开发人员，大模型的相关运用现已成为我日常工作和生活中的常用东西，本人也在不断跟进了解人工智能的发展状况。

大模型从对话、谈天到东西调用、长文了解，乃至多模态，在不断打破人类认知，带给咱们无限的想象空间。

也许未来的某一天，咱们可以拥有自己的贾维斯（Friday）。

如果本文对你有协助的话，欢迎 点赞 + 保藏 ，十分感谢！

我是 Cleaner，咱们下期再会~

大语言模型 InternLM2（书生·浦语）一键部署

介绍

环境预备

模型布置&运用

页面交互方法

Gradio

Streamlit

代码方法

Transformers

ModelScope

LMDeploy

总结

结尾

作者信息

大语言模型 InternLM2（书生·浦语） 一键部署

介绍

环境预备

模型布置&运用

页面交互方法

Gradio

Streamlit

代码方法

Transformers

ModelScope

LMDeploy

总结

结尾

相关文章

iOS对象的内存分析

SRC漏洞挖掘经验+技巧篇

2022容器格式全面指南

画图实战-Python实现某产品全年销量数据多种样式可视化

作者信息

大语言模型 InternLM2（书生·浦语）一键部署