本文为稀土技术社区首发签约文章，30天内禁止转载，30天后未获授权禁止转载，侵权必究！

一、前言

在许多查找引擎中，都内置了以图搜图的功用。以图搜图功用，能够极大简化查找作业。今日要做的便是完结一个以图搜图引擎。

咱们先来讨论一下以图搜图的难点，首战之地的便是怎么比照图片的类似度？怎么样的图片才叫类似？人能够一眼判别，可是核算机却不相同。图片以数字矩阵的方法存在，而类似度的比较也是比较矩阵的类似度。可是这有一些问题。

第二个问题便是巨细问题，图片的巨细一般是不相同的，而不同巨细的矩阵也无法比较类似度。不过这个很好处理，直接修改图片尺度即可。

第三个问题则是像素包含的信息十分有限，无法表达笼统信息。比方画风、物体、色彩等。

根据上面描绘，咱们现在要处理两个问题：用什么信息替换像素信息、怎么核算类似度。下面逐个处理。在开始前，咱们先完结一个简易的以图搜图功用。

二、简易以图搜图完结

2.1 怎么核算类似度

首要来讨论一下直接运用像素作为图画的表明，此刻咱们应该怎么完结以图搜图的操作。一个十分简略的主意便是直接核算两个图片的几许间隔，假如咱们方针图片为target，图库中的图片为source，几许间隔的核算如下：

$distance=sum[(target−source)2]distance=\sqrt{sum[(target – source)^2]}$

然后把间隔最小的n个图片作为查找成果。

这个办法看起来不可靠，可是实际运用时也会有不错的成果。假如图库图片本身不是十分复杂，比方动漫头像，那么这种方法十分简略有效，而其它情况下成果会比较不稳定。

2.2 根据几许间隔的图片查找

根据几许间隔的图片查找完结步骤如下：

把图片修改到同一尺度，假如尺度不同则无法核算几许间隔
选定一个图片作为方针图片，即待查找图片
遍历图库，核算几许间隔，并记录到列表
对列表排序，获取几许间隔最小的n张图片

这儿运用蜡笔小新的图片作为图库进行查找，下面是图片的一些示例：

部分图片有类似的风格，咱们期望能根据一张图片找到类似风格的图片。完结代码如下：

import os
import cv2
import random
import numpy as np
base_path = r"G:\datasets\lbxx"
# 获取一切图片途径
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
# 选取一张图片作为方针图片
target_path = random.choice(files)
target = cv2.imread(target_path)
h, w, _ = target.shape
distances = []
# 遍历图库
for file in files:
    # 读取图片，转换成与方针图片同一尺度
    source = cv2.imread(file)
    if not isinstance(source, np.ndarray):
        continue
    source = cv2.resize(source, (w, h))
    # 核算几许间隔，并加入列表，这儿没有开方
    distance = ((target - source) ** 2).sum()
    distances.append((file, distance))
# 找到类似度前5的图片，这儿拿了6个，榜首个是原图
distances = sorted(distances, key=lambda x: x[-1])[:6]
imgs = list(map(lambda x: cv2.imread(x[0]), distances))
result = np.hstack(imgs)
cv2.imwrite("result.jpg", result)

下面是一些比较好查找成果，其间最左面是target，其他10张为查找成果。

假如换成猫狗图片，下面是一些查找成果：

2.3 存在的问题

上面的完结存在两个问题，其一是像素并不能表明图画的深层意义。查找成果中经常会回来颜色类似的图片。第二个则是核算复杂度的问题，假如图片巨细未224224，那么图片有150528个像素，核算几许间隔会比较耗时。并且在查找时，需求遍历整个图库，当图库数量较大时，核算量将不可忍受。因此需求对上面的办法进行一些改善。

三、改善一，用特征替代像素

3.1 图画特征

在表明图片时，便是从根本的像素到手艺特征再到深度学习特征。比较之下，用卷积神经网络提取的图画特征有几个有点，详细如下：

具有很强的泛化能力，提取的特征受视点、位置、亮度等的影响会像素和手艺特征。
较少的维度，运用ResNet50提取224224图片的特征时，会回来一个772048的张量，这比像素数量要少许多。
具有笼统性，比较前面两种，卷积神经网络提取的特征具有笼统性。比方关于图片中类别的信息，这是前面两种无法到达的作用。

在本文咱们会运用ResNet50来提取图片特征。

3.2 Embedding的妙用

运用ResNet50提取的特征也能够被称为Embedding，也能够简略理解为图向量。Embedding近几年在人工智能领域发挥了巨大潜力，尤其在自然语言处理领域。

3.2.1 联系可视化

前期Embedding首要用于词向量，通过word2vec把单词转换成向量，然后就能够完结一些美妙的操作。比方单词之间联系的可视化，比方下面这张图：

在图片中可视化了：mother、father、car、auto、cat、tiger六个单词，从图能够明显看出mother、father比较近；car、auto比较近；cat、tiger比较近，这些都与咱们常识相符。

3.2.2 联系运算

咱们期望练习杰出的Embedding每一个维度都有一个详细的意义，比方榜首维表明词性，第二维表明情感，其他各个维度都有详细意义。假如能到达这个作用，或许到达近似作用，那么就能够运用向量的核算来核算单词之间的联系。

比方“妈妈-女人+男性≈爸爸”，或许“国王-男性+女人≈皇后”。比方以往要查找“物理学界的贝多芬是谁”或许得到十分奇怪的成果，可是假如把这个问题转换成“贝多芬-音乐界+物理学界≈?”，这样问题就简略多了。

3.2.3 聚类

当咱们能够用Embedding表明图片和文字时，就能够运用聚类算法完结图片或文字的主动分组。在许多手机的相册中，就有主动图片归类的功用。

聚类还能够加速查找的操作，这点会在后边详细说。

3.3 以图搜图改善

下面运用图画特征来替代像素改善以图搜图，代码如下：

import os
import cv2
import random
import numpy as np
from keras.api.keras.applications.resnet50 import ResNet50
from keras.api.keras.applications.resnet50 import preprocess_input
w, h = 224, 224
# 加载模型
encoder = ResNet50(include_top=False)
base_path = r"G:\datasets\lbxx"
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
target_path = random.choice(files)
target = cv2.resize(cv2.imread(target_path), (w, h))
# 提取图片特征
target = encoder(preprocess_input(target[None]))
distances = []
for file in files:
    source = cv2.imread(file)
    if not isinstance(source, np.ndarray):
        continue
    # 读取图片，提取图片特征
    source = cv2.resize(source, (w, h))
    source = encoder(preprocess_input(source[None]))
    distance = np.sum((target - source) ** 2)
    distances.append((file, distance))
# 找到类似度前5的图片，这儿拿了6个，榜首个是原图
distances = sorted(distances, key=lambda x: x[-1])[:6]
imgs = list(map(lambda x: cv2.imread(x[0]), distances))
result = np.hstack(imgs)
cv2.imwrite("result.jpg", result)

这儿运用在imagenet上预练习的ResNet50作为特征提取网络，提取的要害操作如下：

加载模型

# 加载ResNet50的卷积层，放弃全衔接部分
encoder = ResNet50(include_top=False)

图片预处理

# 把图片转换成224224，并运用ResNet50内置的预处理办法处理
target = cv2.resize(cv2.imread(target_path), (w, h))
target = preprocess_input(target[None])

提取特征

# 运用ResNet40网络提取特征
target = encoder(preprocess_input(target)

下面是改善后的查找成果：

四、改善二，运用聚类改善查找速度

4.1 完结原理

在前面的比如中，咱们都是运用线性查找的方法，此刻需求遍历一切图片。查找复杂度为O(n)，一般能够用树结构来存储待查找的内容，从而把复杂度降低到O(logn)。这儿咱们运用更简略的办法，即聚类。

首要咱们要做的便是对图片的特征进行聚类，聚成c个簇，每个簇都会对应一个簇中心。簇中心能够认为是一个簇中的均匀结构，同一簇中的样本类似度会比较高。

在完结聚类后，咱们能够拿到target图片的向量，在c个簇中心中查找target与哪个簇最挨近。然后再到当时簇中线性查找最类似的几个图片。

4.2 代码完结

代码完结分为下面几个步骤：

把图片转换成向量

这部分代码和前面根本相同，不过这次为了速度快，咱们把图画特征存储到embeddings.pkl文件：

import os
import cv2
import pickle
import numpy as np
import tensorflow as tf
from keras.api.keras.applications.resnet50 import ResNet50
from keras.api.keras.applications.resnet50 import preprocess_input
w, h = 224, 224
# 加载模型
encoder = ResNet50(include_top=False)
base_path = r"G:\datasets\lbxx"
# 获取一切图片途径
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
# 将图片转换成向量
embeddings = []
for file in files:
    # 读取图片，转换成与方针图片同一尺度
    source = cv2.imread(file)
    if not isinstance(source, np.ndarray):
        continue
    source = cv2.resize(source, (w, h))
    embedding = encoder(preprocess_input(source[None]))
    embeddings.append({
        "filepath": file,
        "embedding": tf.reshape(embedding, (-1,))
    })
with open('embeddings.pkl', 'wb') as f:
    pickle.dump(embeddings, f)

对一切向量进行聚类操作

这儿能够运用sklearn完结：

from sklearn.cluster import KMeans
with open('embeddings.pkl', 'rb') as f:
    embeddings = pickle.load(f)
X = [item['embedding'] for item in embeddings]
kmeans = KMeans(n_clusters=500)
kmeans.fit(X)
preds = kmeans.predict(X)
for item, pred in zip(embeddings, preds):
    item['cluster'] = pred
joblib.dump(kmeans, 'kmeans.pkl')
with open('embeddings.pkl', 'wb') as f:
    pickle.dump(embeddings, f)

假如图片数量比较多的话，这部分操作会比较耗时。然后调用kmeans.predict办法就能够知道某个图片归于哪个簇，这个也能够事前存储。

找到输入图片最近的簇中心

在练习完结后，就能够拿到一切簇中心：

kmeans.cluster_centers_

现在要做的便是找到与输入图片最近的簇中心，这个和前面的查找相同：

# 查找最近的簇
closet_cluster = 0
closet_distance = sys.float_info.max
for idx, center in enumerate(centers):
    distance = np.sum((target.numpy() - center) ** 2)
    if distance < closet_distance:
        closet_distance = distance
        closet_cluster = idx

在当时簇中查找图片

这个和前面也是根本相同的：

distances = []
for item in embeddings:
    if not item['cluster'] == closet_cluster:
        continue
    embedding = item['embedding']
    distance = np.sum((target - embedding) ** 2)
    distances.append((item['filepath'], distance))
# 对间隔进行排序
distances = sorted(distances, key=lambda x: x[-1])[:11]
imgs = list(map(lambda x: cv2.imread(x[0]), distances))
result = np.hstack(imgs)
cv2.imwrite("result.jpg", result)

下面是一些查找成果：

作用还是不错的，并且这次查找速度快了许多。不过在编码上这种方法比较繁琐，为了让代码更简练，下面引入向量数据库。

五、向量数据库

5.1 向量数据库

向量数据库和传统数据库不太相同，能够在数据库中存储向量字段，然后完结向量类似度检索。运用向量数据库能够很便利完结上面的检索功用，并且功用方面会比前面更佳。

向量数据库与传统数据库有很多类似的地方，在联系型数据库中，数据库分为衔接、数据库、表、目标。在向量数据库平分别对应衔接、数据库、调集、数据。调集中，能够添加embedding类型的字段，该字段能够用于向量检索。

5.2 Milvus向量数据库的运用

下面简略说一下Milvus向量数据库的运用，首要需求安装Milvus，履行下面两条履行即可：

wget https://github.com/milvus-io/milvus/releases/download/v2.2.11/milvus-standalone-docker-compose.yml -O docker-compose.yml
sudo docker-compose up -d

下载完结后，需求衔接数据库，代码如下：

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
connections.connect(host='127.0.0.1', port='19530')

然后创建调集：

def create_milvus_collection(collection_name, dim):
    if utility.has_collection(collection_name):
        utility.drop_collection(collection_name)
    fields = [
        FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', max_length=500, is_primary=True,
                    auto_id=True),
        FieldSchema(name='filepath', dtype=DataType.VARCHAR, description='filepath', max_length=512),
        FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim),
    ]
    schema = CollectionSchema(fields=fields, description='reverse image search')
    collection = Collection(name=collection_name, schema=schema)
    # create IVF_FLAT index for collection.
    index_params = {
        'metric_type': 'L2',
        'index_type': "IVF_FLAT",
        'params': {"nlist": 2048}
    }
    collection.create_index(field_name="embedding", index_params=index_params)
    return collection
collection = create_milvus_collection('images', 2048)

其间create_milvus_collection的第二个参数是embedding的维度，这儿传入图片特征的维度。然后把图片特征存储到向量数据库中，这儿需求留意维度不能超过32768，可是ResNet50回来的维度超过了这个限制，为此能够用PCA降维或许采用其它办法获取图片embedding。

import pickle
from sklearn.decomposition import PCA
with open('embeddings.pkl', 'rb') as f:
    embeddings = pickle.load(f)
X = [item['embedding'] for item in embeddings]
pca = PCA(n_components=2048)
X = pca.fit_transform(X)
for item, vec in zip(embeddings, X):
    item['embedding'] = vec
with open('embeddings.pkl', 'wb') as f:
    pickle.dump(embeddings, f)
with open('pca.pkl', 'wb') as f:
    pickle.dump(pca, f)

这样就能够刺进数据了，代码如下：

index_params = {
    "metric_type": "L2",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 1024}
}
with open('embeddings.pkl', 'rb') as f:
    embeddings = pickle.load(f)
base_path = r"G:\datasets\lbxx"
# 获取一切图片途径
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
for item in embeddings:
    collection.insert([
        [item['filepath']],
        [item['embedding']]
    ])

现在假如想要查找图片，只需求下面几行代码即可：

import os
import cv2
import joblib
import random
import numpy as np
import tensorflow as tf
from PIL import Image
from keras.api.keras.applications.resnet50 import ResNet50
from keras.api.keras.applications.resnet50 import preprocess_input
from pymilvus import connections, Collection
pca = joblib.load('pca.pkl')
w, h = 224, 224
encoder = ResNet50(include_top=False)
base_path = r"G:\datasets\lbxx"
files = [os.path.join(base_path, file) for file in os.listdir(base_path)]
target_path = random.choice(files)
target = cv2.resize(cv2.imread(target_path), (w, h))
target = encoder(preprocess_input(target[None]))
target = tf.reshape(target, (1, -1))
target = pca.transform(target)
# 衔接数据库，加载images调集
connections.connect(host='127.0.0.1', port='19530')
collection = Collection(name='images')
search_params = {"metric_type": "L2", "params": {"nprobe": 10}, "offset": 5}
collection.load()
# 在数据库中查找
results = collection.search(
    data=[target[0]],
    anns_field='embedding',
    param=search_params,
    output_fields=['filepath'],
    limit=10,
    consistency_level="Strong"
)
collection.release()
images = []
for result in results[0]:
    entity = result.entity
    filepath = entity.get('filepath')
    image = cv2.resize(cv2.imread(filepath), (w, h))
    images.append(np.array(image))
result = np.hstack(images)
cv2.imwrite("result.jpg", result)

下面是一些查找成果，全体来看还是十分不错的，不过由于降维的联系，查找作用或许或略差于前面，可是全体效率要高许多。

六、总结

本文咱们分享了以图搜图的功用。首要思想便是将图片转换成向量表明，然后利用类似度核算，在图库中查找与之最挨近的图片。最开始运用线性查找的方法，此刻查找效率最低。然后运用聚类进行改善，把把图片分成多个簇，把查找分为查找簇和查找最近图片两个步骤，能够大大提高查找效率。

改善后代码变得比较繁琐，所以引入向量数据库，运用向量数据库完结检索功用。这样就完结了整个程序的编写。

如何实现以图搜图

一、前言

二、简易以图搜图完结

2.1 怎么核算类似度

2.2 根据几许间隔的图片查找

2.3 存在的问题

三、改善一，用特征替代像素

3.1 图画特征

3.2 Embedding的妙用

3.2.1 联系可视化

3.2.2 联系运算

3.2.3 聚类

3.3 以图搜图改善

四、改善二，运用聚类改善查找速度

4.1 完结原理

4.2 代码完结

五、向量数据库

5.1 向量数据库

5.2 Milvus向量数据库的运用

六、总结

作者信息

如何实现以图搜图

一、前言

二、简易以图搜图完结

2.1 怎么核算类似度

2.2 根据几许间隔的图片查找

2.3 存在的问题

三、改善一，用特征替代像素

3.1 图画特征

3.2 Embedding的妙用

3.2.1 联系可视化

3.2.2 联系运算

3.2.3 聚类

3.3 以图搜图改善

四、改善二，运用聚类改善查找速度

4.1 完结原理

4.2 代码完结

五、向量数据库

5.1 向量数据库

5.2 Milvus向量数据库的运用

六、总结

相关文章

如何更好的使用ChatGPT类大语言模型，这儿汇总了一份prompt指引

ChatGPT的出现我的所思所悟

头大了，Mysql写入数据十几秒后被自动删除了

极限发问，挑战文心一言底线

作者信息