一、概述
Elasticsearch 是一个流行的开源查找引擎,用于存储、查找和分析数据。下面是 Elasticsearch 7.x 版别的根本操作(CRUD):
1、创立索引:
PUT /index_name
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
2、检查索引:
GET /index_name
3、删去索引:
DELETE /index_name
4、创立文档:
POST /index_name/_doc
{
"field1": "value1",
"field2": "value2"
}
5、获取文档:
GET /index_name/_doc/doc_id
6、更新文档:
POST /index_name/_doc/doc_id/_update
{
"doc": {
"field1": "new_value1"
}
}
7、删去文档:
DELETE /index_name/_doc/doc_id
这些操作能够经过 Elasticsearch 的 REST API 进行。注意,这仅仅 Elasticsearch 的根本操作之一,还有许多其他操作,如查找、聚合、分析等。要深化了解 Elasticsearch 的运用,请检查 Elasticsearch 官方文档。
二、Elasticsearch CRUD 具体示例解说
1)增加文档
1、指定文档ID
PUT blog/_doc/1
{
"title":"1、VMware Workstation虚拟机软件装置图解",
"author":"chengyuqiang",
"content":"1、VMware Workstation虚拟机软件装置图解...",
"url":"http://x.co/6nc81"
}
Elasticsearch服务会返回一个JSON格局的呼应。
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 2
}
呼应成果阐明:
- _index:文档地点的索引名
- _type:文档地点的类型名
- _id:文档ID
- _version:文档的版别
- result:created现已创立
- _shards: _shards标明索引操作的仿制进程的信息。
- total:指示应在其上履行索引操作的分片副本(主分片和副本分片)的数量。
- successful:标明索引操作成功的分片副本数。
- failed:在副本分片上索引操作失败的状况下包含仿制相关错误。
2、不指定文档ID
增加文档时能够不指定文档id,则文档id是自动生成的字符串。注意,需求运用POST办法,而不是PUT办法。
POST blog/_doc
{
"title":"2、Linux服务器装置图解",
"author":"chengyuqiang",
"content":"2、Linux服务器装置图解解...",
"url":"http://x.co/6nc82"
}
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "5P2-O2gBNSQY7o-KMw2P",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
2)获取文档
1、经过文档id获取指定的文档
GET blog/_doc/1
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件装置图解",
"author" : "chengyuqiang",
"content" : "1、VMware Workstation虚拟机软件装置图解...",
"url" : "http://x.co/6nc81"
}
}
呼应成果阐明:
- found值为true,标明查询到该文档
- _source字段是文档的内容
2、文档不存在的状况
GET blog/_doc/2
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
found字段值为false标明查询的文档不存在。
3、断定文档是否存在
HEAD blog/_doc/1
输出:
200 - OK
3)更新文档
1、更改id为1的文档,删去了author,修正content字段。
PUT blog/_doc/1
{
"title":"1、VMware Workstation虚拟机软件装置图解",
"content":"下载得到VMware-workstation-full-15.0.2-10952284.exe可履行文件...",
"url":"http://x.co/6nc81"
}
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
_version更新为2
检查该文档
GET blog/_doc/1
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件装置图解",
"content" : "下载得到VMware-workstation-full-15.0.2-10952284.exe可履行文件...",
"url" : "http://x.co/6nc81"
}
}
2、增加文档时,避免掩盖已存在的文档,能够经过_create加以限制
PUT blog/_doc/1/_create
{
"title":"1、VMware Workstation虚拟机软件装置图解",
"content":"下载得到VMware-workstation-full-15.0.2-10952284.exe可履行文件...",
"url":"http://x.co/6nc81"
}
该文档现已存在,增加失败。
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[_doc][1]: version conflict, document already exists (current version [2])",
"index_uuid": "GqC2fSqPS06GRfTLmh1TLg",
"shard": "1",
"index": "blog"
}
],
"type": "version_conflict_engine_exception",
"reason": "[_doc][1]: version conflict, document already exists (current version [2])",
"index_uuid": "GqC2fSqPS06GRfTLmh1TLg",
"shard": "1",
"index": "blog"
},
"status": 409
}
3、更新文档的字段
经过脚本更新拟定字段,其中ctx是脚本语言中的一个履行对象,先获取_source,再修正content字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.content=\"从官网下载VMware-workstation,双击可履行文件进行装置...\""
}
}
呼应成果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
再次获取文档 GET blog/_doc/1
,呼应成果如下
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件装置图解",
"content" : "从官网下载VMware-workstation,双击可履行文件进行装置...",
"url" : "http://x.co/6nc81"
}
}
4、增加字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.author=\"chengyuqiang\""
}
}
再次获取文档 GET blog/_doc/1
,呼应成果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件装置图解",
"content" : "从官网下载VMware-workstation,双击可履行文件进行装置...",
"url" : "http://x.co/6nc81",
"author" : "chengyuqiang"
}
}
5、删去字段
POST blog/_doc/1/_update
{
"script": {
"source": "ctx._source.remove(\"url\")"
}
}
再次获取文档 GET blog/_doc/1
,呼应成果如下:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"found" : true,
"_source" : {
"title" : "1、VMware Workstation虚拟机软件装置图解",
"content" : "从官网下载VMware-workstation,双击可履行文件进行装置...",
"author" : "chengyuqiang"
}
}
4)删去文档
DELETE blog/_doc/1
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 6,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1
}
再次断定该文档是否存在,履行 HEAD blog/_doc/1
,呼应成果 404 - Not Found
5)批量操作
假如文档数量非常巨大,商业运维中都是海量数据,一个一个操作文档显然不合实际。走运的是ElasticSearch供给了文档的批量操作机制。咱们现已知道mget允许一次性检索多个文档,ElasticSearch供给了Bulk API,能够履行批量索引、批量删去、批量更新等操作,也就是说Bulk API允许运用在单个步骤中进行多次 create 、 index 、 update 或 delete 恳求。
bulk 与其他的恳求体格局稍有不同,bulk恳求格局如下:
{ action: { metadata }}\n
{ request body }\n
{ action: { metadata }}\n
{ request body }\n
...
这种格局相似一个有用的单行 JSON 文档 流 ,它经过换行符(\n)衔接到一起。注意两个要点:
- 每行必定要以换行符(\n)结束, 包含最终一行 。这些换行符被用作一个标记,能够有用分隔行。
- 这些行不能包含未转义的换行符,由于他们将会对解析形成搅扰。这意味着这个 JSON 不 能运用 pretty 参数打印。
- action/metadata 行指定 哪一个文档 做 什么操作 。metadata 应该 指定被索引、创立、更新或许删去的文档的 _index 、 _type 和 _id 。
- request body 行由文档的 _source 本身组成–文档包含的字段和值。它是 index 和 create 操作所必需的。
1、批量导入
POST /_bulk
{ "create": { "_index": "blog", "_type": "_doc", "_id": "1" }}
{ "title": "1、VMware Workstation虚拟机软件装置图解" ,"author":"chengyuqiang","content":"官网下载VMware-workstation,双击可履行文件进行装置" , "url":"http://x.co/6nc81" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "2" }}
{ "title": "2、Linux服务器装置图解" ,"author": "chengyuqiang" ,"content": "VMware模拟Linux服务器装置图解" , "url": "http://x.co/6nc82" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "3" }}
{ "title": "3、Xshell 6 个人版装置与长途操作衔接服务器" , "author": "chengyuqiang" ,"content": "Xshell 6 个人版装置与长途操作衔接服务器..." , "url": "http://x.co/6nc84" }
这个 Elasticsearch 呼应包含 items 数组, 这个数组的内容是以恳求的次序列出来的每个恳求的成果。
{
"took" : 132,
"errors" : false,
"items" : [
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 7,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
}
]
}
2、批量操作,包含删去、更新、新增
POST /_bulk
{ "delete": { "_index": "blog", "_type": "_doc", "_id": "1" }}
{ "update": { "_index": "blog", "_type": "_doc", "_id": "3", "retry_on_conflict" : 3} }
{ "doc" : {"title" : "Xshell教程"} }
{ "index": { "_index": "blog", "_type": "_doc", "_id": "4" }}
{ "title": "4、CentOS 7.x根本设置" ,"author":"chengyuqiang","content":"CentOS 7.x根本设置","url":"http://x.co/6nc85" }
{ "create": { "_index": "blog", "_type": "_doc", "_id": "5" }}
{ "title": "5、图解Linux下JDK装置与环境变量装备","author":"chengyuqiang" ,"content": "图解JDK装置装备" , "url": "http://x.co/6nc86" }
在7.0版别中,retry_on_conflict 参数取代了之前的**_retry_on_conflict**
{
"took" : 125,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1,
"status" : 200
}
},
{
"update" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1,
"status" : 200
}
},
{
"index" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "blog",
"_type" : "_doc",
"_id" : "5",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1,
"status" : 201
}
}
]
}
6)批量获取
GET blog/_doc/_mget
{
"ids" : ["1", "2","3"]
}
id为1的文档现已删去,所以没有查找到
{
"docs" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"found" : false
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "2、Linux服务器装置图解",
"author" : "chengyuqiang",
"content" : "VMware模拟Linux服务器装置图解",
"url" : "http://x.co/6nc82"
}
},
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"found" : true,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版装置与长途操作衔接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
7)简略查找
这里介绍一下简略的文档查找操作,后面章节会具体介绍。
1、词项查询, 也称 term 查询
【示例一】
GET blog/_search
{
"query": {
"term": {
"title": "centos"
}
}
}
输出:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.71023846,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.71023846,
"_source" : {
"title" : "4、CentOS 7.x根本设置",
"author" : "chengyuqiang",
"content" : "CentOS 7.x根本设置",
"url" : "http://x.co/6nc85"
}
}
]
}
}
【示例二】
GET blog/_search
{
"query": {
"term": {
"title": "长途"
}
}
}
输出:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
【示例三】
GET blog/_search
{
"query": {
"term": {
"title": "程"
}
}
}
输出:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3486402,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.3486402,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版装置与长途操作衔接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
}
2、匹配查询,也称match查询
与term精确查询不同,关于match查询,只要被查询字段中存在任何一个词项被匹配,就会查找到该文档。
GET blog/_search
{
"query": {
"match": {
"title": {
"query": "长途"
}
}
}
}
输出:
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3486402,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.3486402,
"_source" : {
"title" : "Xshell教程",
"author" : "chengyuqiang",
"content" : "Xshell 6 个人版装置与长途操作衔接服务器...",
"url" : "http://x.co/6nc84"
}
}
]
}
}
8)路由机制
当你索引(动词,对该文档建立倒排索引)一个文档,它被存储到master节点上的一个主分片上。
Elasticsearch是怎么知道文档归于哪个分片的呢?当你创立一个新文档,它是怎么知道是应该存储在分片1还是分片2上的呢? 解答这个问题,咱们需求了解Elasticsearch的路由机制。 简略地说,Elasticsearch将具有相关Hash值的文档存放到同一个主分片中,分片位置核算算法如下:
shard = hash(routing) % number_of_primary_shards
算法阐明:
- routing值是一个字符串,它默许是文档_id,也能够自界说。这个routing字符串经过哈希函数生成一个数字,然后除以主切片的数量得到一个余数(remainder),余数的范围是[0 , number_of_primary_shards-1],这个数字就是特定文档地点的分片。
- 之前咱们介绍过,创立索引时需求指定主分片数量,该不能修正。这是由于假如主分片的数量在未来改变了,一切先前的路由值就失效了,文档也就永久找不到了。
- 该算法根本能够保证一切文档在一切分片上平均散布,不会导致数据散布不均(数据倾斜)的状况。
- 默许状况下,routing值是文档的_id。咱们创立文档时能够指定id的值;假如不指定id时,Elasticsearch将随机生成文档的_id值。这将导致在查询文档时,Elasticsearch不能确定文档的位置,需求将恳求广播到一切的分片节点上。
假设咱们有一个10个分片的索引。当一个恳求在集群上履行时根本进程如下:
- 这个查找的恳求会被发送到一个节点。
- 接纳到这个恳求的节点,将这个查询广播到这个索引的每个分片上(可能是主分片,也可能是仿制分片)。
- 每个分片履行这个查找查询并返回成果。
- 成果在通道节点上兼并、排序并返回给用户。
了解Elasticsearch的路由机制后,咱们能够在创立某一类文档时指定文档的路由值,这样ElasticSearch就知道在处理这一类文档时,怎么定位到正确的分片。比如,把某一特定类型的书籍存储到特定的分片上去,这样在查找这一类书籍的时候就能够避免查找其它的分片,也就避免了多个分片查找成果的兼并。路由机制向 Elasticsearch供给一种信息来决定哪些分片用于存储和查询。同一个路由值将映射到同一个分片。这根本上就是在说:“经过运用用户供给的路由值,就能够做到定向存储,定向查找。
一切的文档API(GET、INDEX、DELETE、BULK、UPDATE、MGET)都接纳一个routing参数,它用来自界说文档到分片的映射。增加routing参数方式与URL参数方式相同url?参数名=参数值。
PUT blog/_doc/1?routing=haron
{
"title":"1、VMware装置",
"author":"hadron",
"content":"VMware Workstation虚拟机软件装置图解...",
"url":"http://x.co/6nc81"
}
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 12,
"_primary_term" : 1
}
查询
GET blog/_doc/1?routing=hardon
输出:
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_routing" : "hardon",
"found" : true,
"_source" : {
"title" : "1、VMware装置",
"author" : "hadron",
"content" : "VMware Workstation虚拟机软件装置图解...",
"url" : "http://x.co/6nc81"
}
}
【注意】自界说routing值能够形成数据散布不均的状况。例如用户hadron的文档非常多,有数十万个,而其他大多数用户的文档只有数个到数十个,这样将导致hadron地点的分片较大。
9)版别控制
参考文档:
- www.elastic.co/guide/en/el…
- www.elastic.co/guide/en/el…
- elasticsearch.cn/book/elasti…
- elasticsearch.cn/book/elasti…
【示例一】不带版别
PUT website
{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}
}
}
PUT /website/_doc/1/_create
{
"title": "My first blog entry",
"text": "Just trying this out..."
}
检查
GET website/_doc/1
输出:
{
"_index" : "website",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"title" : "My first blog entry",
"text" : "Just trying this out..."
}
}
【示例二】指定版别
PUT website/_doc/1?version=1
{
"title": "My first blog entry",
"text": "Starting to get the hang of this..."
}
输出:
{
"_index" : "website",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
例如,要创立一个新的具有外部版别号 5 的博客文章,咱们能够按以下办法进行:
PUT /website/_doc/2?version=5&version_type=external
{
"title": "My first external blog entry",
"text": "Starting to get the hang of this..."
}
在呼应中,咱们能看到当时的 _version 版别号是 5 :
{
"_index" : "website",
"_type" : "_doc",
"_id" : "2",
"_version" : 5,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
现在咱们更新这个文档,指定一个新的 version 号是 10 :
PUT /website/_doc/2?version=10&version_type=external
{
"title": "My first external blog entry",
"text": "This is a piece of cake..."
}
恳求成功并将当时 _version 设为 10 :
{
"_index" : "website",
"_type" : "_doc",
"_id" : "2",
"_version" : 10,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
假如你要重新运转此恳求时,它将会失败,并返回像咱们之前看到的同样的冲突错误, 由于指定的外部版别号不大于 Elasticsearch 的当时版别号。
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[_doc][2]: version conflict, current version [10] is higher or equal to the one provided [10]",
"index_uuid": "5616aEUkQ7yvQIYUDyLudg",
"shard": "0",
"index": "website"
}
],
"type": "version_conflict_engine_exception",
"reason": "[_doc][2]: version conflict, current version [10] is higher or equal to the one provided [10]",
"index_uuid": "5616aEUkQ7yvQIYUDyLudg",
"shard": "0",
"index": "website"
},
"status": 409
}
10)refresh
1、立即改写,文档可见
这些将创立一个文档并立即改写索引,使其可见:
DELETE test
PUT test/_doc/1?refresh
{"message": "测验文档1"}
PUT test/_doc/2?refresh=true
{"message": "测验文档2"}
2、不改写
这些将创立一个文档而不做任何使查找可见的内容:
PUT test/_doc/3
{"message": "测验文档3"}
PUT test/_doc/4?refresh=false
{"message": "测验文档4"}
3、等待改写可见
PUT test/_doc/5?refresh=wait_for
{"message": "测验文档5"}
Elasticsearch 常见的操作就先到这里了,想了解更多的API 接口操作,能够查阅官方文档或私信我,也可关注我的公众号【大数据与云原生技能共享】加群交流~