ES是什么

Elasticsearch(搜索数据分析引擎) 是一个分布式、RESTful 风格的搜索和数据分析引擎,能够解决不断涌现出的各种用例。作为 Elastic Stack 的核心,Elasticsearch 会集中存储您的数据,让您飞快完成搜索,微调相关性,进行强大的分析,并轻松缩放规模。

DSL语句是什么

DSL语句:ES使用的RESTful风格的增删改查语句

基础介绍

在学习DSL语句之前,我们要明白ES其实也是一个数据库,只不过数据存储和查询方式和我们一般了解的关系型数据库不太一样,它提供的数据结构可以让我们快速检索数据,分析大规模数据。

如果将它抽象成熟知的数据库,比如MySQL,就会更容易理解一些。

MySQL ElasticSearch
数据库 (database) 索引数据库 (index)
表结构 映射 (_mapping)
表 (table) 类型 (type)
行 (row) 文档 (document)
列 (column) 字段 (field)

详解


创建

创建索引库

1
PUT /studentinfo  //索引库名

设置mapping映射

一些字段属性:

type:标识数据类型,可以是👇

  • 字符串:text(可分词搜索文本)、keyword(精确值,例如:邮箱、国家、品牌之类的,拆分没有意义的词)

  • 数值:long、short、byte、integer、double、float

  • 日期:date

  • 布尔:boolean

  • 对象:object

  • 地理位置:1.geo_point:经纬度确定的点,例如(32.896536,120.561981)

    ​ 2.geo_shape:多个geo_point点组成的复杂几何图形,例如一条直线 LINESTRING(32.896536 120.561981,30.785423 121.189631)

index:是否索引,默认为true,如果是索引则会进行分词,参与搜索

analyzer:分词器

properties:子字段

copy_to:可以让ES创建倒排索引时将多个字段的值排到一个字段中,增加ES搜索效率

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
POST /studentinfo/_mapping
{
"_source": {"enabled": true},
"dynamic": true,
"properties" : {
"imei" : {"type" : "text","analyzer" : "keyword","fielddata": true},
"ispid" : {"type":"integer","ignore_malformed": true},
"msisdn" : {"type" : "text","analyzer" : "keyword","fielddata": true},
//"analyzer" : "ik_smart" 是ik分词器
//"ik_smart"是粗粒度,即分出的词较少;"ik_max_word"是最大粒度,分词很细,词最多
"maddr_s" : {"type" : "text","analyzer" : "ik_smart","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"maddr_p" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"maddr_c" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"netaddr_s" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"netaddr_p" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"netaddr_c" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"seraddr_s" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"seraddr_p" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"seraddr_c" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"uli" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"x_sip" : {"type" : "ip"},
"x_dip" : {"type" : "ip"},
"i_sport" : {"type" : "keyword","doc_values":true},
"i_dport" : {"type" : "keyword","doc_values":true},
"guti" : {"type" : "keyword","doc_values":true},
"i_nsapi" : {"type" : "long","ignore_malformed": true},
"apn" : {"type" : "text", "analyzer" : "keyword"},
"rai" : {"type" : "text", "analyzer" : "keyword"},
"gsnu" : {"type" : "text", "analyzer" : "keyword"},
"teid" : {"type" : "text", "analyzer" : "keyword"},
"endtime" : {"type" : "long","ignore_malformed": true},
"id" : {"type" : "long"},
"i_type" : {"type" : "text","analyzer" : "keyword"},
"conndirect" : {"type" : "text", "analyzer" : "keyword"},
"protocoltype" : {"type" : "text" ,"analyzer" : "keyword"},
"i_trojan_type" : {"type" : "long","ignore_malformed": true},
"pguti":{"type" : "text" ,"analyzer" : "keyword"},
"sid":{"type" : "text" ,"analyzer" : "keyword"},
"x_imsi":{"type" : "text" ,"analyzer" : "keyword","fielddata": true},
"pteid":{"type" : "text" ,"analyzer" : "keyword"},
"flow":{"type" : "long","ignore_malformed": true},
"vpsfirm":{"type" : "text" ,"analyzer" : "keyword"},
"x_begintime":{"type" : "long","ignore_malformed": true},
"netaddr" : {"type" : "keyword","doc_values":true},
"seraddr" : {"type" : "keyword","doc_values":true},
"maddr" : {"type" : "keyword","doc_values":true},
"paddr_s" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"paddr_p" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"paddr_c" : {"type" : "text","analyzer" : "ik_max_word","norms" : false,"fields" : {"raw" : {"type" : "keyword","doc_values":true}}},
"paddr" : {"type" : "keyword","doc_values":true},

// copy_to 案例
"copy_toTestAll": {"type":"text","analyzer" : "ik_max_word"},
"copy_toTest1": {"type":"text","copy_to":"copy_toTestAll"},
"copy_toTest2": {"type":"text","copy_to":"copy_toTestAll"},
"copy_toTest3": {"type":"text","copy_to":"copy_toTestAll"},

"sshclient" : {"type" : "keyword","doc_values":true},
"sshserver" : {"type" : "keyword","doc_values":true},
"sshver" : {"type" : "keyword","doc_values":true},
"b_firstpacket" : {"type" : "binary"},
"b_firstpacket1" : {"type" : "text", "analyzer" : "keyword"},
"appbigt" :{"type" : "integer"},
"appsubt" :{"type" : "integer"}
}
}

插入文档(数据)

1
2
3
4
5
6
7
8
9
POST /索引库名/_doc/文档id	// 如果不加文档id,es会随机生成一个id
{
"id":"1",
"name":"张三",
"age":"15",
"height":"1.45",
"isHealth":true,
"remark":"中学生"
}

批量插入文档

类似与sql语句:INSERT INTO testTable (xx,xx) VALUES (‘xx’,‘xx’),(‘xx’,‘xx’),(‘xx’,‘xx’)

注意这里格式必须要格式化一下,kibana 快捷键ctrl + i ,不然会报错

1
2
3
4
5
6
7
8
9
POST /studentinfo/student/_bulk
{"index":{}}
{"name":"李四","age":"28","height":"1.65","isHealth":true,"remark":"社会打工人"}
{"index":{}}
{"name":"王五","age":"37","height":"1.75","isHealth":true,"remark":"管理人员"}
{"index":{}}
{"name":"赵六","age":"24","height":"1.65","isHealth":true,"remark":"大学毕业时"}
{"index":{}}
{"name":"田七","age":"24","height":"1.65","isHealth":true,"remark":"良好四民"}

查询

查询索引库

1
2
3
4
5
6
GET /studentinfo  //索引库名
{
"query": {
"match_all": {}
}
}

根据id查询文档

1
GET /索引库名/_doc/文档id

match匹配, term精确匹配, range范围过滤, exists包含某个字段. all_interests自己定义的聚合名称

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
GET /索引库名/_search
{
"query": {
"bool":{
"must":[
{
"match": {
"about": "travel"
}
},
{
"term":{
"sex":"boy"
}
},
{
"range":{
"age":{
"gte":16, // age >= 16 and age <= 25
"lt":25
}
}
},
{
"exists":{
"field":"age"
}
}
]
}
},
"aggs": {
"all_interests": {
"terms": { "field": "age" }
}
}
}

模糊查询

text和keyword类型值

1
2
3
4
5
6
7
8
GET /student/_search
{
"query": {
"wildcard":{
"name": "*li*"
}
}
}

聚合查询

– text的字段聚合,使用“field.keyword” –

单字段聚合统计

效果约等于 select sum(age) from aggs group by age

1
2
3
4
5
6
7
8
9
10
11
GET /索引库名/_search
{
"aggs": {
"all_interests": { //自定义
"terms": {
"field": "age" ,
"size": 1000
}
}
}
}
单字段聚合算平均

效果约等于 select color,avg(price) as’平均价格’ from aggs group by color

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /索引库名/_search
{
"size": 0,
"aggs": {
"s": {
"terms": {
"field": "color.keyword", //text的字段聚合,使用“field.keyword”
"size": 10
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
两个字段统计聚合
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET /索引库名/_search
{
"aggs": {
"aggCount": { // 自定义
"terms": {
"field": "firstip"
},
"aggs": {
"idCount": { // 自定义
"terms": {
"field": "emlcountry.keyword" //text的字段聚合,使用“field.keyword”
}
}
}
}
}
}

查询排序

查询地址为’‘北京’', 按照endtime进行降序排序desc

PS:只能用数字 日期进行排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GET /索引库名/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"seraddr_p": {
"value": "北京市"
}
}
}
]
}
},
"sort": [
{
"endtime": {
"order": "desc"
}
}
]
}

and or查询

实现 “name”==“a” and (“city” == “b” or “city” == “c”)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
GET /索引库名/_search
{
"query": {
"bool": {
"must": [{
"match_phrase": {
"name": "a"
}
}],
"should": [{
"match_phrase": {
"city": "b"
}
},
{
"match_phrase": {
"city": "c"
}
}],
"minimum_should_match": 1
}
},
"size": 5
}

高亮查询

(富文本字符串拼接)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET /索引库名/_search
{
"query": {
"match": {
"name": "四"
}
},
"highlight": {
"pre_tags": "<b color='red'>",
"post_tags": "<b>",
"fields": {
"name": {}
}
}
}

分页

效果等于 select * from table limit 2,2

#分页公式:int start = {pageNum-1}*size

1
2
3
4
5
6
7
8
GET /索引库名/_search
{
"query": {
"match_all": {}
},
"from": 2,
"size": 2
}

地理查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
公式1: (不常用  根据两个点画出矩形,查询这个矩形内的所有东西)
GET /索引库名/_search
{
"query":{
"geo_bounding_box":{
"FIELD":{
"top_left":{
"lat":22.2,
"lon":33.3
},
"bottom_right":{
"lat":44.4,
"lon":55.5
},
}
}
}
}

公式2: (常用 以自己为中心,画出5km的半径,然后画一个圆 搜索圆里面的内容 )
GET /索引库名/_search
{
"query":{
"geo_distance":{
"FIELD":{
"distance":"5km"
"FIELD":"11.1,22.2"
}
}
}
}

删除

删除索引库

1
DELETE /studentinfo  //索引库名

根据id删除

1
DELETE /索引库名/_doc/1		// 文档id = 1

根据条件删除索引中数据

1
2
3
4
5
6
7
8
POST /索引名称/文档名称/_delete_by_query   
{
"query":{
"term":{
"_id":100000100
}
}
}

删除所有数据

1
2
3
4
5
6
7
POST /索引名称/文档名称/_delete_by_query?pretty
{
"query": {
"match_all": {
}
}
}

修改

修改索引库

ES与MySQL不太一样的地方,ES的索引库是不允许修改原有字段的,只能添加新字段

1
2
3
4
5
6
7
8
PUT /索引库名/_mapping
{
"properties" : {
"新字段名" : {
"type" : "text"
}
}
}

修改文档(数据)

方法一:全量修改

就是把插入文档的POST请求改成PUT请求

若该文档存在,则删除旧文档,插入新文档(全量替换)

若文档不存在,则直接插入新文档

1
2
3
4
5
6
7
8
9
PUT /索引库名/_doc/文档id
{
"id":"1",
"name":"张三",
"age":"16",
"height":"1.55",
"isHealth":true,
"remark":"高学生"
}

方法二:增量修改

只修改部分值,不会全量替换

1
2
3
4
5
6
POST /索引库名/_update/文档id
{
"doc" : {
"字段名" : "新的值"
}
}

集群

查看集群状态

1
GET /_cat/nodes