Logstash导入数据到Elasticsearch时自定义mapping

template概念

使用logstash导入数据到ES中由于索引按天拆分，手动创建mapping非常麻烦，ES内部维护了template，template定义好了mapping，如果索引的名称被template匹配到，那么该索引的mapping就按照template中定义的mapping自动创建。而且template中定义了索引的主分片和副分片的数量，分词等属性。

每个模板都有一个名字用于描述这个模板的用途，模板的级别使用order来管理，配合，一个 mapping 字段用于指明这个映射怎么使用，和至少一个参数（例如 match）来定义这个模板适用于哪个字段。

在elasticsearch中，如果你有一类相似的数据字段(我用logstash解析字段的时候，没有指定类型，所以解析出来的字段都是string类型，这些字段也不需要分词)，想要统一设置其映射，就可以用到一项功能：动态模板映射(dynamic_templates)。

启动logstash系统会使用一个默认的动态映射模板，模板名字为logstash，使用_template查看。

在启动logstash过程中你会看到如下信息

你看第一行path=>nil表示没有找到自定义模板，那就使用默认模板，并且最后将模板存储在elasticsearch模板路径中，以logstash命名。模板内容：

参数：

　　match_mapping_type允许你只对特定类型的字段使用模板，正如标准动态映射规则那样，比如string，long等。

　　match(unmatch相反)参数只会匹配字段名，如”*_es”,如果为”*”,就是所有字段（同时是match_papping_type类型）都会匹配到

　　path_match(path_unmatch相反)参数用于匹配对象中字段的完整路径，比如address.*.name可以匹配如下字段:

{
	"logstash": {
		"order": 0,
		"template": "logstash-*",
		"settings": {
			"index": {
				"refresh_interval": "5s"
			}
		},
		"mappings": {
			"_default_": {    # 文档类型
				"dynamic_templates":   # 关键词，固定的
             [ # 必须是中括号
                {   
					"message_field":    #模板名     
                     {   
						"mapping": {
							"fielddata": {
								"format": "disabled"
							},
							"index": "analyzed",   #是否分词，该demo是2.4.0的es，和6.5.4有一点点不太一样
							"omit_norms": true,
							"type": "string"   # 转换成的类型
						},
						"match_mapping_type": "string",    #匹配类型            
						"match": "message"    #匹配规则         
					}
				}, {
					"string_fields": {
						"mapping": {
							"fielddata": {
								"format": "disabled"
							},
							"index": "analyzed",
							"omit_norms": true,
							"type": "string",
							"fields": {
								"raw": {
									"ignore_above": 256,
									"index": "not_analyzed",
									"type": "string"
								}
							}
						},
						"match_mapping_type": "string",
						"match": "*"
					}
				}
               ],
				"_all": {
					"omit_norms": true,
					"enabled": true
				},
				"properties": {
					"@timestamp": {
						"type": "date"
					},
					"geoip": {
						"dynamic": true,
						"properties": {
							"ip": {
								"type": "ip"
							},
							"latitude": {
								"type": "float"
							},
							"location": {
								"type": "geo_point"
							},
							"longitude": {
								"type": "float"
							}
						}
					},
					"@version": {
						"index": "not_analyzed",
						"type": "string"
					}
				}
			}
		},
		"aliases": {}
	}
}

不指定索引的名字，使用logstash默认模板创建索引，logstash会向Elasticsearch创建一个名为logstash-*的按天创建的index以及名为logstash的template，之后每天创建一个logstash-%{+YYYY.MM.dd}的index用于存储日志。

这种情况下，logstash-%{+YYYY.MM.dd}索引就会有两个type, 一个是defalut, 一个是logs.

不使用logstash默认模板创建索引

如果不想使用logstash默认创建的模板创建索引，有两种解决方式：

1、是可以在logstash配置文件中的output中指定index索引名称，自定义索引名字不以logstash-开头，默认模板只匹配以logstash-开头的索引。

2、修改manage_template=>false #关闭logstash自动管理模板功能，logstash将不会调用Elasticsearch API创建模板，默认为true，如果该参数为false，则自定义的模板也不会加载。

索引type的问题

默认情况下，logstash向Elasticsearch提交创建的索引的type为”logs”,如果需要自定义type, 有两种方式，一种是在output里指定document_type参数，另一种是在input里指定type参数， output里的document_type优先级大于input里的type，但是在5.0版本以后input插件里面的type属性不会被默认为文档的类型了，需要使用document_type在output插件中指定。

自定义模板

1、在logstash-2.4.0目录下创键mkdir templates

2、vim demo_template.json编写template文件

3、在logstash的output插件中配置如下参数

4、

manage_template => true	开启logstash自动管理模板功能,默认manage_template参数为true, 否则logstash将不会调用Elasticsearch API创建模板。
template => “/xxx/logstash-2.4.0/templates/logs.json”	模板的路径
template_name => “tempalte_name”	映射模板的名字，template_name如果不指定的话，会使用默认值logstash.
template_overwrite => true	是否覆盖已存在的模板，template_overwrite为true则template的order高的，满足同样条件（如均以searchlog-开头）的template将覆盖order低的

（一）静态模板Demo：

{
	"template": "normal-index-*",
	"order": 1,
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 0,
		"refresh_interval": "60s"
	},
	"mappings ": {
		"normal": {
			"dynamic":"strict",
			"properties": {
				"traceid": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"systemid": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"offset": {
					"type": "long"
				},
				"prospector": {
					"properties": {
						"type": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						}
					}
				},
				"source": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"message": {
					"search_analyzer": "ik_smart",
					"analyzer": "ik",
					"store": true,
					"type": "string"
				},
				"tags": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"input": {
					"properties": {
						"type": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						}
					}
				},
				"@timestamp": {
					"format": "strict_date_optional_time||epoch_millis",
					"type": "date"
				},
				"loglevel": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"@version": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"beat": {
					"properties": {
						"hostname": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						},
						"name": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						},
						"version": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						}
					}
				},
				"host": {
					"properties": {
						"name": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						}
					}
				},
				"fields": {
					"properties": {
						"type": {
							"index": "not_analyzed",
							"store": true,
							"type": "string"
						}
					}
				},
				"logtime": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				},
				"methodname": {
					"index": "not_analyzed",
					"store": true,
					"type": "string"
				}
			}
		}
	}
}

首先有2个问题，如果一个index下有新的type写入，之前没有定义怎么办？如果一个type下有新的字段写入，之前没有定义怎么办？

如果一个index下有新的type写入，之前没有定义，则根据_default_定义的属性来匹配生成。如果一个type下有新的字段写入，之前没有定义，则根据该type下的dynamic_templates来匹配生成。

（二）动态模板Demo:

{
	"template": "normal-index-*",
	"order": 1,
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 0
	},
	"mappings": {
		"_default_": {
			"_all": {
				"enabled": true,
				"omit_norms": true
			},
			"dynamic_templates": [{
				"message_field": {
					"match": "message",
					"match_mapping_type": "string",
					"mapping": {
						"type": "string",
						"index": "analyzed",
						"search_analyzer": "ik_max_word",
						"analyzer": "ik_max_word",
						"omit_norms": true,
						"fielddata": {
							"format": "disabled"
						}
					}
				}
			},
			{
				"string_fields": {
					"match": "*",
					"match_mapping_type": "string",
					"mapping": {
						"type": "string",
						"index": "not_analyzed",
						"doc_values": true
					}
				}
			}
			],
			"properties": {
				"@timestamp": {
					"type": "date"
				},
				"@version": {
					"type": "string",
					"index": "not_analyzed"
				}
			},
			"dynamic_date_formats": ["yyyy-MM-dd HH:mm:ss.SSS"]
		}
	}
}

总结：template模板使用

1.静态模板：
适合索引字段数据固定的场景，一旦配置完成，~~不能向里面加入多余的字段，否则会报错~~ ，测试不会出错，会自动映射，如果加入”dynamic”:”true”,则会报错。
优点：scheam已知，业务场景明确，不容易出现因字段随便映射从而造成元数据撑爆es内存，从而导致es集群全部宕机
缺点：字段数多的情况下配置稍繁琐

1.动态模板：
　　　　　　适合字段数不明确，大量字段的配置类型相同的场景，多加字段不会报错
优点：可动态添加任意字段，无须改动scheaml，
缺点：如果添加的字段非常多，有可能造成es集群宕机

定制索引模板，是搜索业务中一项比较重要的步骤，需要注意的地方有很多，比如：
（1）字段数固定吗
（2）字段类型是什么
（3）分不分词
（4）索引不索引
（5）存储不存储
（6）排不排序
（7）是否加权
除了这些还有其他的一些因素，比如，词库的维护改动，搜索架构的变化等等。
如果前提没有充分的规划好，后期改变的话，改动其中任何一项，都需要重建索引，这个代价是非常大和耗时的，尤其是在一些数据量大的场景中

参考：

Elasticsearch – 自动检测及动态映射Dynamic Mapping

logstash在向elasticsearch输出数据时的动态映射模板问题

logstash使用template提前设置好maping同步mysql数据到Elasticsearch5.5.2

Elasticsearch dynamic mapping（动态映射）策略

logstash在Elasticsearch中创建的默认索引模板问题

Day9: Elasticsearch template的order

ES系列十一、ES的index、store、_source、copy_to和all的区别

elasticsearch 动态模板

原文链接：https://blog.csdn.net/qq_39669058/article/details/87615055