出处:
http://blog.csdn.net/shirdrn/article/details/9718387
我们使用Solr Replication可以实现Solr服务器的可用性,即使某一个索引副本由于磁盘介质故障或者误操作删除等,其他的多个复制副本仍然可以提供服务。如果只是单纯的基于Solr Replication技术,只能对一个索引进行管理维护,当索引数据达到一定规模,搜索的性能成了瓶颈,除了重新规划设计索引,实现逻辑划分以外,没有更好地方法实现查询服务器的可扩展性。
SolrCloud就是为了解决这个问题而提出的。SolrCloud通过ZooKeeper集群来进行协调,使一个索引(SolrCloud中叫做一个Collection)进行分片,各个分片可以分布在不同的物理节点上,而且,对于同一个Collection的多个分片(Shard)之间没有交集,亦即,多个物理分片组成一个完成的索引Collection。为了保证分片数据的可用性,SolrCloud自动支持Solr Replication,可以同时对分片进行复制,冗余存储。下面,我们基于Solr最新的4.3.1版本进行安装配置SolrCloud集群,通过实践来实现索引数据的分布存储和检索。
准备工作
-
服务器信息
三台服务器:
-
10.95.3.61 master
-
10.95.3.62 slave1
-
10.95.3.65 slave4
-
ZooKeeper集群配置
安装ZooKeeper集群,在上面3分节点上分别安装,使用的版本是zookeeper-3.4.5。
首先,在master节点上配置zoo.cfg,内容如下所示:
-
[hadoop@master ~]$ vi applications/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
-
# The number of milliseconds of each tick
-
tickTime=2000
-
# The number of ticks that the initial
-
# synchronization phase can take
-
initLimit=10
-
# The number of ticks that can pass between
-
# sending a request and getting an acknowledgement
-
syncLimit=5
-
# the directory where the snapshot is stored.
-
# do not use /tmp for storage, /tmp here is just
-
# example sakes.
-
dataDir=/home/hadoop/applications/zookeeper/zookeeper-3.4.5/data
-
# the port at which the clients will connect
-
clientPort=2188
-
-
dataLogDir=/home/hadoop/applications/zookeeper/zookeeper-3.4.5/data/logs
-
-
server.1=master:4888:5888
-
server.2=slave1:4888:5888
-
server.3=slave4:4888:5888
-
#
-
# Be sure to read the maintenance section of the
-
# administrator guide before turning on autopurge.
-
#
-
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
-
#
-
# The number of snapshots to retain in dataDir
-
#autopurge.snapRetainCount=3
-
# Purge task interval in hours
-
# Set to “0” to disable auto purge feature
-
#autopurge.purgeInterval=1
然后,创建对应的数据存储目录后,可以直接将该配置复制到其他两个节点上:
-
[hadoop@master ~]$ scp -r applications/zookeeper/zookeeper-3.4.5 hadoop@slave1:~/applications/zookeeper/
-
[hadoop@master ~]$ scp -r applications/zookeeper/zookeeper-3.4.5 hadoop@slave4:~/applications/zookeeper/
启动ZooKeeper集群,在每个节点上分别启动ZooKeeper服务:
-
[hadoop@master ~]$ cd applications/zookeeper/zookeeper-3.4.5/
-
[hadoop@master zookeeper-3.4.5]$ bin/zkServer.sh start
-
-
[hadoop@slave1 ~]$ cd applications/zookeeper/zookeeper-3.4.5/
-
[hadoop@slave1 zookeeper-3.4.5]$ bin/zkServer.sh start
-
-
[hadoop@slave4 ~]$ cd applications/zookeeper/zookeeper-3.4.5/
-
[hadoop@slave4 zookeeper-3.4.5]$ bin/zkServer.sh start
可以查看ZooKeeper集群的状态,保证集群启动没有问题:
-
[hadoop@master zookeeper-3.4.5]$ bin/zkServer.sh status
-
JMX enabled by default
-
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
-
Mode: follower
-
-
[hadoop@slave1 zookeeper-3.4.5]$ bin/zkServer.sh status
-
JMX enabled by default
-
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
-
Mode: follower
-
-
[hadoop@slave4 zookeeper-3.4.5]$ bin/zkServer.sh status
-
JMX enabled by default
-
Using config: /home/hadoop/applications/zookeeper/zookeeper-3.4.5/bin/../conf/zoo.cfg
-
Mode: leader
可以看到,slave4节点是ZooKeeper集群服务Leader。
- SolrCloud相关目录
我们选择/home/hadoop/applications/solr/cloud目录存放Solr的库文件和配置文件,该目录下有lib和multicore两个子目录。
另外,还有一个存储索引的目录,设置为/home/hadoop/applications/storage/cloud/data。
SolrCloud配置
首先在一个节点上对SOLR进行配置,我们选择master节点。
1、SOLR基本配置
将下载的SOLR的压缩包解压缩,将solr-4.3.1\example\webapps\solr.war解开,将solr-4.3.1\example\webapps\solr\WEB-INF\lib和solr-4.3.1\example\lib\ext中的jar文件拷贝到solr-4.3.1\example\webapps\solr\WEB-INF\lib中,并将解开的solr目录改名为solr-cloud,然后传到服务器的Tomcat下的webapps目录下。
将solr-4.3.1\example\webapps\solr\WEB-INF\lib和solr-4.3.1\example\lib\ext下面的jar文件都拷贝到指定目录/home/hadoop/applications/solr/cloud/lib/中:
-
[hadoop@master ~]$ ls /home/hadoop/applications/solr/cloud/lib/
-
commons-cli-1.2.jar lucene-analyzers-common-4.3.1.jar lucene-suggest-4.3.1.jar
-
commons-codec-1.7.jar lucene-analyzers-kuromoji-4.3.1.jar noggit-0.5.jar
-
commons-fileupload-1.2.1.jar lucene-analyzers-phonetic-4.3.1.jar org.restlet-2.1.1.jar
-
commons-io-2.1.jar lucene-codecs-4.3.1.jar org.restlet.ext.servlet-2.1.1.jar
-
commons-lang-2.6.jar lucene-core-4.3.1.jar slf4j-api-1.6.6.jar
-
guava-13.0.1.jar lucene-grouping-4.3.1.jar slf4j-log4j12-1.6.6.jar
-
httpclient-4.2.3.jar lucene-highlighter-4.3.1.jar solr-core-4.3.1.jar
-
httpcore-4.2.2.jar lucene-memory-4.3.1.jar solr-solrj-4.3.1.jar
-
httpmime-4.2.3.jar lucene-misc-4.3.1.jar spatial4j-0.3.jar
-
jcl-over-slf4j-1.6.6.jar lucene-queries-4.3.1.jar wstx-asl-3.2.7.jar
-
jul-to-slf4j-1.6.6.jar lucene-queryparser-4.3.1.jar zookeeper-3.4.5.jar
-
log4j-1.2.16.jar lucene-spatial-4.3.1.jar
目录/home/hadoop/applications/solr/cloud/multicore的结构,如图所示:
下面,我们对上面conf目录下的配置文件进行说明:
-
schema.xml文件
-
<?
xml
version
=
“1.0”
?>
-
-
<
schema
name
=
“example core two”
version
=
“1.1”
>
-
<
types
>
-
<
fieldtype
name
=
“string”
class
=
“solr.StrField”
omitNorms
=
“true”
/>
-
<
fieldType
name
=
“long”
class
=
“solr.TrieLongField”
/>
-
<
fieldtype
name
=
“int”
class
=
“solr.IntField”
/>
-
<
fieldtype
name
=
“float”
class
=
“solr.FloatField”
/>
-
<
fieldType
name
=
“date”
class
=
“solr.TrieDateField”
precisionStep
=
“0”
positionIncrementGap
=
“0”
/>
-
</
types
>
-
<
fields
>
-
<
field
name
=
“id”
type
=
“long”
indexed
=
“true”
stored
=
“true”
multiValued
=
“false”
required
=
“true”
/>
-
<
field
name
=
“area”
type
=
“string”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“building_type”
type
=
“int”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“category”
type
=
“string”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“temperature”
type
=
“int”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“code”
type
=
“int”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“latitude”
type
=
“float”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“longitude”
type
=
“float”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“when”
type
=
“date”
indexed
=
“true”
stored
=
“false”
multiValued
=
“false”
/>
-
<
field
name
=
“_version_”
type
=
“long”
indexed
=
“true”
stored
=
“true”
/>
-
</
fields
>
-
<
uniqueKey
>
id
</
uniqueKey
>
-
<
defaultSearchField
>
area
</
defaultSearchField
>
-
<
solrQueryParser
defaultOperator
=
“OR”
/>
-
</
schema
>
-
solrconfig.xml文件
-
<?
xml
version
=
“1.0”
encoding
=
“UTF-8”
?>
-
-
<
config
>
-
<
luceneMatchVersion
>
LUCENE_43
</
luceneMatchVersion
>
-
<
directoryFactory
name
=
“DirectoryFactory”
class
=
“${solr.directoryFactory:solr.StandardDirectoryFactory}”
/>
-
<
dataDir
>
${solr.shard.data.dir:}
</
dataDir
>
-
<
schemaFactory
class
=
“ClassicIndexSchemaFactory”
/>
-
-
<
updateHandler
class
=
“solr.DirectUpdateHandler2”
>
-
<
updateLog
>
-
<
str
name
=
“dir”
>
${solr.shard.data.dir:}
</
str
>
-
</
updateLog
>
-
</
updateHandler
>
-
-
<!– realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. The current implementation relies on the updateLog feature being enabled. –>
-
<
requestHandler
name
=
“/get”
class
=
“solr.RealTimeGetHandler”
>
-
<
lst
name
=
“defaults”
>
-
<
str
name
=
“omitHeader”
>
true
</
str
>
-
</
lst
>
-
</
requestHandler
>
-
<
requestHandler
name
=
“/replication”
class
=
“solr.ReplicationHandler”
startup
=
“lazy”
/>
-
<
requestDispatcher
handleSelect
=
“true”
>
-
<
requestParsers
enableRemoteStreaming
=
“false”
multipartUploadLimitInKB
=
“2048”
formdataUploadLimitInKB
=
“2048”
/>
-
</
requestDispatcher
>
-
-
<
requestHandler
name
=
“standard”
class
=
“solr.StandardRequestHandler”
default
=
“true”
/>
-
<
requestHandler
name
=
“/analysis/field”
startup
=
“lazy”
class
=
“solr.FieldAnalysisRequestHandler”
/>
-
<
requestHandler
name
=
“/update”
class
=
“solr.UpdateRequestHandler”
/>
-
<
requestHandler
name
=
“/update/csv”
class
=
“solr.CSVRequestHandler”
startup
=
“lazy”
>
-
<
lst
name
=
“defaults”
>
-
<
str
name
=
“separator”
>
,
</
str
>
-
<
str
name
=
“header”
>
true
</
str
>
-
<
str
name
=
“encapsulator”
>
”
</
str
>
-
</
lst
>
-
<
updateLog
>
-
<
str
name
=
“dir”
>
${solr.shard.data.dir:}
</
str
>
-
</
updateLog
>
-
</
requestHandler
>
-
<
requestHandler
name
=
“/admin/”
class
=
“org.apache.solr.handler.admin.AdminHandlers”
/>
-
<
requestHandler
name
=
“/admin/ping”
class
=
“solr.PingRequestHandler”
>
-
<
lst
name
=
“invariants”
>
-
<
str
name
=
“q”
>
solrpingquery
</
str
>
-
</
lst
>
-
<
lst
name
=
“defaults”
>
-
<
str
name
=
“echoParams”
>
all
</
str
>
-
</
lst
>
-
</
requestHandler
>
-
-
<
updateRequestProcessorChain
name
=
“sample”
>
-
<
processor
class
=
“solr.LogUpdateProcessorFactory”
/>
-
<
processor
class
=
“solr.DistributedUpdateProcessorFactory”
/>
-
<
processor
class
=
“solr.RunUpdateProcessorFactory”
/>
-
</
updateRequestProcessorChain
>
-
-
<
query
>
-
<
maxBooleanClauses
>
1024
</
maxBooleanClauses
>
-
<
filterCache
class
=
“solr.FastLRUCache”
size
=
“10240”
initialSize
=
“512”
autowarmCount
=
“0”
/>
-
<
queryResultCache
class
=
“solr.LRUCache”
size
=
“10240”
initialSize
=
“512”
autowarmCount
=
“0”
/>
-
<
documentCache
class
=
“solr.LRUCache”
size
=
“10240”
initialSize
=
“512”
autowarmCount
=
“0”
/>
-
<
enableLazyFieldLoading
>
true
</
enableLazyFieldLoading
>
-
<
queryResultWindowSize
>
20
</
queryResultWindowSize
>
-
<
queryResultMaxDocsCached
>
200
</
queryResultMaxDocsCached
>
-
<
maxWarmingSearchers
>
2
</
maxWarmingSearchers
>
-
</
query
>
-
<
admin
>
-
<
defaultQuery
>
solr
</
defaultQuery
>
-
</
admin
>
-
</
config
>
-
solrcore.properties文件
-
solr.shard.data.dir=/home/hadoop/applications/storage/cloud/data
属性solr.shard.data.dir在solrconfig.xml文件中北引用过,指定索引数据的存放位置。
-
solr.xml文件
该文件中指定了ZooKeeper的相关配置,已经Solr Core的配置内容:
-
<?
xml
version
=
“1.0”
encoding
=
“UTF-8”
?>
-
-
<
solr
persistent
=
“true”
>
-
<
cores
defaultCoreName
=
“collection1”
host
=
“${host:}”
adminPath
=
“/admin/cores”
zkClientTimeout
=
“${zkClientTimeout:15000}”
hostPort
=
“8888”
hostContext
=
“${hostContext:solr-cloud}”
>
-
</
cores
>
-
</
solr
>
注意:这里,我们并没有配置任何的core元素,这个等到整个配置安装完成之后,通过SOLR提供的REST接口,来实现Collection以及Shard的创建,从而来更新这些配置文件。
2、ZooKeeper管理监控配置文件
SolrCloud是通过ZooKeeper集群来保证配置文件的变更及时同步到各个节点上,所以,需要将配置文件上传到ZooKeeper集群中:
-
[hadoop@master ~]$ java -classpath .:/home/hadoop/applications/solr/cloud/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost master:2188,slave1:2188,slave4:2188 -confdir /home/hadoop/applications/solr/cloud/multicore/collection1/conf -confname myconf
-
-
[hadoop@master ~]$ java -classpath .:/home/hadoop/applications/solr/cloud/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection collection1 -confname myconf -zkhost master:2188,slave1:2188,slave4:2188
上传完成以后,我们检查一下ZooKeeper上的存储情况:
-
[hadoop@master ~]$ cd applications/zookeeper/zookeeper-3.4.5/
-
[hadoop@master zookeeper-3.4.5]$ bin/zkCli.sh -server master:2188
-
…
-
[zk: master:2188(CONNECTED) 0] ls /
-
[configs, collections, zookeeper]
-
[zk: master:2188(CONNECTED) 2] ls /configs
-
[myconf]
-
[zk: master:2188(CONNECTED) 3] ls /configs/myconf
-
[solrcore.properties, solrconfig.xml, schema.xml]
3、Tomcat配置与启动
在Tomcat的启动脚本bin/catalina.sh中,增加如下配置:
-
JAVA_OPTS=”-server -Xmx4096m -Xms1024m -verbose:gc -Xloggc:solr_gc.log -Dsolr.solr.home=/home/hadoop/applications/solr/cloud/multicore -DzkHost=master:2188,slave1:2188,slave4:2188″
启动Tomcat服务器:
-
[hadoop@master ~]$ cd servers/apache-tomcat-7.0.42
-
[hadoop@master apache-tomcat-7.0.42]$ bin/catalina.sh start
可以查看日志,如下所示:
-
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.AprLifecycleListener init
-
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: :HADOOP_HOME/lib/native:/dw/snappy/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
-
八月 01, 2013 3:11:03 下午 org.apache.coyote.AbstractProtocol init
-
INFO: Initializing ProtocolHandler [“http-bio-8888”]
-
八月 01, 2013 3:11:03 下午 org.apache.coyote.AbstractProtocol init
-
INFO: Initializing ProtocolHandler [“ajp-bio-8009”]
-
八月 01, 2013 3:11:03 下午 org.apache.catalina.startup.Catalina load
-
INFO: Initialization processed in 1410 ms
-
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.StandardService startInternal
-
INFO: Starting service Catalina
-
八月 01, 2013 3:11:03 下午 org.apache.catalina.core.StandardEngine startInternal
-
INFO: Starting Servlet Engine: Apache Tomcat/7.0.42
-
八月 01, 2013 3:11:03 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/ROOT
-
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/host-manager
-
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/manager
-
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/examples
-
八月 01, 2013 3:11:04 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/solr-cloud
-
2013-08-01 15:11:05.369 [localhost-startStop-1] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init()
-
2013-08-01 15:11:05.392 [localhost-startStop-1] INFO org.apache.solr.core.SolrResourceLoader – No /solr/home in JNDI
-
2013-08-01 15:11:05.393 [localhost-startStop-1] INFO org.apache.solr.core.SolrResourceLoader – using system property solr.solr.home: /home/hadoop/applications/solr/cloud/multicore
-
2013-08-01 15:11:05.402 [localhost-startStop-1] INFO org.apache.solr.core.CoreContainer – looking for solr config file: /home/hadoop/applications/solr/cloud/multicore/solr.xml
-
2013-08-01 15:11:05.403 [localhost-startStop-1] INFO org.apache.solr.core.CoreContainer – New CoreContainer 1665441141
-
2013-08-01 15:11:05.406 [localhost-startStop-1] INFO org.apache.solr.core.CoreContainer – Loading CoreContainer using Solr Home: ‘/home/hadoop/applications/solr/cloud/multicore/’
-
2013-08-01 15:11:05.406 [localhost-startStop-1] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: ‘/home/hadoop/applications/solr/cloud/multicore/’
-
2013-08-01 15:11:05.616 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’adminHandler’]
-
2013-08-01 15:11:05.618 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/int[@name=’coreLoadThreads’]
-
2013-08-01 15:11:05.620 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’coreRootDirectory’]
-
2013-08-01 15:11:05.621 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’distribUpdateConnTimeout’]
-
2013-08-01 15:11:05.622 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’distribUpdateSoTimeout’]
-
2013-08-01 15:11:05.624 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/str[@name=’host’]
-
2013-08-01 15:11:05.626 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/str[@name=’hostContext’]
-
2013-08-01 15:11:05.628 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’hostPort’]
-
2013-08-01 15:11:05.630 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’leaderVoteWait’]
-
2013-08-01 15:11:05.632 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’managementPath’]
-
2013-08-01 15:11:05.633 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’sharedLib’]
-
2013-08-01 15:11:05.635 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’shareSchema’]
-
2013-08-01 15:11:05.636 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/int[@name=’transientCacheSize’]
-
2013-08-01 15:11:05.638 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’zkClientTimeout’]
-
2013-08-01 15:11:05.640 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’zkHost’]
-
2013-08-01 15:11:05.647 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/str[@name=’class’]
-
2013-08-01 15:11:05.648 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/str[@name=’enabled’]
-
2013-08-01 15:11:05.649 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/int[@name=’size’]
-
2013-08-01 15:11:05.654 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/int[@name=’threshold’]
-
2013-08-01 15:11:05.657 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@coreLoadThreads
-
2013-08-01 15:11:05.658 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@sharedLib
-
2013-08-01 15:11:05.659 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@zkHost
-
2013-08-01 15:11:05.661 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/@class
-
2013-08-01 15:11:05.662 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/@enabled
-
2013-08-01 15:11:05.663 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/@size
-
2013-08-01 15:11:05.665 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/@threshold
-
2013-08-01 15:11:05.666 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@adminHandler
-
2013-08-01 15:11:05.668 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@distribUpdateConnTimeout
-
2013-08-01 15:11:05.669 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@distribUpdateSoTimeout
-
2013-08-01 15:11:05.672 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@host=${host:}
-
2013-08-01 15:11:05.673 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@hostContext=${hostContext:solr-cloud}
-
2013-08-01 15:11:05.674 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@hostPort=8888
-
2013-08-01 15:11:05.676 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@leaderVoteWait
-
2013-08-01 15:11:05.677 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@managementPath
-
2013-08-01 15:11:05.679 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@shareSchema
-
2013-08-01 15:11:05.680 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@transientCacheSize
-
2013-08-01 15:11:05.681 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@zkClientTimeout=${zkClientTimeout:15000}
-
2013-08-01 15:11:05.686 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/@class
-
2013-08-01 15:11:05.692 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/@name
-
2013-08-01 15:11:05.694 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/int[@connTimeout]
-
2013-08-01 15:11:05.695 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/int[@socketTimeout]
-
2013-08-01 15:11:05.699 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@defaultCoreName=collection1
-
2013-08-01 15:11:05.700 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/@persistent=true
-
2013-08-01 15:11:05.701 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@adminPath=/admin/cores
-
2013-08-01 15:11:05.713 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’adminHandler’]
-
2013-08-01 15:11:05.714 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/int[@name=’coreLoadThreads’]
-
2013-08-01 15:11:05.715 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’coreRootDirectory’]
-
2013-08-01 15:11:05.718 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’distribUpdateConnTimeout’]
-
2013-08-01 15:11:05.719 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’distribUpdateSoTimeout’]
-
2013-08-01 15:11:05.720 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/str[@name=’host’]
-
2013-08-01 15:11:05.722 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/str[@name=’hostContext’]
-
2013-08-01 15:11:05.723 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’hostPort’]
-
2013-08-01 15:11:05.724 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’leaderVoteWait’]
-
2013-08-01 15:11:05.727 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’managementPath’]
-
2013-08-01 15:11:05.728 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’sharedLib’]
-
2013-08-01 15:11:05.729 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/str[@name=’shareSchema’]
-
2013-08-01 15:11:05.730 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/int[@name=’transientCacheSize’]
-
2013-08-01 15:11:05.735 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’zkClientTimeout’]
-
2013-08-01 15:11:05.737 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/solrcloud/int[@name=’zkHost’]
-
2013-08-01 15:11:05.740 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/str[@name=’class’]
-
2013-08-01 15:11:05.747 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/str[@name=’enabled’]
-
2013-08-01 15:11:05.749 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/int[@name=’size’]
-
2013-08-01 15:11:05.752 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/int[@name=’threshold’]
-
2013-08-01 15:11:05.755 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@coreLoadThreads
-
2013-08-01 15:11:05.756 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@sharedLib
-
2013-08-01 15:11:05.759 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/@zkHost
-
2013-08-01 15:11:05.760 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/@class
-
2013-08-01 15:11:05.761 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/@enabled
-
2013-08-01 15:11:05.763 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/@size
-
2013-08-01 15:11:05.764 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/logging/watcher/@threshold
-
2013-08-01 15:11:05.765 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@adminHandler
-
2013-08-01 15:11:05.768 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@distribUpdateConnTimeout
-
2013-08-01 15:11:05.769 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@distribUpdateSoTimeout
-
2013-08-01 15:11:05.770 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@host=${host:}
-
2013-08-01 15:11:05.771 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@hostContext=${hostContext:solr-cloud}
-
2013-08-01 15:11:05.772 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@hostPort=8888
-
2013-08-01 15:11:05.774 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@leaderVoteWait
-
2013-08-01 15:11:05.776 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@managementPath
-
2013-08-01 15:11:05.777 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@shareSchema
-
2013-08-01 15:11:05.778 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/@transientCacheSize
-
2013-08-01 15:11:05.779 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@zkClientTimeout=${zkClientTimeout:15000}
-
2013-08-01 15:11:05.780 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/@class
-
2013-08-01 15:11:05.781 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/@name
-
2013-08-01 15:11:05.783 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/int[@connTimeout]
-
2013-08-01 15:11:05.785 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/shardHandlerFactory/int[@socketTimeout]
-
2013-08-01 15:11:05.786 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@defaultCoreName=collection1
-
2013-08-01 15:11:05.787 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/@persistent=true
-
2013-08-01 15:11:05.788 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null solr/cores/@adminPath=/admin/cores
-
2013-08-01 15:11:05.791 [localhost-startStop-1] DEBUG org.apache.solr.core.Config – null missing optional solr/cores/shardHandlerFactory
-
2013-08-01 15:11:05.799 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0
-
2013-08-01 15:11:05.802 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http://
-
2013-08-01 15:11:05.802 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0
-
2013-08-01 15:11:05.803 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20
-
2013-08-01 15:11:05.803 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0
-
2013-08-01 15:11:05.804 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647
-
2013-08-01 15:11:05.805 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5
-
2013-08-01 15:11:05.805 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1
-
2013-08-01 15:11:05.806 [localhost-startStop-1] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false
-
2013-08-01 15:11:05.824 [localhost-startStop-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20&maxConnections=10000&socketTimeout=0&connTimeout=0&retry=false
-
2013-08-01 15:11:06.248 [localhost-startStop-1] INFO org.apache.solr.core.CoreContainer – Registering Log Listener
-
2013-08-01 15:11:06.251 [localhost-startStop-1] INFO org.apache.solr.core.CoreContainer – Zookeeper client=master:2188,slave1:2188,slave4:2188
-
2013-08-01 15:11:06.273 [localhost-startStop-1] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnections=500&maxConnectionsPerHost=16&socketTimeout=0&connTimeout=0
-
2013-08-01 15:11:06.402 [localhost-startStop-1] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for client to connect to ZooKeeper
-
2013-08-01 15:11:06.461 [localhost-startStop-1-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@4b1707b4 name:ZooKeeperConnection Watcher:master:2188,slave1:2188,slave4:2188 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
-
2013-08-01 15:11:06.462 [localhost-startStop-1] INFO org.apache.solr.common.cloud.ConnectionManager – Client is connected to ZooKeeper
-
2013-08-01 15:11:06.485 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/queue
-
2013-08-01 15:11:06.523 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/collection-queue-work
-
2013-08-01 15:11:06.546 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /live_nodes
-
2013-08-01 15:11:06.555 [localhost-startStop-1] INFO org.apache.solr.cloud.ZkController – Register node as live in ZooKeeper:/live_nodes/10.95.3.61:8888_solr-cloud
-
2013-08-01 15:11:06.562 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /live_nodes/10.95.3.61:8888_solr-cloud
-
2013-08-01 15:11:06.578 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer_elect/election
-
2013-08-01 15:11:06.626 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer_elect/leader
-
2013-08-01 15:11:06.644 [localhost-startStop-1] INFO org.apache.solr.cloud.Overseer – Overseer (id=234248255751323650-10.95.3.61:8888_solr-cloud-n_0000000000) starting
-
2013-08-01 15:11:06.667 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /overseer/queue-work
-
2013-08-01 15:11:06.697 [Overseer-234248255751323650-10.95.3.61:8888_solr-cloud-n_0000000000] INFO org.apache.solr.cloud.OverseerCollectionProcessor – Process current queue of collection creations
-
2013-08-01 15:11:06.698 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /clusterstate.json
-
2013-08-01 15:11:06.711 [localhost-startStop-1] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: /aliases.json
-
2013-08-01 15:11:06.720 [localhost-startStop-1] INFO org.apache.solr.common.cloud.ZkStateReader – Updating cluster state from ZooKeeper…
-
2013-08-01 15:11:06.780 [Thread-2] INFO org.apache.solr.cloud.Overseer – Starting to work on the main queue
-
2013-08-01 15:11:06.829 [localhost-startStop-1] INFO org.apache.solr.servlet.SolrDispatchFilter – user.dir=/home/hadoop/servers/apache-tomcat-7.0.42
-
2013-08-01 15:11:06.829 [localhost-startStop-1] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() done
-
八月 01, 2013 3:11:06 下午 org.apache.catalina.startup.HostConfig deployDirectory
-
INFO: Deploying web application directory /home/hadoop/servers/apache-tomcat-7.0.42/webapps/docs
-
八月 01, 2013 3:11:06 下午 org.apache.coyote.AbstractProtocol start
-
INFO: Starting ProtocolHandler [“http-bio-8888”]
-
八月 01, 2013 3:11:06 下午 org.apache.coyote.AbstractProtocol start
-
INFO: Starting ProtocolHandler [“ajp-bio-8009”]
-
八月 01, 2013 3:11:06 下午 org.apache.catalina.startup.Catalina start
-
INFO: Server startup in 3163 ms
我开的是DEBUG模式,便于调试。
这时候,SolrCloud集群中只有一个活跃的节点,而且默认生成了一个collection1实例,这个实例实际上虚拟的,因为通过web界面无法访问
http://master:8888/solr-cloud/
,看不到任何有关SolrCloud的信息,如图所示:
4、同步数据和配置信息,启动其他节点
在另外两个节点上安装Tomcat和Solr服务器,只需要拷贝对应的目录即可:
-
[hadoop@master ~]$ scp -r servers/ hadoop@slave1:~/
-
[hadoop@master ~]$ scp -r servers/ hadoop@slave4:~/
-
-
[hadoop@master ~]$ scp -r applications/solr/cloud hadoop@slave1:~/applications/solr/
-
[hadoop@master ~]$ scp -r applications/solr/cloud hadoop@slave4:~/applications/solr/
-
-
[hadoop@slave1 ~]$ mkdir -p applications/storage/cloud/data/
-
[hadoop@slave4 ~]$ mkdir -p applications/storage/cloud/data/
启动其他Solr服务器节点:
-
[hadoop@slave1 ~]$ cd servers/apache-tomcat-7.0.42
-
[hadoop@slave1 apache-tomcat-7.0.42]$ bin/catalina.sh start
-
-
[hadoop@slave4 ~]$ cd servers/apache-tomcat-7.0.42
-
[hadoop@slave4 apache-tomcat-7.0.42]$ bin/catalina.sh start
查看ZooKeeper集群中数据状态:
-
[zk: master:2188(CONNECTED) 3] ls /live_nodes
-
[10.95.3.65:8888_solr-cloud, 10.95.3.61:8888_solr-cloud, 10.95.3.62:8888_solr-cloud]
5、创建Collection、Shard和Replication
- 创建Collection及初始Shard
直接通过REST接口来创建Collection,如下所示:
-
[hadoop@master ~]$ curl ‘http://master:8888/solr-cloud/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=1’
-
<?xml version=”1.0″ encoding=”UTF-8″?>
-
<response>
-
<lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>4103</int></lst><lst name=”success”><lst><lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>3367</int></lst><str name=”core”>mycollection_shard2_replica1</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst><lst><lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>3280</int></lst><str name=”core”>mycollection_shard1_replica1</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst><lst><lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>3690</int></lst><str name=”core”>mycollection_shard3_replica1</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str></lst></lst>
-
</response>
上面链接中的几个参数的含义,说明如下:
-
name 待创建Collection的名称
-
numShards 分片的数量
-
replicationFactor 复制副本的数量
执行上述操作如果没有异常,已经创建了一个Collection,名称为mycollection,而且每个节点上存在一个分片。这时,也可以查看ZooKeeper中状态:
-
[zk: master:2188(CONNECTED) 5] ls /collections
-
[mycollection, collection1]
-
[zk: master:2188(CONNECTED) 6] ls /collections/mycollection
-
[leader_elect, leaders]
由上图可以看到,对应节点上SOLR分片的对应关系:
-
shard3 10.95.3.61 master
-
shard1 10.95.3.62 slave1
-
shard2 10.95.3.65 slave4
实际上,我们从master节点可以看到,SOLR的配置文件内容,已经发生了变化,如下所示:
-
[hadoop@master ~]$ cat applications/solr/cloud/multicore/solr.xml
-
<?
xml
version
=
“1.0”
encoding
=
“UTF-8”
?>
-
<
solr
persistent
=
“true”
>
-
<
cores
defaultCoreName
=
“collection1”
host
=
“${host:}”
adminPath
=
“/admin/cores”
zkClientTimeout
=
“${zkClientTimeout:15000}”
hostPort
=
“8888”
hostContext
=
“${hostContext:solr-cloud}”
>
-
<
core
loadOnStartup
=
“true”
shard
=
“shard3”
instanceDir
=
“mycollection_shard3_replica1/”
transient
=
“false”
name
=
“mycollection_shard3_replica1”
collection
=
“mycollection”
/>
-
</
cores
>
-
</
solr
>
- 创建Replication
下面对已经创建的初始分片进行复制。
shard1已经在slave1上,我们复制分片到master和slave4上,执行如下命令:
shard1已经在slave1上,我们复制分片到master和slave4上,执行如下命令:
-
[hadoop@master ~]$ curl ‘http://master:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_2&shard=shard1’
-
<?xml version=”1.0″ encoding=”UTF-8″?>
-
<response>
-
<lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>1485</int></lst><str name=”core”>mycollection_shard1_replica_2</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
-
</response>
-
-
[hadoop@master ~]$ curl ‘http://master:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_3&shard=shard1’
-
<?xml version=”1.0″ encoding=”UTF-8″?>
-
<response>
-
<lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>2543</int></lst><str name=”core”>mycollection_shard1_replica_3</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
-
</response>
-
-
[hadoop@slave4 ~]$ curl ‘http://slave4:8888/solr-cloud/admin/cores?action=CREATE&collection=mycollection&name=mycollection_shard1_replica_4&shard=shard1’
-
<?xml version=”1.0″ encoding=”UTF-8″?>
-
<response>
-
<lst name=”responseHeader”><int name=”status”>0</int><int name=”QTime”>2405</int></lst><str name=”core”>mycollection_shard1_replica_4</str><str name=”saved”>/home/hadoop/applications/solr/cloud/multicore/solr.xml</str>
-
</response>
最后的结果是,slave1上的shard1,在master节点上有2个副本,名称为mycollection_shard1_replica_2和mycollection_shard1_replica_3,在slave4节点上有一个副本,名称为mycollection_shard1_replica_4.
也可以通过查看master和slave4上的目录变化,如下所示:
-
[hadoop@master ~]$ ll applications/solr/cloud/multicore/
-
总用量 24
-
drwxrwxr-x. 4 hadoop hadoop 4096 8月 1 09:58 collection1
-
drwxrwxr-x. 3 hadoop hadoop 4096 8月 1 15:41 mycollection_shard1_replica_2
-
drwxrwxr-x. 3 hadoop hadoop 4096 8月 1 15:42 mycollection_shard1_replica_3
-
drwxrwxr-x. 3 hadoop hadoop 4096 8月 1 15:23 mycollection_shard3_replica1
-
-rw-rw-r–. 1 hadoop hadoop 784 8月 1 15:42 solr.xml
-
-rw-rw-r–. 1 hadoop hadoop 1004 8月 1 10:02 zoo.cfg
-
-
[hadoop@slave4 ~]$ ll applications/solr/cloud/multicore/
-
总用量 20
-
drwxrwxr-x. 4 hadoop hadoop 4096 8月 1 14:53 collection1
-
drwxrwxr-x. 3 hadoop hadoop 4096 8月 1 15:44 mycollection_shard1_replica_4
-
drwxrwxr-x. 3 hadoop hadoop 4096 8月 1 15:23 mycollection_shard2_replica1
-
-rw-rw-r–. 1 hadoop hadoop 610 8月 1 15:44 solr.xml
-
-rw-rw-r–. 1 hadoop hadoop 1004 8月 1 15:08 zoo.cfg
其中,mycollection_shard3_replica1和mycollection_shard2_replica1都是创建Collection的时候自动生成的分片,也就是第一个副本。
通过Web界面,可以更加直观地看到shard1的情况,如图所示:
我们再次从master节点可以看到,SOLR的配置文件内容,又发生了变化,如下所示:
-
[hadoop@master ~]$ cat applications/solr/cloud/multicore/solr.xml
-
<?
xml
version
=
“1.0”
encoding
=
“UTF-8”
?>
-
<
solr
persistent
=
“true”
>
-
<
cores
defaultCoreName
=
“collection1”
host
=
“${host:}”
adminPath
=
“/admin/cores”
zkClientTimeout
=
“${zkClientTimeout:15000}”
hostPort
=
“8888”
hostContext
=
“${hostContext:solr-cloud}”
>
-
<
core
loadOnStartup
=
“true”
shard
=
“shard3”
instanceDir
=
“mycollection_shard3_replica1/”
transient
=
“false”
name
=
“mycollection_shard3_replica1”
collection
=
“mycollection”
/>
-
<
core
loadOnStartup
=
“true”
shard
=
“shard1”
instanceDir
=
“mycollection_shard1_replica_2/”
transient
=
“false”
name
=
“mycollection_shard1_replica_2”
collection
=
“mycollection”
/>
-
<
core
loadOnStartup
=
“true”
shard
=
“shard1”
instanceDir
=
“mycollection_shard1_replica_3/”
transient
=
“false”
name
=
“mycollection_shard1_replica_3”
collection
=
“mycollection”
/>
-
</
cores
>
-
</
solr
>
到此为止,我们已经基于3个物理节点,配置完成了SolrCloud集群。
索引数据
我们根据前面定义的schema.xml,自己构造了一个数据集,代码如下所示:
-
package
org.shirdrn.solr.data;
-
-
import
java.io.BufferedWriter;
-
import
java.io.FileOutputStream;
-
import
java.io.IOException;
-
import
java.io.OutputStreamWriter;
-
import
java.text.DateFormat;
-
import
java.text.SimpleDateFormat;
-
import
java.util.Date;
-
import
java.util.Random;
-
-
public
class
BuildingSampleGenerator {
-
-
private
final
DateFormat df =
new
SimpleDateFormat(
“yyyy-MM-dd’T’HH:mm:ss.SSS’Z'”
);
-
private
Random random =
new
Random();
-
-
static
String[] areas = {
-
“北京”
,
“上海”
,
“深圳”
,
“广州”
,
“天津”
,
“重庆”
,
“成都”
,
-
“银川”
,
“沈阳”
,
“大连”
,
“吉林”
,
“郑州”
,
“徐州”
,
“兰州”
,
-
“东京”
,
“纽约”
,
“贵州”
,
“长春”
,
“大连”
,
“武汉”
,
“南京”
,
-
“海口”
,
“太原”
,
“济南”
,
“日照”
,
“菏泽”
,
“包头”
,
“松原”
-
};
-
-
long
pre = 0L;
-
long
current = 0L;
-
public
synchronized
long
genId() {
-
current = System.nanoTime();
-
if
(current == pre) {
-
try
{
-
Thread.sleep(
0
,
1
);
-
}
catch
(InterruptedException e) {
-
e.printStackTrace();
-
}
-
current = System.nanoTime();
-
pre = current;
-
}
-
return
current;
-
}
-
-
public
String genArea() {
-
return
areas[random.nextInt(areas.length)];
-
}
-
-
private
int
maxLatitude =
90
;
-
private
int
maxLongitude =
180
;
-
-
public
Coordinate genCoordinate() {
-
int
beforeDot = random.nextInt(maxLatitude);
-
double
afterDot = random.nextDouble();
-
double
lat = beforeDot + afterDot;
-
-
beforeDot = random.nextInt(maxLongitude);
-
afterDot = random.nextDouble();
-
double
lon = beforeDot + afterDot;
-
-
return
new
Coordinate(lat, lon);
-
}
-
-
private
Random random1 =
new
Random(System.currentTimeMillis());
-
private
Random random2 =
new
Random(
2
* System.currentTimeMillis());
-
public
int
genFloors() {
-
return
1
+ random1.nextInt(
50
) + random2.nextInt(
50
);
-
}
-
-
public
class
Coordinate {
-
-
double
latitude;
-
double
longitude;
-
-
public
Coordinate() {
-
super
();
-
}
-
-
public
Coordinate(
double
latitude,
double
longitude) {
-
super
();
-
this
.latitude = latitude;
-
this
.longitude = longitude;
-
}
-
-
public
double
getLatitude() {
-
return
latitude;
-
}
-
-
public
double
getLongitude() {
-
return
longitude;
-
}
-
}
-
-
-
static
int
[] signs = {-
1
,
1
};
-
public
int
genTemperature() {
-
return
signs[random.nextInt(
2
)] * random.nextInt(
81
);
-
}
-
-
static
String[] codes = {
“A”
,
“B”
,
“C”
,
“D”
,
“E”
,
“F”
,
“G”
,
“H”
,
“I”
,
-
“J”
,
“K”
,
“L”
,
“M”
,
“N”
,
“O”
,
“P”
,
“Q”
,
“R”
,
“S”
,
“T”
,
“U”
,
“V”
,
-
“W”
,
“X”
,
“Y”
,
“Z”
};
-
public
String genCode() {
-
return
codes[random.nextInt(codes.length)];
-
}
-
-
static
int
[] types = {
0
,
1
,
2
,
3
};
-
public
int
genBuildingType() {
-
return
types[random.nextInt(types.length)];
-
}
-
-
static
String[] categories = {
-
“办公建筑”
,
“教育建筑”
,
“商业建筑”
,
“文教建筑”
,
“医卫建筑”
,
-
“住宅”
,
“宿舍”
,
“公寓”
,
“工业建筑”
};
-
public
String genBuildingCategory() {
-
return
categories[random.nextInt(categories.length)];
-
}
-
-
public
void
generate(String file,
int
count)
throws
IOException {
-
BufferedWriter w =
new
BufferedWriter(
new
OutputStreamWriter(
new
FileOutputStream(file),
“UTF-8”
));
-
w.write(
“id,area,building_type,category,temperature,code,latitude,longitude,when”
);
-
w.newLine();
-
-
-
for
(
int
i=
0
; i<count; i++) {
-
String when = df.format(
new
Date());
-
-
StringBuffer sb =
new
StringBuffer();
-
sb.append(genId()).append(
“,”
)
-
.append(
“\””
).append(genArea()).append(
“\””
).append(
“,”
)
-
.append(genBuildingType()).append(
“,”
)
-
.append(
“\””
).append(genBuildingCategory()).append(
“\””
).append(
“,”
)
-
.append(genTemperature()).append(
“,”
)
-
.append(genCode()).append(
“,”
);
-
Coordinate coord = genCoordinate();
-
sb.append(coord.latitude).append(
“,”
)
-
.append(coord.longitude).append(
“,”
)
-
.append(
“\””
).append(when).append(
“\””
);
-
w.write(sb.toString());
-
w.newLine();
-
}
-
w.close();
-
System.out.println(
“Finished: file=”
+ file);
-
}
-
-
public
static
void
main(String[] args)
throws
Exception {
-
BuildingSampleGenerator gen =
new
BuildingSampleGenerator();
-
String file =
“E:\\Develop\\eclipse-jee-kepler\\workspace\\solr-data\\building_files”
;
-
for
(
int
i=
0
; i<=
9
; i++) {
-
String f =
new
String(file +
“_100w_0”
+ i +
“.csv”
);
-
gen.generate(f,
5000000
);
-
}
-
}
-
-
}
生成的文件,如下所示:
-
[hadoop@master solr-data]$ ll building_files_100w*
-
-rw-rw-r–. 1 hadoop hadoop 109025853 7月 26 14:05 building_files_100w_00.csv
-
-rw-rw-r–. 1 hadoop hadoop 108015504 7月 26 10:53 building_files_100w_01.csv
-
-rw-rw-r–. 1 hadoop hadoop 108022184 7月 26 11:00 building_files_100w_02.csv
-
-rw-rw-r–. 1 hadoop hadoop 108016854 7月 26 11:00 building_files_100w_03.csv
-
-rw-rw-r–. 1 hadoop hadoop 108021750 7月 26 11:00 building_files_100w_04.csv
-
-rw-rw-r–. 1 hadoop hadoop 108017496 7月 26 11:00 building_files_100w_05.csv
-
-rw-rw-r–. 1 hadoop hadoop 108016193 7月 26 11:00 building_files_100w_06.csv
-
-rw-rw-r–. 1 hadoop hadoop 108023537 7月 26 11:00 building_files_100w_07.csv
-
-rw-rw-r–. 1 hadoop hadoop 108014684 7月 26 11:00 building_files_100w_08.csv
-
-rw-rw-r–. 1 hadoop hadoop 108022044 7月 26 11:00 building_files_100w_09.csv
数据文件格式如下:
-
[hadoop@master solr-data]$ head building_files_100w_00.csv
-
id,area,building_type,category,temperature,code,latitude,longitude,when
-
18332617097417,”广州”,2,”医卫建筑”,61,N,5.160762478343409,62.92919119315037,”2013-07-26T14:05:55.832Z”
-
18332617752331,”成都”,1,”教育建筑”,10,Q,77.34792453477195,72.59812030045762,”2013-07-26T14:05:55.833Z”
-
18332617815833,”大连”,0,”教育建筑”,18,T,81.47569061530493,0.2177194388096203,”2013-07-26T14:05:55.833Z”
-
18332617903711,”广州”,0,”办公建筑”,31,D,51.85825084513671,13.60710950097155,”2013-07-26T14:05:55.833Z”
-
18332617958555,”深圳”,3,”商业建筑”,5,H,22.181374031472675,119.76001810254823,”2013-07-26T14:05:55.833Z”
-
18332618020454,”济南”,3,”公寓”,-65,L,84.49607030736806,29.93095171443135,”2013-07-26T14:05:55.834Z”
-
18332618075939,”北京”,2,”住宅”,-29,J,86.61660177436184,39.20847527640485,”2013-07-26T14:05:55.834Z”
-
18332618130141,”菏泽”,0,”医卫建筑”,24,J,70.57574551258345,121.21977908377244,”2013-07-26T14:05:55.834Z”
-
18332618184343,”徐州”,2,”办公建筑”,31,W,0.10129771041097524,153.40533210345387,”2013-07-26T14:05:55.834Z”
我们向已经搭建好的SolrCloud集群,执行索引数据的操作。这里,实现了一个简易的客户端,代码如下所示:
-
package
org.shirdrn.solr.indexing;
-
-
import
java.io.IOException;
-
import
java.net.MalformedURLException;
-
import
java.text.DateFormat;
-
import
java.text.SimpleDateFormat;
-
import
java.util.Date;
-
-
import
org.apache.solr.client.solrj.SolrServerException;
-
import
org.apache.solr.client.solrj.impl.CloudSolrServer;
-
import
org.apache.solr.common.SolrInputDocument;
-
import
org.shirdrn.solr.data.BuildingSampleGenerator;
-
import
org.shirdrn.solr.data.BuildingSampleGenerator.Coordinate;
-
-
public
class
CloudSolrClient {
-
-
private
CloudSolrServer cloudSolrServer;
-
-
public
synchronized
void
open(
final
String zkHost,
final
String defaultCollection,
-
int
zkClientTimeout,
final
int
zkConnectTimeout) {
-
if
(cloudSolrServer ==
null
) {
-
try
{
-
cloudSolrServer =
new
CloudSolrServer(zkHost);
-
cloudSolrServer.setDefaultCollection(defaultCollection);
-
cloudSolrServer.setZkClientTimeout(zkClientTimeout);
-
cloudSolrServer.setZkConnectTimeout(zkConnectTimeout);
-
}
catch
(MalformedURLException e) {
-
System.out
-
.println(
“The URL of zkHost is not correct!! Its form must as below:\n zkHost:port”
);
-
e.printStackTrace();
-
}
catch
(Exception e) {
-
e.printStackTrace();
-
}
-
}
-
}
-
-
public
void
addDoc(
long
id, String area,
int
buildingType, String category,
-
int
temperature, String code,
double
latitude,
double
longitude, String when) {
-
try
{
-
SolrInputDocument doc =
new
SolrInputDocument();
-
doc.addField(
“id”
, id);
-
doc.addField(
“area”
, area);
-
doc.addField(
“building_type”
, buildingType);
-
doc.addField(
“category”
, category);
-
doc.addField(
“temperature”
, temperature);
-
doc.addField(
“code”
, code);
-
doc.addField(
“latitude”
, latitude);
-
doc.addField(
“longitude”
, longitude);
-
doc.addField(
“when”
, when);
-
cloudSolrServer.add(doc);
-
cloudSolrServer.commit();
-
}
catch
(SolrServerException e) {
-
System.err.println(
“Add docs Exception !!!”
);
-
e.printStackTrace();
-
}
catch
(IOException e) {
-
e.printStackTrace();
-
}
catch
(Exception e) {
-
System.err.println(
“Unknowned Exception!!!!!”
);
-
e.printStackTrace();
-
}
-
-
}
-
-
public
static
void
main(String[] args) {
-
final
String zkHost =
“master:2188”
;
-
final
String defaultCollection =
“mycollection”
;
-
final
int
zkClientTimeout =
20000
;
-
final
int
zkConnectTimeout =
1000
;
-
-
CloudSolrClient client =
new
CloudSolrClient();
-
client.open(zkHost, defaultCollection, zkClientTimeout, zkConnectTimeout);
-
-
BuildingSampleGenerator gen =
new
BuildingSampleGenerator();
-
final
DateFormat df =
new
SimpleDateFormat(
“yyyy-MM-dd’T’HH:mm:ss.SSS’Z'”
);
-
-
for
(
int
i =
0
; i <
10000
; i++) {
-
long
id = gen.genId();
-
String area = gen.genArea();
-
int
buildingType = gen.genBuildingType();
-
String category = gen.genBuildingCategory();
-
int
temperature = gen.genTemperature();
-
String code = gen.genCode();
-
Coordinate coord = gen.genCoordinate();
-
double
latitude = coord.getLatitude();
-
double
longitude = coord.getLongitude();
-
String when = df.format(
new
Date());
-
client.addDoc(id, area, buildingType, category, temperature, code, latitude, longitude, when);
-
}
-
-
}
-
-
}
这样,可以查看SolrCloud管理页面,或者直接登录到服务器上,能够看到对应索引数据分片的情况,比较均匀地分布到各个Shard节点上。
当然,也可以从Web管理页面上来管理各个分片的副本数据,比如某个分片具有太多的副本,通过页面上的删除掉(unload)该副本,实际该副本的元数据信息被从ZooKeeper集群维护的信息中删除,在具体的节点上的副本数据并没有删除,而只是处于离线状态,不能提供服务。
搜索数据
我们可以执行搜索,执行如下搜索条件:
-
http://master:8888/solr-cloud/mycollection/select?q=北京 纽约&fl=*&fq=category:公寓&fq=building_type:2&start=0&rows=10
搜索结果,如下所示:
-
<
response
>
-
<
lst
name
=
“responseHeader”
>
-
<
int
name
=
“status”
>
0
</
int
>
-
<
int
name
=
“QTime”
>
570
</
int
>
-
</
lst
>
-
<
result
name
=
“response”
numFound
=
“201568”
start
=
“0”
maxScore
=
“1.5322487”
>
-
<
doc
>
-
<
long
name
=
“id”
>
37109751480918
</
long
>
-
<
long
name
=
“_version_”
>
1442164237143113728
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
37126929150371
</
long
>
-
<
long
name
=
“_version_”
>
1442164255154503680
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
37445266827945
</
long
>
-
<
long
name
=
“_version_”
>
1442164588949798912
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
37611390043867
</
long
>
-
<
long
name
=
“_version_”
>
1442164763138195456
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
37892268870281
</
long
>
-
<
long
name
=
“_version_”
>
1442165057653833728
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
89820941817153
</
long
>
-
<
long
name
=
“_version_”
>
1442219517734289408
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
89825667635450
</
long
>
-
<
long
name
=
“_version_”
>
1442219522665742336
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
89830029550692
</
long
>
-
<
long
name
=
“_version_”
>
1442219527207124993
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
93932235463589
</
long
>
-
<
long
name
=
“_version_”
>
1442223828610580480
</
long
>
-
</
doc
>
-
<
doc
>
-
<
long
name
=
“id”
>
93938975733467
</
long
>
-
<
long
name
=
“_version_”
>
1442223835684274177
</
long
>
-
</
doc
>
-
</
result
>
-
</
response
>
可以查看对应的日志,示例如下所示:
-
2013-08-05 18:38:26.814 [http-bio-8888-exec-228] INFO org.apache.solr.core.SolrCore – [mycollection_shard1_0_replica2] webapp=/solr-cloud path=/select params={NOW=1375699145633&shard.url=10.95.3.62:8888/solr-cloud/mycollection_shard1_0_replica1/|10.95.3.61:8888/solr-cloud/mycollection_shard1_0_replica3/&fl=id,score&start=0&q=北京+纽约&distrib=false&wt=javabin&isShard=true&fsv=true&fq=category:公寓&fq=building_type:2&version=2&rows=10} hits=41529 status=0 QTime=102
-
-
2013-08-05 18:39:06.203 [http-bio-8888-exec-507] INFO org.apache.solr.core.SolrCore – [mycollection_shard3_replica1] webapp=/solr-cloud path=/select params={fl=*&start=0&q=北京+纽约&fq=category:公寓&fq=building_type:2&rows=10} hits=201568 status=0 QTime=570
相关问题
1、我在进行Collection的创建的时候,当前有4个节点,在ZooKeeper集群中注册,执行如下命令:
-
[hadoop@slave1 multicore]$ curl ‘http://slave1:8888/solr-cloud/admin/collections?action=CREATE&name=tinycollection&numShards=2&replicationFactor=3’
出现异常:
-
<?
xml
version
=
“1.0”
encoding
=
“UTF-8”
?>
-
<
response
>
-
<
lst
name
=
“responseHeader”
>
-
<
int
name
=
“status”
>
400
</
int
>
-
<
int
name
=
“QTime”
>
81
</
int
>
-
</
lst
>
-
<
str
name
=
“Operation createcollection caused exception:”
>
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)
</
str
>
-
<
lst
name
=
“exception”
>
-
<
str
name
=
“msg”
>
Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)
</
str
>
-
<
int
name
=
“rspCode”
>
400
</
int
>
-
</
lst
>
-
<
lst
name
=
“error”
>
-
<
str
name
=
“msg”
>
Cannot create collection tinycollection. Value of maxShardsPerNode is 1, and the number of live nodes is 4. This allows a maximum of 4 to be created. Value of numShards is 2 and value of replicationFactor is 3. This requires 6 shards to be created (higher than the allowed number)
</
str
>
-
<
int
name
=
“code”
>
400
</
int
>
-
</
lst
>
-
</
response
>
根据上面异常信息可知,当前有4个节点可用,但是我在创建Collection的时候,指定两个Shard,同时复制因子是3,所以最低要求,需要6个节点。所以,可以减少复制因子,例如
replicationFactor=2,表示一共存在两个副本(Leader分片和另一个副本),然后再执行创建Collection的操作就不会报错了。