Table of Contents
第二个问题:Tomcat JDBC Connection not enough的问题
第三个问题:Postgresql too many clients 的问题
本次测试是针对部署在云服务上的单个service的性能测试,此service是整个业务逻辑服务的其中一环。
服务器背景介绍
整个service架构如下
数据库: postgresql
后端框架: spring boot + embbed tomcat + java
部署环境:某一个知名的云平台
性能测试方法
测试工具:
jemeter
测试思路:使用jemeter测试,分别针对不同的需要测试的api创建test plan。每个test plan有两个变量
- 线程数
- 数据量
每一个test plan分别起10,20,50,100,200个线程同时运行.分别在数据库数据量大致为一万,五万,十万,二十万,五十万时候运行。一共会产生25组结果,表格如下
数据量(单位:万) | 线程数(单位:个) | 并发运行时间(单位:分钟) |
1 | 10 | 5 |
5 | 20 | 5 |
10 | 50 | 5 |
20 | 100 | 5 |
50 | 200 | 5 |
在jemeter当中,我们可以填写启动的线程数,运行的时间,测试数据等。界面如下:
下面是一个test plan的xml代码实例
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="3.2" jmeter="3.3 r1808647">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Get by customerNumber with markets defaultAddress" enabled="true">
<stringProp name="TestPlan.comments"></stringProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
<boolProp name="TestPlan.serialize_threadgroups">true</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="TestPlan.user_define_classpath"></stringProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Get by customerNumber with markets defaultAddress" enabled="true">
<stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
<boolProp name="LoopController.continue_forever">false</boolProp>
<intProp name="LoopController.loops">-1</intProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">100</stringProp>
<stringProp name="ThreadGroup.ramp_time">1</stringProp>
<longProp name="ThreadGroup.start_time">1515376034000</longProp>
<longProp name="ThreadGroup.end_time">1515376034000</longProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">300</stringProp>
<stringProp name="ThreadGroup.delay"></stringProp>
</ThreadGroup>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="" elementType="Header">
<stringProp name="Header.name">hybris-tenant</stringProp>
<stringProp name="Header.value">tenantIT</stringProp>
</elementProp>
<elementProp name="" elementType="Header">
<stringProp name="Header.name">hybris-user</stringProp>
<stringProp name="Header.value">jmeter</stringProp>
</elementProp>
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Get by customerNumber with markets defaultAddress" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="HTTPSampler.domain"></stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol"></stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">requesturl</stringProp>
<stringProp name="HTTPSampler.method">GET</stringProp>
<boolProp name="HTTPSampler.follow_redirects">false</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">false</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<ResultCollector guiclass="ObsoleteGui" testclass="ResultCollector" testname="Monitor Results" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>false</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<threadCounts>true</threadCounts>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
<ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="true">
<boolProp name="ResultCollector.error_logging">true</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>false</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<threadCounts>true</threadCounts>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="StatVisualizer" testclass="ResultCollector" testname="Aggregate Report" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>false</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<threadCounts>true</threadCounts>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
<ResultCollector guiclass="ObsoleteGui" testclass="ResultCollector" testname="Monitor Results" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>false</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<threadCounts>true</threadCounts>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
<WorkBench guiclass="WorkBenchGui" testclass="WorkBench" testname="WorkBench" enabled="true">
<boolProp name="WorkBench.save">true</boolProp>
</WorkBench>
<hashTree/>
</hashTree>
</jmeterTestPlan>
测试结果以excel文档的形式生成出来,下面是一个其中一个测试结果的实例:
其中的error%是错误率的意思,即所有对服务器发的请求有多少个报错了。
测试结果和分析
原本的预想是想通过本次测试去发现代码和实现逻辑潜在的性能问题,但事实证明还没有到发现这一类问题的时候就已经先卡在了框架和环境的性能瓶颈或者问题上面。发现的主要问题有下面三个,其中前面两个问题是在部署的云环境里发现,第三个问题在本地环境发现,以一张图说明:
上图标红色的地方就是出现的问题,标注的位置就是出现错误的地方。下面是这几个问题的分析。
第一个问题:service crash重启的问题
错误信息如下:
Failed to make HTTP request to '/admin/health' on port 8080
这个问题的原因和部署的云环境密不可分,这个服务器部署的环境,默认提供了一个检测服务器状态的功能,即上图的monitor process,它提供了一个功能称之为:health check。所谓health check就是这个云平台,通过和你的服务器建立一个连接(可配置为TCP,HTTP),来检测这个服务器的状态是否是可用的,是不是down掉了。
实现的基础在于spring boot提供的actuator在一个spring boot服务启动的时候会默认开启一个/admin/health结尾的api,用以检测这个服务的状态是否可用。云平台的health check功能就是去访问这个接口,来判断service是否正常,如下:
云平台的机制是若超出一定的时间没有响应,则认为服务器出现了问题,就会启用crash重启机制, 即强制停止这个service,然后重新启动。
第二个问题:Tomcat JDBC Connection not enough的问题
错误信息如下
[EL Info]: query: 2018-01-22 10:20:52.064--UnitOfWork(269647005)--Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-nio-8080-exec-1505] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:4; busy:4; idle:0; lastwait:30000].
这是一个web容器tomcat产生的问题,spring boot内置了tomcat容器,同时也提供了默认的数据库连接池配置,当部署在云平台上的时候,它的默认配置是最大的连接数和最小的连接数都是4个,我们可以从错误信息看到: Unable to fetch a connection in 30 seconds, none available[
size:4; busy:4
; idle:0; lastwait:30000].
在spring的源代码里,我们也可以找到如下代码:
public class DataSourceConfigurer extends PooledServiceConnectorConfigurer<DataSource, DataSourceConfig> {
private MapServiceConnectionConfigurer<DataSource, MapServiceConnectorConfig> mapServiceConnectionConfigurer = new MapServiceConnectionConfigurer();
public DataSourceConfigurer() {
}
public DataSource configure(DataSource dataSource, DataSourceConfig config) {
if (config == null) {
config = new DataSourceConfig(new PoolConfig(4, 30000), (ConnectionConfig)null);
}
this.configureConnection(dataSource, config);
this.configureConnectionProperties(dataSource, config);
return (DataSource)super.configure(dataSource, config);
}
...
}
The code tells everything.
第三个问题:Postgresql too many clients 的问题
这个错误是在本地发现的,因为本地的web容器数据库连接池限制不再是4。错误信息如下
[EL Info]: query: 2018-01-22 16:30:41.721--UnitOfWork(222405369)--Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
这是一个postgresql数据库返回的一个错误,错误信息很明显了:数据库的连接不够用了,Clients客户端已经满了。
postgresql默认的数据库连接数量是100,在其
官方文档
,有如下说明:
而默认情况下,tomcat的请求连接数是200(
更多信息参考
),因此当并发数小于请求连接数又大于数据库连接数的时候,就会出现这个问题,对我们的启示是,我们在配置服务器的时候应该协调好各个部分的最大支持的并发量,在这里应该设置:
连接池大小 * 服务器instance数量 <= database连接限制