爬虫开发–常见问题解决方法

  • Post author:
  • Post category:其他



常见问题解决方法:


1、

爬虫代码报错:Max retries exceeded with url


具体报错信息:


“requests.exceptions.

SSLError: HTTPSConnectionPool

(host=’www.qiushibaike.com’, port=443):

Max retries exceeded with url

: /imgrank/page/4/ (Caused by SSLError(SSLEOFError(8, ‘EOF occurred in violation of protocol (_ssl.c:1124)’)))”


报错原因和解决方法分析


1)http连接太多没有关闭导致的



解决方法:

设置重连次数,发送请求前,关闭多余连接;

import requests
requests.adapters.DEFAULT_RETRIES = 5 # 增加重连次数
s = requests.session()
s.keep_alive = False # 关闭多余连接
s.get(url) # 你需要的网址


2)访问次数频繁,被禁止访问



解决方法:

使用代理

查找代理的网址:http://ip.zdaye.com/shanghai_ip.html#Free

import requests
s = requests.session()
url = "https://mail.163.com/"
s.proxies = {"https": "47.100.104.247:8080", "http": "36.248.10.47:8080", }
s.headers = header
s.get(url)



使用代理时需注意:

  • 代理分为

    http和https

    两种,

    不能用混

    ,如果把http的代理用作https也是会报上面的错误;
  • 上面的

    代理以字典格式

    传入,例如上面的例子,可以是“47.100.104.247:8080”这种格式,也可以是“https://47.100.104.247:8080”这种格式;
  • 如果

    代理不可用

    一样会报上面的错误。


以下方法判断代理是否可用:

import requests
s = requests.session()
url = "https://mail.163.com/"
s.keep_alive = False
s.proxies = {"https": "47.100.104.247:8080", "http": "36.248.10.47:8080", }
s.headers = header
r = s.get(url)
print r.status_code  # 如果代理可用则正常访问,不可用报以上错误

2、代码报错:

AttributeError: ‘NoneType’ object has no attribute ‘extend’


具体报错信息:

AttributeError: ‘NoneType’ object has no attribute ‘extend’ 或者AttributeError: ‘NoneType’ object has no attribute ‘append’

原因:这两种方法都是没有返回值的,也就是返回的是NoneType类型,而NoneType数据再次调用extend或者append方法时就会报上面的error。



错误写法:

将extend的返回值赋值给aa,导致aa此时成为NoneType类型



正确写法:



直接使用,不用返回值

3、反爬表现:在使用python爬虫的时候,请求只返回一段js代码的html页面;

<html><script>
var arg1='5C0F6F00234C266ACFF04557E2629D01A1F6EEF5';
var _0x4818=['\x63\x73\x4b\x48\x77\x71\x4d\x49','\x5a\x73\x4b\x4a\x77\x72\x38\x56\x65\x41\x73\x79','\x55\x63\x4b\x69\x4e\x38\x4f\x2f\x77\x70\x6c\x77\x4d\x41\x3d\x3d','\x4a\x52\x38\x43\x54\x67\x3d\x3d','\x59\x73\x4f\x6e\x62\x53\x45\x51\x77\x37\x6f\x7a\x77\x71\x5a\x4b\x65\x73\x4b\x55\x77\x37\x6b\x77\x58\x38\x4f\x52\x49\x51\x3d\x3d','\x77\x37\x6f\x56\x53\x38\x4f\x53\x77\x6f\x50\x43\x6c\x33\x6a\x43\x68\x4d\x4b\x68\x77\x36\x48\x44\x6c\x73\x4b\x58\x77\x34\x73\x2f\x59\x73\x4f\x47','\x66\x77\x56\x6d\x49\x31\x41\x74\x77\x70\x6c\x61\x59\x38\x4f\x74\x77\x35\x63\x4e\x66\x53\x67\x70\x77\x36\x4d\x3d','\x4f\x63\x4f\x4e\x77\x72\x6a\x43\x71\x73\x4b\x78\x54\x47\x54\x43\x68\x73\x4f\x6a\x45\x57\x45\x38\x50\x63\x4f\x63\x4a\x38\x4b\x36','\x55\x38\x4b\x35\x4c\x63\x4f\x74\x77\x70\x56\x30\x45\x4d\x4f\x6b\x77\x34\x37\x44\x72\x4d\x4f\x58','\x48\x4d\x4f\x32\x77\x6f\x48\x43\x69\x4d\x4b\x39\x53\x6c\x58\x43\x6c\x63\x4f\x6f\x43\x31\x6b\x3d','\x61\x73\x4b\x49\x77\x71\x4d\x44\x64\x67\x4d\x75\x50\x73\x4f\x4b\x42\x4d\x4b\x63\x77\x72\x72\x43\x74\x6b\x4c\x44\x72\x4d\x4b\x42\x77\x36\x34\x64','\x77\x71\x49\x6d\x4d\x54\x30\x74\x77\x36\x52\x4e\x77\x35\x6b\x3d','\x44\x4d\x4b\x63\x55\x30\x4a\x6d\x55\x77\x55\x76','\x56\x6a\x48\x44\x6c\x4d\x4f\x48\x56\x63\x4f\x4e\x58\x33\x66\x44\x69\x63\x4b\x4a\x48\x51\x3d\x3d','\x77\x71\x68\x42\x48\x38\x4b\x6e\x77\x34\x54\x44\x68\x53\x44\x44\x67\x4d\x4f\x64\x77\x72\x6a\x43\x6e\x63\x4f\x57\x77\x70\x68\x68\x4e\x38\x4b\x43\x47\x63\x4b\x71\x77\x36\x64\x48\x41\x55\x35\x2b\x77\x72\x67\x32\x4a\x63\x4b\x61\x77\x34\x49\x45\x4a\x63\x4f\x63\x77\x72\x52\x4a\x77\x6f\x5a\x30\x77\x71\x46\x39\x59\x67\x41\x56','\x64\x7a\x64\x32\x77\x35\x62\x44\x6d\x33\x6a\x44\x70\x73\x4b\x33\x77\x70\x59\x3d','\x77\x34\x50\x44\x67\x63\x4b\x58\x77\x6f\x33\x43\x6b\x63\x4b\x4c\x77\x72\x35\x71\x77\x72\x59\x3d','\x77\x72\x4a\x4f\x54\x63\x4f\x51\x57\x4d\x4f\x67','\x77\x71\x54\x44\x76\x63\x4f\x6a\x77\x34\x34\x37\x77\x72\x34\x3d','\x77\x35\x58\x44\x71\x73\x4b\x68\x4d\x46\x31\x2f','\x77\x72\x41\x79\x48\x73\x4f\x66\x77\x70\x70\x63','\x4a\x33\x64\x56\x50\x63\x4f\x78\x4c\x67\x3d\x3d','\x77\x72\x64\x48\x77\x37\x70\x39\x5a\x77\x3d\x3d','\x77\x34\x72\x44\x6f\x38\x4b\x6d\x4e\x45\x77\x3d','\x49\x4d\x4b\x41\x55\x6b\x42\x74','\x77\x36\x62\x44\x72\x63\x4b\x51\x77\x70\x56\x48\x77\x70\x4e\x51\x77\x71\x55\x3d','\x64\x38\x4f\x73\x57\x68\x41\x55\x77\x37\x59\x7a\x77\x72\x55\x3d','\x77\x71\x6e\x43\x6b\x73\x4f\x65\x65\x7a\x72\x44\x68\x77\x3d\x3d','\x55\x73\x4b\x6e\x49\x4d\x4b\x57\x56\x38\x4b\x2f','\x77\x34\x7a\x44\x6f\x63\x4b\x38\x4e\x55\x5a\x76','\x63\x38\x4f\x78\x5a\x68\x41\x4a\x77\x36\x73\x6b\x77\x71\x4a\x6a','\x50\x63\x4b\x49\x77\x34\x6e\x43\x6b\x6b\x56\x62','\x4b\x48\x67\x6f\x64\x4d\x4f\x32\x56\x51\x3d\x3d','\x77\x70\x73\x6d\x77\x71\x76\x44\x6e\x47\x46\x71','\x77\x71\x4c\x44\x74\x38\x4f\x6b\x77\x34\x63\x3d','\x77\x37\x77\x31\x77\x34\x50\x43\x70\x73\x4f\x34\x77\x71\x41\x3d','\x77\x71\x39\x46\x52\x73\x4f\x71\x57\x4d\x4f\x71','\x62\x79\x42\x68\x77\x37\x72\x44\x6d\x33\x34\x3d','\x4c\x48\x67\x2b\x53\x38\x4f\x74\x54\x77\x3d\x3d','\x77\x71\x68\x4f\x77\x37\x31\x35\x64\x73\x4f\x48','\x55\x38\x4f\x37\x56\x73\x4f\x30\x77\x71\x76\x44\x76\x63\x4b\x75\x4b\x73\x4f\x71\x58\x38\x4b\x72','\x59\x69\x74\x74\x77\x35\x44\x44\x6e\x57\x6e\x44\x72\x41\x3d\x3d','\x59\x4d\x4b\x49\x77\x71\x55\x55\x66\x67\x49\x6b','\x61\x42\x37\x44\x6c\x4d\x4f\x44\x54\x51\x3d\x3d','\x77\x70\x66\x44\x68\x38\x4f\x72\x77\x36\x6b\x6b','\x77\x37\x76\x43\x71\x4d\x4f\x72\x59\x38\x4b\x41\x56\x6b\x35\x4f\x77\x70\x6e\x43\x75\x38\x4f\x61\x58\x73\x4b\x5a\x50\x33\x44\x43\x6c\x63\x4b\x79\x77\x36\x48\x44\x72\x51\x3d\x3d','\x77\x6f\x77\x2b\x77\x36\x76\x44\x6d\x48\x70\x73\x77\x37\x52\x74\x77\x6f\x39\x38\x4c\x43\x37\x43\x69\x47\x37\x43\x6b\x73\x4f\x52\x54\x38\x4b\x6c\x57\x38\x4f\x35\x77\x72\x33\x44\x69\x38\x4f\x54\x48\x73\x4f\x44\x65\x48\x6a\x44\x6d\x63\x4b\x6c\x4a\x73\x4b\x71\x56\x41\x3d\x3d','\x4e\x77\x56\x2b','\x77\x37\x48\x44\x72\x63\x4b\x74\x77\x70\x4a\x61\x77\x70\x5a\x62','\x77\x70\x51\x73\x77\x71\x76\x44\x69\x48\x70\x75\x77\x36\x49\x3d','\x59\x4d\x4b\x55\x77\x71\x4d\x4a\x5a\x51\x3d\x3d','\x4b\x48\x31\x56\x4b\x63\x4f\x71\x4b\x73\x4b\x31','\x66\x51\x35\x73\x46\x55\x6b\x6b\x77\x70\x49\x3d','\x77\x72\x76\x43\x72\x63\x4f\x42\x52\x38\x4b\x6b','\x4d\x33\x77\x30\x66\x51\x3d\x3d','\x77\x36\x78\x58\x77\x71\x50\x44\x76\x4d\x4f\x46\x77\x6f\x35\x64'];(function(_0x4c97f0,_0x1742fd){var _0x4db1c=function(_0x48181e){while(--_0x48181e){_0x4c97f0['\x70\x75\x73\x68'](_0x4c97f0['\x73\x68\x69\x66\x74']());}};var _0x3cd6c6=function(){var _0xb8360b={'\x64\x61\x74\x61':{'\x6b\x65\x79':'\x63\x6f\x6f\x6b\x69\x65','\x76\x61\x6c\x75\x65':'\x74\x69\x6d\x65\x6f\x75\x74'},'\x73\x65\x74\x43\x6f\x6f\x6b\x69\x65':function(_0x20bf34,_0x3e840e,_0x5693d3,_0x5e8b26){_0x5e8b26=_0x5e8b26||{};var _0xba82f0=_0x3e840e+'\x3d'+_0x5693d3;var _0x5afe31=0x0;for(var _0x5afe31=0x0,_0x178627=_0x20bf34['\x6c\x65\x6e\x67\x74\x68'];_0x5afe31<_0x178627;_0x5afe31++){var _0x41b2ff=_0x20bf34[_0x5afe31];_0xba82f0+='\x3b\x20'+_0x41b2ff;var _0xd79219=_0x20bf34[_0x41b2ff];_0x20bf34['\x70\x75\x73\x68'](_0xd79219);_0x178627=_0x20bf34['\x6c\x65\x6e\x67\x74\x68'];if(_0xd79219!==!![]){_0xba82f0+='\x3d'+_0xd79219;}}_0x5e8b26['\x63\x6f\x6f\x6b\x69\x65']=_0xba82f0;},'\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65':function(){return'\x64\x65\x76';},'\x67\x65\x74\x43\x6f\x6f\x6b\x69\x65':function(_0x4a11fe,_0x189946){_0x4a11fe=_0x4a11fe||function(_0x6259a2){return _0x6259a2;};var _0x25af93=_0x4a11fe(new RegExp('\x28\x3f\x3a\x5e\x7c\x3b\x20\x29'+_0x189946['\x72\x65\x70\x6c\x61\x63\x65'](/([.$?*|{}()[]\/+^])/g,'\x24\x31')+'\x3d\x28\x5b\x5e\x3b\x5d\x2a\x29'));var _0x52d57c=function(_0x105f59,_0x3fd789){_0x105f59(++_0x3fd789);};_0x52d57c(_0x4db1c,_0x1742fd);return _0x25af93?decodeURIComponent(_0x25af93[0x1]):undefined;}};var _0x4a2aed=function(){var _0x124d17=new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return _0x124d17['\x74\x65\x73\x74'](_0xb8360b['\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65']['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};_0xb8360b['\x75\x70\x64\x61\x74\x65\x43\x6f\x6f\x6b\x69\x65']=_0x4a2aed;var _0x2d67ec='';var _0x120551=_0xb8360b['\x75\x70\x64\x61\x74\x65\x43\x6f\x6f\x6b\x69\x65']();if(!_0x120551){_0xb8360b['\x73\x65\x74\x43\x6f\x6f\x6b\x69\x65'](['\x2a'],'\x63\x6f\x75\x6e\x74\x65\x72',0x1);}else if(_0x120551){_0x2d67ec=_0xb8360b['\x67\x65\x74\x43\x6f\x6f\x6b\x69\x65'](null,'\x63\x6f\x75\x6e\x74\x65\x72');}else{_0xb8360b['\x72\x65\x6d\x6f\x76\x65\x43\x6f\x6f\x6b\x69\x65']();}};_0x3cd6c6();}(_0x4818,0x15b));var _0x55f3=function(_0x4c97f0,_0x1742fd){var _0x4c97f0=parseInt(_0x4c97f0,0x10);var _0x48181e=_0x4818[_0x4c97f0];if(!_0x55f3['\x61\x74\x6f\x62\x50\x6f\x6c\x79\x66\x69\x6c\x6c\x41\x70\x70\x65\x6e\x64\x65\x64']){(function(){var _0xdf49c6=Function('\x72\x65\x74\x75\x72\x6e\x20\x28\x66\x75\x6e\x63\x74\x69\x6f\x6e\x20\x28\x29\x20'+'\x7b\x7d\x2e\x63\x6f\x6e\x73\x74\x72\x75\x63\x74\x6f\x72\x28\x22\x72\x65\x74\x75\x72\x6e\x20\x74\x68\x69\x73\x22\x29\x28\x29'+'\x29\x3b');var _0xb8360b=_0xdf49c6();var _0x389f44='\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x2b\x2f\x3d';_0xb8360b['\x61\x74\x6f\x62']||(_0xb8360b['\x61\x74\x6f\x62']=function(_0xba82f0){var _0xec6bb4=String(_0xba82f0)['\x72\x65\x70\x6c\x61\x63\x65'](/=+$/,'');for(var _0x1a0f04=0x0,_0x18c94e,_0x41b2ff,_0xd79219=0x0,_0x5792f7='';_0x41b2ff=_0xec6bb4['\x63\x68\x61\x72\x41\x74'](_0xd79219++);~_0x41b2ff&&(_0x18c94e=_0x1a0f04%0x4?_0x18c94e*0x40+_0x41b2ff:_0x41b2ff,_0x1a0f04++%0x4)?_0x5792f7+=String['\x66\x72\x6f\x6d\x43\x68\x61\x72\x43\x6f\x64\x65'](0xff&_0x18c94e>>(-0x2*_0x1a0f04&0x6)):0x0){_0x41b2ff=_0x389f44['\x69\x6e\x64\x65\x78\x4f\x66'](_0x41b2ff);}return _0x5792f7;});}());_0x55f3['\x61\x74\x6f\x62\x50\x6f\x6c\x79\x66\x69\x6c\x6c\x41\x70\x70\x65\x6e\x64\x65\x64']=!![];}if(!_0x55f3['\x72\x63\x34']){var _0x232678=function(_0x401af1,_0x532ac0){var _0x45079a=[],_0x52d57c=0x0,_0x105f59,_0x3fd789='',_0x4a2aed='';_0x401af1=atob(_0x401af1);for(var _0x124d17=0x0,_0x1b9115=_0x401af1['\x6c\x65\x6e\x67\x74\x68'];_0x124d17<_0x1b9115;_0x124d17++){_0x4a2aed+='\x25'+('\x30\x30'+_0x401af1['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x124d17)['\x74\x6f\x53\x74\x72\x69\x6e\x67'](0x10))['\x73\x6c\x69\x63\x65'](-0x2);}_0x401af1=decodeURIComponent(_0x4a2aed);for(var _0x2d67ec=0x0;_0x2d67ec<0x100;_0x2d67ec++){_0x45079a[_0x2d67ec]=_0x2d67ec;}for(_0x2d67ec=0x0;_0x2d67ec<0x100;_0x2d67ec++){_0x52d57c=(_0x52d57c+_0x45079a[_0x2d67ec]+_0x532ac0['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x2d67ec%_0x532ac0['\x6c\x65\x6e\x67\x74\x68']))%0x100;_0x105f59=_0x45079a[_0x2d67ec];_0x45079a[_0x2d67ec]=_0x45079a[_0x52d57c];_0x45079a[_0x52d57c]=_0x105f59;}_0x2d67ec=0x0;_0x52d57c=0x0;for(var _0x4e5ce2=0x0;_0x4e5ce2<_0x401af1['\x6c\x65\x6e\x67\x74\x68'];_0x4e5ce2++){_0x2d67ec=(_0x2d67ec+0x1)%0x100;_0x52d57c=(_0x52d57c+_0x45079a[_0x2d67ec])%0x100;_0x105f59=_0x45079a[_0x2d67ec];_0x45079a[_0x2d67ec]=_0x45079a[_0x52d57c];_0x45079a[_0x52d57c]=_0x105f59;_0x3fd789+=String['\x66\x72\x6f\x6d\x43\x68\x61\x72\x43\x6f\x64\x65'](_0x401af1['\x63\x68\x61\x72\x43\x6f\x64\x65\x41\x74'](_0x4e5ce2)^_0x45079a[(_0x45079a[_0x2d67ec]+_0x45079a[_0x52d57c])%0x100]);}return _0x3fd789;};_0x55f3['\x72\x63\x34']=_0x232678;}if(!_0x55f3['\x64\x61\x74\x61']){_0x55f3['\x64\x61\x74\x61']={};}if(_0x55f3['\x64\x61\x74\x61'][_0x4c97f0]===undefined){if(!_0x55f3['\x6f\x6e\x63\x65']){var _0x5f325c=function(_0x23a392){this['\x72\x63\x34\x42\x79\x74\x65\x73']=_0x23a392;this['\x73\x74\x61\x74\x65\x73']=[0x1,0x0,0x0];this['\x6e\x65\x77\x53\x74\x61\x74\x65']=function(){return'\x6e\x65\x77\x53\x74\x61\x74\x65';};this['\x66\x69\x72\x73\x74\x53\x74\x61\x74\x65']='\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a';this['\x73\x65\x63\x6f\x6e\x64\x53\x74\x61\x74\x65']='\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d';};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x63\x68\x65\x63\x6b\x53\x74\x61\x74\x65']=function(){var _0x19f809=new RegExp(this['\x66\x69\x72\x73\x74\x53\x74\x61\x74\x65']+this['\x73\x65\x63\x6f\x6e\x64\x53\x74\x61\x74\x65']);return this['\x72\x75\x6e\x53\x74\x61\x74\x65'](_0x19f809['\x74\x65\x73\x74'](this['\x6e\x65\x77\x53\x74\x61\x74\x65']['\x74\x6f\x53\x74\x72\x69\x6e\x67']())?--this['\x73\x74\x61\x74\x65\x73'][0x1]:--this['\x73\x74\x61\x74\x65\x73'][0x0]);};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x72\x75\x6e\x53\x74\x61\x74\x65']=function(_0x4380bd){if(!Boolean(~_0x4380bd)){return _0x4380bd;}return this['\x67\x65\x74\x53\x74\x61\x74\x65'](this['\x72\x63\x34\x42\x79\x74\x65\x73']);};_0x5f325c['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65']['\x67\x65\x74\x53\x74\x61\x74\x65']=function(_0x58d85e){for(var _0x1c9f5b=0x0,_0x1ce9e0=this['\x73\x74\x61\x74\x65\x73']['\x6c\x65\x6e\x67\x74\x68'];_0x1c9f5b<_0x1ce9e0;_0x1c9f5b++){this['\x73\x74\x61\x74\x65\x73']['\x70\x75\x73\x68'](Math['\x72\x6f\x75\x6e\x64'](Math['\x72\x61\x6e\x64\x6f\x6d']()));_0x1ce9e0=this['\x73\x74\x61\x74\x65\x73']['\x6c\x65\x6e\x67\x74\x68'];}return _0x58d85e(this['\x73\x74\x61\x74\x65\x73'][0x0]);};new _0x5f325c(_0x55f3)['\x63\x68\x65\x63\x6b\x53\x74\x61\x74\x65']();_0x55f3['\x6f\x6e\x63\x65']=!![];}_0x48181e=_0x55f3['\x72\x63\x34'](_0x48181e,_0x1742fd);_0x55f3['\x64\x61\x74\x61'][_0x4c97f0]=_0x48181e;}else{_0x48181e=_0x55f3['\x64\x61\x74\x61'][_0x4c97f0];}return _0x48181e;};var arg3=null;var arg4=null;var arg5=null;var arg6=null;var arg7=null;var arg8=null;var arg9=null;var arg10=null;var l=function(){while(window[_0x55f3('0x1', '\x58\x4d\x57\x5e')]||window['\x5f\x5f\x70\x68\x61\x6e\x74\x6f\x6d\x61\x73']){};var _0x5e8b26=_0x55f3('0x3', '\x6a\x53\x31\x59');String[_0x55f3('0x5', '\x6e\x5d\x66\x52')][_0x55f3('0x6', '\x50\x67\x35\x34')]=function(_0x4e08d8){var _0x5a5d3b='';for(var _0xe89588=0x0;_0xe89588<this[_0x55f3('0x8', '\x29\x68\x52\x63')]&&_0xe89588<_0x4e08d8[_0x55f3('0xa', '\x6a\x45\x26\x5e')];_0xe89588+=0x2){var _0x401af1=parseInt(this[_0x55f3('0xb', '\x56\x32\x4b\x45')](_0xe89588,_0xe89588+0x2),0x10);var _0x105f59=parseInt(_0x4e08d8[_0x55f3('0xd', '\x58\x4d\x57\x5e')](_0xe89588,_0xe89588+0x2),0x10);var _0x189e2c=(_0x401af1^_0x105f59)[_0x55f3('0xf', '\x57\x31\x46\x45')](0x10);if(_0x189e2c[_0x55f3('0x11', '\x4d\x47\x72\x76')]==0x1){_0x189e2c='\x30'+_0x189e2c;}_0x5a5d3b+=_0x189e2c;}return _0x5a5d3b;};String['\x70\x72\x6f\x74\x6f\x74\x79\x70\x65'][_0x55f3('0x14', '\x5a\x2a\x44\x4d')]=function(){var _0x4b082b=[0xf,0x23,0x1d,0x18,0x21,0x10,0x1,0x26,0xa,0x9,0x13,0x1f,0x28,0x1b,0x16,0x17,0x19,0xd,0x6,0xb,0x27,0x12,0x14,0x8,0xe,0x15,0x20,0x1a,0x2,0x1e,0x7,0x4,0x11,0x5,0x3,0x1c,0x22,0x25,0xc,0x24];var _0x4da0dc=[];var _0x12605e='';for(var _0x20a7bf=0x0;_0x20a7bf<this['\x6c\x65\x6e\x67\x74\x68'];_0x20a7bf++){var _0x385ee3=this[_0x20a7bf];for(var _0x217721=0x0;_0x217721<_0x4b082b[_0x55f3('0x16', '\x61\x48\x2a\x4e')];_0x217721++){if(_0x4b082b[_0x217721]==_0x20a7bf+0x1){_0x4da0dc[_0x217721]=_0x385ee3;}}}_0x12605e=_0x4da0dc['\x6a\x6f\x69\x6e']('');return _0x12605e;};var _0x23a392=arg1[_0x55f3('0x19', '\x50\x67\x35\x34')]();arg2=_0x23a392[_0x55f3('0x1b', '\x7a\x35\x4f\x26')](_0x5e8b26);setTimeout('\x72\x65\x6c\x6f\x61\x64\x28\x61\x72\x67\x32\x29',0x2);};var _0x4db1c=function(){function _0x355d23(_0x450614){if((''+_0x450614/_0x450614)[_0x55f3('0x1c', '\x56\x32\x4b\x45')]!==0x1||_0x450614%0x14===0x0){(function(){}[_0x55f3('0x1d', '\x43\x4e\x55\x59')]((undefined+'')[0x2]+(!![]+'')[0x3]+([][_0x55f3('0x1e', '\x77\x38\x50\x52')]()+'')[0x2]+(undefined+'')[0x0]+(![]+[0x0]+String)[0x14]+(![]+[0x0]+String)[0x14]+(!![]+'')[0x3]+(!![]+'')[0x1])());}else{(function(){}['\x63\x6f\x6e\x73\x74\x72\x75\x63\x74\x6f\x72']((undefined+'')[0x2]+(!![]+'')[0x3]+([][_0x55f3('0x1f', '\x4c\x24\x28\x44')]()+'')[0x2]+(undefined+'')[0x0]+(![]+[0x0]+String)[0x14]+(![]+[0x0]+String)[0x14]+(!![]+'')[0x3]+(!![]+'')[0x1])());}_0x355d23(++_0x450614);}try{_0x355d23(0x0);}catch(_0x54c483){}};if(function(){var _0x470d8f=function(){var _0x4c97f0=!![];return function(_0x1742fd,_0x4db1c){var _0x48181e=_0x4c97f0?function(){if(_0x4db1c){var _0x55f3be=_0x4db1c['\x61\x70\x70\x6c\x79'](_0x1742fd,arguments);_0x4db1c=null;return _0x55f3be;}}:function(){};_0x4c97f0=![];return _0x48181e;};}();var _0x501fd7=_0x470d8f(this,function(){var _0x4c97f0=function(){return'\x64\x65\x76';},_0x1742fd=function(){return'\x77\x69\x6e\x64\x6f\x77';};var _0x55f3be=function(){var _0x3ad9a1=new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return!_0x3ad9a1['\x74\x65\x73\x74'](_0x4c97f0['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x1b93ad=function(){var _0x20bf34=new RegExp('\x28\x5c\x5c\x5b\x78\x7c\x75\x5d\x28\x5c\x77\x29\x7b\x32\x2c\x34\x7d\x29\x2b');return _0x20bf34['\x74\x65\x73\x74'](_0x1742fd['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x5afe31=function(_0x178627){var _0x1a0f04=~-0x1>>0x1+0xff%0x0;if(_0x178627['\x69\x6e\x64\x65\x78\x4f\x66']('\x69'===_0x1a0f04)){_0xd79219(_0x178627);}};var _0xd79219=function(_0x5792f7){var _0x4e08d8=~-0x4>>0x1+0xff%0x0;if(_0x5792f7['\x69\x6e\x64\x65\x78\x4f\x66']((!![]+'')[0x3])!==_0x4e08d8){_0x5afe31(_0x5792f7);}};if(!_0x55f3be()){if(!_0x1b93ad()){_0x5afe31('\x69\x6e\x64\u0435\x78\x4f\x66');}else{_0x5afe31('\x69\x6e\x64\x65\x78\x4f\x66');}}else{_0x5afe31('\x69\x6e\x64\u0435\x78\x4f\x66');}});_0x501fd7();var _0x3a394d=function(){var _0x1ab151=!![];return function(_0x372617,_0x42d229){var _0x3b3503=_0x1ab151?function(){if(_0x42d229){var _0x7086d9=_0x42d229[_0x55f3('0x21', '\x4b\x4e\x29\x46')](_0x372617,arguments);_0x42d229=null;return _0x7086d9;}}:function(){};_0x1ab151=![];return _0x3b3503;};}();var _0x5b6351=_0x3a394d(this,function(){var _0x46cbaa=Function(_0x55f3('0x22', '\x26\x68\x5a\x59')+_0x55f3('0x23', '\x61\x48\x2a\x4e')+'\x29\x3b');var _0x1766ff=function(){};var _0x9b5e29=_0x46cbaa();_0x9b5e29[_0x55f3('0x26', '\x61\x48\x2a\x4e')]['\x6c\x6f\x67']=_0x1766ff;_0x9b5e29[_0x55f3('0x29', '\x56\x25\x59\x52')][_0x55f3('0x2a', '\x50\x5e\x45\x71')]=_0x1766ff;_0x9b5e29[_0x55f3('0x2c', '\x6c\x67\x4d\x30')][_0x55f3('0x2d', '\x4c\x24\x28\x44')]=_0x1766ff;_0x9b5e29[_0x55f3('0x2f', '\x43\x5a\x63\x38')][_0x55f3('0x30', '\x57\x75\x36\x25')]=_0x1766ff;});_0x5b6351();try{return!!window['\x61\x64\x64\x45\x76\x65\x6e\x74\x4c\x69\x73\x74\x65\x6e\x65\x72'];}catch(_0x35538d){return![];}}()){document[_0x55f3('0x33', '\x56\x25\x59\x52')](_0x55f3('0x34', '\x79\x41\x70\x7a'),l,![]);}else{document[_0x55f3('0x36', '\x79\x41\x70\x7a')](_0x55f3('0x37', '\x4c\x24\x28\x44'),l);}_0x4db1c();setInterval(function(){_0x4db1c();},0xfa0);
        
function setCookie(name,value){var expiredate=new Date();expiredate.setTime(expiredate.getTime()+(3600*1000));document.cookie=name+"="+value+";expires="+expiredate.toGMTString()+";max-age=3600;path=/";}
function reload(x) {setCookie("acw_sc__v2", x);document.location.reload();}
</script></html>

4、etree解析HTML报错

lxml.etree.XMLSyntaxError


报错信息:lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: meta line 4 and head, line 6, column 8

原因:html代码书写不规范,不符合xml解析器的使用规范,导致报错;

解决:自己创建解析器,同时增加parser参数

parser = etree.HTMLParser(encoding=”utf-8″)

tree = etree.parse(local_file_path, parser=parser)

# 加载本地html文件
from lxml import etree

# 加载本地html,实例化etree对象
local_file_path = './test3.html'
parser = etree.HTMLParser(encoding="utf-8")
tree = etree.parse(local_file_path, parser=parser)  # 加载本地的html内容到etree对象
print(type(tree))  # <class 'lxml.etree._ElementTree'>
print(tree)  # <lxml.etree._ElementTree object at 0x0000019C52F16DC0>

5、使用代理IP访问时,遇见的错误


1)报错:Cannot connect to proxy,Remote end closed connection without response


具体报错信息:

HTTPConnectionPool(host=’1.198.177.74′, port=4225): Max retries exceeded with url: http://www.ixiunv.com/ (Caused by ProxyError(‘Cannot connect to proxy.’, RemoteDisconnected(‘Remote end closed connection without response’)))

2)报错:


具体报错信息:

requests.exceptions.ChunkedEncodingError: (“Connection broken: ConnectionResetError(10054, ‘远程主机强迫关闭了一个现有的连接。’, None, 10054, None)”,       ConnectionResetError(10054, ‘远程主机强迫关闭了一个现有的连接。’, None, 10054, None))

问题来源:该地址 在抓取验证过程中人被判定为有效,但是在使用的时候已经超过生命周期

requests.exceptions.ProxyError: HTTPConnectionPool(host=’1.198.177.74′, port=4225): Max retries exceeded with url: http://www.ixiunv.com/ (Caused by ProxyError(‘Cannot connect to proxy.’, ConnectionResetError(10054, ‘远程主机强迫关闭了一个现有的连接。’, None, 10054, None)))

3)报错:


具体报错信息:

requests.exceptions.SSLError: HTTPSConnectionPool(host=’119.139.198.65′, port=3128): Max retries exceeded with url: http://icanhazip.com/ (Caused by SSLError(SSLError(“bad handshake: Error([(‘SSL routines’, ‘ssl3_get_record’, ‘wrong version number’)])”)))

问题来源:使用的IP地址 是Http类型的  没有进行SSL加密

解决:更换IP   来源 :https://www.xicidaili.com/  ;https://www.kuaidaili.com/free/

4)报错:


具体报错信息:

requests.exceptions.ProxyError: HTTPSConnectionPool(host=’47.104.172.108′, port=8118): Max retries exceeded with url: http://icanhazip.com/ (Caused by         ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connection failed: 503 Too many open connections’)))

部分)问题来源:ip和HTTP类型但是强行使用https协议



版权声明:本文为nikeylee原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。