用到的配置为:
Anaconda 4.2.0(64-bit)
Python 3.5.2
程序为:
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.request import Request
import requests
import time
def get_ip_list(obj):
ip_text = obj.findAll('tr', {'class': 'odd'})
ip_list = []
for i in range(len(ip_text)):
ip_tag = ip_text[i].findAll('td')
ip_port = ip_tag[1].get_text() + ':' + ip_tag[2].get_text()
ip_list.append(ip_port)
print("现有{}个IP".format(len(ip_list)))
return ip_list
def get_random_ip(bsObj):
ip_list = get_ip_list(bsObj)
import random
random_ip = "http://" + random.choice(ip_list)
proxy_ip = {"http": random_ip}
return proxy_ip
while 1:
url = 'http://www.xicidaili.com/'
headers = {
'User-Agent': 'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36'
}
request = Request(url, headers=headers)
response = urlopen(request)
bsObj = BeautifulSoup(response, 'lxml')
random_ip = get_random_ip(bsObj)
print(random_ip)
hz_url = 'http://**************'
proxies = random_ip
hz_r = requests.get(hz_url, proxies=proxies)
print(hz_r.status_code)
print(hz_r.text)
time.sleep(3)
截图就不发上来了,因为有的ip不能用,所以程序会中断并报错,这时候只需要再运行一下程序就可以继续了(还需要人力辅助
)
版权声明:本文为shangxiaqiusuo1原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。