爬虫——动作链、xpath、打码平台使用

系列文章目录

文章目录

系列文章目录
一、动作链
二、xpath
三、打码平台使用

一、动作链

模拟按住鼠标拖动的效果，或者是在某个标签上的某个位置点击的效果，主要用来做验证码的破解（滑动验证码）

动作链事件表：

函数	作用
click(on_element=None)	单击鼠标左键
click_and_hold(on_element=None)	点击鼠标左键，不松开
context_click(on_element=None)	点击鼠标右键
double_click(on_element=None)	双击鼠标左键
drag_and_drop(source, target)	拖拽到某个元素然后松开
drag_and_drop_by_offset(source, xoffset, yoffset)	拖拽到某个坐标然后松开
key_down(value, element=None)	按下某个键盘上的键
key_up(value, element=None)	松开某个键
move_by_offset(xoffset, yoffset)	鼠标从当前位置移动到某个坐标
move_to_element(to_element)	鼠标移动到某个元素
move_to_element_with_offset(to_element, xoffset, yoffset)	移动到距某个元素（左上角坐标）多少距离的位置
perform()	执行链中的所有动作（一般所有动作排列好后才执行）
release(on_element=None)	在某个元素位置松开鼠标左键
send_keys(*keys_to_send)	发送某个键到当前焦点的元素
send_keys_to_element(element, *keys_to_send)	发送某个键到指定元素

动作链的使用：

1.先拿到动作链对象（也就是标签）

2.对动作链对象设置动作事件（先设置的先执行）

3.动作设置完毕使用perform执行

# 动作链模块
from selenium.webdriver import ActionChains

模拟滑动验证码

from selenium import webdriver
from selenium.webdriver import ActionChains
import time
from selenium.webdriver.common.by import By
driver = webdriver.Edge()
driver.get('http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
driver.implicitly_wait(10)  # 使用隐式等待

driver.maximize_window()

try:
    driver.switch_to.frame('iframeResult') ##切换到iframeResult
    sourse=driver.find_element(By.ID,'draggable')
    target=driver.find_element(By.ID,'droppable')


    #方式一：基于同一个动作链串行执行
    # actions=ActionChains(driver) #拿到动作链对象
    # actions.drag_and_drop(sourse,target) #把动作放到动作链中，准备串行执行

    actions=ActionChains(driver).click_and_hold(sourse) # 对动作链对象使用click_and_hold为拖住sourse标签元素
    actions.drag_and_drop_by_offset(target,10,20) 
    # drag_and_drop_by_offset为将actions对象拖到距离target对象的左侧10px、高20px的位置
    actions.perform()
    time.sleep(10)

finally:
    driver.close()

from selenium import webdriver
from selenium.webdriver import ActionChains
import time
from selenium.webdriver.common.by import By
driver = webdriver.Edge()
driver.get('http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
driver.implicitly_wait(10)  # 使用隐式等待

driver.maximize_window()

try:
    driver.switch_to.frame('iframeResult') ##切换到iframeResult
    sourse=driver.find_element(By.ID,'draggable')
    target=driver.find_element(By.ID,'droppable')

    #方式二：不同的动作链，每次移动的位移都不同

    ActionChains(driver).click_and_hold(sourse).perform()
    distance = target.location['x'] - sourse.location['x']  # 两个控件之间的x轴的距离
    track=0
    while track < distance:
        ActionChains(driver).move_by_offset(xoffset=20,yoffset=0).perform()# 移动量为xoffset
        track+=20 # 此处只是控制循环条件
    ActionChains(driver).release().perform()

    time.sleep(10)


finally:
    driver.close()

二、xpath

一般解析库都会有子的的搜索标签的方法，一般都会支持css和xpath

XPath 是一门在 XML 文档中查找信息的语言

符号	作用
标签名	找对应标签名的标签，如div、p、a等
/	找当前节点下的标签
//	找当前节点子子孙孙下的标签
.	表示当前节点
. .	表示上一层
@	表示取属性，如@id=‘xxx’、@href=‘www.baidu.com’

例如：

[@id="cnblogs_post_body"]/p[9]/strong
1.获取id为cnblogs_post_body的对象    [@id="cnblogs_post_body"]
2.当前路径下拿到第九个p标签      /p[9]
3.当前节点下的strong标签     /strong

三、打码平台使用

验证码的破解

简单的数字字母组合可以使用图像识别（python 现成模块），成功率不高

使用第三方打码平台（破解验证码平台），花钱，把验证码图片给它，它给你识别完，返回给你

例如：

超级鹰

from selenium import webdriver
from selenium.webdriver.common.by import By
from PIL import Image

bro = webdriver.Chrome()
bro.get('http://www.chaojiying.com/user/login/')
bro.maximize_window()

try:
    bro.save_screenshot('main.png')  # 把当前页面截图截图
    img = bro.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/div/img')
    location = img.location
    size = img.size
    print(location)
    print(size)
    # 使用pillow扣除大图中的验证码
    img_tu = (
    int(location['x']), int(location['y']), int(location['x'] + size['width']), int(location['y'] + size['height']))
    # # 抠出验证码
    # #打开
    img = Image.open('./main.png')
    # 抠图
    fram = img.crop(img_tu)
    # 截出来的小图
    fram.save('code.png')
    from chaojiying import ChaojiyingClient

    chaojiying = ChaojiyingClient('306334678', 'lqz12345', '937234')
    im = open('a.jpg', 'rb').read()  # 本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
    print(chaojiying.PostPic(im, 1902))

except:
    pass
finally:
    bro.close()

原文链接：https://blog.csdn.net/kdq18486588014/article/details/126146640

系列文章目录

文章目录

一、动作链

二、xpath

三、打码平台使用

你可能也喜欢