selenium爬取网页部分HTML代码

  • Post author:
  • Post category:其他




1、构造webdriver启动方法

driver = webdriver.Chrome()
base_url = "https://movie.douban.com/subject/26100958/"
driver.get(base_url)



2、保存完整HTML代码

print(driver.page_source)



3、保存部分HTML代码,及其他方法

# 定位至节点
elem = driver.find_element(By.ID, "info")

htm_dat = elem.get_property("outerHTML")
print('获取节点的html源码:', htm_dat)
htm_name = elem.get_property("nodeName")
print('节点名称:', htm_name)
htm_type = elem.get_property("nodeType")
print('节点类型:', htm_type)
htm_ght = elem.get_property("clientHeight")
print('节点实际高度:', htm_ght)
htm_dth = elem.get_property("clientWidth")
print('节点实际宽度:', htm_dth)
htm_node_name = elem.get_property("parentNode").get_property("nodeName")
print('该节点的父节点.名称:', htm_node_name)
htm_next_htm = elem.get_property("nextSibling").get_property("outerHTML")
print('该节点的相邻的下一个节点.源码:', htm_next_htm)



其他方案1:BeautifulSoup

安装:pip3 install beautifulsoup4
	 pip install lxml



其他方案2:lxml库中etree.HTML()

安装:pip install bs4
	 pip install lxml
	 pip install html5lib



版权声明:本文为Peter_cat0原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。