官网地址:https://2021.naacl.org/program/accepted/
Paper List
NAACL2021接受论文通过爬虫提取到下面的excel中
共计:528篇
获取excel Link
提取码:2021
快速检索小工具
使用python快速检索并从arxiv获取pdf
def get_pdf(key):
url_format = "https://arxiv.org/search/?query={}&searchtype=all&abstracts=show&order=-announced_date_first&size=50"
rep = requests.get(url_format.format(key))
body = etree.HTML(rep.content)
ols = body.xpath(r'//*[@id="main-container"]/div[2]/p[1]/text()')
if ols:
ols = "Sorry, your query for all: {} produced no results.".format(r"Knowledge Guided Metric Learning for Few-Shot Text Classification get")
print(ols)
else:
ols = body.xpath(r'//*[@id="main-container"]/div[2]/ol/li')
for ol in ols:
print("[PDF]:",ol.xpath(r'./p[1]/text()')[0].replace("\n","").replace(" ",""),ol.xpath(r'./div/p/span/a[1]/@href')[0])
# 查询关键词列表函数
def Search_domain_print(key_list,df,withPdf=False):
keys = set([key.lower() for key in key_list])
for key in keys:
count = 0
for i in df["title"].values.tolist():
if key in i.lower():
count = count + 1
print("[{}]-[{}]:{}".format(key,count,i))
if withPdf:
get_pdf(i)
print()
if __name__ == '__main__':
key_list = ["Text Classification",
# "Sentiment Analysis",
# "Knowledge Graph",
]
# withPdf设置为True可以直接检索并获取pdf,但速度会很慢。
# 也可以使用单步函数get_pdf("标题")直接查询要的文章
excel = pd.read_excel('data/NAACL2021 Paper List.xlsx')
Search_domain_print(key_list,excel,withPdf=False)
效果:
获得pdf效果:
版权声明:本文为qq_35891520原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。