爬虫报错记录

Post author:xfxia
Post published:2023年9月9日
Post category:其他

存入mysql数据库报错：

1.’You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \’

说你存入数据库的东西格式有问题，可以爬去信息，但是没有办法存入。报错会把你爬去的东西print出来，然后就可以看到有一个item是空的，无法存入。

有一项为空：

1.你没抓取到，要么抓错了要么网站格式不统一，有的抓了有的没有。

2.格式错，存不进。（这次的原因）

3.mysql没练好。

    def parse_pages(self, response):
        item = NewsItem()
        title_1 = response.xpath('//section[@id="title-news"]/h2/strong/text()').getall()
        print(response.xpath('//section[@id="title-news"]/h2/strong/text()').getall())
        if title_1 == []:
            title_2= response.xpath('//span[@class="new-title"]/text()').getall()
            item['title'] = title_2
        else:
            item['title'] = title_1

发现title_2里的东西抓不到，然后就去print title_2,发现的确是列表，得转成字符串。开头结尾有多余的空格，可能也有可能会被识别成格式不对然后报错，如果还有问题就把空格也strip掉。

        def parse_pages(self, response):
        item = NewsItem()
        title_1 = response.xpath('//section[@id="title-news"]/h2/strong/text()').getall()
        print(response.xpath('//section[@id="title-news"]/h2/strong/text()').getall())
        if title_1 == []:
            title_2= response.xpath('//span[@class="new-title"]/text()').getall()
            title_2=''.join(title_2) #增加的一行，把字符串从列表中取出
            # print(response.xpath('//span[@class="news-title"]/text()').getall())
            # print("111111")
            item['title'] = title_2
        else:
            item['title'] = title_1

2022.6.24高考出分啦～

这个问题可能没别人会犯，但是困扰了我两三周了，记录一下。

爬去内容时报错：

1.’method’ object is not subscriptable

括号错误或者漏括号

原文链接：https://blog.csdn.net/m0_63991785/article/details/125448212

你可能也喜欢