scrapy 爬取糗事百科段子篇章二（下载用户头像）

接着博客往下走上篇博客地址

一、更新代码

vim ITtest.py

import scrapy
from qiushi.items import QiushiItem   #导入糗事项目下items中QiushiItem函数
from scrapy.http.response.html import HtmlResponse   #导入HtmlXPathSelector模块
from scrapy.selector.unified   import SelectorList   #导入SelectorList模块
import urllib
import os


class IttestSpider(scrapy.Spider):
    name = 'ITtest'
    allowed_domains = ['www.qiushibaike.com']
    start_urls = ['https://www.qiushibaike.com/text/page/1/']
    bash_domain = "https://www.qiushibaike.com"

    def parse(self, response):
        body = response.xpath('//div[@class="col1 old-style-col1"]/div')
        for duanzhi in body:
            touxiang = duanzhi.xpath('.//div//@src').get()
            neirong = duanzhi.xpath('.//div[@class="content"]//text()').getall()
            neirong = "".join(neirong).strip()
            zuozhe  = duanzhi.xpath('.//div//h2/text()').get().strip()
            item = QiushiItem(头像=touxiang,作者=zuozhe,内容=neirong)
            #判断文件夹是否存在，无则创建
            path_dir = os.path.dirname(os.getcwd()) + '/img/'
            if not os.path.exists(path_dir):
                os.mkdir(path_dir)

            if  zuozhe and touxiang:
                print(zuozhe,touxiang)
                file_path = os.path.join(path_dir, zuozhe + '.jpg')
                if not os.path.exists(file_path):
                   #os.mknod创建空文件
                   os.mknod(file_path)
                   print(file_path)
               # #urllib.urlretrieve 直接将远程数据下载到本地
                   urllib.request.urlretrieve('http:'+touxiang, file_path)
            yield item
        next_url = response.xpath("//ul[@class='pagination']/li[last()]/a/@href").get()
        if not next_url:
            return
        else:
            yield  scrapy.Request(self.bash_domain+next_url,callback=self.parse)

二、再次爬虫

scrapy  crawl ITtest

三、查看爬取数据

四、打包压缩传输到windows机器中

zip -r img.zip img/

查看img文件

本文地址：https://blog.csdn.net/qq_37377136/article/details/107239874

《scrapy 爬取糗事百科段子篇章二（下载用户头像）.doc》

下载本文的Word格式文档，以方便收藏与打印。

scrapy 爬取糗事百科段子篇章二（下载用户头像）

相关推荐

【FPGA篇章二】FPGA开发流程：详述每一环节的物理含义和实现目标

新手学习爬虫之创建第一个完整的scrapy工程-糗事百科

Android自定义控件实例，圆形头像（图库 + 裁剪+设置），上传头像显示为圆形，附源码

[实战]MVC5+EF6+MySql企业网盘实战(4)——上传头像

python 全栈开发，Day86(上传文件,上传头像,CBV,python读写Excel,虚拟环境virtualenv)

Android头像上传功能的实现代码（获取头像加剪切）

canvas怎样实现自定义头像功能

django头像上传预览功能