基于Python的Scrapy爬虫入门：代码详解

item['tags'] = []

# 将 tags 处理成 tag_name 数组

for tag in post.get('tags', ''):

item['tags'].append(tag['tag_name'])

items.append(item)

return items

当然如不雅不消管道直接在 parse 中处理也是一样的，只不过如许构造更清楚一些，并且还有功能更多的FilePipelines和ImagePipelines可供应用，process_item将在每一个条目抓取后触发，同时还有 open_spider 及 close_spider 函数可以重载，用于处理爬虫打开及封闭时的动作。

留意：管道须要在项目中注册才能应用，在 settings.py 中添加：

ITEM_PIPELINES = { 
 
    'tuchong.pipelines.TuchongPipeline': 300, # 管道名称: 运行优先级(数字小优先) 
 
}

四、运行

返回 cmder 敕令行进入项目目次，输入敕令：

scrapy crawl photo

终端会输出所有的爬行结不雅及调试信息，并在最后列出爬财气行的统计信息，例如：

[scrapy.statscollectors] INFO: Dumping Scrapy stats: 
 
{'downloader/request_bytes': 491, 
 
 'downloader/request_count': 2, 
 
 'downloader/request_method_count/GET': 2, 
 
 'downloader/response_bytes': 10224, 
 
 'downloader/response_count': 2, 
 
 'downloader/response_status_count/200': 2, 
 
 'finish_reason': 'finished', 
 
 'finish_time': datetime.datetime(2017, 11, 27, 7, 20, 24, 414201), 
 
 'item_dropped_count': 5, 
 
 'item_dropped_reasons_count/DropItem': 5, 
 
 'item_scraped_count': 15, 
 
 'log_count/DEBUG': 18, 
 
 'log_count/INFO': 8, 
 
 'log_count/WARNING': 5, 
 	
			 9/10   首页 上一页 7 8 9 10 下一页 尾页	
			

　　推荐阅读
　　摆脱尴尬，我国IPv6加速跑需要“魔鬼步伐”
            CTO练习营 | 12月3-5日，深圳，是时刻成为优良的技巧治理者了
            
                
                    
                
                人工智能、大年夜数据、云计算、物联网，其实都是>>>详细阅读


本文标题：基于Python的Scrapy爬虫入门：代码详解
地址：http://www.17bianji.com/lsqh/39298.html
 1/2    1