简易Python Selenium爬虫实现歌曲免费下载

沙龙晃荡 | 去哪儿、陌陌、ThoughtWorks在主动化运维中的实践！10.28不见不散！

简略单纯Python Selenium爬虫实现歌曲免费下载

比来发明越来越多的歌曲下载都须要缴费了，对保护正版是功德。但有的时刻也想钻个空子，正好比来在进修python，顺手写了一个建议爬虫，用来爬取某播放软件的在线音乐。

重要思路就是爬取播放页里的播放源文件的url，法度榜样可以攫取用户输入并返回歌单，，，因为在线网站包含大年夜量js，requests就显得很无奈，又懒到手动解析js，于是寄出selenium大年夜杀器。

代码不长，做的有些简陋，今后可以加个GUI。。。。

步调一：

进入酷狗主页，F12查看元素，，经由过程selenium.webdriver的send_keys()办法给send_input类传参，即竽暌姑感化户的输入，然后通webdriver.click()办法点击搜刮按钮，获得搜刮结不雅列表。这里会有一个js重定向，经由过程webdriver.current_ur就可以了，，切记一点!传入的参数须要经由unicode编码(.decode(‘gb18030′))效不雅一样)，不然如不雅有中文会乱码。。(来自被深深困扰的我)

步调二：

步调三：

进入播放页面后经由过程xpath找到播放源文件链接(强推firepath，xpath神器啊)但发明这里依然有一个js衬着，来生成播放源链接，直接提取<src>标签会显示为空，于是持续webdriver，调用的浏览器会主动解析js脚本，解析完成后提取<src>获得歌曲链接，应用urllib的urlretrueve()下载即可

代码如下：

selnium是一款很强大年夜的浏览器主动化测试框架，直接运行在浏览器端，模仿用户操作，今朝selenium支撑包含IE,Firefox,Chrome等主流浏览器及PhantomJS之类的无头浏览器，selenium+phantomjs也是如今很火的一个爬虫框架。

#coding=utf-8 
from selenium.webdriver.remote.webelement import WebElement 
from selenium import webdriver 
from selenium.webdriver import ActionChains 
from selenium.common.exceptions import NoSuchElementException 
from selenium.common.exceptions import StaleElementReferenceException 
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities 
from selenium.webdriver.common.by import By 
import time 
import urllib 
 
#歌曲名 
mname = '' 
 
#JS重定向 
def wait(driver): 
    elem = driver.find_element_by_tag_name('html') 
    count = 0 
    while True: 
        count += 1 
        if count > 20: 
            print('chao shi le') 
            return 	
			 1/4    1 2 3 4 下一页 尾页	
			

　　推荐阅读
　　浅析Mybatis与Hibernate的区别与用途
            沙龙晃荡 | 去哪儿、陌陌、ThoughtWorks在主动化运维中的实践！10.28不见不散！
            有很长一段时光对mybatis是比较陌生的，肮脏道与Hibernate一样是个orm数据库框架。跟着应用闇练度的增长，发>>>详细阅读


本文标题：简易Python Selenium爬虫实现歌曲免费下载
地址：http://www.17bianji.com/lsqh/38267.html
 1/2    1