在具體數(shù)據(jù)的選取上,我爬取的是各省份降水量實(shí)時(shí)數(shù)據(jù)
話不多說,開始實(shí)操
正文
f—string:
url_a= f'http://www.weather.com.cn/weather1dn/101{a}0101.shtml'
f-string 用大括號(hào) {} 表示被替換字段,其中直接填入替換內(nèi)容
將城市和降水量相對(duì)應(yīng)后存入字典再打印
代碼:
from lxml import etree from selenium import webdriver import re city = [''for n in range(34)] #存放城市列表 rain = [''for n in range(34)] #存放有關(guān)降雨量信息的數(shù)值 rain_item = [] driver = webdriver.Chrome(executable_path='chromedriver') #使用chrome瀏覽器打開 for a in range(1,5): #直轄市數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0100.shtml' #網(wǎng)址 driver.get(url_a) #打開網(wǎng)址 rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(5,10): #一位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(10,35): #二位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/101{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 d = dict(zip(city,rain)) #將城市和降水量的列表合成為字典 for k,v in d.items(): #str轉(zhuǎn)float類型 rain_item.append(float(v)) print(d)
在對(duì)爬取的內(nèi)容進(jìn)行處理時(shí),可能會(huì)因?yàn)閿?shù)據(jù)的類型而報(bào)錯(cuò),如爬下來的數(shù)據(jù)為str類型,而排序需要數(shù)字類型,故需要進(jìn)行float類型轉(zhuǎn)化
使用該爬取方法,是模擬用戶打開網(wǎng)頁,并且會(huì)在電腦上進(jìn)行顯示。在爬取實(shí)驗(yàn)進(jìn)行中途,中國天氣網(wǎng)進(jìn)行了網(wǎng)址更新,原網(wǎng)址出現(xiàn)了部分城市數(shù)據(jù)無法顯示的問題,但當(dāng)刷新界面后,數(shù)據(jù)可正常顯示,此時(shí)可采用模擬鼠標(biāo)點(diǎn)擊刷新的方法避免錯(cuò)誤。由于后續(xù)找到了新網(wǎng)址,故將這一方法省去。
代碼:
#-*- codeing = utf-8 -*- import matplotlib.pyplot as plt from lxml import etree from selenium import webdriver import re import matplotlib matplotlib.rc("font",family='YouYuan') city = [''for n in range(34)] #存放城市列表 rain = [''for n in range(34)] #存放有關(guān)降雨量信息的數(shù)值 driver = webdriver.Chrome(executable_path='chromedriver') #使用chrome瀏覽器打開 for a in range(1,5): #直轄市數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0100.shtml' #網(wǎng)址 driver.get(url_a) #打開網(wǎng)址 rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(5,10): #非直轄一位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(10,35): #非直轄二位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/101{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 if len(rain)%2 == 0: #尋找中值 medium = int(len(rain)/2) else: medium = int(len(rain)/2)+1 medium_text = "中位值:" + rain[medium] d = dict(zip(city,rain)) #將城市和降水量的列表合成為字典 rain_item = [] city_min = [] city_max = [] for k,v in d.items(): rain_item.append(float(v)) average_rain = sum(rain_item)/len(rain_item) average_text = "平均值:"+ str(average_rain) max_rain = max(rain_item) #最大值 min_rain = min(rain_item) #最小值 for k,v in d.items(): if float(v) == min_rain: city_min.append(k) min_text = "降雨量最小的城市:"+str(city_min)+" 最小值:"+str(min_rain) for k,v in d.items(): if float(v) ==max_rain: city_max.append(k) max_text = "降雨量最大的城市:"+str(city_max)+" 最大值:"+str(max_rain) plt.bar(range(len(d)), rain_item, align='center') plt.xticks(range(len(d)), list(d.keys())) plt.xlabel('城市',fontsize=20) plt.ylabel('降水量',fontsize=20) plt.text(0,12,average_text,fontsize=6) plt.text(0,13,medium_text,fontsize=6) plt.text(0,14,max_text,fontsize=6) plt.text(0,15,min_text,fontsize=6) plt.show()
使用tkinter庫進(jìn)行GUI的構(gòu)建使用button函數(shù)實(shí)現(xiàn)交互,調(diào)用編寫的get函數(shù)獲取對(duì)用戶輸入的內(nèi)容進(jìn)行獲取并使用循環(huán)進(jìn)行遍歷處理,若城市輸入正確,則在界面上輸出當(dāng)?shù)氐慕邓看a:
#-*- codeing = utf-8 -*- from lxml import etree from selenium import webdriver import re import matplotlib matplotlib.rc("font",family='YouYuan') from tkinter import * import tkinter as tk city = [''for n in range(34)] #存放城市列表 rain = [''for n in range(34)] #存放有關(guān)降雨量信息的數(shù)值 driver = webdriver.Chrome(executable_path='chromedriver') #使用chrome瀏覽器打開 for a in range(1,5): #直轄市數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0100.shtml' #網(wǎng)址 driver.get(url_a) #打開網(wǎng)址 rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(5,10): #非直轄一位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/1010{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 for a in range(10,35): #非直轄二位數(shù)字網(wǎng)址數(shù)據(jù) url_a= f'http://www.weather.com.cn/weather1dn/101{a}0101.shtml' driver.get(url_a) rain_list = [] city_list = [] resp_text = driver.page_source page_html = etree.HTML(resp_text) city_list = page_html.xpath('/html/body/div[4]/div[2]/a')[0] #通過xpath爬取城市名稱 rain_list = page_html.xpath('//*[@id="weatherChart"]/div[2]/p[5]')[0] #通過xpath爬取降雨量數(shù)據(jù) city[a-1] = city_list.text #存入城市列表 rain[a-1] = re.findall(r"\d+\.?\d*",rain_list.text)[0] #存入數(shù)值 d = dict(zip(city,rain)) #將城市和降水量的列表合成為字典 root=tk.Tk() root.title('降水量查詢') root.geometry('500x200') def get(): values = entry.get() for k,v in d.items(): if k == values: label = Label(root, text= v+'mm') label.pack() frame = Frame(root) frame.pack() u1 = tk.StringVar() entry = tk.Entry(frame, width=20, textvariable=u1, relief="sunken") entry.pack(side="left") frame1 = Frame(root) frame1.pack() btn1=Button(frame1, text="查詢", width=20, height=1, relief=GROOVE, command=lambda :get()) btn1.pack(side="left") root.mainloop()
寫在最后
在爬取天氣的過程中,僅發(fā)現(xiàn)中國天氣網(wǎng)有各省份降水量的數(shù)據(jù),可見我國在數(shù)據(jù)開源方面還有很長的路要走
到此這篇關(guān)于python爬取各省降水量及可視化詳解的文章就介紹到這了,更多相關(guān)python爬取請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持腳本之家!
標(biāo)簽:呼和浩特 股票 駐馬店 畢節(jié) 湖州 中山 衡水 江蘇
巨人網(wǎng)絡(luò)通訊聲明:本文標(biāo)題《python爬取各省降水量及可視化詳解》,本文關(guān)鍵詞 python,爬取,各省,降水量,;如發(fā)現(xiàn)本文內(nèi)容存在版權(quán)問題,煩請(qǐng)?zhí)峁┫嚓P(guān)信息告之我們,我們將及時(shí)溝通與處理。本站內(nèi)容系統(tǒng)采集于網(wǎng)絡(luò),涉及言論、版權(quán)與本站無關(guān)。