栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

我可以从highcharts.js抓取原始数据吗?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

我可以从highcharts.js抓取原始数据吗?

数据在脚本标签中。您可以使用bs4和正则表达式获取脚本标签。您也可以使用正则表达式提取数据,但我喜欢使用/
js2xml
将js函数解析为xml树:

from bs4 import BeautifulSoupimport requestsimport reimport js2xmlsoup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")script = soup.find("script", text=re.compile("Highcharts.Chart")).text# script = soup.find("script", text=re.compile("precipchartcontainer")).text if you want precipitation dataparsed = js2xml.parse(script)print js2xml.pretty_print(parsed)

那给你:

<program>  <functioncall>    <function>      <identifier name="$"/>    </function>    <arguments>      <funcexpr>        <identifier/>        <parameters/>        <body>          <var name="chart"/>          <functioncall> <function>   <dotaccessor>     <object>       <functioncall>         <function><identifier name="$"/>         </function>         <arguments><identifier name="document"/>         </arguments>       </functioncall>     </object>     <property>       <identifier name="ready"/>     </property>   </dotaccessor> </function> <arguments>   <funcexpr>     <identifier/>     <parameters/>     <body>       <assign operator="=">         <left><identifier name="chart"/>         </left>         <right><new>  <dotaccessor>    <object>      <identifier name="Highcharts"/>    </object>    <property>      <identifier name="Chart"/>    </property>  </dotaccessor>  <arguments>    <object>      <property name="chart">        <object>          <property name="renderTo"> <string>tempchartcontainer</string>          </property>          <property name="type"> <string>spline</string>          </property>        </object>      </property>      <property name="credits">        <object>          <property name="enabled"> <boolean>false</boolean>          </property>        </object>      </property>      <property name="colors">        <array>          <string>#FF8533</string>          <string>#4572A7</string>        </array>      </property>      <property name="title">        <object>          <property name="text"> <string>Average Temperature (°c) Graph for Brussels</string>          </property>        </object>      </property>      <property name="xAxis">        <object>          <property name="categories"> <array>   <string>January</string>   <string>February</string>   <string>March</string>   <string>April</string>   <string>May</string>   <string>June</string>   <string>July</string>   <string>August</string>   <string>September</string>   <string>October</string>   <string>November</string>   <string>December</string> </array>          </property>          <property name="labels"> <object>   <property name="rotation">     <number value="270"/>   </property>   <property name="y">     <number value="40"/>   </property> </object>          </property>        </object>      </property>      <property name="yAxis">        <object>          <property name="title"> <object>   <property name="text">     <string>Temperature (°c)</string>   </property> </object>          </property>        </object>      </property>      <property name="tooltip">        <object>          <property name="enabled"> <boolean>true</boolean>          </property>        </object>      </property>      <property name="plotOptions">        <object>          <property name="spline"> <object>   <property name="dataLabels">     <object>       <property name="enabled">         <boolean>true</boolean>       </property>     </object>   </property>   <property name="enableMouseTracking">     <boolean>false</boolean>   </property> </object>          </property>        </object>      </property>      <property name="series">        <array>          <object> <property name="name">   <string>Average High Temp (°c)</string> </property> <property name="color">   <string>#FF8533</string> </property> <property name="data">   <array>     <number value="6"/>     <number value="8"/>     <number value="11"/>     <number value="14"/>     <number value="19"/>     <number value="21"/>     <number value="23"/>     <number value="23"/>     <number value="19"/>     <number value="15"/>     <number value="9"/>     <number value="6"/>   </array> </property>          </object>          <object> <property name="name">   <string>Average Low Temp (°c)</string> </property> <property name="color">   <string>#4572A7</string> </property> <property name="data">   <array>     <number value="2"/>     <number value="2"/>     <number value="4"/>     <number value="6"/>     <number value="10"/>     <number value="12"/>     <number value="14"/>     <number value="14"/>     <number value="11"/>     <number value="8"/>     <number value="5"/>     <number value="2"/>   </array> </property>          </object>        </array>      </property>    </object>  </arguments></new>         </right>       </assign>     </body>   </funcexpr> </arguments>          </functioncall>        </body>      </funcexpr>    </arguments>  </functioncall></program>

因此,要获取所有数据:

In [28]: from bs4 import BeautifulSoup  In [29]: import requestsIn [30]: import re    In [31]: import js2xml    In [32]: from itertools import repeat    In [33]: from pprint import pprint as ppIn [34]: soup = BeautifulSoup(requests.get("http://www.worldweatheronline.com/brussels-weather-averages/be.aspx").content, "html.parser")In [35]: script = soup.find("script", text=re.compile("Highcharts.Chart")).textIn [36]: parsed = js2xml.parse(script)In [37]: data = [d.xpath(".//array/number/@value") for d in parsed.xpath("//property[@name='data']")]In [38]: categories = parsed.xpath("//property[@name='categories']//string/text()")In [39]: output =  list(zip(repeat(categories), data))    In [40]: pp(output)[(['January',   'February',   'March',   'April',   'May',   'June',   'July',   'August',   'September',   'October',   'November',   'December'],  ['6', '8', '11', '14', '19', '21', '23', '23', '19', '15', '9', '6']), (['January',   'February',   'March',   'April',   'May',   'June',   'July',   'August',   'September',   'October',   'November',   'December'],  ['2', '2', '4', '6', '10', '12', '14', '14', '11', '8', '5', '2'])]

就像我说的那样,您可以只使用正则表达式,但是我发现 js2xml 更加可靠,因为错误的空格等。不会破坏它。



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/571030.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号