这个答案与此类似,但是初始URL页和tableau基本URL不同。流程/算法本质上保持不变,但我将详细说明以下步骤:
图形是根据API的结果在JS中生成的:
POST https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
SESSION_ID参数(除其他外)位于
tsConfigContainer用于构建iframe的URL的textarea中。
从https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima?:embed=y&:showAppBanner=false&:showShareOptions=true&:display_count=no&:showVizHome=no开始:
- 有一个ID
tsConfigContainer
带有一堆json值的textarea - 提取
session_id
和根路径(vizql_root
) https://tableau.ons.org.br/ROOT_PATH/bootstrapSession/sessions/SESSION_ID
使用sheetId
as表单数据进行POST- 从结果中提取json(结果不是json)
代码:
import requestsfrom bs4 import BeautifulSoupimport jsonimport reurl = "https://tableau.ons.org.br/t/ONS_Publico/views/DemandaMxima/HistricoDemandaMxima"r = requests.get( url, params= { ":embed":"y", ":showAppBanner":"false", ":showShareOptions":"true", ":display_count":"no", "showVizHome": "no" })soup = BeautifulSoup(r.text, "html.parser")tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)dataUrl = f'https://tableau.ons.org.br{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'r = requests.post(dataUrl, data= { "sheet_id": tableauData["sheetId"],})dataReg = re.search('d+;({.*})d+;({.*})', r.text, re.MULTILINE)info = json.loads(dataReg.group(1))data = json.loads(dataReg.group(2))print(data["secondaryInfo"]["presModelMap"]["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"]["dataSegments"]["0"]["dataColumns"])


