栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

如何处理通过yfinance下载的多级列名?

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

如何处理通过yfinance下载的多级列名?

将所有代码下载到具有单个级别列标题的单个数据帧中

选项1

  • 下载单个股票行情收录器数据时,返回的数据框列名称是单个级别,但没有股票行情栏。
  • 这将下载每个行情自动收录器的数据,添加行情自动收录器列,并根据所有所需的行情自动收录器创建单个数据框。

    import yfinance as yf
    import pandas as pd

    tickerStrings = [‘AAPL’, ‘MSFT’]
    df_list = list()
    for ticker in tickerStrings:
    data = yf.download(ticker, group_by=”Ticker”, period=‘2d’)
    data[‘ticker’] = ticker # add this column becasue the dataframe doesn’t contain a column with the ticker
    df_list.append(data)

    combine all dataframes into a single dataframe

    df = pd.concat(df_list)

    save to csv

    df.to_csv(‘ticker.csv’)

选项2

  • 下载所有股票并取消堆叠

    • group_by='Ticker'
      将代码置入
      level=0
      列名称

    tickerStrings = [‘AAPL’, ‘MSFT’]
    df = yf.download(tickerStrings, group_by=’Ticker’, period=‘2d’)
    df = df.stack(level=0).rename_axis([‘Date’, ‘Ticker’]).reset_index(level=1)


读取
yfinance
已存储有多级列名称的csv

  • 如果您希望保留并读取具有多级列索引的文件,请使用以下代码,这会将数据帧恢复为原始格式。

    df = pd.read_csv(‘test.csv’, header=[0, 1])
    df.drop([0], axis=0, inplace=True) # drop this row because it only has one column with Date in it
    df[(‘Unnamed: 0_level_0’, ‘Unnamed: 0_level_1’)] = pd.to_datetime(df[(‘Unnamed: 0_level_0’, ‘Unnamed: 0_level_1’)], format=’%Y-%m-%d’) # convert the first column to a datetime
    df.set_index((‘Unnamed: 0_level_0’, ‘Unnamed: 0_level_1’), inplace=True) # set the first column as the index
    df.index.name = None # rename the index

  • 问题是

    tickerStrings
    代码清单,这将导致最终数据帧具有多级列名

     AAPL        MSFTOpen      High       Low     Close Adj Close     Volume Open High Low Close Adj Close Volume

    Date
    1980-12-12 0.513393 0.515625 0.513393 0.513393 0.405683 117258400 NaN NaN NaN NaN NaN NaN
    1980-12-15 0.488839 0.488839 0.486607 0.486607 0.384517 43971200 NaN NaN NaN NaN NaN NaN
    1980-12-16 0.453125 0.453125 0.450893 0.450893 0.356296 26432000 NaN NaN NaN NaN NaN NaN
    1980-12-17 0.462054 0.464286 0.462054 0.462054 0.365115 21610400 NaN NaN NaN NaN NaN NaN
    1980-12-18 0.475446 0.477679 0.475446 0.475446 0.375698 18362400 NaN NaN NaN NaN NaN NaN

  • 将其保存到csv后,它看起来像下面的示例,并导致出现数据框,就像您遇到问题一样。

    ,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
    ,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
    Date,,,,,,,,,,,,
    1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
    1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
    1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,


将多级列展平为单个级,然后添加行情栏

  • 如果股票代号

    level=0
    在列名的顶部

    • 什么时候
      group_by='Ticker'
      使用

    df.stack(level=0).rename_axis([‘Date’, ‘Ticker’]).reset_index(level=1)

  • 如果股票代号

    level=1
    在列名的(底部)

    df.stack(level=1).rename_axis([‘Date’, ‘Ticker’]).reset_index(level=1)


下载每个股票行情并将其保存到单独的文件中

  • 我建议分别下载并保存每个行情收录器,如下所示:

    import yfinance as yf
    import pandas as pd

    tickerStrings = [‘AAPL’, ‘MSFT’]
    for ticker in tickerStrings:
    data = yf.download(ticker, group_by=”Ticker”, period=prd, interval=intv)
    data[‘ticker’] = ticker # add this column becasue the dataframe doesn’t contain a column with the ticker
    data.to_csv(f’ticker_{ticker}.csv’) # ticker_AAPL.csv for example

  • data
    看起来像

     Open      High       Low     Close  Adj Close      Volume ticker

    Date
    1986-03-13 0.088542 0.101562 0.088542 0.097222 0.062205 1031788800 MSFT
    1986-03-14 0.097222 0.102431 0.097222 0.100694 0.064427 308160000 MSFT
    1986-03-17 0.100694 0.103299 0.100694 0.102431 0.065537 133171200 MSFT
    1986-03-18 0.102431 0.103299 0.098958 0.099826 0.063871 67766400 MSFT
    1986-03-19 0.099826 0.100694 0.097222 0.098090 0.062760 47894400 MSFT

  • 生成的csv将看起来像

    Date,Open,High,Low,Close,Adj Close,Volume,ticker
    1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
    1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
    1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
    1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
    1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT

读入上一部分保存的多个文件并创建一个数据框

import pandas as pdfrom pathlib import Path# set the path to the filesp = Path('c:/path_to_files')# find the filesfiles = list(p.glob('ticker_*.csv'))# read the files into a dataframedf_list = list()for file in files:    df_list.append(pd.read_csv(file))# combine dataframesdf = pd.concat(df_list)


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/414550.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号