实证资产定价(Empirical asset pricing)已经发布于Github. 包的具体用法(documentation)博主将会陆续在CSDN中详细介绍。
Github: GitHub - whyecofiliter/EAP: empirical asset pricing
自Fama and French (2015)引入以来,投资因素逐渐流行起来。它还包括在HXZ的模型(2015)中,Zhang (2017) 将其扩展到ICAPM。在其流行之前,Titman et al. (2014)是该因素的早期研究者之一,他们使用异常资本投资作为代理变量。然而,在随后的文献中,大多数研究使用资产增长率作为代理变量,包括Fama and French (2015) 和Hou et al. (2015)。在发达国家的市场中,投资因素与未来收益呈负相关,而在大多数发展中国家,这种关系更为密切。在中国市场,大多数文献都不存在显著的投资效应(Guo et al., 2017; Qiao, 2019; Liu et al., 2019)。
在这个demo中,年度资产增长率被用作盈利能力因子的代理变量,盈利能力因子是根据财务数据和衍生工具比率计算出来的。数据集始于2004年1月,从CSMAR数据集中收集。警告:请勿将此演示中的数据集用于任何商业目的。
# %% import package
from numpy import dtype
import pandas as pd
import sys, os
sys.path.append(os.path.abspath(".."))
# %% import data
# Monthly return of stocks in China security market
month_return = pd.read_hdf('.datamonth_return.h5', key='month_return')
company_data = pd.read_hdf('.datalast_filter_pe.h5', key='data')
对数据进行一些预处理。
# %% preprocessing data
# forward the monthly return for each stock
# emrwd is the return including dividend
month_return['emrwd'] = month_return.groupby(['Stkcd'])['Mretwd'].shift(-1)
# emrnd is the return including no dividend
month_return['emrnd'] = month_return.groupby(['Stkcd'])['Mretnd'].shift(-1)
# select the A share stock
month_return = month_return[month_return['Markettype'].isin([1, 4, 16])]
# % distinguish the stocks whose size is among the up 30% stocks in each month
def percentile(stocks) :
return stocks >= stocks.quantile(q=.3)
month_return['cap'] = month_return.groupby(['Trdmnt'])['Msmvttl'].apply(percentile)
年度资产增长率被用作盈利能力系数的代理变量,数据由财务数据和衍生财务比率计算得出。
# %% calculate the total asset # asset = debt + equity # debt = company_value - market_value # equity = market_value / PB company_data['debt'] = company_data['EV1'] - company_data['MarketValue'] company_data['equity'] = company_data['MarketValue']/company_data['PBV1A'] company_data['asset'] = company_data['debt'] + company_data['equity'] # asset growth rate company_data['asset_growth_rate'] = company_data['asset'].groupby(['Symbol']).diff(12)/company_data['asset']
进一步数据预处理。
# %% prepare merge data from pandas.tseries.offsets import * month_return['Stkcd_merge'] = month_return['Stkcd'].astype(dtype='string') month_return['Date_merge'] = pd.to_datetime(month_return['Trdmnt']) #month_return['Yearmonth'] = month_return['Date_merge'].map(lambda x : 1000*x.year + x.month) #month_return['Date_merge'] += MonthEnd() company_data['Stkcd_merge'] = company_data['Symbol'].dropna().astype(dtype='int').astype(dtype='string') company_data['Date_merge'] = pd.to_datetime(company_data['TradingDate']) #company_data['Yearmonth'] = company_data['Date_merge'].map(lambda x : 1000*x.year + x.month) company_data['Date_merge'] += MonthBegin() # %% dataset starts from '2000-01' company_data = company_data[company_data['Date_merge'] >= '2000-01'] month_return = month_return[month_return['Date_merge'] >= '2000-01'] return_company = pd.merge(company_data, month_return, on=['Stkcd_merge', 'Date_merge'])
构成了两个数据集。一个包括尾部30%的股票,而另一个不包括尾部30%的股票。附单变量分析和双变量分析。
# %% construct test_data for bivariate analysis # dataset 1 : no tail stocks & ROE Bivariate from portfolio_analysis import Bivariate, Univariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_1 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)] test_data_1 = test_data_1[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna() test_data_1 = test_data_1[(test_data_1['Date_merge'] >= '2004-01-01') & (test_data_1['Date_merge'] <= '2019-12-01')] # Univariate analysis uni_1 = Univariate(np.array(test_data_1[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9) uni_1.summary_and_test() uni_1.print_summary_by_time() uni_1.print_summary() ==================================================================================================== +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ | Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff | +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ | Average | 0.011 | 0.012 | 0.013 | 0.013 | 0.015 | 0.014 | 0.013 | 0.015 | 0.015 | 0.016 | 0.005 | | T-Test | 1.393 | 1.655 | 1.783 | 1.879 | 2.054 | 1.985 | 1.955 | 2.162 | 2.064 | 2.152 | 1.907 | +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ ==================================================================================================== # Bivariate analysis bi_1 = Bivariate(np.array(test_data_1), number=4) bi_1.average_by_time() bi_1.summary_and_test() bi_1.print_summary_by_time() bi_1.print_summary() ============================================================== +-------+--------+--------+--------+--------+--------+-------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+-------+ | 1 | 0.015 | 0.017 | 0.018 | 0.018 | 0.02 | 0.005 | | | 1.848 | 2.119 | 2.336 | 2.404 | 2.482 | 1.985 | | 2 | 0.012 | 0.014 | 0.017 | 0.015 | 0.019 | 0.007 | | | 1.509 | 1.784 | 2.301 | 1.984 | 2.434 | 2.8 | | 3 | 0.01 | 0.012 | 0.015 | 0.014 | 0.014 | 0.004 | | | 1.314 | 1.695 | 2.026 | 1.884 | 1.912 | 1.862 | | 4 | 0.009 | 0.01 | 0.011 | 0.013 | 0.015 | 0.006 | | | 1.194 | 1.507 | 1.579 | 1.831 | 2.009 | 2.45 | | 5 | 0.007 | 0.01 | 0.011 | 0.014 | 0.012 | 0.005 | | | 1.03 | 1.517 | 1.685 | 2.106 | 1.749 | 1.7 | | Diff | -0.008 | -0.007 | -0.008 | -0.005 | -0.007 | 0.0 | | | -1.902 | -1.646 | -1.897 | -1.213 | -1.771 | 0.088 | +-------+--------+--------+--------+--------+--------+-------+ ==============================================================
数据集#1的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。
# %% construct test_data for bivariate analysis # dataset 2 : tail stocks & ROE Bivariate from portfolio_analysis import Bivariate, Univariate import numpy as np # select stocks whose size is among the up 30% stocks in each month and whose trading # days are more than or equal to 10 days test_data_2 = return_company[return_company['Ndaytrd']>=10] test_data_2 = test_data_2[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna() test_data_2 = test_data_2[(test_data_2['Date_merge'] >= '2004-01-01') & (test_data_2['Date_merge'] <= '2019-12-01')] # Univariate analysis uni_2 = Univariate(np.array(test_data_2[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9) uni_2.summary_and_test() uni_2.print_summary_by_time() uni_2.print_summary() ==================================================================================================== +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ | Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff | +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ | Average | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.016 | 0.017 | 0.017 | 0.018 | 0.001 | | T-Test | 2.052 | 2.204 | 2.301 | 2.303 | 2.323 | 2.33 | 2.249 | 2.392 | 2.283 | 2.411 | 0.313 | +---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ ==================================================================================================== # Bivariate analysis bi_2 = Bivariate(np.array(test_data_2), number=4) bi_2.average_by_time() bi_2.summary_and_test() bi_2.print_summary_by_time() bi_2.print_summary() =============================================================== +-------+--------+--------+--------+--------+--------+-------+ | Group | 1 | 2 | 3 | 4 | 5 | Diff | +-------+--------+--------+--------+--------+--------+-------+ | 1 | 0.027 | 0.026 | 0.027 | 0.027 | 0.027 | 0.0 | | | 3.113 | 3.25 | 3.257 | 3.312 | 3.372 | 0.079 | | 2 | 0.015 | 0.019 | 0.02 | 0.021 | 0.021 | 0.006 | | | 1.885 | 2.331 | 2.482 | 2.706 | 2.674 | 2.551 | | 3 | 0.012 | 0.014 | 0.017 | 0.015 | 0.017 | 0.005 | | | 1.561 | 1.788 | 2.198 | 2.067 | 2.286 | 2.264 | | 4 | 0.009 | 0.01 | 0.013 | 0.013 | 0.015 | 0.005 | | | 1.271 | 1.475 | 1.745 | 1.888 | 1.999 | 2.397 | | 5 | 0.007 | 0.011 | 0.01 | 0.012 | 0.013 | 0.006 | | | 0.987 | 1.729 | 1.582 | 1.882 | 1.83 | 2.2 | | Diff | -0.02 | -0.015 | -0.017 | -0.014 | -0.014 | 0.006 | | | -4.431 | -3.522 | -3.695 | -3.205 | -3.197 | 1.813 | +-------+--------+--------+--------+--------+--------+-------+ ===============================================================
数据集#2的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。



