f ( x ) = f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f(bold{x}) = f_M(bold{x}) = sum_{m=1}^Mh(bold{x};theta_m) f(x)=fM(x)=m=1∑Mh(x;θm)
其中 h ( x ; θ m ) h(bold{x;theta_m}) h(x;θm)为第m棵树, θ m theta_m θm为第m棵树的参数,M为决策树的数量。
前向分步算法:
首先确定初始提升树 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0(x)=0第m步的提升树模型为 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_{m}({bold{x}})=f_{m-1}({bold{x}})+h_{m}({bold{x}} ; theta_{m}) fm(x)=fm−1(x)+hm(x;θm), θ m theta_m θm为待求解的第m个树的参数 ( x i , y ‾ i ) (bold{x}_i,overline{y}_i) (xi,yi)为样本, θ m : θ ^ m = a r g m i n θ m E [ L ( y , f m ( x ) ) ] = arg min θ m ∑ i = 1 N L ( y ‾ i , f m ( x i ) ) theta_m: hat{theta}_m=argmin_{theta_m}E[L(y,f_m(bold{x}))]=arg min _{theta_m} sum_{i=1}^{N} Lleft(overline{y}_i, f_mleft(bold{x}_{i}right)right) θm:θ^m=argminθmE[L(y,fm(x))]=argminθm∑i=1NL(yi,fm(xi))
损失函数有以下推导:
L ( y ˉ , f m ( x ) ) = ( y ˉ − f m ( x ) ) 2 L(bar{y}, f_m(bold{x})) = (bar{y}-f_m(bold{x}))^2 L(yˉ,fm(x))=(yˉ−fm(x))2
= ( y ˉ − f m − 1 ( x ) − h m ( x ; θ m ) ) 2 = ( r − h m ( x ) ; θ m ) 2 , r = y ˉ − f m − 1 ( x ) =(bar{y}-f_{m-1}(bold{x})-h_m(bold{x};theta_m))^2 = (r - h_m(bold{x});theta_m)^2,r=bar{y}-f_{m-1}({bold{x}}) =(yˉ−fm−1(x)−hm(x;θm))2=(r−hm(x);θm)2,r=yˉ−fm−1(x)
其中r为上一步模型拟合的残差, h m ( ⋅ ) h_{m}(cdot) hm(⋅)拟合的是 f m − 1 ( ⋅ ) f_{m-1}(cdot) fm−1(⋅)的残差。
回归提升树的算法过程
输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D={(bold{x}_1,bar{y}_1),...,(bold{x}_n,bar{y}_n)} D={(x1,yˉ1),...,(xn,yˉn)}
输出: f M ( x ) f_M(bold{x}) fM(x)
算法:
初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0(x)=0对于 m = 1 , 2 , . . , M m = 1, 2,..,M m=1,2,..,M
计算残差 r m i = y ˉ i − f m − 1 ( x i ) r_{mi} = bar{y}_i - f_{m-1}(bold{x}_i) rmi=yˉi−fm−1(xi),构建训练样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } {(bold{x}_1,r_{m1}),...,(bold{x}_n,r_{mn})} {(x1,rm1),...,(xn,rmn)}新建一个回归树拟合上一步残差,得到回归树 h m ( x ; θ m ) h_m({bold{x};theta_m}) hm(x;θm)更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(bold{x}) = f_{m-1}(bold{x})+h_m(bold{x};theta_m) fm(x)=fm−1(x)+hm(x;θm) 得到模型 f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f_M(bold{x})=sum_{m=1}^Mh(bold{x};theta_m) fM(x)=∑m=1Mh(x;θm) Gradiant Boosting Tree
对损失函数进行泰勒展开,得到损失函数的近似表示:
L ( y ˉ , f m ( x ) ) = L ( y ˉ , f m − 1 ( x ) + h m ( x ; θ m ) ) L(bar{y},f_m(bold{x})) = L(bar{y},f_{m-1}(bold{x})+h_m(bold{x};theta_m)) L(yˉ,fm(x))=L(yˉ,fm−1(x)+hm(x;θm))
≈ L ( y ˉ , f m − 1 ( x ) ) + ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h m ( x ; θ m ) approx L(bar{y},f_{m-1}(bold{x})) + frac{partial L(bar{y}, f_{m-1}(bold{x}))}{partial f_{m-1}(bold{x})} h_m(bold{x};theta_m) ≈L(yˉ,fm−1(x))+∂fm−1(x)∂L(yˉ,fm−1(x))hm(x;θm)
取 h m ( x ; θ m ) = − ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h_m(bold{x};theta_m) = -frac{partial L(bar{y},f_{m-1}(bold{x}))}{partial f_{m-1}(bold{x})} hm(x;θm)=−∂fm−1(x)∂L(yˉ,fm−1(x)),则损失函数下降
梯度提升树算法过程:
输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D={(bold{x}_1,bar{y}_1),...,(bold{x}_n,bar{y}_n)} D={(x1,yˉ1),...,(xn,yˉn)}
输出: f M ( x ) f_M(bold{x}) fM(x)
算法:
初始化 f 0 ( x ) = 0 f_0(bold{x})=0 f0(x)=0对于 m = 1 , 2 , . . . , M m=1, 2,...,M m=1,2,...,M
计算梯度 r m i = − ∂ L ( y ˉ i , f m − 1 ( x i ) ) ∂ f m − 1 ( x i ) r_{mi} = -frac{partial L(bar{y}_i,f_{m-1}(bold{x}_i))}{partial f_{m-1}(bold{x}_i)} rmi=−∂fm−1(xi)∂L(yˉi,fm−1(xi)),构建样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } {(bold{x}_1,r_{m1}),...,(bold{x}_n,r_{mn})} {(x1,rm1),...,(xn,rmn)}拟合 r m i r_{mi} rmi,得到回归树 h m ( x ; θ m ) h_m(bold{x};theta_m) hm(x;θm)更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(bold{x})=f_{m-1}(bold{x}) + h_m(bold{x};theta_m) fm(x)=fm−1(x)+hm(x;θm) 得到模型 f M ( x ) = ∑ m = 1 M h m ( x ; θ m ) f_M(bold{x})=sum_{m=1}^Mh_m(bold{x};theta_m) fM(x)=∑m=1Mhm(x;θm) xgboost
xgboost的第m步的损失函数定义为:
L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + Ω ( h m ( x ; θ m ) ) L({theta}_{m})=sum_{i=1}^{N} Lleft(bar{y}_{i}, f_{m}left(mathbf{x}_{i}right)right)+Omegaleft(h_{m}({mathbf{x}};theta_m)right) L(θm)=i=1∑NL(yˉi,fm(xi))+Ω(hm(x;θm))
= ∑ i = 1 N L ( y ˉ i , f m − 1 ( x i ) + h m ( x i ; θ m ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 = sum_{i=1}^{N} L(bar{y}_{i}, f_{m-1}(bold{x}_i)+h_m(bold{x}_i;theta_m))+gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 =i=1∑NL(yˉi,fm−1(xi)+hm(xi;θm))+γT+21λj=1∑Twj2
其中 Ω ( ∗ ) Omega(*) Ω(∗)表示正则项,具体包括: T T T表示m棵树叶子节点数量, w j w_j wj表示叶子节点 j j j输出值,正则项的含义是希望数的叶子节点数量较少,并且叶子节点的输出值不要出现极值。
由二阶泰勒展开:
f ( x + Δ x ) ≃ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 f(x+Delta x) simeq f(x)+f^{prime}(x) Delta x+frac{1}{2} f^{prime prime}(x) Delta x^{2} f(x+Δx)≃f(x)+f′(x)Δx+21f′′(x)Δx2
得到:
L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 L(theta_m)= sum_{i=1}^{N} Lleft(bar{y}_{i}, f_{m}left(mathbf{x}_{i}right)right)+gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 L(θm)=i=1∑NL(yˉi,fm(xi))+γT+21λj=1∑Twj2
≈ ∑ i = 1 N [ L ( y ˉ i , f m − 1 ( x i ) ) + g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 approx sum_{i=1}^N[L(bar{y}_i,f_{m-1}(bold{x}_i)) + g_i * h_m(bold{x}_i;theta_m) + frac{1}{2}h_i * h_m(bold{x}_i;theta_m)^2] + gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 ≈i=1∑N[L(yˉi,fm−1(xi))+gi∗hm(xi;θm)+21hi∗hm(xi;θm)2]+γT+21λj=1∑Twj2
其中
y
^
i
<
m
−
1
>
=
f
m
−
1
(
x
i
)
,
g
i
=
∂
L
(
y
ˉ
i
,
y
^
i
<
m
−
1
>
)
∂
y
^
i
<
m
−
1
>
,
h
i
=
∂
2
L
(
y
ˉ
i
,
y
^
i
<
m
−
1
>
)
∂
2
y
^
i
<
m
−
1
>
hat{y}_i^{
去掉常数项,简化为:
L ( θ m ) = ∑ i = 1 N [ g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 L(theta_m) = sum_{i=1}^N[g_i * h_m(bold{x}_i;theta_m) + frac{1}{2}h_i * h_m(bold{x}_i;theta_m)^2] + gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 L(θm)=i=1∑N[gi∗hm(xi;θm)+21hi∗hm(xi;θm)2]+γT+21λj=1∑Twj2
= ∑ j = 1 T [ ∑ x i ∈ I j [ g i ∗ w j + 1 2 h i ∗ w j 2 ] + 1 2 λ w j 2 ] + γ T =sum_{j=1}^T[sum_{bold{x}_iin I_j}[g_i * w_j + frac{1}{2}h_i * w_j^2] + frac{1}{2}lambda w_j^2] + gamma T =j=1∑T[xi∈Ij∑[gi∗wj+21hi∗wj2]+21λwj2]+γT
= ∑ j = 1 T [ w j ∑ x i ∈ I j g i + 1 2 w j 2 ( ∑ x i ∈ I j [ h i ] + λ ) + γ T =sum_{j=1}^T[w_jsum_{bold{x}_iin I_j}g_i + frac{1}{2}w_j^2(sum_{bold{x}_iin I_j}[h_i]+lambda) + gamma T =j=1∑T[wjxi∈Ij∑gi+21wj2(xi∈Ij∑[hi]+λ)+γT
其中 x i ∈ I j bold{x}_iin I_j xi∈Ij表示 x i bold{x}_i xi落到第 j j j个叶子节点。
∂ L ∂ w j = 0 frac{partial L}{partial w_j} = 0 ∂wj∂L=0
得到:
w j ∗ = − ∑ x i ∈ I j g i λ + ∑ x i ∈ I j h i ( 1 ) w_j^* = -frac{sum_{bold{x}_iin I_j}g_i}{lambda + sum_{bold{x}_iin I_j}h_i} quad (1) wj∗=−λ+∑xi∈Ijhi∑xi∈Ijgi(1)
L ( w j ∗ ) = − 1 2 ∑ j = 1 T ( ∑ x i ∈ I j g i ) 2 λ + ∑ x i ∈ I j h i + γ T ( 2 ) L(w_j^*) = -frac{1}{2}sum_{j=1}^Tfrac{(sum_{bold{x}_iin I_j}g_i)^2}{lambda + sum_{bold{x}_iin I_j}h_i} + gamma T quad (2) L(wj∗)=−21j=1∑Tλ+∑xi∈Ijhi(∑xi∈Ijgi)2+γT(2)
根据损耗函数决定节点分裂方式,假设根据某个特征和特征值将样本分裂成 I L , I R I_L,I_R IL,IR两个集合,定义节点分裂的增益为:
G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ − ( G L + G R ) 2 H L + H R + λ ] − γ ( 3 ) Gain = frac{1}{2}[frac{G_L^2}{H_L+lambda} + frac{G_R^2}{H_R + lambda}- frac{(G_L+G_R)^2}{H_L+H_R+lambda}] - gamma quad (3) Gain=21[HL+λGL2+HR+λGR2−HL+HR+λ(GL+GR)2]−γ(3)
G L = ∑ x i ∈ I L g i G_L=sum_{bold{x}_iin I_L}g_i GL=xi∈IL∑gi
G R = ( ∑ x i ∈ I R g i ) 2 G_R=(sum_{bold{x}_iin I_R}g_i)^2 GR=(xi∈IR∑gi)2
H L = ∑ x i ∈ I L h i H_L = sum_{bold{x}_iin I_L} h_i HL=xi∈IL∑hi
H R = ∑ x i ∈ I R h i H_R = sum_{bold{x}_iin I_R} h_i HR=xi∈IR∑hi
xgboost算法过程:
输入:训练集 { ( x 1 , y 1 ) , . . . , ( x n , y n ) } {(bold{x}_1,y_1),..., (bold{x}_n,y_n)} {(x1,y1),...,(xn,yn)}
输出: f M ( x ) f_M(bold{x}) fM(x)
算法:
初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0(x)=0对于 m = 1 , . . . , M m = 1,...,M m=1,...,M
节点分裂增益定义(3)找到最优的分裂特征和分裂值进行分裂根据(1)计算叶子节点输出值,得到第m个树 h m ( x ; θ m ) h_m(bold{x};theta_m) hm(x;θm)更新模型 f m ( x ) = f m − 1 ( x ) − h m ( x ; θ m ) f_m(bold{x}) = f_{m-1}(bold{x})-h_m(bold{x};theta_m) fm(x)=fm−1(x)−hm(x;θm) 得到模型 f M ( x ) f_M(bold{x}) fM(x) LambdaMart
d c g @ T = ∑ i = 1 T 2 l a b l e ( i ) − 1 l o g ( i + 1 ) dcg@T=sum_{i=1}^Tfrac{2^{lable(i)}-1}{log(i+1)} dcg@T=i=1∑Tlog(i+1)2lable(i)−1
n d c g @ T = d c g @ T m a x D c g @ T ndcg@T=frac{dcg@T}{maxDcg@T} ndcg@T=maxDcg@Tdcg@T
Δ N D C G ( i , j ) = ∣ n d c g ( o r i g i n a l S e q u e n c e ) − n d c g ( s e q u e n c e A f t e r S w a p ( i , j ) ) ∣ Delta NDCG(i,j) = |ndcg(originalSequence) - ndcg(sequenceAfterSwap(i,j))| ΔNDCG(i,j)=∣ndcg(originalSequence)−ndcg(sequenceAfterSwap(i,j))∣
λ i j = − σ 1 + e σ ( s i − s j ) ∣ Δ N D C G ( i , j ) ∣ lambda_{ij} = frac{-sigma }{1+e^{sigma(s_i-s_j)}}|Delta NDCG(i,j)| λij=1+eσ(si−sj)−σ∣ΔNDCG(i,j)∣
λ i = ∑ j : l a b l e ( i ) > l a b l e ( j ) λ i j − ∑ j : l a b e l ( j ) > l a b l e ( i ) λ i j lambda_i = sum_{j:lable(i)>lable(j)} lambda_{ij}-sum_{j:label(j)>lable(i)} lambda_{ij} λi=j:lable(i)>lable(j)∑λij−j:label(j)>lable(i)∑λij
输入:树的数量M,训练数据集 { ( x 1 , l a b e l 1 ) , . . . , ( x n , l a b e l n ) } {(bold{x}_1,label_1),...,(bold{x}_n,label_n)} {(x1,label1),...,(xn,labeln)},学习率 η eta η
输出: f M ( x ) f_M(bold{x}) fM(x)
算法
初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0(x)=0对于 m = 1 , . . . , M m=1,...,M m=1,...,M
计算一阶导,也就是 x i x_i xi的 λ lambda λ梯度,记为 y i = λ i y_i = lambda_i yi=λi值,构成 { ( x 1 , λ 1 ) , . . . , ( x n , λ n ) } {(bold{x}_1,lambda_1),...,(bold{x}_n,lambda_n)} {(x1,λ1),...,(xn,λn)}计算二阶导 w i = ∂ y i ∂ f m − 1 ( x i ) w_i = frac{partial y_i}{partial f_{m-1}(bold{x}_i)} wi=∂fm−1(xi)∂yi,用于下面基于牛顿法的优化回归树拟合 λ i lambda_i λi,得到决策树 f m ( x ) f_m(bold{x}) fm(x)计算叶子节点的值 γ m k = ∑ x i ∈ I k y i ∑ x i ∈ I k w i gamma_{mk} = frac{sum_{x_iin I_k} y_i}{sum_{x_iin I_k} w_i} γmk=∑xi∈Ikwi∑xi∈Ikyi更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ) f_m(bold{x}) = f_{m-1}(bold{x}) + h_m(bold{x}) fm(x)=fm−1(x)+hm(x)更新数据得分为 f m ( x i ) f_m(bold{x}_i) fm(xi),根据最新得分重排序训练集 得到模型 f M ( x ) f_M(bold{x}) fM(x)
待更新。。。



