栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

lambdamart 安装_lambdamart Python?

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

lambdamart 安装_lambdamart Python?

Boosting Tree

f ( x ) = f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f(bold{x}) = f_M(bold{x}) = sum_{m=1}^Mh(bold{x};theta_m) f(x)=fM​(x)=m=1∑M​h(x;θm​)

其中 h ( x ; θ m ) h(bold{x;theta_m}) h(x;θm​)为第m棵树, θ m theta_m θm​为第m棵树的参数,M为决策树的数量。

前向分步算法:

首先确定初始提升树 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0​(x)=0第m步的提升树模型为 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_{m}({bold{x}})=f_{m-1}({bold{x}})+h_{m}({bold{x}} ; theta_{m}) fm​(x)=fm−1​(x)+hm​(x;θm​), θ m theta_m θm​为待求解的第m个树的参数 ( x i , y ‾ i ) (bold{x}_i,overline{y}_i) (xi​,y​i​)为样本, θ m : θ ^ m = a r g m i n θ m E [ L ( y , f m ( x ) ) ] = arg ⁡ min ⁡ θ m ∑ i = 1 N L ( y ‾ i , f m ( x i ) ) theta_m: hat{theta}_m=argmin_{theta_m}E[L(y,f_m(bold{x}))]=arg min _{theta_m} sum_{i=1}^{N} Lleft(overline{y}_i, f_mleft(bold{x}_{i}right)right) θm​:θ^m​=argminθm​​E[L(y,fm​(x))]=argminθm​​∑i=1N​L(y​i​,fm​(xi​))

损失函数有以下推导:

L ( y ˉ , f m ( x ) ) = ( y ˉ − f m ( x ) ) 2 L(bar{y}, f_m(bold{x})) = (bar{y}-f_m(bold{x}))^2 L(yˉ​,fm​(x))=(yˉ​−fm​(x))2

= ( y ˉ − f m − 1 ( x ) − h m ( x ; θ m ) ) 2 = ( r − h m ( x ) ; θ m ) 2 , r = y ˉ − f m − 1 ( x ) =(bar{y}-f_{m-1}(bold{x})-h_m(bold{x};theta_m))^2 = (r - h_m(bold{x});theta_m)^2,r=bar{y}-f_{m-1}({bold{x}}) =(yˉ​−fm−1​(x)−hm​(x;θm​))2=(r−hm​(x);θm​)2,r=yˉ​−fm−1​(x)

其中r为上一步模型拟合的残差, h m ( ⋅ ) h_{m}(cdot) hm​(⋅)拟合的是 f m − 1 ( ⋅ ) f_{m-1}(cdot) fm−1​(⋅)的残差。

回归提升树的算法过程

输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D={(bold{x}_1,bar{y}_1),...,(bold{x}_n,bar{y}_n)} D={(x1​,yˉ​1​),...,(xn​,yˉ​n​)}

输出: f M ( x ) f_M(bold{x}) fM​(x)

算法:

初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0​(x)=0对于 m = 1 , 2 , . . , M m = 1, 2,..,M m=1,2,..,M

计算残差 r m i = y ˉ i − f m − 1 ( x i ) r_{mi} = bar{y}_i - f_{m-1}(bold{x}_i) rmi​=yˉ​i​−fm−1​(xi​),构建训练样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } {(bold{x}_1,r_{m1}),...,(bold{x}_n,r_{mn})} {(x1​,rm1​),...,(xn​,rmn​)}新建一个回归树拟合上一步残差,得到回归树 h m ( x ; θ m ) h_m({bold{x};theta_m}) hm​(x;θm​)更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(bold{x}) = f_{m-1}(bold{x})+h_m(bold{x};theta_m) fm​(x)=fm−1​(x)+hm​(x;θm​) 得到模型 f M ( x ) = ∑ m = 1 M h ( x ; θ m ) f_M(bold{x})=sum_{m=1}^Mh(bold{x};theta_m) fM​(x)=∑m=1M​h(x;θm​) Gradiant Boosting Tree

对损失函数进行泰勒展开,得到损失函数的近似表示:

L ( y ˉ , f m ( x ) ) = L ( y ˉ , f m − 1 ( x ) + h m ( x ; θ m ) ) L(bar{y},f_m(bold{x})) = L(bar{y},f_{m-1}(bold{x})+h_m(bold{x};theta_m)) L(yˉ​,fm​(x))=L(yˉ​,fm−1​(x)+hm​(x;θm​))

≈ L ( y ˉ , f m − 1 ( x ) ) + ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h m ( x ; θ m ) approx L(bar{y},f_{m-1}(bold{x})) + frac{partial L(bar{y}, f_{m-1}(bold{x}))}{partial f_{m-1}(bold{x})} h_m(bold{x};theta_m) ≈L(yˉ​,fm−1​(x))+∂fm−1​(x)∂L(yˉ​,fm−1​(x))​hm​(x;θm​)

取 h m ( x ; θ m ) = − ∂ L ( y ˉ , f m − 1 ( x ) ) ∂ f m − 1 ( x ) h_m(bold{x};theta_m) = -frac{partial L(bar{y},f_{m-1}(bold{x}))}{partial f_{m-1}(bold{x})} hm​(x;θm​)=−∂fm−1​(x)∂L(yˉ​,fm−1​(x))​,则损失函数下降

梯度提升树算法过程:

输入:训练数据集 D = { ( x 1 , y ˉ 1 ) , . . . , ( x n , y ˉ n ) } D={(bold{x}_1,bar{y}_1),...,(bold{x}_n,bar{y}_n)} D={(x1​,yˉ​1​),...,(xn​,yˉ​n​)}

输出: f M ( x ) f_M(bold{x}) fM​(x)

算法:

初始化 f 0 ( x ) = 0 f_0(bold{x})=0 f0​(x)=0对于 m = 1 , 2 , . . . , M m=1, 2,...,M m=1,2,...,M

计算梯度 r m i = − ∂ L ( y ˉ i , f m − 1 ( x i ) ) ∂ f m − 1 ( x i ) r_{mi} = -frac{partial L(bar{y}_i,f_{m-1}(bold{x}_i))}{partial f_{m-1}(bold{x}_i)} rmi​=−∂fm−1​(xi​)∂L(yˉ​i​,fm−1​(xi​))​,构建样本 { ( x 1 , r m 1 ) , . . . , ( x n , r m n ) } {(bold{x}_1,r_{m1}),...,(bold{x}_n,r_{mn})} {(x1​,rm1​),...,(xn​,rmn​)}拟合 r m i r_{mi} rmi​,得到回归树 h m ( x ; θ m ) h_m(bold{x};theta_m) hm​(x;θm​)更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ; θ m ) f_m(bold{x})=f_{m-1}(bold{x}) + h_m(bold{x};theta_m) fm​(x)=fm−1​(x)+hm​(x;θm​) 得到模型 f M ( x ) = ∑ m = 1 M h m ( x ; θ m ) f_M(bold{x})=sum_{m=1}^Mh_m(bold{x};theta_m) fM​(x)=∑m=1M​hm​(x;θm​) xgboost

xgboost的第m步的损失函数定义为:

L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + Ω ( h m ( x ; θ m ) ) L({theta}_{m})=sum_{i=1}^{N} Lleft(bar{y}_{i}, f_{m}left(mathbf{x}_{i}right)right)+Omegaleft(h_{m}({mathbf{x}};theta_m)right) L(θm​)=i=1∑N​L(yˉ​i​,fm​(xi​))+Ω(hm​(x;θm​))

= ∑ i = 1 N L ( y ˉ i , f m − 1 ( x i ) + h m ( x i ; θ m ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 = sum_{i=1}^{N} L(bar{y}_{i}, f_{m-1}(bold{x}_i)+h_m(bold{x}_i;theta_m))+gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 =i=1∑N​L(yˉ​i​,fm−1​(xi​)+hm​(xi​;θm​))+γT+21​λj=1∑T​wj2​

其中 Ω ( ∗ ) Omega(*) Ω(∗)表示正则项,具体包括: T T T表示m棵树叶子节点数量, w j w_j wj​表示叶子节点 j j j输出值,正则项的含义是希望数的叶子节点数量较少,并且叶子节点的输出值不要出现极值。

由二阶泰勒展开:

f ( x + Δ x ) ≃ f ( x ) + f ′ ( x ) Δ x + 1 2 f ′ ′ ( x ) Δ x 2 f(x+Delta x) simeq f(x)+f^{prime}(x) Delta x+frac{1}{2} f^{prime prime}(x) Delta x^{2} f(x+Δx)≃f(x)+f′(x)Δx+21​f′′(x)Δx2

得到:

L ( θ m ) = ∑ i = 1 N L ( y ˉ i , f m ( x i ) ) + γ T + 1 2 λ ∑ j = 1 T w j 2 L(theta_m)= sum_{i=1}^{N} Lleft(bar{y}_{i}, f_{m}left(mathbf{x}_{i}right)right)+gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 L(θm​)=i=1∑N​L(yˉ​i​,fm​(xi​))+γT+21​λj=1∑T​wj2​

≈ ∑ i = 1 N [ L ( y ˉ i , f m − 1 ( x i ) ) + g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 approx sum_{i=1}^N[L(bar{y}_i,f_{m-1}(bold{x}_i)) + g_i * h_m(bold{x}_i;theta_m) + frac{1}{2}h_i * h_m(bold{x}_i;theta_m)^2] + gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 ≈i=1∑N​[L(yˉ​i​,fm−1​(xi​))+gi​∗hm​(xi​;θm​)+21​hi​∗hm​(xi​;θm​)2]+γT+21​λj=1∑T​wj2​

其中

y ^ i < m − 1 > = f m − 1 ( x i ) , g i = ∂ L ( y ˉ i , y ^ i < m − 1 > ) ∂ y ^ i < m − 1 > , h i = ∂ 2 L ( y ˉ i , y ^ i < m − 1 > ) ∂ 2 y ^ i < m − 1 > hat{y}_i^{}=f_{m-1}left({bold{x}_i}right), quad g_i=frac{partial Lleft(bar{y}_i, hat{y}_i^{}right)}{partial hat{y}_i^{}}, quad h_i=frac{partial^{2} Lleft(bar{y}_i, hat{y}_i^{}right)}{partial^{2} hat{y}_i^{}} y^​i​=fm−1​(xi​),gi​=∂y^​i​∂L(yˉ​i​,y^​i​)​,hi​=∂2y^​i​∂2L(yˉ​i​,y^​i​)​

去掉常数项,简化为:

L ( θ m ) = ∑ i = 1 N [ g i ∗ h m ( x i ; θ m ) + 1 2 h i ∗ h m ( x i ; θ m ) 2 ] + γ T + 1 2 λ ∑ j = 1 T w j 2 L(theta_m) = sum_{i=1}^N[g_i * h_m(bold{x}_i;theta_m) + frac{1}{2}h_i * h_m(bold{x}_i;theta_m)^2] + gamma T + frac{1}{2}lambdasum_{j=1}^Tw_j^2 L(θm​)=i=1∑N​[gi​∗hm​(xi​;θm​)+21​hi​∗hm​(xi​;θm​)2]+γT+21​λj=1∑T​wj2​

= ∑ j = 1 T [ ∑ x i ∈ I j [ g i ∗ w j + 1 2 h i ∗ w j 2 ] + 1 2 λ w j 2 ] + γ T =sum_{j=1}^T[sum_{bold{x}_iin I_j}[g_i * w_j + frac{1}{2}h_i * w_j^2] + frac{1}{2}lambda w_j^2] + gamma T =j=1∑T​[xi​∈Ij​∑​[gi​∗wj​+21​hi​∗wj2​]+21​λwj2​]+γT

= ∑ j = 1 T [ w j ∑ x i ∈ I j g i + 1 2 w j 2 ( ∑ x i ∈ I j [ h i ] + λ ) + γ T =sum_{j=1}^T[w_jsum_{bold{x}_iin I_j}g_i + frac{1}{2}w_j^2(sum_{bold{x}_iin I_j}[h_i]+lambda) + gamma T =j=1∑T​[wj​xi​∈Ij​∑​gi​+21​wj2​(xi​∈Ij​∑​[hi​]+λ)+γT

其中 x i ∈ I j bold{x}_iin I_j xi​∈Ij​表示 x i bold{x}_i xi​落到第 j j j个叶子节点。

∂ L ∂ w j = 0 frac{partial L}{partial w_j} = 0 ∂wj​∂L​=0

得到:

w j ∗ = − ∑ x i ∈ I j g i λ + ∑ x i ∈ I j h i ( 1 ) w_j^* = -frac{sum_{bold{x}_iin I_j}g_i}{lambda + sum_{bold{x}_iin I_j}h_i} quad (1) wj∗​=−λ+∑xi​∈Ij​​hi​∑xi​∈Ij​​gi​​(1)

L ( w j ∗ ) = − 1 2 ∑ j = 1 T ( ∑ x i ∈ I j g i ) 2 λ + ∑ x i ∈ I j h i + γ T ( 2 ) L(w_j^*) = -frac{1}{2}sum_{j=1}^Tfrac{(sum_{bold{x}_iin I_j}g_i)^2}{lambda + sum_{bold{x}_iin I_j}h_i} + gamma T quad (2) L(wj∗​)=−21​j=1∑T​λ+∑xi​∈Ij​​hi​(∑xi​∈Ij​​gi​)2​+γT(2)

根据损耗函数决定节点分裂方式,假设根据某个特征和特征值将样本分裂成 I L , I R I_L,I_R IL​,IR​两个集合,定义节点分裂的增益为:

G a i n = 1 2 [ G L 2 H L + λ + G R 2 H R + λ − ( G L + G R ) 2 H L + H R + λ ] − γ ( 3 ) Gain = frac{1}{2}[frac{G_L^2}{H_L+lambda} + frac{G_R^2}{H_R + lambda}- frac{(G_L+G_R)^2}{H_L+H_R+lambda}] - gamma quad (3) Gain=21​[HL​+λGL2​​+HR​+λGR2​​−HL​+HR​+λ(GL​+GR​)2​]−γ(3)

G L = ∑ x i ∈ I L g i G_L=sum_{bold{x}_iin I_L}g_i GL​=xi​∈IL​∑​gi​

G R = ( ∑ x i ∈ I R g i ) 2 G_R=(sum_{bold{x}_iin I_R}g_i)^2 GR​=(xi​∈IR​∑​gi​)2

H L = ∑ x i ∈ I L h i H_L = sum_{bold{x}_iin I_L} h_i HL​=xi​∈IL​∑​hi​

H R = ∑ x i ∈ I R h i H_R = sum_{bold{x}_iin I_R} h_i HR​=xi​∈IR​∑​hi​

xgboost算法过程:

输入:训练集 { ( x 1 , y 1 ) , . . . , ( x n , y n ) } {(bold{x}_1,y_1),..., (bold{x}_n,y_n)} {(x1​,y1​),...,(xn​,yn​)}

输出: f M ( x ) f_M(bold{x}) fM​(x)

算法:

初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0​(x)=0对于 m = 1 , . . . , M m = 1,...,M m=1,...,M

节点分裂增益定义(3)找到最优的分裂特征和分裂值进行分裂根据(1)计算叶子节点输出值,得到第m个树 h m ( x ; θ m ) h_m(bold{x};theta_m) hm​(x;θm​)更新模型 f m ( x ) = f m − 1 ( x ) − h m ( x ; θ m ) f_m(bold{x}) = f_{m-1}(bold{x})-h_m(bold{x};theta_m) fm​(x)=fm−1​(x)−hm​(x;θm​) 得到模型 f M ( x ) f_M(bold{x}) fM​(x) LambdaMart

d c g @ T = ∑ i = 1 T 2 l a b l e ( i ) − 1 l o g ( i + 1 ) dcg@T=sum_{i=1}^Tfrac{2^{lable(i)}-1}{log(i+1)} dcg@T=i=1∑T​log(i+1)2lable(i)−1​

n d c g @ T = d c g @ T m a x D c g @ T ndcg@T=frac{dcg@T}{maxDcg@T} ndcg@T=maxDcg@Tdcg@T​

Δ N D C G ( i , j ) = ∣ n d c g ( o r i g i n a l S e q u e n c e ) − n d c g ( s e q u e n c e A f t e r S w a p ( i , j ) ) ∣ Delta NDCG(i,j) = |ndcg(originalSequence) - ndcg(sequenceAfterSwap(i,j))| ΔNDCG(i,j)=∣ndcg(originalSequence)−ndcg(sequenceAfterSwap(i,j))∣

λ i j = − σ 1 + e σ ( s i − s j ) ∣ Δ N D C G ( i , j ) ∣ lambda_{ij} = frac{-sigma }{1+e^{sigma(s_i-s_j)}}|Delta NDCG(i,j)| λij​=1+eσ(si​−sj​)−σ​∣ΔNDCG(i,j)∣

λ i = ∑ j : l a b l e ( i ) > l a b l e ( j ) λ i j − ∑ j : l a b e l ( j ) > l a b l e ( i ) λ i j lambda_i = sum_{j:lable(i)>lable(j)} lambda_{ij}-sum_{j:label(j)>lable(i)} lambda_{ij} λi​=j:lable(i)>lable(j)∑​λij​−j:label(j)>lable(i)∑​λij​

输入:树的数量M,训练数据集 { ( x 1 , l a b e l 1 ) , . . . , ( x n , l a b e l n ) } {(bold{x}_1,label_1),...,(bold{x}_n,label_n)} {(x1​,label1​),...,(xn​,labeln​)},学习率 η eta η

输出: f M ( x ) f_M(bold{x}) fM​(x)

算法

初始化 f 0 ( x ) = 0 f_0(bold{x}) = 0 f0​(x)=0对于 m = 1 , . . . , M m=1,...,M m=1,...,M

计算一阶导,也就是 x i x_i xi​的 λ lambda λ梯度,记为 y i = λ i y_i = lambda_i yi​=λi​值,构成 { ( x 1 , λ 1 ) , . . . , ( x n , λ n ) } {(bold{x}_1,lambda_1),...,(bold{x}_n,lambda_n)} {(x1​,λ1​),...,(xn​,λn​)}计算二阶导 w i = ∂ y i ∂ f m − 1 ( x i ) w_i = frac{partial y_i}{partial f_{m-1}(bold{x}_i)} wi​=∂fm−1​(xi​)∂yi​​,用于下面基于牛顿法的优化回归树拟合 λ i lambda_i λi​,得到决策树 f m ( x ) f_m(bold{x}) fm​(x)计算叶子节点的值 γ m k = ∑ x i ∈ I k y i ∑ x i ∈ I k w i gamma_{mk} = frac{sum_{x_iin I_k} y_i}{sum_{x_iin I_k} w_i} γmk​=∑xi​∈Ik​​wi​∑xi​∈Ik​​yi​​更新模型 f m ( x ) = f m − 1 ( x ) + h m ( x ) f_m(bold{x}) = f_{m-1}(bold{x}) + h_m(bold{x}) fm​(x)=fm−1​(x)+hm​(x)更新数据得分为 f m ( x i ) f_m(bold{x}_i) fm​(xi​),根据最新得分重排序训练集 得到模型 f M ( x ) f_M(bold{x}) fM​(x)

待更新。。。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/783385.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号