栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

sklearn中的逻辑回归

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

sklearn中的逻辑回归

1.sklearn中与逻辑回归有关的三个类

sklearn中,lr相关的代码在linear_model模块中,查看linear_model的__init__文件,内容如下

__all__ = ['ARDRegression',
           'BayesianRidge',
           'ElasticNet',
           'ElasticNetCV',
           'Hinge',
           'Huber',
           'HuberRegressor',
           'Lars',
           'LarsCV',
           'Lasso',
           'LassoCV',
           'LassoLars',
           'LassoLarsCV',
           'LassoLarsIC',
           'LinearRegression',
           'Log',
           'LogisticRegression',
           'LogisticRegressionCV',
           'ModifiedHuber',
           'MultiTaskElasticNet',
           'MultiTaskElasticNetCV',
           'MultiTaskLasso',
           'MultiTaskLassoCV',
           'OrthogonalMatchingPursuit',
           'OrthogonalMatchingPursuitCV',
           'PassiveAggressiveClassifier',
           'PassiveAggressiveRegressor',
           'Perceptron',
           'Ridge',
           'RidgeCV',
           'RidgeClassifier',
           'RidgeClassifierCV',
           'SGDClassifier',
           'SGDRegressor',
           'SquaredLoss',
           'TheilSenRegressor',
           'enet_path',
           'lars_path',
           'lars_path_gram',
           'lasso_path',
           'logistic_regression_path',
           'orthogonal_mp',
           'orthogonal_mp_gram',
           'ridge_regression',
           'RANSACRegressor']

可以看出来,与lr有关的部分一共有三个类:

LogisticRegression
LogisticRegressionCV
logistic_regression_path
2.logistic_regression_path
@deprecated('logistic_regression_path was deprecated in version 0.21 and '
            'will be removed in version 0.23.0')
def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
                             max_iter=100, tol=1e-4, verbose=0,
                             solver='lbfgs', coef=None,
                             class_weight=None, dual=False, penalty='l2',
                             intercept_scaling=1., multi_class='auto',
                             random_state=None, check_input=True,
                             max_squared_sum=None, sample_weight=None,
                             l1_ratio=None):
    """Compute a Logistic Regression model for a list of regularization
    parameters.

    This is an implementation that uses the result of the previous model
    to speed up computations along the set of solutions, making it faster
    than sequentially calling LogisticRegression for the different parameters.
    Note that there will be no speedup with liblinear solver, since it does
    not handle warm-starting.

    .. deprecated:: 0.21
        ``logistic_regression_path`` was deprecated in version 0.21 and will
        be removed in 0.23.

    Read more in the :ref:`User Guide `.

首先可以看到,logistic_regression_path将会在0.23.0版本移出。
由注释说明不难看出,logistic_regression_path主要是基于之前的训练结果用于训练加速。同时特别强调,如果用的是liblinear求解器,logistic_regression_path不能加快速度,因为liblinear不能处理warm-starting场景。

3.LogisticRegression
class LogisticRegression(baseEstimator, LinearClassifierMixin,
                         SparseCoefMixin):
    """
    Logistic Regression (aka logit, MaxEnt) classifier.

    In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
    scheme if the 'multi_class' option is set to 'ovr', and uses the
    cross-entropy loss if the 'multi_class' option is set to 'multinomial'.
    (Currently the 'multinomial' option is supported only by the 'lbfgs',
    'sag', 'saga' and 'newton-cg' solvers.)

    This class implements regularized logistic regression using the
    'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note
    that regularization is applied by default**. It can handle both dense
    and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit
    floats for optimal performance; any other input format will be converted
    (and copied).

    The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
    with primal formulation, or no regularization. The 'liblinear' solver
    supports both L1 and L2 regularization, with a dual formulation only for
    the L2 penalty. The Elastic-Net regularization is only supported by the
    'saga' solver.

    Read more in the :ref:`User Guide `.
...
    def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,
                 fit_intercept=True, intercept_scaling=1, class_weight=None,
                 random_state=None, solver='lbfgs', max_iter=100,
                 multi_class='auto', verbose=0, warm_start=False, n_jobs=None,
                 l1_ratio=None):

        self.penalty = penalty
        self.dual = dual
        self.tol = tol
        self.C = C
        self.fit_intercept = fit_intercept
        self.intercept_scaling = intercept_scaling
        self.class_weight = class_weight
        self.random_state = random_state
        self.solver = solver
        self.max_iter = max_iter
        self.multi_class = multi_class
        self.verbose = verbose
        self.warm_start = warm_start
        self.n_jobs = n_jobs
        self.l1_ratio = l1_ratio

看看注释里头提到的几个关键点:
1.首先对于多分类,如果multi_class设置的是ovr,那么采用的是one-vs-rest的方式。如果设置的是multinomial,会使用cross-entropy loss。
2.具体的优化求解方法包括’liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’。‘newton-cg’, ‘sag’, and ‘lbfgs’支持L2正则,liblinear支持L1与L2,Elastic-Net正则只能使用’saga’。

关心一下初始化方法中的相关参数:
1.dual:选择目标函数是原始形式还是对偶形式,默认false。
2.tol:停止迭代的标准,默认1e-4。
3.C:正则化系数,默认1.0
4.solver:最优化求解方法,默认lbfgs
5.max_iter: 最大迭代次数,默认100。

4.LogisticRegressionCV
class LogisticRegressionCV(LogisticRegression, baseEstimator,
                           LinearClassifierMixin):
    """Logistic Regression CV (aka logit, MaxEnt) classifier.

    See glossary entry for :term:`cross-validation estimator`.

    This class implements logistic regression using liblinear, newton-cg, sag
    of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2
    regularization with primal formulation. The liblinear solver supports both
    L1 and L2 regularization, with a dual formulation only for the L2 penalty.
    Elastic-Net penalty is only supported by the saga solver.

    For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter
    is selected by the cross-validator
    :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed
    using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs'
    solvers can warm-start the coefficients (see :term:`Glossary`).

LogisticRegressionCV的用法与LogisticRegression基本相当,不一样在于LogisticRegressionCV通过cross validation,来选择正则化参数C,同时通过l1_ratios来选择l1正则与l2正则的组合。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/619215.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号