“并行”管道使用gridsearch获得最佳模型

管道

None

在其

steps

（估算器列表）中提供支持，可以关闭管道的某些部分。

您可以通过在传递给GridSearchCV的参数中进行设置，将参数传递给管道的

None

参数以

named_steps

不使用该估算器。

假设您要使用

PCA

和

TruncatedSVD

。

pca = decomposition.PCA()svd = decomposition.TruncatedSVD()svm = SVC()n_components = [20, 40, 64]

添加

svd

管道

pipe = Pipeline(steps=[('pca', pca), ('svd', svd), ('svm', svm)])# Change params_grid -> Instead of dict, make it a list of dict**# In the first element, pass `svd = None`, and in second `pca = None`params_grid = [{'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'pca__n_components': n_components,'svd':[None]},{'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'pca':[None],'svd__n_components': n_components,'svd__algorithm':['randomized']}]

现在只需将管道对象传递给gridsearchCV

grd = GridSearchCV(pipe, param_grid = params_grid)

调用

grd.fit()

将

params_grid

一次使用一个中的所有值在列表的两个元素上搜索参数。

如果参数名称相同，则简化

如果“
OR”中的两个估算器都具有与这种情况相同的参数名称，其中

PCA

和

TruncatedSVD

具有

n_components

（或您只想搜索此参数，则可以简化为：

#Here I have changed the name to `preprocessor`pipe = Pipeline(steps=[('preprocessor', pca), ('svm', svm)])#Now assign both estimators to `preprocessor` as below:params_grid = {'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'preprocessor':[pca, svd],'preprocessor__n_components': n_components,}

该方案的推广

我们可以创建一个函数，该函数可以使用适当的值自动填充

param_grid

要提供给我们的函数

GridSearchCV

：-

def make_param_grids(steps, param_grids):    final_params=[]    # Itertools.product will do a permutation such that     # (pca OR svd) AND (svm OR rf) will become ->    # (pca, svm) , (pca, rf) , (svd, svm) , (svd, rf)    for estimator_names in itertools.product(*steps.values()):        current_grid = {}        # Step_name and estimator_name should correspond        # i.e preprocessor must be from pca and select.        for step_name, estimator_name in zip(steps.keys(), estimator_names): for param, value in param_grids.get(estimator_name).iteritems():     if param == 'object':         # Set actual estimator in pipeline         current_grid[step_name]=[value]     else:         # Set parameters corresponding to above estimator         current_grid[step_name+'__'+param]=value        #Append this dictionary to final params         final_params.append(current_grid)return final_params

并在任意数量的变压器和估计器上使用此功能

# add all the estimators you want to "OR" in single key# use OR between `pca` and `select`, # use OR between `svm` and `rf`# different keys will be evaluated as serial estimator in pipelinepipeline_steps = {'preprocessor':['pca', 'select'],       'classifier':['svm', 'rf']}# fill parameters to be searched in this dictall_param_grids = {'svm':{'object':SVC(),     'C':[0.1,0.2]   },        'rf':{'object':RandomForestClassifier(),   'n_estimators':[10,20]  },        'pca':{'object':PCA(),    'n_components':[10,20]   },        'select':{'object':SelectKBest(),       'k':[5,10]      }       }# Call the method on the above declared variablesparam_grids_list = make_param_grids(pipeline_steps, all_param_grids)

现在使用上面使用的名称初始化管道对象

pipeline_steps

# The PCA() and SVC() used here are just to initialize the pipeline,# actual estimators will be used from our `param_grids_list`pipe = Pipeline(steps=[('preprocessor',PCA()), ('classifier', SVC())])

现在，最后列出gridSearchCV对象并拟合数据

grd = GridSearchCV(pipe, param_grid = param_grids_list)grd.fit(X, y)

“并行”管道使用gridsearch获得最佳模型

如果参数名称相同，则简化

面试问答相关栏目本月热门文章