管道
None在其
steps(估算器列表)中提供支持,可以关闭管道的某些部分。
您可以通过在传递给GridSearchCV的参数中进行设置,将参数传递给管道的
None参数以
named_steps不使用该估算器。
假设您要使用
PCA和
TruncatedSVD。
pca = decomposition.PCA()svd = decomposition.TruncatedSVD()svm = SVC()n_components = [20, 40, 64]
添加
svd管道
pipe = Pipeline(steps=[('pca', pca), ('svd', svd), ('svm', svm)])# Change params_grid -> Instead of dict, make it a list of dict**# In the first element, pass `svd = None`, and in second `pca = None`params_grid = [{'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'pca__n_components': n_components,'svd':[None]},{'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'pca':[None],'svd__n_components': n_components,'svd__algorithm':['randomized']}]现在只需将管道对象传递给gridsearchCV
grd = GridSearchCV(pipe, param_grid = params_grid)
调用
grd.fit()将
params_grid一次使用一个中的所有值在列表的两个元素上搜索参数。
如果参数名称相同,则简化
如果“
OR”中的两个估算器都具有与这种情况相同的参数名称,其中
PCA和
TruncatedSVD具有
n_components(或您只想搜索此参数,则可以简化为:
#Here I have changed the name to `preprocessor`pipe = Pipeline(steps=[('preprocessor', pca), ('svm', svm)])#Now assign both estimators to `preprocessor` as below:params_grid = {'svm__C': [1, 10, 100, 1000],'svm__kernel': ['linear', 'rbf'],'svm__gamma': [0.001, 0.0001],'preprocessor':[pca, svd],'preprocessor__n_components': n_components,}该方案的推广
我们可以创建一个函数,该函数可以使用适当的值自动填充
param_grid要提供给我们的函数
GridSearchCV:-
def make_param_grids(steps, param_grids): final_params=[] # Itertools.product will do a permutation such that # (pca OR svd) AND (svm OR rf) will become -> # (pca, svm) , (pca, rf) , (svd, svm) , (svd, rf) for estimator_names in itertools.product(*steps.values()): current_grid = {} # Step_name and estimator_name should correspond # i.e preprocessor must be from pca and select. for step_name, estimator_name in zip(steps.keys(), estimator_names): for param, value in param_grids.get(estimator_name).iteritems(): if param == 'object': # Set actual estimator in pipeline current_grid[step_name]=[value] else: # Set parameters corresponding to above estimator current_grid[step_name+'__'+param]=value #Append this dictionary to final params final_params.append(current_grid)return final_params并在任意数量的变压器和估计器上使用此功能
# add all the estimators you want to "OR" in single key# use OR between `pca` and `select`, # use OR between `svm` and `rf`# different keys will be evaluated as serial estimator in pipelinepipeline_steps = {'preprocessor':['pca', 'select'], 'classifier':['svm', 'rf']}# fill parameters to be searched in this dictall_param_grids = {'svm':{'object':SVC(), 'C':[0.1,0.2] }, 'rf':{'object':RandomForestClassifier(), 'n_estimators':[10,20] }, 'pca':{'object':PCA(), 'n_components':[10,20] }, 'select':{'object':SelectKBest(), 'k':[5,10] } }# Call the method on the above declared variablesparam_grids_list = make_param_grids(pipeline_steps, all_param_grids)现在使用上面使用的名称初始化管道对象
pipeline_steps
# The PCA() and SVC() used here are just to initialize the pipeline,# actual estimators will be used from our `param_grids_list`pipe = Pipeline(steps=[('preprocessor',PCA()), ('classifier', SVC())])现在,最后列出gridSearchCV对象并拟合数据
grd = GridSearchCV(pipe, param_grid = param_grids_list)grd.fit(X, y)



