Hyperopt|Hyperopt 基于MongoDB的并行计算 Hyperopt基于MongoDB的并行计算

Hyperopt是实现超参数优化的python第三方库, 最近发现其可以运用mongo进行并行计算, 稍微研究了一番,记录并分享一下.
Mongo的安装就不说了, 遵循链接内容即可
在Ubuntu下进行MongoDB安装步骤
安装完成后启动mongo, 运行下官方的demo看一下:

import math from hyperopt import fmin, tpe, hp from hyperopt.mongoexp import MongoTrialstrials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp1') best = fmin(math.sin, hp.uniform('x', -2, 2), trials=trials, algo=tpe.suggest, max_evals=10)

以上的代码中, 实例化 MongoTrials 并赋值给trials变量, 其第一个参数是 mongo 进程, 数据库是 'foodb', 'jobs' 表. 'exp_key' 是任务的编号.(如果修改这个参数, 表明是一个新的任务, 会重新运行搜索而不是从数据库中取结果).
实际运行demin的过程中, fmin 会被阻塞. 这是因为 MongoTrials 会将 fmin 作为异步对象, 所以出现新的搜索点(参数组合)时, fmin 不会去评估目标函数而是等待另一个进程替它完成这个工作.
hyperopt-mongo-worker 脚本就是干这个活滴, 新开一个 shell 输入
hyperopt-mongo-worker --mongo=localhost:1234/foo_db --poll-interval=0.1
第一个参数就是 mongo 的地址, 第二个参数是轮询间隔. 由于demo很简单, 我们很快就得到一个最优的 x 值.
但以上的demo太简单了, 我们想将自己编写的模型替换掉 math.sin. 以一个随机森林举例:

import hyperopt.mongoexp import pandas as pd import numpy as npfrom hyperopt import fmin, tpe, hp, space_eval, pyll, rand, anneal from hyperopt.mongoexp import MongoTrials from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score, cross_val_predict, train_test_splitdef randomforest(args): class_weight = args['class_weight'] criterion = args['criterion'] min_impurity_split = args['min_impurity_split'] n_estimators = args['n_estimators'] min_samples_leaf = args['min_samples_leaf'] min_samples_split = args['min_samples_split']estim = RandomForestClassifier( n_estimators=n_estimators, class_weight=class_weight, criterion=criterion, min_impurity_decrease=min_impurity_split, min_samples_leaf=min_samples_leaf, min_samples_split = min_samples_split )y_pred = cross_val_predict(estim, train_x, train_y, cv=3) metric = f1_score(train_y, y_pred) return -metricspace = { 'class_weight': hp.choice('class_weight', [None, 'balanced']), 'criterion': hp.choice('criterion', ['gini', 'entropy']), 'min_impurity_split': hp.lognormal('min_impurity_split', 1e-10, 1e-4)*1e-7, 'min_samples_leaf': hp.randint('min_samples_leaf', 10)+1, 'min_samples_split': hp.randint('min_samples_split', 10)+1, 'n_estimators': hp.randint('n_estimators', 950)+50 }if __name__== '__main__': trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp2') best = fmin(fn=randomforest, space=space,algo=rand.suggest, max_evals=100, trials=trials) print best

很遗憾有个属性错误, 就是找不到 randomforest 这个模块.

AttributeError: Can't get attribute 'randomforest' on 
 google了一下, 有网友给出了一些解决办法, 我们先将 objective function 写到另外的脚本中, 例如:

 
 # hyperopt_model.py
# !-*- coding: utf-8 -*-
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, cross_val_predict,train_test_splitimport pandas as pd
df = pd.read_csv('xxxxx.csv', header=0)
y, X = df[df.columns[0]], df[df.columns[1:]]def randomforest(args):
n_estimators = args['n_estimators']
criterion = args['criterion']
max_features = args['max_features']
min_impurity_split = args['min_impurity_split']
min_samples_leaf = args['min_samples_leaf']
min_samples_split = args['min_samples_split']
class_weight = args['class_weight']global X, y
clf = RandomForestClassifier(
class_weight=class_weight,
criterion=criterion,
max_features=max_features,
min_samples_leaf=min_samples_leaf,
min_impurity_split=min_impurity_split,
min_samples_split=min_samples_split,
n_estimators=n_estimators,
random_state=1
)
y_pred = cross_val_predict(clf, X, y, cv=3)
metric = accuracy_score(y, y_pred)
return -metric


 
 将这个脚本命名为 hyperopt_model.py 并将其写入环境变量中, 顺便修改下最上面的脚本:
 export PYTHONPATH="${PYTHONPATH}:"

 
 import pandas as pd
import numpy as np
import hyperopt_model
import hyperopt.mongoexpfrom hyperopt import fmin, tpe, hp, space_eval, pyll, rand, anneal
from hyperopt.mongoexp import MongoTrials
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, cross_val_predict, train_test_splitif __name__== '__main__':
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp2')
best = fmin(fn=hyperopt_model.randomforest, space=hyperopt_model.space,algo=rand.suggest, max_evals=100, trials=trials)
print best


 
 之后再运行 hyperopt-mongo-worker 就ok了, 总体时间消耗大概降低了50% 左右.

 
 【Hyperopt|Hyperopt 基于MongoDB的并行计算】我还尝试了用进程管理池管理这两个进程(代码如下), 但是总有一些error没有解决, 如果那位大佬有更好的方法, 烦请告知, 感谢!

 
 # coding: utf-8
import sys
import logging
import hyperopt_modelfrom multiprocessing import Pool, Process
from hyperopt import fmin, tpe, hp, rand
from hyperopt.mongoexp import MongoTrialsdef task1():
logging.basicConfig(stream=sys.stderr, level=logging.INFO)
print 'task1 running'
sys.exit(hyperopt.mongoexp.main_worker())def task2(msg):
trials = MongoTrials('mongo://localhost:1234/foo_db/jobs', exp_key='exp3')
best = fmin(fn=hyperopt_model.randomforest, space=hyperopt_model.space,algo=rand.suggest, max_evals=100, trials=trials)
print msg
print 'task2 is running'
return bestif __name__ == '__main__':
pool = Pool(processes=4)
p = Process(target=task1)p.start()
ret = pool.apply_async(task2, args=(1,))pool.close()
pool.join()
p.join()print 'processes done, result:'
print ret.get()### hyperopt### MongoDB### 并行计算### 自定义超参优化模型




		  	

    
    




    
    
    


推荐阅读

           
                  
              
                  单机经典角色扮演类游戏，好玩的单机游戏角色扮演 
                
                   
                
              
            

                  
              
                  卷心菜怎么洗农药 卷心菜怎么洗 
                
                   
                
              
            

                  
              
                  正睿，正睿nbspI243738WE内存规格怎样 
                
                   
                
              
            

                  
              
                  苹果6投屏要怎么设置 
                
                   
                
              
            

                  
              
                  雪铁龙天逸缺点 赶紧来看看 
                
                   
                
              
            

                  
              
                  免费文件夹加密器,文件夹加密器忘记密码怎么办 
                
                   
                
              
            

                  
              
                  玉米虫养殖方法，玉米虫人工养殖的方法 
                
                   
                
              
            

                  
              
                  vivox21参数,vivox21手机参数 
                
                   
                
              
            

                  
              
                  内心的想法怎么会被别人知道 如何不被别人知道自已的想法，如何不让别人知道自己的想法 
                
                   
                
              
            

                  
              
                  微信小程序新手教程程序文件含义 
                
                   
                
              
            

                  
              
                  冒险岛品克缤应该怎么用啊 
                
                   
                
              
            

                  
              
                  赛车的防滚架有什么用 
                
                   
                
              
            

                  
              
                  2012款福克斯两厢质量怎么样 福特福克斯两厢怎么样 
                
                   
                
              
            

                  
              
                  法国Cougar MS 40毫米榴弹发射器，现代化的外表包裹着简单的构造 
                
                   
                
              
            

                  
              
                  摄影师画像 摄影师画匠 
                
                   
                
              
            

                  
              
                  荒野行动前10武器排名2020,荒野行动武器大全图解 
                
                   
                
              
            

                  
              
                  视频需要配合什么问题，做视频需要准备什么工具 
                
                   
                
              
            

                  
              
                  为什么不能养泰迪熊多肉 
                
                   
                
              
            

                  
              
                  佳能相机x7 佳能x7i是750d吗 
                
                   
                
              
            

                  
              
                  qq群拉好友不用同意直接进群 怎么加入qq群 
                
                   
                
              
            

          

基于微信小程序带后端ssm接口小区物业管理平台设计 
 MongoDB，Wondows下免安装版|MongoDB，Wondows下免安装版 （简化版操作） 
 基于|基于 antd 风格的 element-table + pagination 的二次封装 
 基于爱，才会有“愿望”当“要求”。2017.8.12 
 javaweb|基于Servlet+jsp+mysql开发javaWeb学生成绩管理系统 
 JavaScript|vue 基于axios封装request接口请求——request.js文件 
 韵达基于云原生的业务中台建设 | 实战派 
 EasyOA|EasyOA 基于SSM的实现 未完成总结与自我批判 
 SpringBoot整合MongoDB完整实例代码 
 基于stm32智能风扇|基于stm32智能风扇_一款基于STM32的智能灭火机器人设计