目录
构建环境
Step 1:搭建初始环境
安装Homebrew
安装Pyenv
Step 2:构建开发环境
安装多版本Python
设置虚拟环境
Step 3:完善Python开发环境
训练测试
Step 1:下载源代码
Step 2:准备训练数据
Step 3:进行模型训练
Step 4:进行模型测试
参考资料
学习《深度学习推荐系统》这本书时,为加深理解,尝试以DIEN论文提及的测试程序为抓手,在MacOS工作笔记本上构建了完整的模型训练和测试环境。
构建环境
当前,手头的macOS开发环境如下:
macOS Monterey 12.1Xcode Command Line Tools(xcode-select version 2392)iTerm2
Step 1:搭建初始环境
安装Homebrew
首选从Homebrew官网安装。如果安装过程太耗时或下载失败,可以尝试以下方案:
# 1.下载安装脚本 wget https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh # 2.将脚本中的下述两个变量替换为国内镜像 HOMEBREW_BREW_DEFAULT_GIT_REMOTE="https://mirrors.aliyun.com/homebrew/brew.git" HOMEBREW_CORE_DEFAULT_GIT_REMOTE="https://mirrors.aliyun.com/homebrew/homebrew-core.git" # 3.安装Homebrew bash install.sh
若已安装过Homebrew,但运行 "brew update" 失败,可替换为阿里云Homebrew镜像。步骤详见 homebrew镜像-homebrew下载地址-homebrew安装教程-阿里巴巴开源镜像站 。
安装Pyenv
相关背景知识,可参考官网。
# 1. 安装pyenv brew install pyenv brew install pyenv-virtualenv # 2. 在 "~/.bash_profile"设置变量 echo 'PATH=$(pyenv root)/shims:$PATH' >> ~/.bash_profile eval "$(pyenv init -)" eval "$(pyenv virtualenv-init -)"
Step 2:构建开发环境
TF1和TF2依赖的Python版本差异较大。为了便于尝试多个Python版本(例如:Python 2.7.18 vs. 3.9.10),可以在Pyenv中构建Python环境,甚至通过virtualenv,分目录设置不同的Python版本。
安装多版本Python
# 如果从Python官网下载安装包太慢,可以尝试本地缓存方案。
cd ~/.pyenv/
mkdir cache
cd cache
wget https://www.python.org/ftp/python/2.7.18/Python-2.7.18.tar.xz
wget https://www.python.org/ftp/python/3.9.10/Python-3.9.10.tar.xz
# 安装Python
pyenv install 2.7.18
pyenv install 3.9.10
在安装过程中,需要源码编译Python,可能出现如下错误:
python-build: use zlib from xcode sdk BUILD FAILED (OS X 12.1 using python-build 20180424) clang: error: unsupported option '-V -Wno-objc-signed-char-bool-implicit-int-conversion' clang: error: unknown argument '-qversion'; did you mean '--version'? clang: error: invalid version number in 'MACOSX_DEPLOYMENT_TARGET=12.1' make: *** No targets specified and no makefile found. Stop.
这与Xcode command-line tools版本有关,需要重新安装,执行如下步骤:
sudo rm -rf /Library/Developer/CommandLineTools xcode-select --install
参考资料:
BUILD FAILED (OS X 11.0.1 using python-build 20180424) · Issue #1738 · pyenv/pyenv · GitHub
Technical Note TN2339: Building from the Command Line with Xcode FAQ
Home · pyenv/pyenv Wiki · GitHub
Common build problems · pyenv/pyenv Wiki · GitHub
设置虚拟环境
以Python 2.7.18为例,my-py2-workshop目录是所需的工作目录。
# 1.创建虚拟环境 pyenv virtualenv 2.7.18 my-py2 # 2.创建工作目录 mkdir ~/my-py2-workshop # 3.在工作目录设置application-specific虚拟环境 cd ~/my-py2-workshop pyenv local my-py2
参考资料:
Creating virtual environments with Pyenv – Rob Allen's DevNotes
How to use pyenv to run multiple versions of Python on a Mac | Opensource.com
Step 3:完善Python开发环境经过上述几步,已在my-py2-workshop目录构建了Python 2.7.18环境。为了减少后续脚本适配改造,需要确保PIP升级到20.3.4版本。若为更低版本,可执行如下操作:
pip install --upgrade pip
如果PIP安装速度过慢,可以尝试替换为国内镜像。例如:Simple Index。
至此,完成所有开发环境准备工作。Python 3.9.10的设置方法类似,不再赘述。
总结:在my-py2-workshop目录中,使用Python 2.7.18。其他目录,仍沿用系统默认Python版本。
训练测试
Step 1:下载源代码
# 1.切换到工作目录
cd ~/my-py2-workshop
# 2.直接下载ZIP文件(1f314d1 on 18 Jan 2019,commit 1f314d16aa1700ee02777e6163fb8ca94e3d2810)
wget https://github.com/mouna99/dien/archive/refs/heads/master.zip
unzip master.zip
Step 2:准备训练数据
# 1.进入DIEN目录
cd ~/my-py2-workshop/dien-master
# 2.为缩短等待时间,仅取部分数据
tar -jxvf data1.tar.gz
head -n100000 data1/reviews-info > reviews-info
tar -jxvf data2.tar.gz
mv data2/item-info .
# 1.切换到工作目录 cd ~/my-py2-workshop # 2.直接下载ZIP文件(1f314d1 on 18 Jan 2019,commit 1f314d16aa1700ee02777e6163fb8ca94e3d2810) wget https://github.com/mouna99/dien/archive/refs/heads/master.zip unzip master.zip
Step 2:准备训练数据
# 1.进入DIEN目录
cd ~/my-py2-workshop/dien-master
# 2.为缩短等待时间,仅取部分数据
tar -jxvf data1.tar.gz
head -n100000 data1/reviews-info > reviews-info
tar -jxvf data2.tar.gz
mv data2/item-info .
说明:
README.md 中有详细的步骤,采用"method 2"准备数据。事实上,仅依赖reviews-info和item-info两个数据文件,后续步骤将生成其他依赖文件。关于Amazon product data 的内容格式,详见 Amazon review data
参考 prepare_data.sh ,执行以下Python脚本:
# 注释掉process_data.py的#98和#99,不执行process_meta()和process_reviews()两个步骤。 python script/process_data.py # 依次执行以下步骤 python script/local_aggretor.py python script/split_by_user.py python script/generate_voc.py
Step 3:进行模型训练
确认已安装以下模块
pip install numpy==1.16.6 pip install tensorflow==1.15.0 pip install protobuf==3.17.3 pip install keras==2.8.0
在 README.md 中提及的TensorFlow 1.4版本过于陈旧,搭建环境非常困难。因此,使用TF1的最终发布版TensorFlow 1.15.0,也更容易查阅官方文档https://www.tensorflow.org/versions/r1.15/api_docs 。
python script/train.py train DNN
过程中可能出现一些报错,简单处理就可修复。
# 报错一 Traceback (most recent call last): File "script/train.py", line 4, infrom model import * File "script/model.py", line 6, in from rnn import dynamic_rnn File "script/rnn.py", line 45, in _like_rnncell = rnn_cell_impl._like_rnncell AttributeError: 'module' object has no attribute '_like_rnncell' # 解决:更改script/rnn.py的#45 45 "_like_rnncell = rnn_cell_impl._like_rnncell" --> "_like_rnncell = rnn_cell_impl.assert_like_rnncell"
# 报错二 Traceback (most recent call last): File "script/train.py", line 4, infrom model import * File "script/model.py", line 7, in from utils import * File "script/utils.py", line 3, in from tensorflow.python.ops.rnn_cell_impl import _Linear importError: cannot import name _Linear # 解决:更改script/utils.py的#3 3 "from tensorflow.python.ops.rnn_cell_impl import _Linear" --> "from tensorflow.contrib.rnn.python.ops.core_rnn_cell import _Linear"
# 报错三 Traceback (most recent call last): File "script/train.py", line 4, infrom model import * File "script/model.py", line 7, in from utils import * File "script/utils.py", line 9, in from keras import backend as K File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/__init__.py", line 22, in from keras import distribute File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/distribute/__init__.py", line 18, in from keras.distribute import sidecar_evaluator File "~/.pyenv/versions/my-py2/lib/python2.7/site-packages/keras/distribute/sidecar_evaluator.py", line 180 f'No checkpoints appear to be found after {_CHECKPOINT_TIMEOUT_SEC} ' # 解决:删除script/utils.py的#9 9 "from keras import backend as K"
Step 4:进行模型测试
python script/train.py test DNN
参考资料
虽然 Python 2.7 已经渐出历史舞台,但是它仍是不可或缺的基础环境。
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at Release process - pip documentation v22.1.dev0
可以参考官方文档,加速学习。
TensorFlow 1.5 https://www.tensorflow.org/versions/r1.15/api_docs/python/tfNumPy Reference NumPy Reference — NumPy v1.16 ManualThe Python Debugger 26.2. pdb — The Python Debugger — Python 2.7.18 documentation



