1、文档
Spark机器学习库(MLlib)官方指南手册中文版
https://blog.csdn.net/liulingyuan6/article/details/53582300
厦门大学数据库实验室的Spark教程
http://mocom.xmu.edu.cn/article/show/5858ab782b2730e00d70fa08/0/1
概念:
WOE
https://blog.csdn.net/shenxiaoming77/article/details/78771698
2、实操
/spark-submit
–master yarn –deploy-mode cluster –queue tempo-queue –name MINE-688dc
–files
–executor-memory 2g –executor-cores 2 –driver-memory 2g –driver-cores 2 –num-executors 2
–class com.meritdata.tempo.mine.server.executor.Executor
spark-internal hdfs://*/688dc.xml
3、优化
pip3 list
Package Version
————- ———-
APScheduler 3.5.3
asn1crypto 0.24.0
bcrypt 3.1.4
certifi 2018.10.15
cffi 1.11.5
chardet 3.0.4
Click 7.0
cryptography 2.3.1
dnspython 1.15.0
fire 0.1.3
idna 2.7
IPy 0.83
pexpect 4.6.0
pip 19.0.3
ply 3.11
prettytable 0.7.2
psutil 5.4.8
ptyprocess 0.6.0
pyasn1 0.4.4
pycparser 2.19
pycryptodomex 3.8.1
PyNaCl 1.3.0
pysmi 0.3.4
pysnmp 4.4.6
pytz 2018.7
requests 2.20.0
setuptools 39.0.1
six 1.11.0
tzlocal 1.5.1
urllib3 1.24.1