pycurl
简介
安装
#安装pycurl
pip install pycurl
# 安装phantomjs
1).下载phantomjs(http://phantomjs.org/download.html)官网下载mac版本
2).下载后直接解压,将解压后的phantomjs-2.1.1-macosx文件夹放到你想放的目录下(随意、开心就好)
# 配置环境变量
phantomjs --version
# 安装pyspider
# 验证安装pyspider
pyspider all
#查看pyspider启动情况
lsof -i:25555
#杀死进程
kill -9 14211
补充说明:
安装参考文章:https://www.jianshu.com/p/e37603bc70c7
常见问题
问题1:SyntaxError: invalid syntax
Traceback (most recent call last): File “/usr/local/bin/pyspider”,
line 5, in
from pyspider.run import main File “/usr/local/lib/python3.7/site-packages/pyspider/run.py”, line 231
async=True, get_object=False, no_input=False):
^ SyntaxError: invalid syntax
问题分析
源码里面使用了async作为变量名,但是python3.7以后async已经是关键字了,所以会报错。 参数明冲突:
https://blog.csdn.net/qq_26261381/article/details/86514138
https://www.jianshu.com/p/a0042a636229
解决方案
待修改文件 /usr/local/lib/python3.7/site-packages/pyspider/run.py /usr/local/lib/python3.7/site-packages/pyspider/webui/app.py
/usr/local/lib/python3.7/site-packages/pyspider/fetcher/tornado_fetcher.py
问题2: libcurl link-time version (7.64.1) is older than compile-time version
ImportError: pycurl: libcurl link-time version (7.64.1) is older than compile-time version (7.65.3)
问题分析
https://www.cjjjs.com/article/201841813540391
查看curl版本,仅提取有用信息
curl -V curl 7.65.3 (x86_64-apple-darwin13.4.0) libcurl/7.65.3
OpenSSL/1.1.1d zlib/1.2.11 libssh2/1.8.2
查找当前系统libcurl.*文件
/usr/lib/libcurl.dylib /usr/lib/libcurl.4.dylib
/usr/lib/libcurl.3.dylib/Users/apple/opt/anaconda3/lib/libcurl.dylib
/Users/apple/opt/anaconda3/pkgs/libcurl-7.65.3-h051b688_0/lib/libcurl.dylib/System/Volumes/Data/Users/apple/opt/anaconda3/lib/libcurl.dylib
/System/Volumes/Data/Users/apple/opt/anaconda3/pkgs/libcurl-7.65.3-h051b688_0/lib/libcurl.dylib
解决方案一:卸载并升级pycurl
#首先确认当前执行脚本的Python版本,其次用该版本下的pip进行卸载、升级操作。
/usr/bin/python -m pip list
/usr/bin/python -m pip uninstall pycurl
/usr/bin/python -m pip install pycurl
or
pip uninstall pycurl / pip install pycurl
再次启动pyspider查看效果
#启动pyspider
pyspider all
仍旧报错,因此解决方案一验证失败
再次启动pyspider报错:
ImportError: pycurl: libcurl link-time version (7.64.1) is older than
compile-time version (7.65.3)
解决方案二:卸载并升级pycurl(推荐)
#重新编译安装
pip3 install pycurl --compile --no-cache-dir
验证python导入的库文件目录
删除/usr/lib目录下面的libcurl.4.dylib库以后,报错:
import pycurl # type: ignore ImportError:
dlopen(/usr/local/lib/python3.7/site-packages/pycurl.cpython-37m-darwin.so,
2): Library not loaded: @rpath/libcurl.4.dylib Referenced from:
/usr/local/lib/python3.7/site-packages/pycurl.cpython-37m-darwin.so
Reason: image not found
#python运行环境下导入pycurl
>>> import pycurl
Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'pycurl'
分析:
错误提示是全局的,因为导入这个模块的文件是公共库文件,所以一出错,很多地方受影响。然后是在导入pycurl时报错,错误提示是链接时间和编译时间不一致。而我是用C++新编译安装了一个高版本的curl,python的curl是低版本的,是以前安装的。那么提示的链接时间和编译时间不一致,那么可以确定是新编译安装的curl和python安装的curl的冲突。
问题是为什么会有这样的提示?python安装的curl是编译好的,直接安装的。而我新装的这个是编译安装的,所以我们不难理解错误提示链接的时间和编译安装时间不一致了。这个好确定产生问题的场景,可以大致确定范围在这个库。
解决思路
- 重装pycurl failed!
- pycurl.cpython-37m-darwin.so 重新编译
结论:
python导入的库文件确实是site-packages目录下的。
tesserocr
简介
安装
#安装imagemagick
brew install imagemagick
成功安装结果
==> Caveats(警告)
==> libffi libffi is keg-only, which means it was not symlinked into /usr/local, because macOS already provides this software and
installing another version in parallel can cause all kinds of trouble.For compilers to find libffi you may need to set: export
LDFLAGS=”-L/usr/local/opt/libffi/lib” export
CPPFLAGS=”-I/usr/local/opt/libffi/include”==> python@3.8 Python has been installed as /usr/local/opt/python@3.8/bin/python3
You can install Python packages with
/usr/local/opt/python@3.8/bin/pip3 install They will install
into the site-package directory
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packagesSee: https://docs.brew.sh/Homebrew-and-Python
python@3.8 is keg-only, which means it was not symlinked into
/usr/local, because this is an alternate version of another formula.If you need to have python@3.8 first in your PATH run: echo ‘export
PATH=”/usr/local/opt/python@3.8/bin:$PATH”’ >> ~/.zshrcFor compilers to find python@3.8 you may need to set: export
LDFLAGS=”-L/usr/local/opt/python@3.8/lib”==> glib Bash completion has been installed to: /usr/local/etc/bash_completion.d
==> docbook To use the DocBook package in your XML toolchain, you need to add the following to your ~/.bashrc:export XML_CATALOG_FILES=”/usr/local/etc/xml/catalog”
==> gnu-getopt gnu-getopt is keg-only, which means it was not symlinked into /usr/local, because macOS already provides this
software and installing another version in parallel can cause all
kinds of trouble.If you need to have gnu-getopt first in your PATH run: echo ‘export
PATH=”/usr/local/opt/gnu-getopt/bin:$PATH”’ >> ~/.zshrcBash completion has been installed to:
/usr/local/opt/gnu-getopt/etc/bash_completion.d
==> libtool In order to prevent conflicts with Apple’s own libtool we have prepended a “g” so, you have instead: glibtool and glibtoolize.
# 安装tesseract
brew install tesseract-lang
# 安装
pip install tesserocr pillow
# 验证安装
## 测试图片地址
https://raw.githubusercontent.com/Python3WebSpider/TestTess/master/image.png
tesseract image.png result -l eng && cat result.txt
# 主要查看具体的信息及依赖关系当前版本注意事项等
brew info tesseract
运行结果
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
Python3WebSpider
参数说明:
第一个参数:图片名称
第二个参数:结果保存的目标文件名称
第三个参数-l: 指定使用的语言包,此处eng表示英文
cat:用于输出结果
常见问题
Mac使用brew安装tesseract提示invalid: –all-languages
https://blog.csdn.net/weixin_40368256/article/details/100624099brew install tesseract –all-languages (failed)
RedisDump
简介
安装
#安装命令
准备:首先安装Ruby
sudo gem install redis-dump
#验证安装
redis-dump
redis-load
Flask
简介
Web库-Flask的安装
#安装命令
pip install flask
验证安装
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello():
return 'Hello World!'
if __name__ == '__main__':
app.run()
Tornado
简介
Web库-Tornado的安装
#安装命令
pip install tornado
验证安装
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
def make_app():
return tornado.web.Application([
(r"/", MainHandler)
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
mitmproxy
简介
App爬取-mitmproxy的安装
#安装命令
pip install mitmproxy
验证安装
node
简介
Node的安装
#安装命令
brew install node
成功安装结果
==> Caveats
==> icu4c icu4c is keg-only, which means it was not symlinked into /usr/local, because macOS provides libicucore.dylib (but nothing
else).If you need to have icu4c first in your PATH run: echo ‘export
PATH=”/usr/local/opt/icu4c/bin:
PA
T
H
”
′
>
>
/
.
z
s
h
r
c
e
c
h
o
′
e
x
p
o
r
t
P
A
T
H
=
”
/
u
s
r
/
l
o
c
a
l
/
o
p
t
/
i
c
u
4
c
/
s
b
i
n
:
PATH”‘ >> ~/.zshrc echo ‘export PATH=”/usr/local/opt/icu4c/sbin:
P
A
T
H
”
′
>
>
/
.
z
s
h
r
c
e
c
h
o
′
e
x
p
o
r
t
P
A
T
H
=
”
/
u
s
r
/
l
o
c
a
l
/
o
p
t
/
i
c
u
4
c
/
s
b
i
n
:
PATH”’ >> ~/.zshrcFor compilers to find icu4c you may need to set: export
LDFLAGS=”-L/usr/local/opt/icu4c/lib” export
CPPFLAGS=”-I/usr/local/opt/icu4c/include”==> node Bash completion has been installed to: /usr/local/etc/bash_completion.d
验证安装
node -v
npm -v
appium
简介
App爬取-Appium的安装
npm install -g appium
安装命令
pip install mitmproxy
验证安装