WindowsでPycURLを使う
PycURL – A Python Interface To The cURL library — PycURL 7.45.1 documentation
PythonでHTTP関係の処理をするならRequestsなのですが、PycURLはlibcurlベースで高速だったり細かい処理ができたりして便利なこともあるので使ってみます。
残念ながらWindows用のバイナリは公式に存在しないそうです。かといってソースからbuildするのもなあ。
検索してみるとこちらのサイトでバイナリを配布してるようなので使ってみます。
Python Extension Packages for Windows - Christoph Gohlke
自分の環境はこれなので "pycurl‑7.45.1‑cp38‑cp38‑win_amd64.whl" をダウンロードします。
$ python -VV Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]
$ pip install pycurl-7.45.1-cp38-cp38-win_amd64.whl
とりあえず動くサンプル。
from datetime import datetime, timedelta import pycurl from io import BytesIO def report(curl): print("Performance report:") print("-----------------------------------------------------------------------") print("EFFECTIVE_URL : {}".format(curl.getinfo(pycurl.EFFECTIVE_URL))) print("RESPONSE_CODE : {}".format(curl.getinfo(pycurl.RESPONSE_CODE))) print("SIZE_DOWNLOAD : {}".format(curl.getinfo(pycurl.SIZE_DOWNLOAD))) print("NAMELOOKUP_TIME : {}".format(curl.getinfo(pycurl.NAMELOOKUP_TIME))) print("CONNECT_TIME : {} {}".format(curl.getinfo(pycurl.CONNECT_TIME), curl.getinfo(pycurl.CONNECT_TIME)-curl.getinfo(pycurl.NAMELOOKUP_TIME))) # APPCONNECT : ssl_handshake_done print("APPCONNECT_TIME : {} {}".format(curl.getinfo(pycurl.APPCONNECT_TIME), curl.getinfo(pycurl.APPCONNECT_TIME)-curl.getinfo(pycurl.CONNECT_TIME))) # Time to HTTP GET done # https://curl.se/libcurl/c/CURLINFO_PRETRANSFER_TIME.html print("PRETRANSFER_TIME : {} {}".format(curl.getinfo(pycurl.PRETRANSFER_TIME), curl.getinfo(pycurl.PRETRANSFER_TIME)-curl.getinfo(pycurl.APPCONNECT_TIME))) # STARTTRANSFER : TTFB(time to first byte) print("STARTTRANSFER_TIME : {} {}".format(curl.getinfo(pycurl.STARTTRANSFER_TIME), curl.getinfo(pycurl.STARTTRANSFER_TIME)-curl.getinfo(pycurl.PRETRANSFER_TIME))) print("TOTAL_TIME : {} {}".format(curl.getinfo(pycurl.TOTAL_TIME), curl.getinfo(pycurl.TOTAL_TIME)-curl.getinfo(pycurl.STARTTRANSFER_TIME))) print("REDIRECT_TIME : {}".format(curl.getinfo(pycurl.REDIRECT_TIME))) print() def _curl_debug(type, data): # CURLINFO_TEXT = 0, # CURLINFO_HEADER_IN, /* 1 */ # CURLINFO_HEADER_OUT, /* 2 */ # CURLINFO_DATA_IN, /* 3 */ # CURLINFO_DATA_OUT, /* 4 */ # CURLINFO_SSL_DATA_IN, /* 5 */ # CURLINFO_SSL_DATA_OUT, /* 6 */ type_str = ('*', '<', '>', '{', '}', '<<', '>>') msg = None if type == 3 or type == 4: msg = "[{} bytes data]".format(len(data)) else: msg = data.decode('utf-8').strip() print("{} {} {}".format(datetime.now(), type_str[type], msg)) buffer = BytesIO() curl = pycurl.Curl() curl.setopt(pycurl.URL, 'http://pycurl.io/docs/latest/index.html') curl.setopt(pycurl.WRITEDATA, buffer) curl.setopt(pycurl.FOLLOWLOCATION, True) curl.setopt(pycurl.VERBOSE, True) curl.setopt(pycurl.DEBUGFUNCTION, _curl_debug) curl.perform() print() report(curl) curl.close()
$ python a.py 2022-05-06 09:09:49.740667 * Trying 192.30.252.154:80... 2022-05-06 09:09:49.914677 * Connected to pycurl.io (192.30.252.154) port 80 (#0) 2022-05-06 09:09:49.914677 > GET /docs/latest/index.html HTTP/1.1 Host: pycurl.io User-Agent: PycURL/7.45.1 libcurl/7.80.0 Schannel zlib/1.2.11 zstd/1.5.2 c-ares/1.18.1 libssh2/1.10.0 Accept: */* 2022-05-06 09:09:50.093687 * Mark bundle as not supporting multiuse 2022-05-06 09:09:50.094688 < HTTP/1.1 200 OK 2022-05-06 09:09:50.094688 < Server: GitHub.com 2022-05-06 09:09:50.094688 < Date: Fri, 06 May 2022 00:09:49 GMT 2022-05-06 09:09:50.094688 < Content-Type: text/html; charset=utf-8 2022-05-06 09:09:50.094688 < Content-Length: 22758 2022-05-06 09:09:50.094688 < Vary: Accept-Encoding 2022-05-06 09:09:50.094688 < Last-Modified: Sun, 13 Mar 2022 07:25:32 GMT 2022-05-06 09:09:50.094688 < Vary: Accept-Encoding 2022-05-06 09:09:50.094688 < Access-Control-Allow-Origin: * 2022-05-06 09:09:50.094688 < ETag: "622d9c6c-58e6" 2022-05-06 09:09:50.094688 < expires: Fri, 06 May 2022 00:19:49 GMT 2022-05-06 09:09:50.094688 < Cache-Control: max-age=600 2022-05-06 09:09:50.094688 < Accept-Ranges: bytes 2022-05-06 09:09:50.094688 < x-proxy-cache: MISS 2022-05-06 09:09:50.094688 < X-GitHub-Request-Id: E3E7:3DA7:4CD618:744035:6274674D 2022-05-06 09:09:50.094688 < 2022-05-06 09:09:50.094688 { [984 bytes data] 2022-05-06 09:09:50.094688 { [12924 bytes data] 2022-05-06 09:09:50.268697 { [5744 bytes data] 2022-05-06 09:09:50.268697 { [3106 bytes data] 2022-05-06 09:09:50.268697 * Connection #0 to host pycurl.io left intact Performance report: ----------------------------------------------------------------------- EFFECTIVE_URL : http://pycurl.io/docs/latest/index.html RESPONSE_CODE : 200 SIZE_DOWNLOAD : 22758.0 NAMELOOKUP_TIME : 0.003169 CONNECT_TIME : 0.177756 0.174587 APPCONNECT_TIME : 0.0 -0.177756 PRETRANSFER_TIME : 0.17809 0.17809 STARTTRANSFER_TIME : 0.356996 0.17890599999999998 TOTAL_TIME : 0.531489 0.174493 REDIRECT_TIME : 0.0
いいね。