WindowsでPycURLを使う

PycURL – A Python Interface To The cURL library — PycURL 7.45.1 documentation

PythonでHTTP関係の処理をするならRequestsなのですが、PycURLはlibcurlベースで高速だったり細かい処理ができたりして便利なこともあるので使ってみます。

残念ながらWindows用のバイナリは公式に存在しないそうです。かといってソースからbuildするのもなあ。

検索してみるとこちらのサイトでバイナリを配布してるようなので使ってみます。

Python Extension Packages for Windows - Christoph Gohlke

自分の環境はこれなので "pycurl‑7.45.1‑cp38‑cp38‑win_amd64.whl" をダウンロードします。

$ python -VV
Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]
$ pip install pycurl-7.45.1-cp38-cp38-win_amd64.whl

とりあえず動くサンプル。

from datetime import datetime, timedelta
import pycurl
from io import BytesIO

def report(curl):
    print("Performance report:")
    print("-----------------------------------------------------------------------")
    print("EFFECTIVE_URL      : {}".format(curl.getinfo(pycurl.EFFECTIVE_URL)))
    print("RESPONSE_CODE      : {}".format(curl.getinfo(pycurl.RESPONSE_CODE)))
    print("SIZE_DOWNLOAD      : {}".format(curl.getinfo(pycurl.SIZE_DOWNLOAD)))

    print("NAMELOOKUP_TIME    : {}".format(curl.getinfo(pycurl.NAMELOOKUP_TIME)))
    print("CONNECT_TIME       : {} {}".format(curl.getinfo(pycurl.CONNECT_TIME), curl.getinfo(pycurl.CONNECT_TIME)-curl.getinfo(pycurl.NAMELOOKUP_TIME)))
    # APPCONNECT : ssl_handshake_done
    print("APPCONNECT_TIME    : {} {}".format(curl.getinfo(pycurl.APPCONNECT_TIME), curl.getinfo(pycurl.APPCONNECT_TIME)-curl.getinfo(pycurl.CONNECT_TIME)))
    # Time to HTTP GET done
    # https://curl.se/libcurl/c/CURLINFO_PRETRANSFER_TIME.html
    print("PRETRANSFER_TIME   : {} {}".format(curl.getinfo(pycurl.PRETRANSFER_TIME), curl.getinfo(pycurl.PRETRANSFER_TIME)-curl.getinfo(pycurl.APPCONNECT_TIME)))
    # STARTTRANSFER : TTFB(time to first byte)
    print("STARTTRANSFER_TIME : {} {}".format(curl.getinfo(pycurl.STARTTRANSFER_TIME), curl.getinfo(pycurl.STARTTRANSFER_TIME)-curl.getinfo(pycurl.PRETRANSFER_TIME)))
    print("TOTAL_TIME         : {} {}".format(curl.getinfo(pycurl.TOTAL_TIME), curl.getinfo(pycurl.TOTAL_TIME)-curl.getinfo(pycurl.STARTTRANSFER_TIME)))
    print("REDIRECT_TIME      : {}".format(curl.getinfo(pycurl.REDIRECT_TIME)))
    print()

def _curl_debug(type, data):
    # CURLINFO_TEXT = 0,
    # CURLINFO_HEADER_IN,    /* 1 */
    # CURLINFO_HEADER_OUT,   /* 2 */
    # CURLINFO_DATA_IN,      /* 3 */
    # CURLINFO_DATA_OUT,     /* 4 */
    # CURLINFO_SSL_DATA_IN,  /* 5 */
    # CURLINFO_SSL_DATA_OUT, /* 6 */

    type_str = ('*', '<', '>', '{', '}', '<<', '>>')
    msg = None
    if type == 3 or type == 4:
        msg = "[{} bytes data]".format(len(data))
    else:
        msg = data.decode('utf-8').strip()

    print("{} {} {}".format(datetime.now(), type_str[type], msg))

buffer = BytesIO()
curl = pycurl.Curl()
curl.setopt(pycurl.URL, 'http://pycurl.io/docs/latest/index.html')
curl.setopt(pycurl.WRITEDATA, buffer)
curl.setopt(pycurl.FOLLOWLOCATION, True)
curl.setopt(pycurl.VERBOSE, True)
curl.setopt(pycurl.DEBUGFUNCTION, _curl_debug)
curl.perform()
print()
report(curl)
curl.close()
$ python a.py
2022-05-06 09:09:49.740667 * Trying 192.30.252.154:80...
2022-05-06 09:09:49.914677 * Connected to pycurl.io (192.30.252.154) port 80 (#0)
2022-05-06 09:09:49.914677 > GET /docs/latest/index.html HTTP/1.1
Host: pycurl.io
User-Agent: PycURL/7.45.1 libcurl/7.80.0 Schannel zlib/1.2.11 zstd/1.5.2 c-ares/1.18.1 libssh2/1.10.0
Accept: */*
2022-05-06 09:09:50.093687 * Mark bundle as not supporting multiuse
2022-05-06 09:09:50.094688 < HTTP/1.1 200 OK
2022-05-06 09:09:50.094688 < Server: GitHub.com
2022-05-06 09:09:50.094688 < Date: Fri, 06 May 2022 00:09:49 GMT
2022-05-06 09:09:50.094688 < Content-Type: text/html; charset=utf-8
2022-05-06 09:09:50.094688 < Content-Length: 22758
2022-05-06 09:09:50.094688 < Vary: Accept-Encoding
2022-05-06 09:09:50.094688 < Last-Modified: Sun, 13 Mar 2022 07:25:32 GMT
2022-05-06 09:09:50.094688 < Vary: Accept-Encoding
2022-05-06 09:09:50.094688 < Access-Control-Allow-Origin: *
2022-05-06 09:09:50.094688 < ETag: "622d9c6c-58e6"
2022-05-06 09:09:50.094688 < expires: Fri, 06 May 2022 00:19:49 GMT
2022-05-06 09:09:50.094688 < Cache-Control: max-age=600
2022-05-06 09:09:50.094688 < Accept-Ranges: bytes
2022-05-06 09:09:50.094688 < x-proxy-cache: MISS
2022-05-06 09:09:50.094688 < X-GitHub-Request-Id: E3E7:3DA7:4CD618:744035:6274674D
2022-05-06 09:09:50.094688 <
2022-05-06 09:09:50.094688 { [984 bytes data]
2022-05-06 09:09:50.094688 { [12924 bytes data]
2022-05-06 09:09:50.268697 { [5744 bytes data]
2022-05-06 09:09:50.268697 { [3106 bytes data]
2022-05-06 09:09:50.268697 * Connection #0 to host pycurl.io left intact

Performance report:
-----------------------------------------------------------------------
EFFECTIVE_URL      : http://pycurl.io/docs/latest/index.html
RESPONSE_CODE      : 200
SIZE_DOWNLOAD      : 22758.0
NAMELOOKUP_TIME    : 0.003169
CONNECT_TIME       : 0.177756 0.174587
APPCONNECT_TIME    : 0.0 -0.177756
PRETRANSFER_TIME   : 0.17809 0.17809
STARTTRANSFER_TIME : 0.356996 0.17890599999999998
TOTAL_TIME         : 0.531489 0.174493
REDIRECT_TIME      : 0.0

いいね。