Skip to content

HIBP Downloader documentation

pypi python build tests docs license

This is a CLI tool to efficiently download a local copy of the pwned password hash data from the very awesome HIBP pwned passwords api-endpoint using multiprocessing, async-processes, local-caching, content-etags and http2-connection pooling to make things as fast as (seems) Pythonly possible.

Features

  • Only download hash-prefix content blocks when the hash-prefix block content has changed.
  • Start, stop and re-start the data-collection process without loss of data already collected.
  • Ability to query clear text values and return results from the pwned password data set.
  • Generate a single text file with pwned password hash values in-order, similar to PwnedPasswordsDownloader from the HIBP team.
  • Per prefix file metadata in JSON format for easy data reuse.

Install

pip install --upgrade hibp-downloader

Usage

screenshot-help.png

Performance

Sample download activity log; host with 12 cores on 45Mbit/s DSL connection.

2023-07-31T03:22:45+1000 | INFO | hibp-downloader | prefix=e585f source=[lc:265201 et:0 rc:722148 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~71005H/s] runtime=2.33hr download=11748.0MB
2023-07-31T03:22:48+1000 | INFO | hibp-downloader | prefix=e5877 source=[lc:265201 et:0 rc:722268 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70998H/s] runtime=2.33hr download=11750.0MB
2023-07-31T03:22:50+1000 | INFO | hibp-downloader | prefix=f5837 source=[lc:265201 et:0 rc:722388 ro:3 xx:0] runtime_rate=[11.2MBit/s 86req/s ~70992H/s] runtime=2.33hr download=11751.9MB

  • 86 requests per second to api.pwnedpasswords.com
  • 265,201 prefix files from (lc) local-cache; 722,388 from (rc) remote-cache; 3 from (ro) remote-origin; 0 failed (xx) download
  • estimated ~70k hash values downloaded per second
  • 11.5GB (11,751MB) downloaded in 2.3 hours (full dataset is ~3.5 hours)

Project

All rights reserved.

License

  • BSD-3-Clause - see LICENSE file for details.