|
| 1 | +# Saving a web page to an HTTP Archive |
| 2 | + |
| 3 | +An HTTP Archive file captures the full details of a series of HTTP requests and responses as JSON. |
| 4 | + |
| 5 | +The `shot-scraper har` command can save a `*.har.zip` file that contains both that JSON data and the content of any assets that were loaded by the page. |
| 6 | +```bash |
| 7 | +shot-scraper har https://datasette.io/ |
| 8 | +``` |
| 9 | +This will save to `datasette-io.har.zip`. You can use `-o` to specify a filename: |
| 10 | +```bash |
| 11 | +shot-scraper har https://datasette.io/tutorials/learn-sql \ |
| 12 | + -o learn-sql.har.zip |
| 13 | +``` |
| 14 | +You can view the contents of a HAR file using `unzip -l`: |
| 15 | +```bash |
| 16 | +unzip -l datasette-io.har.zip |
| 17 | +``` |
| 18 | +``` |
| 19 | +Archive: datasette-io.har.zip |
| 20 | + Length Date Time Name |
| 21 | +--------- ---------- ----- ---- |
| 22 | + 39067 02-13-2025 10:33 41824dbd0c51f584faf0e2c4e88de01b8a5dcdcd.html |
| 23 | + 4052 02-13-2025 10:33 34972651f161f0396c697c65ef9aaeb2c9ac50c4.css |
| 24 | + 2501 02-13-2025 10:33 9f612e71165058f0046d8bf8fec12af7eb15f39d.css |
| 25 | + 10916 02-13-2025 10:33 2737174596eafba6e249022203c324605f023cdd.svg |
| 26 | + 5557 02-13-2025 10:33 427504aa6ef5a8786f90fb2de636133b3fc6d1fe.js |
| 27 | + 1393 02-13-2025 10:33 25c68a82b654c9d844c604565dab4785161ef697.js |
| 28 | + 1170 02-13-2025 10:33 31c073551ef5c84324073edfc7b118f81ce9a7d2.svg |
| 29 | + 1158 02-13-2025 10:33 1e0c64af7e6a4712f5e7d1917d9555bbc3d01f7a.svg |
| 30 | + 1161 02-13-2025 10:33 ec8282b36a166d63fae4c04166bb81f945660435.svg |
| 31 | + 3373 02-13-2025 10:33 5f85a11ef89c0e3f237c8e926c1cb66727182102.svg |
| 32 | + 1134 02-13-2025 10:33 3b9d8109b919dfe9393dab2376fe03267dadc1f1.svg |
| 33 | + 31670 02-13-2025 10:33 469f0d28af6c026dcae8c81731e2b0484aeac92c.jpeg |
| 34 | + 1157 02-13-2025 10:33 b7786336bfce38a9677d26dc9ef468bb1ed45e19.svg |
| 35 | + 50494 02-13-2025 10:33 har.har |
| 36 | +--------- ------- |
| 37 | + 154803 14 files |
| 38 | +``` |
| 39 | + |
| 40 | +## `shot-scraper har --help` |
| 41 | + |
| 42 | +Full `--help` for this command: |
| 43 | + |
| 44 | +<!-- [[[cog |
| 45 | +import cog |
| 46 | +from shot_scraper import cli |
| 47 | +from click.testing import CliRunner |
| 48 | +runner = CliRunner() |
| 49 | +result = runner.invoke(cli.cli, ["har", "--help"]) |
| 50 | +help = result.output.replace("Usage: cli", "Usage: shot-scraper") |
| 51 | +cog.out( |
| 52 | + "```\n{}\n```\n".format(help.strip()) |
| 53 | +) |
| 54 | +]]] --> |
| 55 | +``` |
| 56 | +Usage: shot-scraper har [OPTIONS] URL |
| 57 | +
|
| 58 | + Record a HAR file for the specified page |
| 59 | +
|
| 60 | + Usage: |
| 61 | +
|
| 62 | + shot-scraper har https://datasette.io/ |
| 63 | +
|
| 64 | +Options: |
| 65 | + -a, --auth FILENAME Path to JSON authentication context file |
| 66 | + -o, --output FILE HAR filename |
| 67 | + --timeout INTEGER Wait this many milliseconds before failing |
| 68 | + --log-console Write console.log() to stderr |
| 69 | + --fail Fail with an error code if a page returns an HTTP error |
| 70 | + --skip Skip pages that return HTTP errors |
| 71 | + --bypass-csp Bypass Content-Security-Policy |
| 72 | + --auth-password TEXT Password for HTTP Basic authentication |
| 73 | + --auth-username TEXT Username for HTTP Basic authentication |
| 74 | + --help Show this message and exit. |
| 75 | +``` |
| 76 | +<!-- [[[end]]] --> |
0 commit comments