Skip to content

ruoyitalk/payload

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

中文文档:README_zh.md

payload

ruoyi_talk · v1.0.0 · MIT

Downloads the articles you actually care about.

payload is stage 2 of the ruoyi_talk knowledge pipeline. It reads the article queue produced by an upstream source (e.g. fairing), fetches full content locally — PDFs for papers, Markdown for blog posts — and manages a searchable local knowledge base. An interactive shell lets you browse, fetch, star, and open articles without leaving the terminal.

upstream (fairing / any producer)
  └─ QUEUE_DIR/payload_queue.json

payload (this project)
  ├─ reads   QUEUE_DIR/payload_queue.json
  ├─ tracks  PAYLOAD_DATA_DIR/downloaded.jsonl
  │                          /failed.jsonl
  └─ writes  KNOWLEDGE_DIR/
               <slug>.pdf / .md
               preferred/<slug>.pdf / .md
               index.json

Quick Start

git clone https://github.com/JiekerTime/payload.git
cd payload

# macOS / Linux
python3 -m venv .venv && source .venv/bin/activate
./run.sh

# Windows
.\run.bat

run.sh / run.bat create the virtualenv, install dependencies, and launch the interactive shell automatically.


Configuration (.env)

Copy .env.example to .env and set at minimum QUEUE_DIR:

Variable Required Default Description
QUEUE_DIR Yes Directory containing payload_queue.json
PAYLOAD_DATA_DIR No ~/Documents/payload payload's own state files
KNOWLEDGE_DIR No ~/files/OneDrive/ruoyi_knowledge Downloaded articles and index
FIRECRAWL_API_KEY No Enables high-fidelity web → Markdown (optional)

Shell Commands

Launch the shell: python main.py (or ./run.sh)

Non-interactive: python main.py run [--dry-run]

Shortcut Command Description
\r run [--dry-run] Fetch all pending articles from the queue
\ls list Browse queue with pagination; [1-N] to fetch, [q] to quit
\fs search <kw…> Filter queue by title keyword; select to fetch
\dl download <id> Fetch a queued article by its 16-char article ID
\rt retry Re-fetch any previously fetched article (overwrite)
\f failed Show all failed fetch attempts
\l log [N] Show fetch history (default: last 20)
\i index [kw] Browse knowledge index · [N] star/unstar · o[N] open · d[N] delete
\fv fav Show preferred (★) articles
\o open <id> Open a downloaded article with the system default app
\st stats Knowledge base and queue statistics
\e env [set K V] View or update .env variables
\li license Show MIT license
\h / \? shortcuts Show this help
\q quit Exit

Knowledge index interactions (\i)

Inside \i and \fv:

Input Action
[1-N] Toggle ★ preferred for that row
o[N] Open file with system default app
d[N] Delete article (file + index entry); confirm required
[n] / [p] Next / previous page
[q] Quit

Marking an article as preferred moves its file to KNOWLEDGE_DIR/preferred/. Unmarking moves it back to KNOWLEDGE_DIR/.


URL Handlers

Domain Output Notes
arxiv.org/abs/* PDF Full paper
arxiv.org/pdf/* PDF Direct PDF
Any web page Markdown Firecrawl if key set, else requests + markdownify

Web pages: footnote anchor links are rewritten to local #anchor references. Images are downloaded to <article_id>/images/ and paths updated in the Markdown.


Adding a Handler

# payload/handlers/myhandler.py
from payload.handlers.base import BaseHandler, DownloadResult
from pathlib import Path

class MyHandler(BaseHandler):
    patterns = ["example.com"]

    def download(self, url: str, dest_dir: Path, article: dict) -> DownloadResult:
        ...
        return DownloadResult(
            article_id=article["article_id"],
            url=url, path=dest, source="example", format="md",
        )

Register in payload/router.py before WebHandler.


Running Tests

pip install pytest
pytest tests/ -v

License

MIT © JiekerTime (若呓)

About

From signal to substance — fetches full article content from the queue into a searchable local knowledge base: PDFs for papers, Markdown for the web.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors