A command-line tool to extract data from Armenia's Statistical Committee database (https://statbank.armstat.am/pxweb/en/).
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txtList all available tables (indicators) in the database:
python3 -m statbank_parser.cli list-indicators > indicators.jsonlOutput is in JSONL format (one JSON object per line) with fields:
id: Unique table identifier (e.g., "EF-NA-01")title: Table titleurl: Direct URL to the tabletopic: Main categorysubtopics: List of subcategories
See what file formats are available for a specific table:
python3 -m statbank_parser.cli list-formats --url "TABLE_URL"Example:
python3 -m statbank_parser.cli list-formats --url "https://statbank.armstat.am/pxweb/en/ArmStatBank/ArmStatBank__1%20Econnomy%20and%20finance__15%20National%20Accounts/EF-NA-01.px/"Extract data from a table:
python3 -m statbank_parser.cli get-data --url "TABLE_URL" --output data.csvExample:
python3 -m statbank_parser.cli get-data --url "https://statbank.armstat.am/pxweb/en/ArmStatBank/ArmStatBank__1%20Econnomy%20and%20finance__15%20National%20Accounts/EF-NA-01.px/" --output national_accounts.csvFor JSON output:
python3 -m statbank_parser.cli get-data --url "TABLE_URL" --output data.json- ✅ List Indicators: Recursively traverses the database to find all tables
- ✅ Extract Metadata: Captures table ID, topic, and subtopic hierarchy
- ✅ List Formats: Shows available download formats for any table
- ✅ Data Extraction: Parses HTML tables and exports to CSV or JSON
- ✅ JSONL Output: Efficient line-by-line JSON format for indicators
The tool scrapes the PXWeb interface by:
- Traversing the Tree: Simulates ASP.NET
__doPostBackevents to navigate categories - Extracting Data: Fetches table pages, extracts
__VIEWSTATE, selects all variables, and submits forms - Parsing Results: Uses BeautifulSoup and Pandas to parse HTML tables
Run the test suite:
python -m unittest discover tests -vSee tests/README.md for more details.
- The
--formatoption for downloading specific file formats (Excel, PX, etc.) is not currently functional due to the complexity of the ASP.NET postback mechanism - The default HTML parsing works correctly and provides all data in CSV/JSON format
- Traversing all indicators can take several minutes due to the large number of tables
statbank-parser/
├── statbank_parser/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ └── scraper.py # Web scraping logic
├── tests/
│ ├── __init__.py
│ ├── test_cli.py # CLI tests
│ └── test_scraper.py # Scraper tests
├── requirements.txt
├── setup.py
└── README.md
MIT