This script exports data from a Postgres database to a Hugging Face dataset in Parquet format.
-
Install dependencies:
Navigate to this directory and install the required Python packages.
pip install -r requirements.txt
-
Set environment variables:
The script uses environment variables for database credentials. You can set them in your shell or use a
.env
file.export DB_USER="your_db_user" export DB_PASSWORD="your_db_password" export DB_HOST="localhost" export DB_PORT="5432" export DB_NAME="your_db_name"
Run the script from the root of the repository:
python export.py
The script will create a directory at the specified output path containing the dataset in Parquet format. If --output_dir
is not provided, it will save to dataset
in the current working directory.