- Run the foundry dev tools
- Download datasets from Foundry
- Keep copies of datasets compressed as zip
- Provide an API to download datasets (via docker network)
- Config file for mapping names to dataset RIDs
- Redis channel for communication with docker network
- API for dataset management / provisioning
- Docker network for communication with other containers
/dataset/versions
- Returns all available versions of a dataset/dataset/download
- Trigger the download of a dataset from the Foundry/dataset/unzip
- Trigger unzip of one or multiple datasets/dataset/zip
- Trigger zip of one or multiple datasets/dataset/delete_unzipped
- Trigger deletion of one or multiple unzipped dataset files
/dataset/delete_zipped
- Trigger deletion of one or multiple zipped dataset files/dataset/delete
- Trigger deletion of dataset (both zipped and unzipped files)/dataset/list
- Returns a list of all available datasets and their versions/dataset/info
- Returns information about one or multiple datasets
-
External API requests /dataset_version for a dataset (name is given by the API, an mapping inside the foundry-dev-tools container is going to map the RID to it)
-
External API checks whether the dataset is new enough.
If new enough:
- subscribe to the redis channel for dataset zip updates
- request /dataset_unzip for the dataset
- wait for the unzip completion message
If not new enough:
- subscribe to the redis channel for dataset downloads
- request /dataset_download for the dataset
- wait for the download completion message
- request /dataset_get for the dataset
- request /dataset_zip for the dataset
- request /dataset_delete_raw for the dataset
-
Continue with the operation of converting the DB to the database and so on
Step 1: Create a .secrets
directory and add a foundry_dev_tools.toml
file with the following content. This file is passed to the foundry-dev-tools
as your configuration.
[credentials]
domain="palantir.company.com"
jwt="eyJhbGciOiJIUzI1NiIs..."
Step 2: Also add a foundry_datasets.toml
file to the .secrets
directory, which maps names to dataset RIDs. Within the whole container datasets will always be referred to by their names, so this is essential. You only need to map the UUID part of the RID, ri.foundry.main.dataset.
will automatically be added in front of the UUID.
"prefix"="ri.foundry.main.dataset."
[datasets]
"Customer Demographics"="12a3b4c5-d6e7-8f90-1a2b-3c4d5e6f7g8h"
"Transaction History"="98f7e6d5-c4b3-2a10-9f8e-7d6c5b4a3210"
"Product Catalog"="a1b2c3d4-e5f6-7890-a1b2-c3d4e5f67890"
"Supplier Inventory"="z9y8x7w6-v5u4-3210-z9y8-x7w6v5u4t3s2"
Step 3: Add the following parts to your docker-compose.yml
:
# Network to connect with other containers and the internet, if not exposing a port
networks:
api--fdt-container_net:
driver: bridge
# Secrets for the config and datasets
secrets:
fdt_config:
file: .secrets/foundry_dev_tools.toml
fdt_datasets:
file: .secrets/foundry_datasets.toml
# The Foundry DevTools Container
services:
fdt-container:
container_name: project-fdt-container
build:
context: ./foundry-dev-tools-container
dockerfile: Dockerfile
restart: always
networks: # internal port 8000 - only expose a port if you want to access it outside of the Docker network
- api--fdt-container_net
volumes:
- ./foundry-dev-tools-container/t3_code:/app/t3_code # allow for code adjustments
- ./foundry-dev-tools-container/datasets:/app/datasets # persistent dataset storage
- ./foundry-dev-tools-container/.vscode-server:/root/.vscode-server # faster access to the container
environment:
- PYTHONPATH=/app/t3_code
secrets:
- fdt_config
- fdt_datasets
stop_grace_period: 0s
The following script can be used to test the API endpoints of the Foundry DevTools Container. Make sure to execute this within a Docker network which has access to the project-fdt-container
service - if you have no port exposed.
import asyncio
import websockets
import json
DATASET_URL = "ws://project-fdt-container:8000/dataset/get" # If you expose the port: ws://localhost:8000/dataset/get
DATASET_NAMES = ["Customer Demographics", "Transaction History"]
async def test_websocket():
try:
async with websockets.connect(DATASET_URL) as websocket:
print("Connected to WebSocket")
# Send initial request with DATASET_NAMES
initial_request = {"names": DATASET_NAMES}
await websocket.send(json.dumps(initial_request))
print(f"Sent initial request: {initial_request}")
# Listen for responses
async for message in websocket:
response = json.loads(message)
print(f"Received: {response}")
# type final marks the last message in the stream
if response.get("type") == "final":
break
except Exception as e:
print(f"Error: {e}")
asyncio.run(test_websocket())