This project is the core of a Science Data System.
Our goal with the project is that users will only need to modify the file config.json to define the data products stored on the SDS, and the rest should be mission agnostic.
- AWS CLI download link
- nodejs download link
- Docker download link
The code in this repository takes the form of an AWS CDK project. It provides the architecture for:
- An HTTPS API to upload files to an S3 bucket (in development)
- An S3 bucket to contain uploaded files
- An HTTPS API to query and download files from the S3 bucket (in development)
- A lambda function that inserts file metadata into an opensearch instance
- A Cognito User Pool that keeps track of who can access the restricted APIs.
The development environment uses a GitHub codespace, to ensure that we're all using the proper libraries as we develop and deploy.
Everyone gets 50 free hours per month of github Codespace time. Alternatively, your organization can pay for it to run longer than this.
To start a new development environment, click the button for "Code" in the upper right corner of the repository, and click "Codespaces".
If you are running locally, you will need to install cdk and poetry.
If you're running locally, you can install the Python requirements with Poetry.
To setup versioning
poetry self add poetry-dynamic-versioning
To install without the extras
poetry install
To install all extras
poetry install --all-extras
This will install the dependencies from poetry.lock
, ensuring that consistent versions are used. Poetry also provides a virtual environment, which you will have to activate.
poetry shell
If running in codespaces, this should already be done.
You may also need to set the CDK_DEFAULT_ACCOUNT
environment variable.
NOTE-- For new AWS users, you'll need to make certain the AWS Cloud Development Kit is installed:
nvm use <version>
npm install -g aws-cdk
NOTE-- If this is a brand-new AWS account (IMPORTANT: new account, not new user), then you'll need to bootstrap your account to allow CDK deployment with the command:
cdk bootstrap
If you get errors with the 'cdk bootstrap' command, running with -v
will provide more information.
Codespaces actually comes with a fully functional virtual desktop. To open, click on the "ports" tab and then "open in new browser". The default password is "vscode".
Inside of the "scripts" folder is a python script you can use to call the APIs. It is completely independent of the rest of the project, so you should be able to pull this single file out and run it anywhere. It only depends on basic python libraries.
Unfortunately right now you need to "hard code" in the lambda API URL and the Cognito App Client at the top of the file after every build. I'm hoping in the future to determine a better way to automate this.
There are some things that may need to be run by administrators or updated outside of infrastructure as code. These are documented here to help future admins with the steps required.
There are public endpoints and private/team endpoints. The private/team endpoints are meant to allow IMAP team members access to data files that may not be publicly released, get access to instrument job logs, etc.
The public urls are all located at the root level, with private authorization urls have an additional path prefix on the front to indicate it is a restricted endpoint. Examples:
/query
: URL to query for publicly released files/authorized/query
: URL to query for private/team files in addition to public files. Uses oauth2 style authentication, which is primarily used by humans on the team websites./api-key/query
: URL to query for private/team files in addition to public files. Uses API keys for authentication, which is primarily used by automated scripts at team institutions for data access.
Management of these keys is done through a script located at sds_data_manager/lambda_code/authorization
.
That script can add, remove, and list the current keys. To add keys, add the name and e-mail of the associated
user or account and get returned an API Key that you can then give to the external user for access.