|
1 | | -# SEL-735 Meter Event Data Pipeline |
| 1 | +# SEL-735 Meter and SCADA Data Pipeline |
2 | 2 |
|
3 | | -This repository contains a set of Bash scripts that make up a data pipeline, designed to automate the process of interacting with an SEL-735 meter. The pipeline is divided into two main executable scripts: |
| 3 | +This repository contains a set of Bash scripts designed to automate the retrieval and organization of event data from SEL-735 meters, the synchronization of SCADA data between directories, and the archival of data to a dedicated remote server. |
4 | 4 |
|
5 | | -1. **`data_pipeline.sh`**: Handles the first four steps: |
6 | | - - Connecting to the meter via FTP |
7 | | - - Downloading new files |
8 | | - - Organizing and creating metadata |
9 | | - - Compressing data |
| 5 | +## Pipeline Overview |
| 6 | +Each of the following scripts are executed seperately and have their own config file. |
10 | 7 |
|
11 | | -2. **`archive_pipeline.sh`**: Handles the final step: |
12 | | - - Archiving and transferring event data to the dedicated server. |
| 8 | +1. **`data_pipeline.sh`** |
| 9 | + |
| 10 | + Handles fetching and organizing raw event data from SEL-735 meters via FTP: |
| 11 | + - Connects to the meter |
| 12 | + - Downloads new event data |
| 13 | + - Organizes directory structure and creates metadata |
| 14 | + - Adds checksums |
| 15 | + - Compresses raw data into `.zip` |
| 16 | + - Generates `.message` file to be ingest by [data-streams-das-mqtt-pub](https://github.com/acep-uaf/data-streams-das-mqtt-pub) |
| 17 | + |
| 18 | +1. **`sync-scada-data.sh`** |
| 19 | + |
| 20 | + Synchronizes SCADA data from a source directory to a destination directory: |
| 21 | + - Supports syncing data over a configurable number of past months |
| 22 | + - **TO DO**: Exclude current days data to avoid syncing partially written files. |
13 | 23 |
|
| 24 | +1. **`archive_pipeline.sh`** |
14 | 25 |
|
15 | | -## Prerequisites |
16 | | -Ensure you have the following before running the pipeline: |
17 | | -- Unix-like environment (Linux, macOS, or a Unix-like Windows terminal) |
18 | | -- FTP credentials for the meter |
19 | | -- Meter Configuration |
20 | | -- Must have installed: |
21 | | - - `lftp` |
22 | | - - `yq` |
23 | | - - `zip` |
24 | | - - `rsync` |
25 | | - - `jq` |
| 26 | + Transfers downloaded and processed meter data to a dedicated server: |
| 27 | + - Uses `rsync` to transfer data to remote server |
| 28 | + - Automatically triggers a cleanup script if enabled via config |
26 | 29 |
|
27 | 30 | ## Installation |
| 31 | + |
| 32 | +1. Ensure you have the following before running the pipeline: |
| 33 | + - Unix-like environment (Linux, macOS, or a Unix-like Windows terminal) |
| 34 | + - FTP credentials for the meter |
| 35 | + - Meter Configuration |
| 36 | + - Must have installed: `lftp`, `yq`, `zip`, `rsync`, `jq` |
| 37 | + |
28 | 38 | 1. Clone the repository: |
29 | 39 |
|
30 | 40 | ```bash |
31 | 41 | git clone git@github.com:acep-uaf/camio-meter-streams.git |
32 | 42 | cd camio-meter-streams/cli_meter |
33 | 43 | ``` |
34 | 44 |
|
35 | | - **Note**: You can check your SSH connection with `ssh -T git@github.com` |
36 | | - |
37 | 45 | ## Configuration |
38 | 46 |
|
39 | | -### General Configuration Steps |
40 | | -1. Navigate to the `config` directory and copy the example configuration files to a new file: |
| 47 | +Each script uses its own YAML configuration file located in the `config/` directory. |
| 48 | + |
| 49 | +1. **Navigate to the config directory and copy the example configuration files:** |
41 | 50 |
|
42 | 51 | ```bash |
43 | 52 | cd config |
44 | 53 | cp config.yml.example config.yml |
45 | 54 | cp archive_config.yml.example archive_config.yml |
| 55 | + cp scada_config.yml.example scada_config.yml |
46 | 56 | ``` |
47 | 57 |
|
48 | | -2. **Update** the configuration files with the target details: |
49 | | - - **`config.yml`**: Add the FTP server credentials and meter configuration data. |
50 | | - - **`archive_config.yml`**: Add the source and destination directories and other relevant details. |
| 58 | +1. **Update each configuration file** |
| 59 | + - `config.yml` — used by `data_pipeline.sh` |
| 60 | + - `archive_config.yml` — used by `archive_pipeline.sh` |
| 61 | + - `scada_config.yml` — used by `sync-scada-data.sh` |
51 | 62 |
|
52 | | -3. Secure the configuration files so that only the owner can read and write: |
| 63 | +1. **Secure the configuration files** |
53 | 64 |
|
54 | 65 | ```bash |
55 | | - chmod 600 config.yml |
56 | | - chmod 600 archive_config.yml |
| 66 | + chmod 600 config.yml archive_config.yml scada_config.yml |
57 | 67 | ``` |
| 68 | +## Usage |
58 | 69 |
|
59 | | -## Execution |
60 | | -To run the data pipeline and then transfer data to the target server: |
| 70 | +This pipeline can be used in two ways: |
| 71 | +1. **Manually**, by executing the scripts directly from the command line |
| 72 | +1. **Automatically**, by running it as a scheduled systemd service managed through Chef |
61 | 73 |
|
62 | | -1. **Run the Data Pipeline** |
| 74 | +### Automated Execution via systemd and Chef |
63 | 75 |
|
64 | | - Execute the `data_pipeline` script from the `cli_meter` directory. The script requires a configuration file specified via the `-c/--config` flag. If this is your first time running the pipeline, the initial download may take a few hours. To pause the download safely, see: [How to Stop the Pipeline](#how-to-stop-the-pipeline) |
| 76 | +In production environments, each pipeline script is run automatically using a dedicated `systemd` **service** and **timer** pair, configured through custom default attributes defined in the Chef cookbook. |
65 | 77 |
|
66 | | - ### Command |
67 | | - |
68 | | - ```bash |
69 | | - ./data_pipeline.sh -c config/config.yml |
70 | | - ``` |
| 78 | +Each configuration file has a corresponding Chef data bag that defines its values. All configuration data is centrally managed through Chef data bags and vaults. To make changes, update the appropriate Chef-managed data bags and cookbooks. |
71 | 79 |
|
72 | | -2. **Run the Archive Pipeline** |
| 80 | +**Cookbooks**: |
| 81 | +- [acep-camio-streams](https://github.com/acep-devops/acep-camio-streams/tree/main) - installs and configures the server. |
| 82 | +- [acep-devops-chef](https://github.com/acep-devops/acep-devops-chef/tree/main) |
73 | 83 |
|
74 | | - After the `data_pipeline` script completes, execute the `archive_pipeline` script from the `cli_meter` directory. The script requires a configuration file specified via the `-c/--config` flag. |
| 84 | +### Manual Execution |
| 85 | +To run the data pipeline and then transfer data to the target server: |
75 | 86 |
|
76 | | - ### Command |
| 87 | +1. **Data Pipeline (Event Data)** |
| 88 | + ```sh |
| 89 | + ./data_pipeline.sh -c config/config.yml |
| 90 | + ``` |
| 91 | +1. **Sync SCADA Data** |
| 92 | + ```sh |
| 93 | + ./sync-scada-data.sh -c config/scada-sync.yml |
| 94 | + ``` |
77 | 95 |
|
78 | | - ```bash |
| 96 | +1. **Archive Pipeline** |
| 97 | + ```sh |
79 | 98 | ./archive_pipeline.sh -c config/archive_config.yml |
80 | 99 | ``` |
81 | | - #### Notes |
82 | | - The **rsync** uses the `--exclude` flag to exclude the `working` directory to ensure only complete files are transfered. |
83 | | - |
84 | | -3. **Run the Cleanup Process (Conditional)** |
| 100 | + **Note:** `rsync` uses the `--exclude` flag to exclude the `working/` directory to ensure only complete files are transfered. |
85 | 101 |
|
86 | | - If the `archive_pipeline` script completes successfully and the `enable_cleanup` flag is set to true in the archive configuration file, the `cleanup.sh` script will be executed automatically. This script removes outdated event files based on the retention period specified in the configuration file. |
| 102 | +1. **Run the Cleanup Process (Conditional)** |
| 103 | + The cleanup script removes outdated event files based on the retention period specified in the configuration file. |
87 | 104 |
|
88 | | - If the `enable_cleanup` flag is not enabled, you can run the cleanup manually by passing in the archive configuration file. |
89 | | - |
90 | | - ### Command |
| 105 | + If `enable_cleanup` is set to `true` in `archive_config.yml`, `cleanup.sh` runs automatically after `archive_pipeline.sh`. |
| 106 | + |
| 107 | + Otherwise, you can run it manually: |
91 | 108 |
|
92 | 109 | ```bash |
93 | 110 | ./cleanup.sh -c config/archive_config.yml |
94 | 111 | ``` |
95 | 112 |
|
96 | | - #### Notes |
97 | | - Ensure that the `archive_config.yml` file is properly configured with the retention periods for each directory in the cleanup process. |
| 113 | + **Note:** Ensure `archive_config.yml` specifies retention periods for each directory. |
98 | 114 |
|
99 | 115 | ## How to Stop the Pipeline |
100 | 116 |
|
|
0 commit comments