Skip to content

Commit 074828b

Browse files
authored
Merge pull request #5 from KhaosResearch/v1.1
[On progress] Major changes for v1.1
2 parents fa2d342 + 0a096dc commit 074828b

20 files changed

+366
-786
lines changed

.gitignore

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,22 @@ auxiliar_scripts/
3737
# Images
3838
*.png
3939
images/
40+
!static/*.png
4041

4142
# Rasters
4243
*.tif
4344
*.jp2
4445

4546
*.egg-info
4647

47-
.python-version
48+
.python-version
49+
50+
requirements.txt
51+
*.laz
52+
*.las
53+
*.xml
54+
55+
1.1/
56+
1.2/
57+
58+
*.json

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2022 Khaos Research
3+
Copyright (c) 2024 Khaos Research
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
# LandCoverPy, a scalable land cover classification workflow
1+
# LandCoverPy, a scalable land cover/land use classification workflow
2+
3+
![lebanon_second_level_classification](https://github.com/KhaosResearch/landcoverpy/blob/v1.1/static/lebanon_example.png)
4+
25
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7462308.svg)](https://doi.org/10.5281/zenodo.7462308)
36

47
A scalable land cover classification workflow aimed to be able to scale to cover the Mediterranean bassin.
@@ -12,15 +15,30 @@ A research article describing the methodology followed on this workflow can be f
1215
> Journal of Big Data 10, 91 (2023). doi: [10.1186/s40537-023-00770-z](https://doi.org/10.1186/s40537-023-00770-z)
1316
1417
## Installation
15-
From PyPi:
16-
`$ python -m pip install landcoverpy`
1718

18-
From source:
19-
`$ make install`
19+
Currently, the package is not available on PyPI, so you need to install it from the source code. To do so, you can clone the repository and install it using pip:
20+
21+
```bash
22+
git clone https://github.com/KhaosResearch/landcoverpy.git
23+
cd landcoverpy
24+
pip install .
25+
```
26+
27+
For development purposes, you can install the package in editable mode:
28+
29+
```bash
30+
pip install -e .
31+
```
32+
33+
In the future, the package will be available on PyPI, so you will be able to install it using pip:
34+
35+
```bash
36+
pip install landcoverpy
37+
```
2038

2139
## Usage
2240

23-
Usage examples can be found at the [notebooks](notebooks) folder.
41+
An usage example can be found at the [main usage notebook](notebooks/main_usage.ipynb).
2442

2543
## License
2644
This project is licensed under the MIT license. See the [LICENSE](LICENSE) file for more info.

notebooks/explain_model.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,7 @@
503503
"name": "python",
504504
"nbconvert_exporter": "python",
505505
"pygments_lexer": "ipython3",
506-
"version": "3.10.6"
506+
"version": "3.10.12"
507507
},
508508
"vscode": {
509509
"interpreter": {
Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@
2424
"cell_type": "markdown",
2525
"metadata": {},
2626
"source": [
27-
"### Postprocess generated dataset, called `dataset.csv` (which has been generated in MinIO)"
27+
"### Train models using the generated dataset. \n",
28+
"\n",
29+
"The data will be automatically loaded from MinIO, which is also automatically saved in MinIO when using `workflow(execution_mode=ExecutionMode.TRAINING)`"
2830
]
2931
},
3032
{
@@ -33,19 +35,15 @@
3335
"metadata": {},
3436
"outputs": [],
3537
"source": [
36-
"from landcoverpy.data_postprocessing import postprocess_dataset\n",
37-
"\n",
38-
"input_dataset = \"dataset.csv\"\n",
39-
"land_cover_dataset = \"dataset_postprocessed.csv\"\n",
40-
"\n",
41-
"postprocess_dataset(input_dataset, land_cover_dataset)"
38+
"from landcoverpy.model_training import train_model_land_cover\n",
39+
"train_model_land_cover(n_jobs = 1)"
4240
]
4341
},
4442
{
4543
"cell_type": "markdown",
4644
"metadata": {},
4745
"source": [
48-
"### Train model using postprocessed dataset"
46+
"Now, use the generated model (also saved and loaded from MinIO) to predict a specific tile using the `ExecutionMode.LAND_COVER_PREDICTION` mode."
4947
]
5048
},
5149
{
@@ -54,27 +52,40 @@
5452
"metadata": {},
5553
"outputs": [],
5654
"source": [
57-
"from landcoverpy.model_training import train_model_land_cover\n",
58-
"\n",
59-
"train_model_land_cover(land_cover_dataset, n_jobs = 1)"
55+
"workflow(execution_mode=ExecutionMode.LAND_COVER_PREDICTION, tiles_to_predict=[\"36SYC\"])"
6056
]
6157
},
6258
{
6359
"cell_type": "markdown",
6460
"metadata": {},
6561
"source": [
66-
"### Get the tiles in Spain and classify them using the trained model"
62+
"If you want to make second level predictions, you can train a new model that will learn from SL_PROPERTY labels from specified classes."
6763
]
6864
},
6965
{
7066
"cell_type": "code",
71-
"execution_count": 5,
67+
"execution_count": null,
7268
"metadata": {},
7369
"outputs": [],
7470
"source": [
75-
"from landcoverpy.utilities.aoi_tiles import get_list_of_tiles_in_iberian_peninsula\n",
76-
"\n",
77-
"workflow(execution_mode=ExecutionMode.LAND_COVER_PREDICTION, tiles_to_predict=get_list_of_tiles_in_iberian_peninsula())"
71+
"from landcoverpy.model_training import train_second_level_models\n",
72+
"train_second_level_models(\"dataset.csv\", [\"bareSoil\", \"closedForest\", \"herbaceousVegetation\", \"openForest\", \"shrubland\", \"water\", \"wetland\"], n_jobs=1)"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"Use the mode `ExecutionMode.SECOND_LEVEL_PREDICTION` to predict the second level labels of the tile using the previous land cover predictions. They will be reclassified using new models trained with the SL_PROPERTY labels in a hierarchical way."
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {},
86+
"outputs": [],
87+
"source": [
88+
"workflow(ExecutionMode.SECOND_LEVEL_PREDICTION,tiles_to_predict=[\"36SYC\"])"
7889
]
7990
}
8091
],
@@ -94,7 +105,7 @@
94105
"name": "python",
95106
"nbconvert_exporter": "python",
96107
"pygments_lexer": "ipython3",
97-
"version": "3.10.6"
108+
"version": "3.8.15"
98109
},
99110
"orig_nbformat": 4,
100111
"vscode": {

scripts/cloud_coverage.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import json
33
from datetime import datetime
44
from os.path import join
5-
from typing import List
65
from pymongo.collection import Collection
76

87
import pandas as pd
@@ -97,10 +96,6 @@ def _get_tiles_in_geojson(sentinel_api: SentinelAPI, geojson_path: str):
9796
def compute_cloud_coverage(geojson_path: str):
9897
"""
9998
Computes the cloud coverage for a list of tiles included in a geojson.
100-
101-
Parameters:
102-
countries (List[str]): List of countries included.
103-
10499
"""
105100

106101
mongo_client = MongoConnection()

scripts/download_tiles.py

Lines changed: 0 additions & 36 deletions
This file was deleted.

scripts/execution_times.py

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from landcoverpy.mongo import MongoConnection
1515
from landcoverpy.minio import MinioConnection
1616
from landcoverpy.utilities.aoi_tiles import get_list_of_tiles_in_iberian_peninsula
17-
from landcoverpy.utilities.geometries import _group_polygons_by_tile, _kmz_to_geojson
17+
from landcoverpy.utilities.geometries import _group_validated_data_points_by_tile, _kmz_to_geojson, _csv_to_geojson
1818
from landcoverpy.utilities.utils import get_products_by_tile_and_date, get_season_dict
1919
from landcoverpy.workflow import _process_tile
2020

@@ -66,23 +66,20 @@ def time_training_dataset(client: Client = None):
6666
tiles = get_list_of_tiles_in_iberian_peninsula()
6767
tile = random.choice(tiles)
6868

69-
geojson_files = []
70-
for data_class in glob(join(settings.DB_DIR, "*.kmz")):
71-
if not Path.exists(Path(data_class.replace("kmz","geojson"))):
72-
print(f"Parsing database to geojson: {data_class}")
73-
_kmz_to_geojson(data_class)
69+
data_file = settings.DB_FILE
70+
if data_file.endswith(".kmz"):
71+
data_file = _kmz_to_geojson(data_file)
72+
if data_file.endswith(".csv"):
73+
data_file = _csv_to_geojson(data_file, sep=',')
7474

75-
for data_class in glob(join(settings.DB_DIR, "*.geojson")):
76-
print(f"Working with database {data_class}")
77-
geojson_files.append(data_class)
78-
polygons_per_tile = _group_polygons_by_tile(*geojson_files)
75+
polygons_per_tile = _group_validated_data_points_by_tile(data_file)
7976

8077
metadata_filename = "metadata.json"
81-
metadata_filepath = join(settings.TMP_DIR, settings.LAND_COVER_MODEL_FOLDER, metadata_filename)
78+
metadata_filepath = join(settings.TMP_DIR, "land-cover", metadata_filename)
8279

8380
minio.fget_object(
8481
bucket_name=settings.MINIO_BUCKET_MODELS,
85-
object_name=join(settings.MINIO_DATA_FOLDER_NAME, settings.LAND_COVER_MODEL_FOLDER, metadata_filename),
82+
object_name=join(settings.MINIO_DATA_FOLDER_NAME, "land-cover", metadata_filename),
8683
file_path=metadata_filepath,
8784
)
8885

@@ -129,11 +126,11 @@ def time_predicting_tile(client: Client = None):
129126

130127
# For predictions, read the rasters used in "metadata.json".
131128
metadata_filename = "metadata.json"
132-
metadata_filepath = join(settings.TMP_DIR, settings.LAND_COVER_MODEL_FOLDER, metadata_filename)
129+
metadata_filepath = join(settings.TMP_DIR, "land-cover", metadata_filename)
133130

134131
minio.fget_object(
135132
bucket_name=settings.MINIO_BUCKET_MODELS,
136-
object_name=join(settings.MINIO_DATA_FOLDER_NAME, settings.LAND_COVER_MODEL_FOLDER, metadata_filename),
133+
object_name=join(settings.MINIO_DATA_FOLDER_NAME, "land-cover", metadata_filename),
137134
file_path=metadata_filepath,
138135
)
139136

scripts/jeffreys_matusita.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def jeffreys_matusita_analysis(
108108
)
109109

110110
if is_forest:
111-
class_column = "forest_type"
111+
class_column = settings.SL_PROPERTY
112112
df = df.drop("class",axis=1)
113113
else:
114114
class_column = "class"

setup.cfg

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[metadata]
22
name = landcoverpy
3-
version = 1.0.0
3+
version = 1.1
44
author = Antonio Manuel Burgueño Romero
55
author_email = ambrbr[at]uma.es
66
description = A scalable land cover classification workflow using Big Data techniques with Sentinel-2 data.
@@ -13,7 +13,7 @@ project_urls =
1313
Bug Tracker = https://github.com/KhaosResearch/landcoverpy/issues
1414
classifiers =
1515
Programming Language :: Python :: 3 :: Only
16-
Programming Language :: Python :: 3.10
16+
Programming Language :: Python :: 3.8
1717
Development Status:: 5 - Production/Stable
1818
License :: OSI Approved :: MIT License
1919
Topic :: Scientific/Engineering
@@ -42,7 +42,7 @@ install_requires =
4242
seaborn==0.12.0
4343
sentinelsat==1.1.1
4444
Shapely==1.8.4
45-
greensenti==0.5.1
45+
greensenti@git+https://github.com/KhaosResearch/greensenti.git@v0.5.1
4646
dask==2022.04.1
4747
distributed==2022.4.1
4848
blosc==1.10.2

0 commit comments

Comments
 (0)