Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved metadata #240

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
21338d3
Template for Clay V1 model (#221)
srmsoumya Apr 18, 2024
74c65ad
Remove Pixelify from Decoder
Apr 19, 2024
71645e9
Merge branch 'main' into dev
Apr 19, 2024
d5ed36f
Merge branch 'dev' into multinode-training
Apr 19, 2024
b6f31ab
Modify lightning module for ClayMAE
Apr 19, 2024
e6a276e
Use lightning config.yaml
Apr 20, 2024
7b31a09
Modify dataset & datamodule to read metadata.yaml & npz files instead…
Apr 20, 2024
261cd14
Modify the model to handle EODataset
Apr 21, 2024
e0e31a6
Pass metadata path to both ClayDataModule & ClayMAEModule
Apr 21, 2024
3bcf7e8
Add model variants tiny, small, medium, large
Apr 21, 2024
801ddea
Lr 1e-3 to 1e-5
Apr 21, 2024
fa9edcf
Add teacher encoder
Apr 21, 2024
d0d9544
Pass band order as argument to get rgb for teacher model. Replace tra…
Apr 21, 2024
9587232
Add 0.75 weight to reconstruction loss
Apr 21, 2024
880cbf7
Pass rgb indices in the metadata.yaml file
Apr 22, 2024
afe1460
Don't add bias for decoder side of Dynamic EMbedding
Apr 23, 2024
40af48a
Freeze the teacher model on start of every train epoch
Apr 23, 2024
b68c5f9
pre-commit fix lint errors
weiji14 Apr 23, 2024
a317860
:heavy_plus_sign: Add timm
weiji14 Apr 24, 2024
4aa9fc0
:mute: Silence set_float32_matmul_precision tip and a print statement
weiji14 Apr 24, 2024
1e4ea81
Modify config & datamodule to run on multi-gpu mode
Apr 24, 2024
80a0778
add rec_loss & rep_loss to the logger
Apr 25, 2024
c4c8e8d
Add a temporary env file for v1 runs
Apr 25, 2024
701f412
Fix lr to 1e-5
Apr 25, 2024
7f8e633
Add multinode sbatch script
Apr 25, 2024
6df5aca
Sampler to load data as batches of sensor (#233)
srmsoumya Apr 26, 2024
cc605c6
Add a sampler to load multi sensor data
Apr 26, 2024
11fe5a8
Write collate, Sampler returns one element, use BatchSampler
Apr 26, 2024
819c25e
Simplify names of landsat
Apr 26, 2024
757132f
Sampler returns batches of input, use_distributed_sampler: False in l…
Apr 27, 2024
9628f86
Add support to provide images of different size as input to the model
Apr 27, 2024
f0efcb4
Add jsonargparse[signatures]>=4.27.7 to env yaml, required by lightni…
Apr 30, 2024
8f69719
Improved metadata
yellowcap Apr 30, 2024
0b15e83
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 30, 2024
8f6386e
Update configs/metadata.yaml
yellowcap Apr 30, 2024
698b692
Update configs/metadata.yaml
yellowcap Apr 30, 2024
e04338b
Update configs/metadata.yaml
yellowcap Apr 30, 2024
4c0c846
Update statistics on each band for each platform
yellowcap Apr 30, 2024
6107496
Fixed metadata by excluding nodata
yellowcap Apr 30, 2024
5e9ccfe
Replace legacy `np.random.shuffle` call with `np.random.Generator`
weiji14 Apr 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3,458 changes: 2,083 additions & 1,375 deletions conda-lock.yml

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions configs/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# lightning.pytorch==2.1.2
seed_everything: 42
data:
data_dir: data
size: 224
metadata_path: configs/metadata.yaml
batch_size: 96
num_workers: 192
model:
model_size: base
mask_ratio: 0.75
norm_pix_loss: False
patch_size: 16
shuffle: True
metadata_path: configs/metadata.yaml
teacher: vit_base_patch16_224.dino
lr: 1e-5
wd: 0.05
b1: 0.9
b2: 0.95
embeddings_level: mean
trainer:
accelerator: auto
strategy: ddp # ddp_find_unused_parameters_true
devices: auto
num_nodes: 1
precision: bf16-mixed
log_every_n_steps: 50
max_epochs: 100
accumulate_grad_batches: 2
default_root_dir: s3://clay-model-ckpt/v0.3.4/
val_check_interval: 0.5
fast_dev_run: False
num_sanity_val_steps: 0
use_distributed_sampler: False
# logger:
# - class_path: lightning.pytorch.loggers.CSVLogger
# init_args:
# save_dir: log_dir
# name: testv0.3.4
logger:
- class_path: lightning.pytorch.loggers.WandbLogger
init_args:
entity: developmentseed
project: clay
log_model: false
callbacks:
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
dirpath: s3://clay-model-ckpt/v0.3.4/
auto_insert_metric_name: False
filename: mae_v0.3.4_epoch-{epoch:02d}_val-loss-{val/loss:.4f}
monitor: val/loss
mode: min
save_last: True
save_top_k: 2
save_weights_only: False
verbose: True
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: step
# - class_path: src.callbacks.LogIntermediatePredictions
plugins:
- class_path: lightning.pytorch.plugins.io.AsyncCheckpointIO
ckpt_path: null
186 changes: 186 additions & 0 deletions configs/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
sentinel-2-l2a:
band_order:
- blue
- green
- red
- rededge1
- rededge2
- rededge3
- nir
- nir08
- swir16
- swir22
rgb_indices:
- 2
- 1
- 0
gsd: 10
bands:
mean:
blue: 1105.7905443429436
green: 1355.677465018305
red: 1552.3042139595739
rededge1: 1887.99582727287
rededge2: 2422.6797035254617
rededge3: 2630.9284178917605
nir: 2743.8657175279195
nir08: 2785.1764946917183
swir16: 2388.5889652492897
swir22: 1835.1844161965867
std:
blue: 1809.331169111925
green: 1757.0167343635235
red: 1888.0897412466225
rededge1: 1870.5861740445393
rededge2: 1732.9353863509612
rededge3: 1697.8050746180984
nir: 1742.9970821305676
nir08: 1648.9436499521603
swir16: 1470.1173655417583
swir22: 1379.8726451562047
wavelength:
blue: 0.493
green: 0.56
red: 0.665
rededge1: 0.704
rededge2: 0.74
rededge3: 0.783
nir: 0.842
nir08: 0.865
swir16: 1.61
swir22: 2.19
landsat-c2l1:
band_order:
- red
- green
- blue
- nir08
- swir16
- swir22
Comment on lines +57 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the near-infrared band for Landsat called nir08? Band 8 on Landsat 8/9 is the panchromatic band. Is this to match with Sentinel-2? Maybe we could follow the HLS naming scheme at https://lpdaac.usgs.gov/data/get-started-data/collection-overview/missions/harmonized-landsat-sentinel-2-hls-overview/#hls-spectral-bands if we want to keep the same band name. I.e. use NIR Broad, NIR Narrow, SWIR 1, SWIR 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming comes as-is from the underlying STAC catalog. We have this baked into the code in 2-3 places now. So let's not change it right now and make this more consistent in a future iteration.

Example:

https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l1/items/LC09_L1GT_173239_20240430_20240430_02_T2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, then let's keep this as is for now then.

rgb_indices:
- 2
- 1
- 0
gsd: 30
bands:
mean:
red: 10678.76484118414
green: 10563.786161933183
blue: 11083.12981654811
nir08: 14792.687949021101
swir16: 12276.474680884123
swir22: 10114.366217187642
std:
red: 6025.007860775774
green: 5411.132912550983
blue: 5468.117975597713
nir08: 6746.1849042811145
swir16: 5897.5555548894135
swir22: 4850.891246371683
wavelength:
blue: 0.48
green: 0.56
red: 0.65
nir08: 0.86
swir16: 1.6
swir22: 2.2
landsat-c2l2-sr:
band_order:
- red
- green
- blue
- nir08
- swir16
- swir22
rgb_indices:
- 2
- 1
- 0
gsd: 30
bands:
mean:
red: 13705.219514393912
green: 13310.059206095915
blue: 12474.550493458444
nir08: 17801.61402457603
swir16: 14615.177702523612
swir22: 12701.944126287925
std:
red: 9578.492926737901
green: 9408.33708239741
blue: 10144.81158292442
nir08: 8277.685363276674
swir16: 5300.695343235831
swir22: 4522.903361585379
wavelength:
blue: 0.48
green: 0.56
red: 0.65
nir08: 0.86
swir16: 1.6
swir22: 2.2
naip:
band_order:
- red
- green
- blue
- nir
rgb_indices:
- 0
- 1
- 2
gsd: 1.0
bands:
mean:
red: 110.1608291887418
green: 115.41097838702503
blue: 98.15365538896357
nir: 139.0418563491658
std:
red: 47.2342043248675
green: 39.82007274804654
blue: 35.438855511259206
nir: 49.86587804525244
wavelength:
blue: 0.48
green: 0.56
red: 0.65
nir: 0.842
linz:
band_order:
- red
- green
- blue
rgb_indices:
- 0
- 1
- 2
gsd: 0.5
bands:
mean:
red: 89.96022450545784
green: 99.46474319207113
blue: 89.51493029863082
std:
red: 41.833805519370436
green: 36.96485641824399
blue: 31.456326328599612
wavelength:
blue: 0.465
green: 0.555
red: 0.635
sentinel-1-rtc:
band_order:
- vv
- vh
gsd: 10
Copy link
Contributor

@weiji14 weiji14 Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a subtle difference between Ground Sampling Distance (GSD) and pixel spacing (see e.g. https://mapscaping.com/understanding-ground-sampling-distance-gsd). The more technically correct term here would be pixel spacing, as the Sentinel-1 image is delivered as images with 10m pixels, but the actual ground sampling distance is more like 20m (xref https://sentinel.esa.int/web/sentinel/technical-guides/sentinel-1-sar/products-algorithms/level-1-algorithms/ground-range-detected/iw).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. Maybe this could be renamed to pixel spacing. I think this is what the model need to know the most here, what size a single pixel represents. The GSD would be good additional information, but it would have to be specified per band for multiple systems. For instance, for Sentinel-2 we resample everything to a pixel spacing of 10m, although some bands have 20m GSD. So lets keep it as is for now and update naming later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can keep the naming for now (otherwise the model code needs to be updated too).

bands:
mean:
vv: 0.12327302601895333
vh: 0.02733734212053424
std:
vv: 1.4921548122554626
vh: 0.1221826279224903
wavelength:
vv: 55465.76
vh: 55465.76
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- lightning~=2.1.0
- matplotlib-base~=3.8.2
- planetary-computer~=1.0.0
- python-box~=7.1.0
- pytorch~=2.1.0 # [osx]
- pytorch~=2.1.0 *cuda12* # [linux]
- python~=3.11.0
Expand All @@ -22,6 +23,7 @@ dependencies:
- scikit-image~=0.22.0
- scikit-learn~=1.4.0
- stackstac~=0.5.0
- timm~=0.9.16
- torchdata~=0.7.1
- transformers~=4.35.2
- typeshed-client~=2.4.0
Expand Down
Loading