Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.

Commit 2486bc9

Browse files
author
Kamil A. Kaczmarek
authored
Dev solution 5 (#99)
* added parametrized loss weights * Update pipelines.py random code was pasted in pipelines * Update postprocessing.py bug-fix in crop/pad * Update postprocessing.py * two_unets * Update neptune.yaml dropped local data paths * pull request fixes * two specialist unets pipeline added * Update neptune.yaml * two unets pipeline added * Update pipeline_config.py added globals for specialists * corrections in the neptune.yaml * fixes for unet_specialists * Improve scoring (#54) * propose new (faster hopefully) method of counting score * Update metrics.py * corrections * Update callbacks.py hot-fixed averager update bug * Update utils.py submission generation fix * Bug fix in pipelines, assertion that checks outputs length added. (#62) * Bug fix in pipelines, assertion that checks outputs length added. * assertion message corrected * corrected order of elements in assertion * weighted segmentation loss added (#60) * weighted segmentation loss added * weighted segmentation loss added * formatting * Update neptune.yaml * Update pipeline_config.py * Update validation.py * names refactor * namig refactor * Update models.py * refactor * removed specialists, dropped contour_touching * dropped specialists and contour-touching * Update models.py * Dev patching (#61) * init * added new postpro * local * patching works * added test time augmentation * cropping bugs fixed * fixed callbacks volatile error, updated config, dropped debug from main * dropped loader pickling * added pad if smaller * added more augmentation to the patching seq * added mosaic padding to loaders updated augmentations * added dev mode, updated config, added specialists with patching * fixed mosaic loader bug * Update main.py dropped debug saving * updated postprocessing, added fixes to patching * updated postprocessing, added devmode, fixed loaders, changed mask preprocessing to get full masks and internal contours * fixed mosaic for larger patches, adjusted min blob size in postpro * pipelines with specialists and mulit with patching are working, dropped 0 channel load from loaders, minor fixes in loss def * added small random crop/pads, fixed pipelines for no patching mode, added simple validation mode * added artifact images to train * added global seeding * fixed checkersboard effect * added normalization * added blur to augmentations, added wireframe of scaling pipeline, reverted to vanila postprocessing * added trainable rescaling loop * fixed contour regeneration bug * refactored contour generation, upgraded contour generation in rescaling, cleaned pipelines * added dev and simple cv models, added caching to inference pipeline * added stain deconvolution * fixed image loading for grey images * fixed normalization of patches * moved stand alone notebooks to dir, dropped specialists, refactored pipelines * fixed pipelines updated configs * added kaggle notebooks, small refactor in pipelines, preprocessing clean up * Update augmentation.py * corrections in configs * imports optimized, removed plot_list function from utils.py * corrections * bug fix * added color_seq_RGB * Update neptune.yaml * drop_big_artifacts (#67) * Dev external data (#68) * init * added new postpro * local * patching works * added test time augmentation * cropping bugs fixed * fixed callbacks volatile error, updated config, dropped debug from main * dropped loader pickling * added pad if smaller * added more augmentation to the patching seq * added mosaic padding to loaders updated augmentations * added dev mode, updated config, added specialists with patching * fixed mosaic loader bug * Update main.py dropped debug saving * updated postprocessing, added fixes to patching * updated postprocessing, added devmode, fixed loaders, changed mask preprocessing to get full masks and internal contours * fixed mosaic for larger patches, adjusted min blob size in postpro * pipelines with specialists and mulit with patching are working, dropped 0 channel load from loaders, minor fixes in loss def * added small random crop/pads, fixed pipelines for no patching mode, added simple validation mode * added artifact images to train * added global seeding * fixed checkersboard effect * added normalization * added blur to augmentations, added wireframe of scaling pipeline, reverted to vanila postprocessing * added trainable rescaling loop * fixed contour regeneration bug * refactored contour generation, upgraded contour generation in rescaling, cleaned pipelines * added dev and simple cv models, added caching to inference pipeline * added stain deconvolution * fixed image loading for grey images * fixed normalization of patches * moved stand alone notebooks to dir, dropped specialists, refactored pipelines * fixed pipelines updated configs * added kaggle notebooks, small refactor in pipelines, preprocessing clean up * added generation of matadata and corresponding target masks for external datasets, updated configs * updated augmentation * fixed train valid split for vgg clustering version * fixed train valid split on clusters with external * optimized imports, dropped plot_list() from utils.py * added color_seq_RGB * corrected best_configs * bug fix * added dummy load save to base transformer and dropped redundant stuff… (#69) * added dummy load save to base transformer and dropped redundant stuff, added chunking * Update postprocessing.py * Update preprocessing.py * Dev stage2 (#74) * added run end to end with configs, addec competition_stage parameter * added postpro dev to pipeline * Update neptune_rescaled_patched.yaml * Update neptune_rescaled_patched.yaml * Update neptune_rescaled_patched.yaml * Update neptune_size_estimator.yaml * Update run_end_to_end.sh * final solution init commit (#98)
1 parent 57c34be commit 2486bc9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+5849
-4786
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ offline_job.log
1919
target/
2020
devbook.ipynb
2121
devbook_local.ipynb
22-
neptune_local.yaml
2322

2423
# Distribution / packaging
2524
.Python

README.md

Lines changed: 6 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,12 @@
11
# Data Science Bowl 2018: open solution
22

3-
This is an open solution to the [Data Science Bowl 2018](https://www.kaggle.com/c/data-science-bowl-2018) based on [winning solution](https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741) from topcoders.
3+
This is an open solution to the [Data Science Bowl 2018](https://www.kaggle.com/c/data-science-bowl-2018).
44

5-
## Goal
6-
Implement winning solution described by topcoders and reproduce their results.
5+
## Goals
6+
1) Deliver open, ready-to-use and extendable solution to this competition. This solution should - by itself - establish solid benchmark, as well as provide good base for your custom ideas and experiments.
7+
2) Encourage more Kagglers to start working on the Data Science Bowl, test their ideas and learn advanced data science.
78

8-
## Disclaimer
9-
In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script :wink:.
10-
11-
## Results
12-
`0.577` **Local CV**
13-
14-
`0.457` **Stage 1 LB**
15-
16-
# Solution write-up
17-
## Preprocessing
18-
* Overlay binary masks for each image is produced
19-
* Borders are produced using dilated watershed lines
20-
* Normalization as on ImageNet
21-
22-
Differences with topcoders solution:
23-
* Borders width doesn't depend on nuclei size
24-
25-
## Augmentations
26-
* Flips u/d and l/r
27-
* Rotations with symmetric padding
28-
* piecewise affine transformation
29-
* perspective transform
30-
* inverting colors
31-
* contrast normalization
32-
* elastic transformation
33-
* adding random value to pixels (elementwise and uniformly, in RGB and HSV)
34-
* multiplying pixels by random value (elementwise and uniformly, in RGB and HSV)
35-
* channel shuffle
36-
* Gaussian, average and median blurring
37-
* sharpen, emboss
38-
39-
Differences with topcoders solution:
40-
* No color to gray and gray to color
41-
* We didn't know how often and how hard were these augmentations, if they were OneOf or SomeOf etc.
42-
43-
## Network
44-
* Unet with pretrained Resnet101 or Resnet152 encoders
45-
* First network with softmax activation function and 3 channels: [background, masks - borders, borders] for predicting borders
46-
* Second network with sigmoid activation function and 2 channels: [masks, borders] for predicting full masks
47-
48-
## Training
49-
* Adam optimizer
50-
* Initial lr 1e-4
51-
* Batch size of 36 (2 GPUs) or 72 (4 GPUs)
52-
* Training on random crops of size 256x256
53-
* Inference on full images padded to minimal size fitting to network (i.e. dimensions must be divisible by 64)
54-
* TTA (flips, rotations)
55-
56-
Differences with topcoders solution:
57-
* No info about inference in the write up, maybe it was done using sliding window not on full images.
58-
* Larger batchsize.
59-
60-
## Loss function
61-
* 1st network: Cross Entropy with Dice (not on background)
62-
* 2nd network: BCE with Dice
63-
* Averaging Dice Loss over number of classes didn't change the results
64-
65-
## Postprocessing
66-
* Different thresholds are used for masks (2nd network) for retrieving seeds and final masks
67-
* Seeds for watershed are calculated as masks (2nd network) - borders (1st network)
68-
* Small mask instances and seeds are dropped
69-
* Watershed using labeled seeds as markers and masks (2nd network) as masks
70-
71-
## External data
72-
We included data from:
73-
* https://nucleisegmentationbenchmark.weebly.com/dataset.html
74-
* https://data.broadinstitute.org/bbbc/BBBC020/
75-
* https://zenodo.org/record/1175282#.W0So1RgwhhG
76-
* custom made images without nuclei on them
77-
But, up to now, including external data did not improve our score
78-
79-
## Not implemented from topcoders solution
80-
* 2nd level model
81-
* model ensembling
82-
83-
84-
85-
# Installation
9+
## Installation
8610
Check [Installation page](https://github.com/neptune-ml/data-science-bowl-2018/wiki/Installation) on our Wiki, for instructions.
8711

8812
#### Fast track:
@@ -102,4 +26,4 @@ There are several ways to seek help:
10226
3. You can submit an [issue](https://github.com/neptune-ml/data-science-bowl-2018/issues) directly in this repo.
10327

10428
## Contributing
105-
Check [CONTRIBUTING](CONTRIBUTING.md) for more information.
29+
Check [CONTRIBUTING](CONTRIBUTING.md) for more information.

augmentation.py

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
import numpy as np
2+
from imgaug import augmenters as iaa
3+
4+
affine_seq = iaa.Sequential([
5+
# General
6+
iaa.SomeOf((1, 2),
7+
[iaa.Fliplr(0.5),
8+
iaa.Flipud(0.5),
9+
iaa.Affine(rotate=(0, 360),
10+
translate_percent=(-0.1, 0.1)),
11+
iaa.CropAndPad(percent=(-0.25, 0.25), pad_cval=0)
12+
]),
13+
# Deformations
14+
iaa.PiecewiseAffine(scale=(0.00, 0.06))
15+
], random_order=True)
16+
17+
color_seq = iaa.Sequential([
18+
iaa.Sometimes(0.5, iaa.OneOf([iaa.AverageBlur(k=((5, 11), (5, 11))),
19+
iaa.AdditiveGaussianNoise(scale=0.05 * 255, per_channel=0.5)
20+
]))
21+
], random_order=True)
22+
23+
color_seq_RGB = iaa.Sequential([
24+
iaa.SomeOf((1, 2),
25+
[iaa.Sequential([
26+
iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
27+
iaa.WithChannels(0, iaa.Add((0, 100))),
28+
iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
29+
iaa.Sequential([
30+
iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
31+
iaa.WithChannels(1, iaa.Add((0, 100))),
32+
iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
33+
iaa.Sequential([
34+
iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
35+
iaa.WithChannels(2, iaa.Add((0, 100))),
36+
iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
37+
iaa.WithChannels(0, iaa.Add((0, 100))),
38+
iaa.WithChannels(1, iaa.Add((0, 100))),
39+
iaa.WithChannels(2, iaa.Add((0, 100)))]
40+
),
41+
iaa.Sometimes(0.5, iaa.OneOf([iaa.AverageBlur(k=((5, 11), (5, 11))),
42+
iaa.AdditiveGaussianNoise(scale=0.05 * 255, per_channel=0.5)])
43+
)
44+
], random_order=True)
45+
46+
47+
def patching_seq(crop_size):
48+
h, w = crop_size
49+
50+
seq = iaa.Sequential([
51+
iaa.Affine(rotate=(0, 360)),
52+
CropFixed(px=h),
53+
iaa.Fliplr(0.5),
54+
iaa.Flipud(0.5),
55+
iaa.Sometimes(0.5, iaa.CropAndPad(percent=(-0.1, 0.1), pad_cval=0)),
56+
iaa.Sometimes(0.5, iaa.PiecewiseAffine(scale=(0.02, 0.06)))
57+
], random_order=False)
58+
return seq
59+
60+
61+
class CropFixed(iaa.Augmenter):
62+
def __init__(self, px=None, name=None, deterministic=False, random_state=None):
63+
super(CropFixed, self).__init__(name=name, deterministic=deterministic, random_state=random_state)
64+
self.px = px
65+
66+
def _augment_images(self, images, random_state, parents, hooks):
67+
68+
result = []
69+
seeds = random_state.randint(0, 10 ** 6, (len(images),))
70+
for i, image in enumerate(images):
71+
seed = seeds[i]
72+
image_cr = self._random_crop_or_pad(seed, image)
73+
result.append(image_cr)
74+
return result
75+
76+
def _augment_keypoints(self, keypoints_on_images, random_state, parents, hooks):
77+
result = []
78+
return result
79+
80+
def _random_crop_or_pad(self, seed, image):
81+
height, width = image.shape[:2]
82+
83+
if height <= self.px and width > self.px:
84+
image_processed = self._random_crop(seed, image, crop_h=False, crop_w=True)
85+
image_processed = self._pad(image_processed)
86+
elif height > self.px and width <= self.px:
87+
image_processed = self._random_crop(seed, image, crop_h=True, crop_w=False)
88+
image_processed = self._pad(image_processed)
89+
elif height <= self.px and width <= self.px:
90+
image_processed = self._pad(image)
91+
else:
92+
image_processed = self._random_crop(seed, image, crop_h=True, crop_w=True)
93+
return image_processed
94+
95+
def _random_crop(self, seed, image, crop_h=True, crop_w=True):
96+
height, width = image.shape[:2]
97+
98+
if crop_h:
99+
np.random.seed(seed)
100+
crop_top = np.random.randint(height - self.px)
101+
crop_bottom = crop_top + self.px
102+
else:
103+
crop_top, crop_bottom = (0, height)
104+
105+
if crop_w:
106+
np.random.seed(seed + 1)
107+
crop_left = np.random.randint(width - self.px)
108+
crop_right = crop_left + self.px
109+
else:
110+
crop_left, crop_right = (0, width)
111+
112+
if len(image.shape) == 2:
113+
image_cropped = image[crop_top:crop_bottom, crop_left:crop_right]
114+
else:
115+
image_cropped = image[crop_top:crop_bottom, crop_left:crop_right, :]
116+
return image_cropped
117+
118+
def _pad(self, image):
119+
if len(image.shape) == 2:
120+
height, width = image.shape
121+
image_padded = np.zeros((max(height, self.px), max(width, self.px))).astype(np.uint8)
122+
image_padded[:height, :width] = image
123+
else:
124+
height, width, channels = image.shape
125+
image_padded = np.zeros((max(height, self.px), max(width, self.px), channels)).astype(np.uint8)
126+
image_padded[:height, :width, :] = image
127+
return image_padded
128+
129+
def get_parameters(self):
130+
return []
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
project-key: DSB
2+
3+
name: dsb_open_solution
4+
tags: [solution_5]
5+
6+
metric:
7+
channel: 'Final Validation Score'
8+
goal: maximize
9+
10+
#Comment out if not in Cloud Environment
11+
pip-requirements-file: requirements.txt
12+
13+
exclude:
14+
- .git
15+
- .idea
16+
- .ipynb_checkpoints
17+
- output
18+
- imgs
19+
- neptune.log
20+
- offline_job.log
21+
- notebooks
22+
23+
parameters:
24+
# Cloud Environment
25+
data_dir: /public/dsb_2018_data/
26+
meta_dir: /public/dsb_2018_data/
27+
external_data_dirs: /public/dsb_2018_data/external_data/
28+
masks_overlayed_dir: /public/dsb_2018_data/masks_overlayed/
29+
contours_overlayed_dir: /public/dsb_2018_data/contours_overlayed/
30+
centers_overlayed_dir: /public/dsb_2018_data/centers_overlayed/
31+
experiment_dir: /output/dsb/experiments/
32+
33+
# Local Environment
34+
# data_dir: /path/to/data
35+
# meta_dir: /path/to/data
36+
# external_data_dirs: /path/to/external/data
37+
# masks_overlayed_dir: /path/to/masks_overlayed
38+
# contours_overlayed_dir: /path/to/contours_overlayed
39+
# centers_overlayed_dir: /path/to/centers_overlayed
40+
# experiment_dir: /path/to/work/dir
41+
42+
# General parameters
43+
valid_category_ids: '[0, 1]'
44+
overwrite: 0
45+
num_workers: 4
46+
load_in_memory: 1
47+
pin_memory: 1
48+
use_patching: 1
49+
patching_stride: 256
50+
51+
# Image parameters (size estimator)
52+
size_estimator__image_h: 512
53+
size_estimator__image_w: 512
54+
size_estimator__image_channels: 1
55+
56+
# U-Net parameters (size estimator)
57+
size_estimator__nr_unet_outputs: 3
58+
size_estimator__n_filters: 16
59+
size_estimator__conv_kernel: 3
60+
size_estimator__pool_kernel: 3
61+
size_estimator__pool_stride: 2
62+
size_estimator__repeat_blocks: 4
63+
64+
# U-Net loss weights (size estimator)
65+
size_estimator__mask: 0.75
66+
size_estimator__contour: 1.0
67+
size_estimator__center: 0.25
68+
size_estimator__bce_mask: 1.0
69+
size_estimator__dice_mask: 1.0
70+
size_estimator__bce_contour: 1.0
71+
size_estimator__dice_contour: 1.0
72+
size_estimator__bce_center: 1.0
73+
size_estimator__dice_center: 1.0
74+
75+
# Image parameters (multi-output)
76+
image_h: 512
77+
image_w: 512
78+
image_channels: 1
79+
80+
# U-Net parameters (multi-output)
81+
nr_unet_outputs: 3
82+
n_filters: 16
83+
conv_kernel: 3
84+
pool_kernel: 3
85+
pool_stride: 2
86+
repeat_blocks: 4
87+
88+
# U-Net loss weights (multi-output)
89+
mask: 0.75
90+
contour: 1.0
91+
center: 0.25
92+
bce_mask: 1.0
93+
dice_mask: 1.0
94+
bce_contour: 1.0
95+
dice_contour: 1.0
96+
bce_center: 1.0
97+
dice_center: 1.0
98+
99+
# Training schedule
100+
epochs_nr: 1000
101+
batch_size_train: 4
102+
batch_size_inference: 4
103+
lr: 0.0002
104+
momentum: 0.9
105+
gamma: 1.0
106+
patience: 50
107+
108+
# Regularization
109+
use_batch_norm: 1
110+
l2_reg_conv: 0.00005
111+
l2_reg_dense: 0.0
112+
dropout_conv: 0.1
113+
dropout_dense: 0.0
114+
115+
# Postprocessing
116+
threshold: 0.5
117+
min_nuclei_size: 20

0 commit comments

Comments
 (0)