minerva-ml
diff --git a/‎.gitignore
Lines changed: 0 additions & 1 deletion b/‎.gitignore
Lines changed: 0 additions & 1 deletion
diff --git a/‎README.md
Lines changed: 6 additions & 82 deletions b/‎README.md
Lines changed: 6 additions & 82 deletions
diff --git a/‎augmentation.py
Lines changed: 130 additions & 0 deletions b/‎augmentation.py
Lines changed: 130 additions & 0 deletions
diff --git a/‎best_configs/neptune_rescaled_patched.yaml
Lines changed: 117 additions & 0 deletions b/‎best_configs/neptune_rescaled_patched.yaml
Lines changed: 117 additions & 0 deletions
@@ -19,7 +19,6 @@ offline_job.log
 target/
 devbook.ipynb
 devbook_local.ipynb
-neptune_local.yaml
 
 # Distribution / packaging
 .Python
 
@@ -1,88 +1,12 @@
 # Data Science Bowl 2018: open solution
 
-This is an open solution to the [Data Science Bowl 2018](https://www.kaggle.com/c/data-science-bowl-2018) based on [winning solution](https://www.kaggle.com/c/data-science-bowl-2018/discussion/54741) from topcoders.
+This is an open solution to the [Data Science Bowl 2018](https://www.kaggle.com/c/data-science-bowl-2018).
 
-## Goal
-Implement winning solution described by topcoders and reproduce their results.
+## Goals
+1) Deliver open, ready-to-use and extendable solution to this competition. This solution should - by itself - establish solid benchmark, as well as provide good base for your custom ideas and experiments.
+2) Encourage more Kagglers to start working on the Data Science Bowl, test their ideas and learn advanced data science.
 
-## Disclaimer
-In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script :wink:.
-
-## Results
-`0.577` **Local CV**
-
-`0.457` **Stage 1 LB**
-
-# Solution write-up
-## Preprocessing
-* Overlay binary masks for each image is produced
-* Borders are produced using dilated watershed lines
-* Normalization as on ImageNet
-
-Differences with topcoders solution:
-* Borders width doesn't depend on nuclei size
-
-## Augmentations
-* Flips u/d and l/r
-* Rotations with symmetric padding
-* piecewise affine transformation
-* perspective transform
-* inverting colors
-* contrast normalization
-* elastic transformation
-* adding random value to pixels (elementwise and uniformly, in RGB and HSV)
-* multiplying pixels by random value (elementwise and uniformly, in RGB and HSV)
-* channel shuffle
-* Gaussian, average and median blurring
-* sharpen, emboss
-
-Differences with topcoders solution:
-* No color to gray and gray to color
-* We didn't know how often and how hard were these augmentations, if they were OneOf or SomeOf etc.
-
-## Network
-* Unet with pretrained Resnet101 or Resnet152 encoders
-* First network with softmax activation function and 3 channels: [background, masks - borders, borders] for predicting borders
-* Second network with sigmoid activation function and 2 channels: [masks, borders] for predicting full masks
-
-## Training
-* Adam optimizer
-* Initial lr 1e-4
-* Batch size of 36 (2 GPUs) or 72 (4 GPUs)
-* Training on random crops of size 256x256
-* Inference on full images padded to minimal size fitting to network (i.e. dimensions must be divisible by 64)
-* TTA (flips, rotations)
-
-Differences with topcoders solution:
-* No info about inference in the write up, maybe it was done using sliding window not on full images.
-* Larger batchsize.
-
-## Loss function
-* 1st network: Cross Entropy with Dice (not on background)
-* 2nd network: BCE with Dice
-* Averaging Dice Loss over number of classes didn't change the results
-
-## Postprocessing
-* Different thresholds are used for masks (2nd network) for retrieving seeds and final masks
-* Seeds for watershed are calculated as masks (2nd network) - borders (1st network)
-* Small mask instances and seeds are dropped
-* Watershed using labeled seeds as markers and masks (2nd network) as masks
-
-## External data
-We included data from:
-* https://nucleisegmentationbenchmark.weebly.com/dataset.html
-* https://data.broadinstitute.org/bbbc/BBBC020/
-* https://zenodo.org/record/1175282#.W0So1RgwhhG
-* custom made images without nuclei on them
-But, up to now, including external data did not improve our score
-
-## Not implemented from topcoders solution
-* 2nd level model
-* model ensembling
-
-
-
-# Installation
+## Installation
 Check [Installation page](https://github.com/neptune-ml/data-science-bowl-2018/wiki/Installation) on our Wiki, for instructions.
 
 #### Fast track:
@@ -102,4 +26,4 @@ There are several ways to seek help:
 3. You can submit an [issue](https://github.com/neptune-ml/data-science-bowl-2018/issues) directly in this repo.
 
 ## Contributing
-Check [CONTRIBUTING](CONTRIBUTING.md) for more information.
+Check [CONTRIBUTING](CONTRIBUTING.md) for more information.
@@ -0,0 +1,130 @@
+import numpy as np
+from imgaug import augmenters as iaa
+
+affine_seq = iaa.Sequential([
+    # General
+    iaa.SomeOf((1, 2),
+               [iaa.Fliplr(0.5),
+                iaa.Flipud(0.5),
+                iaa.Affine(rotate=(0, 360),
+                           translate_percent=(-0.1, 0.1)),
+                iaa.CropAndPad(percent=(-0.25, 0.25), pad_cval=0)
+                ]),
+    # Deformations
+    iaa.PiecewiseAffine(scale=(0.00, 0.06))
+], random_order=True)
+
+color_seq = iaa.Sequential([
+    iaa.Sometimes(0.5, iaa.OneOf([iaa.AverageBlur(k=((5, 11), (5, 11))),
+                                  iaa.AdditiveGaussianNoise(scale=0.05 * 255, per_channel=0.5)
+                                  ]))
+], random_order=True)
+
+color_seq_RGB = iaa.Sequential([
+    iaa.SomeOf((1, 2),
+               [iaa.Sequential([
+                   iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
+                   iaa.WithChannels(0, iaa.Add((0, 100))),
+                   iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
+                   iaa.Sequential([
+                       iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
+                       iaa.WithChannels(1, iaa.Add((0, 100))),
+                       iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
+                   iaa.Sequential([
+                       iaa.ChangeColorspace(from_colorspace="RGB", to_colorspace="HSV"),
+                       iaa.WithChannels(2, iaa.Add((0, 100))),
+                       iaa.ChangeColorspace(from_colorspace="HSV", to_colorspace="RGB")]),
+                   iaa.WithChannels(0, iaa.Add((0, 100))),
+                   iaa.WithChannels(1, iaa.Add((0, 100))),
+                   iaa.WithChannels(2, iaa.Add((0, 100)))]
+               ),
+    iaa.Sometimes(0.5, iaa.OneOf([iaa.AverageBlur(k=((5, 11), (5, 11))),
+                                  iaa.AdditiveGaussianNoise(scale=0.05 * 255, per_channel=0.5)])
+                  )
+], random_order=True)
+
+
+def patching_seq(crop_size):
+    h, w = crop_size
+
+    seq = iaa.Sequential([
+        iaa.Affine(rotate=(0, 360)),
+        CropFixed(px=h),
+        iaa.Fliplr(0.5),
+        iaa.Flipud(0.5),
+        iaa.Sometimes(0.5, iaa.CropAndPad(percent=(-0.1, 0.1), pad_cval=0)),
+        iaa.Sometimes(0.5, iaa.PiecewiseAffine(scale=(0.02, 0.06)))
+    ], random_order=False)
+    return seq
+
+
+class CropFixed(iaa.Augmenter):
+    def __init__(self, px=None, name=None, deterministic=False, random_state=None):
+        super(CropFixed, self).__init__(name=name, deterministic=deterministic, random_state=random_state)
+        self.px = px
+
+    def _augment_images(self, images, random_state, parents, hooks):
+
+        result = []
+        seeds = random_state.randint(0, 10 ** 6, (len(images),))
+        for i, image in enumerate(images):
+            seed = seeds[i]
+            image_cr = self._random_crop_or_pad(seed, image)
+            result.append(image_cr)
+        return result
+
+    def _augment_keypoints(self, keypoints_on_images, random_state, parents, hooks):
+        result = []
+        return result
+
+    def _random_crop_or_pad(self, seed, image):
+        height, width = image.shape[:2]
+
+        if height <= self.px and width > self.px:
+            image_processed = self._random_crop(seed, image, crop_h=False, crop_w=True)
+            image_processed = self._pad(image_processed)
+        elif height > self.px and width <= self.px:
+            image_processed = self._random_crop(seed, image, crop_h=True, crop_w=False)
+            image_processed = self._pad(image_processed)
+        elif height <= self.px and width <= self.px:
+            image_processed = self._pad(image)
+        else:
+            image_processed = self._random_crop(seed, image, crop_h=True, crop_w=True)
+        return image_processed
+
+    def _random_crop(self, seed, image, crop_h=True, crop_w=True):
+        height, width = image.shape[:2]
+
+        if crop_h:
+            np.random.seed(seed)
+            crop_top = np.random.randint(height - self.px)
+            crop_bottom = crop_top + self.px
+        else:
+            crop_top, crop_bottom = (0, height)
+
+        if crop_w:
+            np.random.seed(seed + 1)
+            crop_left = np.random.randint(width - self.px)
+            crop_right = crop_left + self.px
+        else:
+            crop_left, crop_right = (0, width)
+
+        if len(image.shape) == 2:
+            image_cropped = image[crop_top:crop_bottom, crop_left:crop_right]
+        else:
+            image_cropped = image[crop_top:crop_bottom, crop_left:crop_right, :]
+        return image_cropped
+
+    def _pad(self, image):
+        if len(image.shape) == 2:
+            height, width = image.shape
+            image_padded = np.zeros((max(height, self.px), max(width, self.px))).astype(np.uint8)
+            image_padded[:height, :width] = image
+        else:
+            height, width, channels = image.shape
+            image_padded = np.zeros((max(height, self.px), max(width, self.px), channels)).astype(np.uint8)
+            image_padded[:height, :width, :] = image
+        return image_padded
+
+    def get_parameters(self):
+        return []
@@ -0,0 +1,117 @@
+project-key: DSB
+
+name: dsb_open_solution
+tags: [solution_5]
+
+metric:
+  channel: 'Final Validation Score'
+  goal: maximize
+
+#Comment out if not in Cloud Environment
+pip-requirements-file: requirements.txt
+
+exclude:
+  - .git
+  - .idea
+  - .ipynb_checkpoints
+  - output
+  - imgs
+  - neptune.log
+  - offline_job.log
+  - notebooks
+
+parameters:
+# Cloud Environment
+  data_dir:               /public/dsb_2018_data/
+  meta_dir:               /public/dsb_2018_data/
+  external_data_dirs:     /public/dsb_2018_data/external_data/
+  masks_overlayed_dir:    /public/dsb_2018_data/masks_overlayed/
+  contours_overlayed_dir: /public/dsb_2018_data/contours_overlayed/
+  centers_overlayed_dir:  /public/dsb_2018_data/centers_overlayed/
+  experiment_dir:         /output/dsb/experiments/
+
+# Local Environment
+#  data_dir:               /path/to/data
+#  meta_dir:               /path/to/data
+#  external_data_dirs:     /path/to/external/data
+#  masks_overlayed_dir:    /path/to/masks_overlayed
+#  contours_overlayed_dir: /path/to/contours_overlayed
+#  centers_overlayed_dir:  /path/to/centers_overlayed
+#  experiment_dir:         /path/to/work/dir
+
+# General parameters
+  valid_category_ids: '[0, 1]'
+  overwrite: 0
+  num_workers: 4
+  load_in_memory: 1
+  pin_memory: 1
+  use_patching: 1
+  patching_stride: 256
+
+# Image parameters (size estimator)
+  size_estimator__image_h: 512
+  size_estimator__image_w: 512
+  size_estimator__image_channels: 1
+
+# U-Net parameters (size estimator)
+  size_estimator__nr_unet_outputs: 3
+  size_estimator__n_filters: 16
+  size_estimator__conv_kernel: 3
+  size_estimator__pool_kernel: 3
+  size_estimator__pool_stride: 2
+  size_estimator__repeat_blocks: 4
+
+# U-Net loss weights (size estimator)
+  size_estimator__mask: 0.75
+  size_estimator__contour: 1.0
+  size_estimator__center: 0.25
+  size_estimator__bce_mask: 1.0
+  size_estimator__dice_mask: 1.0
+  size_estimator__bce_contour: 1.0
+  size_estimator__dice_contour: 1.0
+  size_estimator__bce_center: 1.0
+  size_estimator__dice_center: 1.0
+
+# Image parameters (multi-output)
+  image_h: 512
+  image_w: 512
+  image_channels: 1
+
+# U-Net parameters (multi-output)
+  nr_unet_outputs: 3
+  n_filters: 16
+  conv_kernel: 3
+  pool_kernel: 3
+  pool_stride: 2
+  repeat_blocks: 4
+
+# U-Net loss weights (multi-output)
+  mask: 0.75
+  contour: 1.0
+  center: 0.25
+  bce_mask: 1.0
+  dice_mask: 1.0
+  bce_contour: 1.0
+  dice_contour: 1.0
+  bce_center: 1.0
+  dice_center: 1.0
+
+# Training schedule
+  epochs_nr: 1000
+  batch_size_train: 4
+  batch_size_inference: 4
+  lr: 0.0002
+  momentum: 0.9
+  gamma: 1.0
+  patience: 50
+
+# Regularization
+  use_batch_norm: 1
+  l2_reg_conv: 0.00005
+  l2_reg_dense: 0.0
+  dropout_conv: 0.1
+  dropout_dense: 0.0
+
+# Postprocessing
+  threshold: 0.5
+  min_nuclei_size: 20