Image segmentation

In this example, we employ a segmentation pipeline which involves the following steps. First, an archive of images is downloaded from zenodo. Then, a single file is extracted from the archive. The file is a fluorescence microscopy image of cell nuclei, and image segmentation is performed using Otsu thresholding. Finally, the result of the segmentation is written to a file.

Afterwards, we will modify one of the pipeline stages, so that Li thresholding will be employed instead of Otsu thresholding. We demonstrate, how repype automatically recognizes those change, and re-runs only the relevant parts of the pipeline.

Task specifications

We then create a file examples/segmentation/task.yml that specifies our segmentation task:

[3]:

%cat examples/segmentation/task.yml

runnable: true

pipeline:
  - tests.test_repype.Download
  - tests.test_repype.Unzip
  - tests.test_repype.Segmentation
  - tests.test_repype.Output

config:
  download:
    url: https://zenodo.org/record/3362976/files/B2.zip

scopes:
  segmentation: 'seg/%s.png'

input_ids:
  - B2--W00026--P00001--Z00000--T00000--dapi.tif

The specification above consists of multiple sections:

The runnable property is set to true, so that the task becomes runnable.
The pipeline is specified, by listing the stages that it consists of. The order of the stages does not metter and is determined automatically.
The config section is optional and consists of one or more subsections, one for each stage of the pipeline. The names of the subsections correspond to the names of the stage classes, with words separated by dashes. Here, the url of the tests.test_repype.Download stage is configured.
The scopes section defines how filepaths will be resolved. In this example, a scope segmentation is defined. Within that scope, paths will be resolved to seg/%s.png via substitution of %s by a given input identifier. This is the path of the file, which the result of the segmentation will be written to.
The input_ids section defines the identifiers of the input data of the task. The pipeline will be run once and independently for each input identifier. In this example, this are the names of the files to be extracted from the archive downloaded from zenodo.

We then create a second task examples/segmentation/sigma=2/task.yml:

[4]:

%cat examples/segmentation/sigma=2/task.yml

config:
  segmentation:
    sigma: 2

Since the previously defined task examples/segmentation/task.yml is a parent of the second task examples/segmentation/sigma=2/task.yml, the latter will be a variant (aka sub-task) of the first task. This means that the second task inherits the specification of the parent, with one adoption: The hyperparameter sigma of the segmentation stage is set to 2.

We can now verify our task configurations as follows:

[5]:

!command python -m repype examples/segmentation


2 task(s) selected for running
DRY RUN: use "--run" to run the tasks instead

Selected tasks:
- /tmp/tmpwyh2_tz7/examples/segmentation (incomplete)
- /tmp/tmpwyh2_tz7/examples/segmentation/sigma=2 (incomplete)

The argument examples/segmentation is the path of the root task directory. It can be seen from the output above, that 2 tasks are found and loaded.

Batch processing

[6]:

!command python -m repype examples/segmentation --run


2 task(s) selected for running

  (1/2) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation
  Starting from scratch

    (1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif

  Results have been stored ✅

  (2/2) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation/sigma=2
  Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)

    (1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif

  Results have been stored ✅

When running the command-line interface of repype like above, the progress can be observed by inspection of the standard output. It can be seen that both tasks have been processed. Notably, while for the first task the pipeline has been executed from the start, for the second task the results from the first task have been re-used, and the execution of the pipeline has been skipped up to the segmentation stage. This makes sense, because we have changed a segmentation parameter in the second task, and it would have been redundant to re-run the download and unzip stages.

When running repype again now, both tasks are skipped, because there is nothing more to be done:

[7]:

!command python -m repype examples/segmentation


0 task(s) selected for running
DRY RUN: use "--run" to run the tasks instead

We can inspect the result of the segmentation, which has been written to file:

[8]:

display(Image(filename = 'examples/segmentation/seg/B2--W00026--P00001--Z00000--T00000--dapi.tif.png'))

../_images/examples_segmentation_19_0.png

In addition, the run times of the individual stages of the pipeline can be inspected:

[9]:

import repype.benchmark
repype.benchmark.Benchmark('examples/segmentation/times.csv').df

[9]:

	B2--W00026--P00001--Z00000--T00000--dapi.tif
download	2.486398
unzip	0.089115
segmentation	0.029838
output	0.019188

The values are reported in seconds.

Modification of the pipeline

We create a modified implementation of the segmentation stage by employing Li thresholding:

[10]:

import repype.stage

class Segmentation(repype.stage.Stage):

    inputs = ['image']
    outputs = ['segmentation']

    def process(self, image, pipeline, config, status = None):
        import skimage
        image = skimage.filters.gaussian(image, sigma = config.get('sigma', 1.))
        threshold = skimage.filters.threshold_li(image)
        return dict(
            segmentation = skimage.util.img_as_ubyte(image > threshold)
        )

Then, we replace the original segmentation stage of the pipeline by our modified implementation.

Info:

In real-world use, we would modify the pipeline by altering the examples/segmentation/task.yml file or one of the stage implementations directly. However, for demonstration purposes, it is more convenient to leave files unchanged, and instead do it programmatically.

To do this, we override the create_pipeline method of the Task class:

[11]:

import repype.task

class Task(repype.task.Task):

    def create_pipeline(self, *args, **kwargs):
        pipeline = super().create_pipeline(*args, **kwargs)
        pipeline.stages[pipeline.find('segmentation')] = Segmentation()
        return pipeline

Now we are ready to re-run the pipeline, but using our modified segmentation stage, induced by our Task implementation. In addition, we want to restrict the computations to the task examples/segmentation this time, i.e. skip its variant:

[12]:

import repype.cli

main = repype.cli.main('examples/segmentation', run = True, task_cls = Task, tasks=['examples/segmentation'])
await main();


1 task(s) selected for running

  (1/1) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation
  Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)

    (1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif

  Results have been stored ✅

It can be seen that, reasonably, the pipeline has been only re-run from the segmentation stage on.

Finally, we verify that a different segmentation result has been obtained now:

[13]:

display(Image(filename = 'examples/segmentation/seg/B2--W00026--P00001--Z00000--T00000--dapi.tif.png'))

../_images/examples_segmentation_31_0.png

Running tasks programmatically

For more fine-grained control, we can first load the tasks before running them:

[14]:

import repype.batch

batch = repype.batch.Batch(task_cls = Task)
batch.load('examples/segmentation')

print('Loaded tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.contexts))

print('Pending tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.pending))

Loaded tasks:
- examples/segmentation
- examples/segmentation/sigma=2
Pending tasks:
- examples/segmentation/sigma=2

Inspect the hyperparametrs of the examples/segmentation/sigma=2 task:

[15]:

rc = batch.context('examples/segmentation/sigma=2')
print(rc.config.yaml)

download:
  url: 'https://zenodo.org/record/3362976/files/B2.zip'
segmentation:
  sigma: 2

Now lets modify the hyperparameters programmatically:

[16]:

rc.config['segmentation/sigma'] = 5.

And now we run the task with the modified hyperparameters:

[17]:

import repype.cli
import repype.status

with repype.status.create() as status:
    async with repype.cli.StatusReaderConsoleAdapter(status.filepath, blocking = True):
        rc.run(status = status)

Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)

  (1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif

Results have been stored ✅

The argument blocking = True is necessary to get live status updates on the standard output, while the blocking operation rc.run continues running.

Info:

Besides of making changes to the hyperparameters, the run context rc can be used to modify the pipeline via rc.pipeline, or directly inspect the task via rc.task.

The task will now be reported as completed, with respect to the modified hyperparameters:

[18]:

bool(rc.task.is_pending(rc.pipeline, rc.config))

[18]:

False

However, the batch will still consider the task as pending, since it had been completed with different hyperparameters than those specified in the task.yml file:

[19]:

print('Pending tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.pending))

Pending tasks:
- examples/segmentation/sigma=2