Image segmentation
In this example, we employ a segmentation pipeline which involves the following steps. First, an archive of images is downloaded from zenodo. Then, a single file is extracted from the archive. The file is a fluorescence microscopy image of cell nuclei, and image segmentation is performed using Otsu thresholding. Finally, the result of the segmentation is written to a file.
Afterwards, we will modify one of the pipeline stages, so that Li thresholding will be employed instead of Otsu thresholding. We demonstrate, how repype automatically recognizes those change, and re-runs only the relevant parts of the pipeline.
Task specifications
We then create a file examples/segmentation/task.yml that specifies our segmentation task:
[3]:
%cat examples/segmentation/task.yml
runnable: true
pipeline:
- tests.test_repype.Download
- tests.test_repype.Unzip
- tests.test_repype.Segmentation
- tests.test_repype.Output
config:
download:
url: https://zenodo.org/record/3362976/files/B2.zip
scopes:
segmentation: 'seg/%s.png'
input_ids:
- B2--W00026--P00001--Z00000--T00000--dapi.tif
The specification above consists of multiple sections:
The
runnableproperty is set totrue, so that the task becomes runnable.The
pipelineis specified, by listing the stages that it consists of. The order of the stages does not metter and is determined automatically.The
configsection is optional and consists of one or more subsections, one for each stage of the pipeline. The names of the subsections correspond to the names of the stage classes, with words separated by dashes. Here, theurlof thetests.test_repype.Downloadstage is configured.The
scopessection defines how filepaths will be resolved. In this example, a scopesegmentationis defined. Within that scope, paths will be resolved toseg/%s.pngvia substitution of%sby a given input identifier. This is the path of the file, which the result of the segmentation will be written to.The
input_idssection defines the identifiers of the input data of the task. The pipeline will be run once and independently for each input identifier. In this example, this are the names of the files to be extracted from the archive downloaded from zenodo.
We then create a second task examples/segmentation/sigma=2/task.yml:
[4]:
%cat examples/segmentation/sigma=2/task.yml
config:
segmentation:
sigma: 2
Since the previously defined task examples/segmentation/task.yml is a parent of the second task examples/segmentation/sigma=2/task.yml, the latter will be a variant (aka sub-task) of the first task. This means that the second task inherits the specification of the parent, with one adoption: The hyperparameter sigma of the segmentation stage is set to 2.
We can now verify our task configurations as follows:
[5]:
!command python -m repype examples/segmentation
2 task(s) selected for running
DRY RUN: use "--run" to run the tasks instead
Selected tasks:
- /tmp/tmpwyh2_tz7/examples/segmentation (incomplete)
- /tmp/tmpwyh2_tz7/examples/segmentation/sigma=2 (incomplete)
The argument examples/segmentation is the path of the root task directory. It can be seen from the output above, that 2 tasks are found and loaded.
Batch processing
[6]:
!command python -m repype examples/segmentation --run
2 task(s) selected for running
(1/2) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation
Starting from scratch
(1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif
Results have been stored ✅
(2/2) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation/sigma=2
Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)
(1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif
Results have been stored ✅
When running the command-line interface of repype like above, the progress can be observed by inspection of the standard output. It can be seen that both tasks have been processed. Notably, while for the first task the pipeline has been executed from the start, for the second task the results from the first task have been re-used, and the execution of the pipeline has been skipped up to the segmentation stage. This makes sense, because we have changed a segmentation parameter in the second task, and it would have been redundant to re-run the download and unzip stages.
When running repype again now, both tasks are skipped, because there is nothing more to be done:
[7]:
!command python -m repype examples/segmentation
0 task(s) selected for running
DRY RUN: use "--run" to run the tasks instead
We can inspect the result of the segmentation, which has been written to file:
[8]:
display(Image(filename = 'examples/segmentation/seg/B2--W00026--P00001--Z00000--T00000--dapi.tif.png'))
In addition, the run times of the individual stages of the pipeline can be inspected:
[9]:
import repype.benchmark
repype.benchmark.Benchmark('examples/segmentation/times.csv').df
[9]:
| B2--W00026--P00001--Z00000--T00000--dapi.tif | |
|---|---|
| download | 2.486398 |
| unzip | 0.089115 |
| segmentation | 0.029838 |
| output | 0.019188 |
The values are reported in seconds.
Modification of the pipeline
We create a modified implementation of the segmentation stage by employing Li thresholding:
[10]:
import repype.stage
class Segmentation(repype.stage.Stage):
inputs = ['image']
outputs = ['segmentation']
def process(self, image, pipeline, config, status = None):
import skimage
image = skimage.filters.gaussian(image, sigma = config.get('sigma', 1.))
threshold = skimage.filters.threshold_li(image)
return dict(
segmentation = skimage.util.img_as_ubyte(image > threshold)
)
Then, we replace the original segmentation stage of the pipeline by our modified implementation.
Info:
In real-world use, we would modify the pipeline by altering the examples/segmentation/task.yml file or one of the stage implementations directly. However, for demonstration purposes, it is more convenient to leave files unchanged, and instead do it programmatically.
To do this, we override the create_pipeline method of the Task class:
[11]:
import repype.task
class Task(repype.task.Task):
def create_pipeline(self, *args, **kwargs):
pipeline = super().create_pipeline(*args, **kwargs)
pipeline.stages[pipeline.find('segmentation')] = Segmentation()
return pipeline
Now we are ready to re-run the pipeline, but using our modified segmentation stage, induced by our Task implementation. In addition, we want to restrict the computations to the task examples/segmentation this time, i.e. skip its variant:
[12]:
import repype.cli
main = repype.cli.main('examples/segmentation', run = True, task_cls = Task, tasks=['examples/segmentation'])
await main();
1 task(s) selected for running
(1/1) Entering task: /tmp/tmpwyh2_tz7/examples/segmentation
Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)
(1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif
Results have been stored ✅
It can be seen that, reasonably, the pipeline has been only re-run from the segmentation stage on.
Finally, we verify that a different segmentation result has been obtained now:
[13]:
display(Image(filename = 'examples/segmentation/seg/B2--W00026--P00001--Z00000--T00000--dapi.tif.png'))
Running tasks programmatically
For more fine-grained control, we can first load the tasks before running them:
[14]:
import repype.batch
batch = repype.batch.Batch(task_cls = Task)
batch.load('examples/segmentation')
print('Loaded tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.contexts))
print('Pending tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.pending))
Loaded tasks:
- examples/segmentation
- examples/segmentation/sigma=2
Pending tasks:
- examples/segmentation/sigma=2
Inspect the hyperparametrs of the examples/segmentation/sigma=2 task:
[15]:
rc = batch.context('examples/segmentation/sigma=2')
print(rc.config.yaml)
download:
url: 'https://zenodo.org/record/3362976/files/B2.zip'
segmentation:
sigma: 2
Now lets modify the hyperparameters programmatically:
[16]:
rc.config['segmentation/sigma'] = 5.
And now we run the task with the modified hyperparameters:
[17]:
import repype.cli
import repype.status
with repype.status.create() as status:
async with repype.cli.StatusReaderConsoleAdapter(status.filepath, blocking = True):
rc.run(status = status)
Picking up from: /tmp/tmpwyh2_tz7/examples/segmentation (segmentation)
(1/1) Processing: B2--W00026--P00001--Z00000--T00000--dapi.tif
Results have been stored ✅
The argument blocking = True is necessary to get live status updates on the standard output, while the blocking operation rc.run continues running.
Info:
Besides of making changes to the hyperparameters, the run context rc can be used to modify the pipeline via rc.pipeline, or directly inspect the task via rc.task.
The task will now be reported as completed, with respect to the modified hyperparameters:
[18]:
bool(rc.task.is_pending(rc.pipeline, rc.config))
[18]:
False
However, the batch will still consider the task as pending, since it had been completed with different hyperparameters than those specified in the task.yml file:
[19]:
print('Pending tasks:')
print('\n'.join(f'- {rc.task.path}' for rc in batch.pending))
Pending tasks:
- examples/segmentation/sigma=2