Getting Started
On the EO DataHub Workflow are defined in the form of OGC Application Packages. These packages require a script to be defined in Common Workflow Language (CWL), which is a YAML formatted file defining a service as a sequence of steps, including inputs and ouputs, both globally for the entire workflow, as well as those for each step within the workflow. Most CWL files will include references to Docker images specifying the processing that is done in each step of the workflow.
An example CWL script, defining a Hub-compliant workflow, can be found here. This script takes as inputs: a URL to an image file, a function name and a scale factor, provided as a percentage. It then executes the chosen function on the image. In this case this function is a resize by the provided scale factor. This image is then output, along with a STAC catalogue defining the resulting image asset.
Some best practices when creating application packages can be found here and include requirements that a workflow needs to comply with in order to work correctly with the Workflow Runner API. These practices should be followed when considering the type of inputs and outputs your workflow will produce, as well as how these are handled by the workflow upon completion.
Specific restrictions of note which must be followed:
Below is an example of defining resource requirements for a workflow step, here the user is specifying a limit of 1 CPU core and 512 MB of RAM. These can be specified for each step in your workflow and you can increase these as required.
- class: CommandLineTool
id: convert
requirements:
ResourceRequirement:
coresMax: 1
ramMax: 512
The Workflow Runner also includes two built-in steps that are run automatically, when required, for each workflow.
{
"stac_version": "1.0.0",
"id": "catalog",
"type": "Catalog",
"description": "Root catalog",
"links": [
{
"type": "application/geo+json",
"rel": "item",
"href": "item.json"
},
{
"type": "application/json",
"rel": "self",
"href": "catalog.json"
}
]
}
And an example of a STAC Feature, item.json, generated from the same workflow:
{
"stac_version": "1.0.0",
"id": " item -1728909682.980245290",
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[-180, -90],
[-180, 90],
[180, 90],
[180, -90],
[-180, -90]
]
]
},
"properties": {
"created": "2024-10-14T12:41:22.980Z",
"datetime": "2024-10-14T12:41:22.980Z",
"updated": "2024-10-14T12:41:22.980Z"
},
"bbox": [-180, -90, 180, 90],
"assets": {
" item": {
"type": "image/png",
"roles": ["data"],
"href": "item.png",
"file:size": 19133
}
},
"links": [
{
"type": "application/json",
"rel": "parent",
"href": "catalog.json"
},
{
"type": "application/geo+json",
"rel": "self",
"href": "item.json"
},
{
"type": "application/json",
"rel": "root",
"href": "catalog.json"
}
]
}
It is vital that these outputs are generated correctly and are captured in the workflow outputs as the Workflow Runner uses the links in these files to ensure the outputs are captured and harvested into the Resource Catalogue.
Note, while a STAC Catalog and Item are required as outputs, as these are used by the STAGEOUT step to gather the outputs, you do not need to provide a STAC Collection. If you omit this, the STAGEOUT step will generate a Collection automatically, using the jobID for the execution to generate the Collection ID `col_<jobID>`.
If you do provide STAC Collections in your outputs, and link to these from your Catalog.json file, these will be harvested as they are, without the need for a new collection to be generated by the STAGEOUT.
The Workflow Runner will harvest your STAC Catalog exactly as found in the outputs, including the IDs of your Catalog, collections and items. This means you can generate data deterministically in your workspace catalogue, so ensure the IDs you use are useful to you, as this will specify where you can find your outputs in your section of the Resource Catalogue. You are also able to leave your Catalog and Collection IDs blank, see example below, if you wish to have the STAGEOUT generate them based on the jobID of your execution. In this case your Catalog and Collection IDs will be rewritten to `cat-<jobID>` and `col-<jobID>` respectively.
{
"stac_version": "1.0.0",
"id": "",
"type": "Catalog",
"description": "Root catalog",
"links": [
{
"type": "application/geo+json",
"rel": "item",
"href": "item.json"
},
{
"type": "application/json",
"rel": "self",
"href": "catalog.json"
}
]
}