Getting Started
The final step of each workflow execution is the STAGEOUT which is executed automatically by the Workflow Runner. This steps handles the files generated by the workflow and prepares them to be ingested into the workspace Object Storage and Resource Catalogue sub-catalog.
The STAGEOUT determines which files to harvest by first locating the catalog.json file that must be generated by the workflow execution. The STAGEOUT then parses this file and identifies any contained STAC Collections, Items and Assets. You workflow does not need to generate STAC Collections, and you are free to just link directly to your STAC Items from the STAC Catalog if you wish instead of including any Collections. A Collection will be added automatically by the STAGEOUT, with the ID `col_<jobID>` based on the jobID of the job that generated these outputs.
The STAGEOUT corrects some of the links in the STAC Catalog to ensure the links are correct and also updates any Asset links to ensure they are available from the Object Store via HTTPS.
These files are then exported to a workspace-specific directory within the workspace Object Store, depending on the workspace which invoked the workflow execution. After this step, the outputs are harvested into the calling workspace sub-catalog within the Resource Catalogue. Any workflow outputs are currently saved to a new sub-catalog within the “processing-results” sub-catalog of the workspace catalog, depending on the ID of the generated catalog.json file. You can then view these outputs in the Resource Catalogue API, for example at `/api/catalogue/stac/catalogs/user/catalogs/<workspace>/catalogs/processing-results/...`. You can also access outputs directly from the Object Store either via an S3 client or HTTPS, for example `https://<workspace>.prod.eodatahub-workspaces.org.uk/files/workspaces-eodhp-prod/processing-results/...`.
By default, the STAC Catalog, Collections and Items harvested from workflow outputs will retain the IDs as set in the actual workflow generated files. This means that should a workflow generate STAC outputs with the same ID on each run, the data will be overwritten in the catalogue after each execution. This can allow you to run the same workflow regularly with the same output IDs, meaning the data in the generated catalog will always be updated to the latest outputs. This might be useful if you want a single source of truth that always contains the latest data outputs.
Should you wish for your STAC Collections and Items to be added to the same Catalog, alongside the previous results, you will need to ensure these are assigned unique IDs, to avoid overwriting. You could add a timestamp to the ID or just add a UUID as a suffix.
If instead you wish for the Workflow Runner STAGEOUT to assign IDs to your Catalog and Collections, based on the jobID for the job that generated the outputs, you can leave these IDs blank, setting them to "", and they will be rewritten as `cat_<jobID>` and `col_<jobID>` respectively. Note, this only works for a single Collection output, as otherwise the data will be overwritten within your collections.
By default, any data within a workspace catalog is private to users who are members of the workspace, i.e. those who can extract tokens scoped to that workspace. Therefore, only users with access to the workspace that called the workflow are able to access the results. You will need to be authenticated before attempting to view the results, whether using the API or a browser. The same goes for any HTTP requests when trying to access the data in the Object Store, such as the generated asset files.
You are able to publish data in a private workspace data store, both in the Object Store and the Resource Catalogue. You can do this using the Data Loading functionality offered in the Workspaces UI.