=================
Process generator
=================

Experimental feature and unofficial extension to the CWL standards.

A process generator is a CWL Process type that executes a concrete CWL
process (CommandLineTool, Workflow or ExpressionTool) which produces
CWL files as output, then executes the CWL that was generated.

The intention is to have a formalized way to express a pre-processing
or bootstrapping step in which a CWL description is generated by
another program (such as from a template, or conversion from another
workflow language).

The ProcessGenerator is a subtype of CWL process, so it must define
its inputs and outputs.  The "run" field is similar to the "run" field
of a workflow step -- it specifies a tool to run that will create new
CWL as output.

.. code:: yaml

   - name: ProcessGenerator
     type: record
     inVocab: true
     extends: cwl:Process
     documentRoot: true
     fields:
       - name: class
         jsonldPredicate:
           "_id": "@type"
           "_type": "@vocab"
         type: string
       - name: run
         type: [string, cwl:Process]
         jsonldPredicate:
           _id: "cwl:run"
           _type: "@id"
           subscope: run
         doc: |
           Specifies the process to run.


Process generator example (pytoolgen.cwl)

.. code:: yaml

   #!/usr/bin/env cwl-runner
   cwlVersion: v1.0
   $namespaces:
     cwltool: "http://commonwl.org/cwltool#"
   class: cwltool:ProcessGenerator
   inputs:
     script: string
     dir: Directory
   outputs: {}
   run:
     class: CommandLineTool
     inputs:
       script: string
       dir: Directory
     outputs:
       runProcess:
         type: File
         outputBinding:
           glob: main.cwl
     requirements:
       InlineJavascriptRequirement: {}
       cwltool:LoadListingRequirement:
         loadListing: shallow_listing
       InitialWorkDirRequirement:
         listing: |
           ${
            var v = inputs.dir.listing;
            v.push({entryname: "inp.py", entry: inputs.script});
            return v;
           }
     arguments: [python, inp.py]
     stdout: main.cwl


The process generator has two required inputs: "script" and "dir".  It
runs the command line tool listed inline in "run" with the input
object, which is required to have those parameters.  Note: the input
object may contain additional parameters which are intended for the
generated CWL when it is executed.

The command line tool populates the working directory using
InitialWorkDirRequirement.  It uses the listing from 'dir' and adds a
new file literal called "inp.py" which contains the text from the
input parameter "script".  Then it runs "python inp.py".

The output of this command line tool is the File parameter
"runProcess".  In this example, the "inp.py" script, when run, is
expected to print the CWL description to standard output, which will
be captured in the "runProcess" output parameter.

Next, the ProcessGenerator will load file in the "runProcess"
parameter, which in this example is "main.cwl".  Finally, it will
execute the process with input object that was originally provided to
the process generator.

The output of the generated script is used as the output for
ProcessGenerator as a whole.


Here's an example (zing.cwl) that uses pytoolgen.cwl.

.. code:: yaml

   #!/usr/bin/env cwltool
   {cwl:tool: pytoolgen.cwl, script: {$include: "#attachment-1"}, dir: {class: Directory, location: .}}
   --- |
   import os
   import sys
   print("""
   cwlVersion: v1.0
   class: CommandLineTool
   inputs:
     zing: string
   outputs: {}
   arguments: [echo, $(inputs.zing)]
   """)

The first line ``#!/usr/bin/env cwltool`` means that this file can be
given the executable bit (+x) and then run directly.

This is a multi-part YAML file.  The first section is a CWL input
object.

The input object uses "cwl:tool" to indicate that this input object
should be used as input to execute "pytoolgen.cwl".

The parameter ``script: {$include: "#attachment-1"}`` takes the text
from the second part of the file (following the YAML division marker
``--- |``) and assigns it as a string value to "script".

The "dir" parameter is not doing much in this example, but by
capturing the whole directory it allows the Python script to refer to
files in the current directory.

In this example the script is trivially printing CWL as a string, but
of course could do something much more complex: generate code from a
template, select among several possible workflows based on the input,
convert from another workflow language, etc.

When this is executed, the following steps happen:

#. pytoolgen.py is loaded and executed with the 1st part of the file as the input object

#. The "script" parameter contains the contents of the second part.
   The inline command line tool creates a file called "inp.py" with
   the contents of "script"

#. The inline command line tool runs python on "inp.py" and collects
   the output, which is CWL description for a trivial "echo" tool.

#. It loads the CWL description and executes it with any additional
   parameters declared in the input object or command line.


Example runs
------------

Note: requires ``cwltool`` flags ``--enable-ext`` and ``--enable-dev``

You can set these with the environment parameter CWLTOOL_OPTIONS

.. code::

   $ export CWLTOOL_OPTIONS="--enable-dev --enable-ext"

   $ ./zing.cwl
   INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
   INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
   INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] /tmp/8sez90gb$ python \
       inp.py > /tmp/8sez90gb/main.cwl
   INFO [job d3626216-d7d8-4322-bc21-4d469634cc9a] completed success
   usage: ./zing.cwl [-h] --zing ZING [job_order]
   ./zing.cwl: error: the following arguments are required: --zing


.. code::

   $ ./zing.cwl --zing blurf
   INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
   INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
   INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] /tmp/ujff239o$ python \
       inp.py > /tmp/ujff239o/main.cwl
   INFO [job a580b69d-2b88-4268-904e-ed105ba7c85e] completed success
   INFO [job main.cwl] /tmp/f_7bxncq$ echo \
       blurf
   blurf
   INFO [job main.cwl] completed success
   {
       "runProcess": {
           "location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
           "basename": "main.cwl",
           "class": "File",
           "checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
           "size": 111,
           "path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
       }
   }
   INFO Final process status is success

.. code::

   $ echo "zing: zoop" > job.yml
   $ ./zing.cwl job.yml
   INFO /home/peter/work/cwltool/venv3/bin/cwltool 3.1.20211112163758
   INFO Resolved './zing.cwl' to 'file:///home/peter/work/cwltool/tests/wf/generator/zing.cwl'
   INFO [job 9073a083-dc79-4719-8762-1c024480605c] /tmp/meeo3d19$ python \
       inp.py > /tmp/meeo3d19/main.cwl
   INFO [job 9073a083-dc79-4719-8762-1c024480605c] completed success
   INFO [job main.cwl] /tmp/2pqdz5nq$ echo \
       zoop
   zoop
   INFO [job main.cwl] completed success
   {
       "runProcess": {
           "location": "file:///home/peter/work/cwltool/tests/wf/generator/main.cwl",
           "basename": "main.cwl",
           "class": "File",
           "checksum": "sha1$8c160b680fb2cededef3228a53425e595b8cdf48",
           "size": 111,
           "path": "/home/peter/work/cwltool/tests/wf/generator/main.cwl"
       }
   }
   INFO Final process status is success
