Monitoring a Job via R-GMA

After a job has been submitted and reaches a resource broker its progress will be tracked by that resource broker. Periodically information about the status of the job will be published from the resource broker via R-GMA, with changes of state published straight away. These data are then available to grid users via the R-GMA CLI or API. It is possible to issue a query to get the current state of a job or to have the query stream updates.

In order to issue an R-GMA query you must use a node with the R-GMA client installed that is correctly configured (pointing to your local R-GMA server), as is the case on worker nodes and user interfaces. You must also have a valid grid certificate. To test that R-GMA is installed and configured correctly you can use the client check:

$ rgma-client-check

      
*** Running R-GMA client tests on host.domain ***

      
Checking C API: Success
Checking C++ API: Success
Checking Command-line API: Success
Checking Java API: Success
Checking Python API: Success

      
*** R-GMA client test successful ***

      
      

Retrieving the status of a job via R-GMA

This can be done via one of the R-GMA APIs, the R-GMA browser or the R-GMA CLI. For these examples we will use the R-GMA CLI.

Obtaining the current status of one or all of your jobs

$ rgma -c "select * from JobStatusRaw where Job_id = 'myjobid'"
+---------+-----+-------+-----------------------------+--------------------------------------------------+--------------------+------------------------------------------------------+----------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
| Job_id  | VO  | State | StateInfo                   | Owner                                            | BKServer           | NetworkServer                                        | Destination                            | Condor_ID | StateEnterTime | DoneCode | UIHost | MeasurementDate | MeasurementTime |
+---------+-----+-------+-----------------------------+--------------------------------------------------+--------------------+------------------------------------------------------+----------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
| myjobid | cms | Done  | Job terminated successfully | /C=XX/O=XXX/OU=Personal Certificate/L=XXX/CN=XXX | lb104.cern.ch:9000 | https://128.142.160.93:7443/glite_wms_wmproxy_server | host.domain:2119/jobmanager-lcgpbs-cms | 123456    | 1205368802     | NULL     | None   | 2008-03-13      | 00:40:02        |
+---------+-----+-------+-----------------------------+--------------------------------------------------+--------------------+------------------------------------------------------+----------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
1 rows

or

$ rgma -c "select * from JobStatusRaw where Owner = 'mydn'"
+----------------------------------+-----+---------+-------------------------------+-------+--------------------+-------------------------------------------------------+-------------------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
| Job_id                           | VO  | State   | StateInfo                     | Owner | BKServer           | NetworkServer                                         | Destination                                     | Condor_ID | StateEnterTime | DoneCode | UIHost | MeasurementDate | MeasurementTime |
+----------------------------------+-----+---------+-------------------------------+-------+--------------------+-------------------------------------------------------+-------------------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
| https://lb104.cern.ch:9000/XXXX1 | ops | Cleared | user retrieved output sandbox | mydn  | lb104.cern.ch:9000 | https://128.142.173.154:7443/glite_wms_wmproxy_server | hera-ce0.desy.de:2119/jobmanager-lcgpbs-default | 123456    | 1205484784     | NULL     | None   | 2008-03-14      | 08:53:04        |
| https://lb104.cern.ch:9000/XXXX2 | ops | Cleared | user retrieved output sandbox | mydn  | lb104.cern.ch:9000 | https://128.142.173.154:7443/glite_wms_wmproxy_server | dangus.itpa.lt:2119/jobmanager-lcgpbs-sdj       | 123457    | 1205394654     | NULL     | None   | 2008-03-13      | 07:50:54        |
| https://lb105.cern.ch:9000/XXXX3 | ops | Waiting | None                          | mydn  | lb105.cern.ch:9000 | https://128.142.160.94:7443/glite_wms_wmproxy_server  | None                                            | 0         | 1205493727     | NULL     | None   | 2008-03-14      | 11:22:07        |
| https://lb105.cern.ch:9000/XXXX4 | ops | Aborted | request expired               | mydn  | lb105.cern.ch:9000 | https://128.142.160.94:7443/glite_wms_wmproxy_server  | None                                            | 0         | 1205413078     | NULL     | None   | 2008-03-13      | 12:57:58        |
+----------------------------------+-----+---------+-------------------------------+-------+--------------------+-------------------------------------------------------+-------------------------------------------------+-----------+----------------+----------+--------+-----------------+-----------------+
4 rows

Stream data about one or all of your jobs

You can specify how long you want the query to stream data, you can specify the value in seconds, minutes, hours or days.

$ rgma -c "set query continuous" -c "set timeout 20 minutes" -c "select * from JobStatusRaw where Job_id = 'myjobid'"

or

$ rgma -c "set query continuous" -c "set timeout 20 minutes" -c "select * from JobStatusRaw where Owner = 'mydn'"