Notes about Grid - Part 2: Using the Grid (060604 ver. 2.0) ***************************************************************** (a summary of my personal knowledge of the Grid computing environment) In these notes I describe the procedure I followed in order to use the Grid (http://en.wikipedia.org/wiki/Grid_computing) to submit jobs. The information I have used is taken from the web, most of all from http://grid-it.cnaf.infn.it and from its links (in particular https://edms.cern.ch/document/454439/2 and http://grid-it.cnaf.infn.it/fileadmin/users/grid-experience/grid-experience.html). Some other reference links are given below: please notice that the information they contain doesn't always describe the actual up-to-date status of the Grid computing environment. In this file some of the commands which allow to "USE" the Grid are listed. Before reading this you should have read the file grid-HOWTO-1.txt, which describes the installation procedure. Manual pages of the software listed in this document are available, usually by typing commands like [bongi@alterra bongi]$ man NAME_OF_THE_COMMAND or [bongi@alterra bongi]$ NAME_OF_THE_COMMAND -h or even [bongi@alterra bongi]$ NAME_OF_THE_COMMAND --help This can be a valuable source of information, also considering that often different pieces of software have different conventions about the same kind of command line argument. Lots of acronyms are involved when speaking about the Grid. I have tried to clarify their meaning in the following when needed: this link http://grid-it.cnaf.infn.it/fileadmin/users/dictionary/dictionary.html contains some of them and can be useful too. These instructions refers to my Linux Redhat 9 computing system: I can't assure neither that these are the best steps to be followed, nor that they work with other Linux distributions, but they can be a track to be followed somehow. I'm not a Grid expert, but if you need to contact me about this (sort of) guide you can find me at bongi@fi.infn.it. Good luck and have fun! ***************************************************************** USING THE GRID -------------- After all is properly set-up, you can finally access the Grid. N.B. The following commands are available in a shell where the UIPnP initialization script had been executed. If you decided not to make it start automatically at login (e.g. I don't make it start automatically), be sure to execute it each time in the shell you are using (see the end of section 1 of grid-HOWTO-1.txt): [bongi@alterra bongi]$ source /afs/infn.it/project/infngrid/ui/UIPnP/UIPnP-AFS-Client.sh ----------------------------------------------------------------- ********* * 1 * Generating a proxy certificate. ********* A proxy certificate is a time-limited certificate that you need in order to access computing resources on the Grid. It can be generated in the following way: [bongi@alterra bongi]$ voms-proxy-init -voms pamela Wrong ownership on file: /afs/infn.it/project/infngrid/ui/UIPnP/files/opt/glite/etc/vomses Expected: either (0,0) or (UID, GID) = (0, 0) Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it Enter GRID pass phrase: Your proxy is valid until Mon May 29 09:28:54 2006 Creating temporary proxy ....................................................... Done Contacting voms.cnaf.infn.it:15013 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it] "pamela" Warning: Wrong ownership on file: /afs/infn.it/project/infngrid/ui/UIPnP/files/opt/glite/etc/vomses Expected: either (0,0) or (UID, GID) = (0, 0) Done Creating proxy .................................. Done Your proxy is valid until Mon May 29 09:28:54 2006 The needed "GRID pass phrase" is the "PEM pass phrase" you set when you created the .pem files (see grid-HOWTO-1.txt, section 2). Please note that the "Warning" messages about wrong file ownership are the normal behavior of the command when used via AFS (as stated in /afs/infn.it/project/infngrid/ui/UIPnP/README). The proxy certificate usually lasts 12 hours. Its validity can be checked with: [bongi@alterra bongi]$ voms-proxy-info -all subject : /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it/CN=proxy issuer : /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it identity : /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it type : proxy strength : 512 bits path : /tmp/x509up_u502 timeleft : 11:35:08 === VO pamela extension information === VO : pamela subject : /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it issuer : /C=IT/O=INFN/OU=Host/L=CNAF/CN=voms.cnaf.infn.it attribute : /pamela/Role=NULL/Capability=NULL timeleft : 11:35:11 and it can be destroyed before its standard expiry time with: [bongi@alterra bongi]$ voms-proxy-destroy If your proxy certificate expires while a job is still running, the job will be aborted. This can be avoided by making a "myproxy" server automatically renew your certificate before it expires. The command is: [bongi@alterra bongi]$ myproxy-init -d -n Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it Enter GRID pass phrase for this identity: Creating proxy ..................................................................................... Done Proxy Verify OK Your proxy is valid until: Mon May 29 10:46:48 2006 A proxy valid for 168 hours (7.0 days) for user /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it now exists on myproxy.cnaf.infn.it. and it allows your jobs to stay on the Grid for 7 days. To check the validity of myproxy or to destroy it, the available commands are respectively: [bongi@alterra bongi]$ myproxy-info -d username: /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it owner: /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it timeleft: 167:59:24 (7.0 days) [bongi@alterra bongi]$ myproxy-destroy -d Default MyProxy credential for user /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it was successfully removed. ----------------------------------------------------------------- ********* * 2 * Submitting a job. ********* The information which is needed in order to submit a job has to be written in a .jdl (Job Description Language) text file. A minimal .jdl file looks like this: [bongi@alterra bongi]$ more mytest.jdl # --> THIS IS A COMMENT Executable = "/bin/echo"; Arguments = "Ciriciao!"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = ""; OutputSandbox = {"std.out","std.err"}; The syntax consists on statements ended by semicolon, like: ATTRIBUTE = VALUE; where VALUE may be a single value, a list of values or a complex expression. Each value is embedded in double quotes; list of values are enclosed in curly braces and separated by commas; comments must be preceded by a # character; no blank characters or tabs should follow the semicolon at the end of a line. The "Executable" attribute contains the path+filename of the job to be executed: any command line argument has to be specified by means of the "Argument" attribute. The standard output and error of the job are redirected into the files specified in the "StdOutput" and "StdError" fields. "InputSandbox" and "OutputSandbox" allow to specify files which have to be copied from your User Interface to the Grid Worker Node (WN) where the job actually runs, and vice versa. They are intended for relatively small files (some tens megabytes max.) like executables, scripts, standard input and standard output. Neither of them can contain two files with the same name (even if in different paths!) as when transferred they would overwrite each other. Executables which are transferred by the InputSandbox lose their "executable flag", so a "chmod +x EXECUTABLE" command should be performed (typically by a script) on the WN. If not specified by the user in the .jdl file, the Computing Element (CE, usually a PC farm) where the job is executed is chosen automatically. The list of the CEs which are available to your virtual organization, and which comply with the specified .jdl attributes can be obtained with: [bongi@alterra bongi]$ edg-job-list-match mytest.jdl Selected Virtual Organisation name (from proxy certificate extension): pamela Connecting to host egee-rb-01.cnaf.infn.it, port 7772 *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* ce2.egee.unile.it:2119/jobmanager-lcgpbs-grid gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-grid gridce.sns.it:2119/jobmanager-lcgpbs-grid spaci01.na.infn.it:2119/jobmanager-lcglsf-grid grid012.ct.infn.it:2119/jobmanager-lcglsf-short grid012.ct.infn.it:2119/jobmanager-lcglsf-long gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-gridit grid012.ct.infn.it:2119/jobmanager-lcglsf-infinite gridce.pi.infn.it:2119/jobmanager-lcgpbs-grid spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-grid atlasce01.na.infn.it:2119/jobmanager-lcgpbs-grid griditce01.na.infn.it:2119/jobmanager-lcgpbs-grid grid0.fe.infn.it:2119/jobmanager-lcgpbs-grid grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-long grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-infinite grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-short gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long *************************************************************************** The command used to submit a job is: [bongi@alterra bongi]$ edg-job-submit -o jobId_list.txt mytest.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from proxy certificate extension): pamela Connecting to host egee-rb-01.cnaf.infn.it, port 7772 Logging to host egee-rb-01.cnaf.infn.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A The edg_jobId has been saved in the following file: /home/bongi/work/pamela/grid/test/mytest.jdl_jobId ============================================================================================= where the "-o FILENAME" option allows to write the job identifier (the "https://COMPUTING_ELEMENT_ADDRESS:PORT/UNIQUE_ID" string) into a text file, in order to keep track of it. Please note that the "Warning" messages are the normal behavior of the command when used via AFS (as stated in /afs/infn.it/project/infngrid/ui/UIPnP/README). Once a job has been submitted, its status can be checked with: [bongi@alterra bongi]$ edg-job-status -i jobId_list.txt ------------------------------------------------------------------ 1 : https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A 2 : https://egee-rb-01.cnaf.infn.it:9000/vji-ZOdf2vRzlDQbi19m6Q a : all q : quit ------------------------------------------------------------------ Choose one or more edg_jobId(s) in the list - [1-2]all: The result is something like: ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A Current Status: Running Status Reason: Job successfully submitted to Globus Destination: ce2.egee.unile.it:2119/jobmanager-lcgpbs-grid reached on: Thu May 18 16:16:35 2006 ************************************************************* where the "Current Status" flag can be: SUBMITTED submission logged in the (LB) (Logging and Bookkeeping Service) WAIT job match making for resources READY job being sent to executing CE SCHEDULED job scheduled in the CE queue manager RUNNING job executing on a WN of the selected CE queue DONE job terminated without grid errors CLEARED job output retrieved ABORT job aborted by middleware, check reason When a job is "Done", you can retrieve the output with: [bongi@alterra bongi]$ edg-job-get-output -i jobId_list.txt Retrieving files from host: egee-rb-01.cnaf.infn.it ( for https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A have been successfully retrieved and stored in the directory: /home/bongi/work/pamela/grid/GRIDjobOutput/bongi_L04vN1skGuyxYJTTc1zR6A ********************************************************************************* The output will be put in the directory chosen in the $HOME/.globus/AFS-UIPnP.conf file. In my case the output was: [bongi@alterra bongi_L04vN1skGuyxYJTTc1zR6A]$ more std.* :::::::::::::: std.err :::::::::::::: :::::::::::::: std.out :::::::::::::: Ciriciao! The job output has to be retrieved within a certain period (the standard seems to be 10 days) since it will be automatically deleted after that. To cancel a job you can used this command: [bongi@alterra bongi]$ edg-job-cancel -i jobId_list.txt ------------------------------------------------------------------ 1 : https://egee-rb-01.cnaf.infn.it:9000/L04vN1skGuyxYJTTc1zR6A 2 : https://egee-rb-01.cnaf.infn.it:9000/vji-ZOdf2vRzlDQbi19m6Q a : all q : quit ------------------------------------------------------------------ Choose one or more edg_jobId(s) in the list - [1-2]all:2 Are you sure you want to remove specified job(s)? [y/n]n :y ============================= edg-job-cancel Success ============================== The cancellation request has been successfully submitted for the following job(s): - https://egee-rb-01.cnaf.infn.it:9000/vji-ZOdf2vRzlDQbi19m6Q ===================================================================================== In case a job submission has some problems, the following command can be useful to understand what's going on: [bongi@alterra bongi]$ edg-job-get-logging-info -v 1 JOBID [...] Event: Abort - host = egee-rb-01.cnaf.infn.it - reason = Failure while executing job wrapper - source = LogMonitor - src_instance = unique - timestamp = Wed May 24 10:52:14 2006 - user = /C=IT/O=INFN/OU=Personal Certificate/L=Firenze/CN=Massimo Bongi/Email=bongi@fi.infn.it [...] An usual way to execute a program is by means of a shell script. In this case the .jdl could look like this: [bongi@alterra test]$ more mytest_script.jdl Executable = "myscript.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"myscript.sh"}; OutputSandbox = {"std.out","std.err"}; If you used this simple script: [bongi@alterra bongi]$ more myscript.sh #!/bin/sh /bin/echo "Ciriciao from script!" /bin/ls thisfiledoesntexist.here you would get this output: [bongi@alterra bongi_6uVNNExCW06CYRGE_5yBig]$ more std.* :::::::::::::: std.err :::::::::::::: /bin/ls: thisfiledoesntexist.here: No such file or directory :::::::::::::: std.out :::::::::::::: Ciriciao from script! A more useful script could contain hostname and date printout, so to keep track of where and when the job has been executed: #!/bin/sh # echo -n "Start date is: " date echo "-------------" echo "Job running on WN: "`hostname` echo "-------------" #[...] echo -n "End date is: " date Other valuable sources of examples, hints and strategies about submitting jobs (e.g. setting environment variables, compiling on the WN) and in general about the Grid can be found in: http://grid-it.cnaf.infn.it/fileadmin/users/job-submission/job_submission.html https://edms.cern.ch/file/454439/LCG-2-UserGuide.html http://grid-it.cnaf.infn.it/fileadmin/users/grid-experience/grid-experience.html https://grid-it.cnaf.infn.it/cdsagenda/fullAgenda.php?ida=a0440 while a complete review of JDL language and attributes can be found here: http://server11.infn.it/workload-grid/docs/DataGrid-01-TEN-0142-0_2.pdf ----------------------------------------------------------------- ********* * 3 * Interactive jobs. ********* I have also tried "interactive" jobs, but it seems they are not supported by every CE (on some of them they simply won't start). In any case, as far as I understood "interactive" doesn't mean that you can get a full remote shell on the WN, but just that the standard input of your executable script is redirected to your shell on the user interface. In particular I have not been able to get an interactive PAW or ROOT session (but I cannot exclude this is just due to my limited knowledge of the Grid!), but only to run "interactive" scripts like: [bongi@alterra bongi]$ more myscript_interactive.sh #!/bin/sh echo -n "Please tell me your name: " read name echo "That is all, $name." echo "Bye bye." exit 0 The .jdl file in this case is something like: [bongi@alterra bongi]$ more mytest_interactive.jdl JobType = "Interactive"; Executable = "myscript_interactive.sh"; InputSandbox = "myscript_interactive.sh"; Anyway PAW or ROOT can be easily used in batch mode (paw -b MACRO_NAME and root -b MACRO_NAME) to run macros on the Grid. A problem I had with interactive jobs was about my /etc/hosts file, which contained a line like: [root@alterra root]# more /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 alterra.fi.infn.it alterra localhost.localdomain localhost In order to get interactive jobs to work I had to put there my IP address: [root@alterra root]# more /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 193.206.190.12 alterra.fi.infn.it alterra localhost.localdomain localhost ----------------------------------------------------------------- ********* * 4 * Grid resources. ********* A list of the Computing Elements (CEs) which are available to a Virtual Organization can be obtained with: [bongi@alterra bongi]$ lcg-infosites --vo pamela ce **************************************************************** These are the related data for pamela: (in terms of queues and CPUs) **************************************************************** #CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------- 10 3 7 7 0 gridce.sns.it:2119/jobmanager-lcgpbs-grid 38 2 34 16 18 gridce.pi.infn.it:2119/jobmanager-lcgpbs-grid 8 0 25 8 17 grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-long 134 1 74 56 18 gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long 8 0 16 11 5 grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-short 36 24 48 8 40 grid0.fe.infn.it:2119/jobmanager-lcgpbs-grid 134 1 3 3 0 gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short 4 3 1 1 0 gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-grid 8 6 0 0 0 grid001.ts.infn.it:2119/jobmanager-lcglsf-grid 8 0 16 9 7 grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-infinite 22 1 0 0 0 grid002.ca.infn.it:2119/jobmanager-lcgpbs-grid 134 1 320 67 253 gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite 34 1 18 2 16 griditce01.na.infn.it:2119/jobmanager-lcgpbs-grid 28 10 0 0 0 ce2.egee.unile.it:2119/jobmanager-lcgpbs-grid 120 85 0 0 0 spaci01.na.infn.it:2119/jobmanager-lcglsf-grid 3 0 3 2 1 spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-grid 32 0 68 0 68 atlasce01.na.infn.it:2119/jobmanager-lcgpbs-grid 10 1 4 0 4 gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-gridit With more verbose output we can get additional information on each of them, like: [bongi@alterra bongi]$ lcg-infosites --vo pamela -v 2 ce ************************************************************** These are the related data for pamela: (in terms of CEs) ************************************************************** RAMMemory Operating System System Version Processor CE Name ------------------------------------------------------------------------------------------------------------------------- 1024 ScientificLinuxCERN SL Xeon atlasce01.na.infn.it 4101216 ScientificLinuxCERN SL Itanium2 ce2.egee.unile.it 2048 ScientificLinuxCERN SL PIV grid-ce.lns.infn.it 1024 ScientificLinuxCERN SL Xeon grid0.fe.infn.it 1024 SLC 3 252 grid001.ts.infn.it 1024 ScientificLinuxCern SL PIV grid002.ca.infn.it 1024 ScientificLinuxCERN 3 PIII gridba2.ba.infn.it 1024 ScientificLinuxCERN SL Athlon gridce.ilc.cnr.it 1024 ScientificLinuxCERN 3 Opteron gridce.pi.infn.it 1024 ScientificLinuxCERN SL Opteron gridce.sns.it 1024 ScientificLinuxCERN SL PIV gridit-ce-001.cnaf.infn.it 1024 ScientificLinuxCERN 3 Xeon griditce01.na.infn.it 4122160 RedHatEnterpriseAS Baboon Itanium2 spaci01.na.infn.it 639504 ScientificLinuxCERN SL PIV spacin-ce1.dma.unina.it A similar command gives the list of the available Storage Elements (SEs): [bongi@alterra bongi]$ lcg-infosites --vo pamela se ************************************************************** These are the related data for pamela: (in terms of SE) ************************************************************** Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 62843880 8181900 n.a gridse.sns.it 357894928 130386336 n.a cmsse1.pi.infn.it 8102552 1694368 n.a gridse.pi.infn.it 495993888 3385392 n.a grid-se.lns.infn.it 274384284 458016512 n.a gridba6.ba.infn.it 1025997692 28843652 n.a grid2.fe.infn.it 398112136 2014492 n.a gridse.ilc.cnr.it 2295834880 46829312 n.a grid002.ts.infn.it 887450456 921887208 n.a grid007g.cnaf.infn.it 274384284 458016512 n.a gridba6.ba.infn.it 454108080 37719956 n.a grid003.ca.infn.it 47465872 62073748 n.a se01-lcg.cr.cnaf.infn.it 110943892 801657812 n.a griditse01.na.infn.it 17451280 13331928 n.a ce1.egee.unile.it 2201842691 3636748704 n.a pccms2.cmsfarm1.ba.infn.it 75330000 800000 n.a pccms5.cmsfarm1.ba.infn.it 41748812 24704308 n.a spaci02.na.infn.it 2717399980 19772620 n.a atlasse01.na.infn.it If your data are on a certain SE, it is usually a good idea to run your jobs on a "close" CE (typically in the same network domain): [bongi@alterra bongi]$ lcg-infosites --vo pamela closeSE Name of the CE: gridce.sns.it:2119/jobmanager-lcgpbs-grid Name of the close SE: gridse.sns.it Name of the CE: gridce.pi.infn.it:2119/jobmanager-lcgpbs-grid Name of the close SE: gridse.pi.infn.it Name of the close SE: cmsse1.pi.infn.it Name of the CE: grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-long Name of the close SE: grid-se.lns.infn.it Name of the CE: gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long Name of the close SE: gridba6.ba.infn.it Name of the close SE: pccms5.cmsfarm1.ba.infn.it Name of the close SE: pccms2.cmsfarm1.ba.infn.it Name of the CE: grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-short Name of the close SE: grid-se.lns.infn.it Name of the CE: grid0.fe.infn.it:2119/jobmanager-lcgpbs-grid Name of the close SE: grid2.fe.infn.it Name of the CE: gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short Name of the close SE: gridba6.ba.infn.it Name of the close SE: pccms5.cmsfarm1.ba.infn.it Name of the close SE: pccms2.cmsfarm1.ba.infn.it Name of the CE: gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-grid Name of the close SE: gridse.ilc.cnr.it Name of the CE: grid001.ts.infn.it:2119/jobmanager-lcglsf-grid Name of the close SE: grid002.ts.infn.it Name of the CE: grid-ce.lns.infn.it:2119/jobmanager-lcgpbs-infinite Name of the close SE: grid-se.lns.infn.it Name of the CE: grid002.ca.infn.it:2119/jobmanager-lcgpbs-grid Name of the close SE: grid003.ca.infn.it Name of the CE: gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite Name of the close SE: gridba6.ba.infn.it Name of the close SE: pccms5.cmsfarm1.ba.infn.it Name of the close SE: pccms2.cmsfarm1.ba.infn.it Name of the CE: griditce01.na.infn.it:2119/jobmanager-lcgpbs-grid Name of the close SE: griditse01.na.infn.it Name of the CE: ce2.egee.unile.it:2119/jobmanager-lcgpbs-grid Name of the close SE: ce1.egee.unile.it Name of the CE: spaci01.na.infn.it:2119/jobmanager-lcglsf-grid Name of the close SE: spaci02.na.infn.it Name of the CE: spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-grid Name of the close SE: gridba6.ba.infn.it Name of the CE: atlasce01.na.infn.it:2119/jobmanager-lcgpbs-grid Name of the close SE: atlasse01.na.infn.it Name of the CE: gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-gridit Name of the close SE: grid007g.cnaf.infn.it The choice of the best CE for executing your job is done automatically (also considering the "closeness" of the SE which contains data used by the job). The user can specify requirements about CEs in the .jdl file by means of more or less elaborate selections, like for example: Requirements = other.CEId=="gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-gridit"; which asks for a particular CE, or: Requirements = RegExp("cnaf.infn.it",other.CEId); which selects any CE whose name contains the string "cnaf.infn.it". A list of CE attributes which can be used for the selection can be obtained with: [bongi@alterra bongi]$ lcg-info --list-attrs lcg-info can be also used for other tasks, like in this example: [bongi@alterra bongi]$ lcg-info --list-ce --attrs 'RunningJobs,WaitingJobs' --query 'CE=gridit-ce-001.cnaf*' Several websites exists which provide information about Grid resources, and/or submitted jobs. A list can be found at the end of this file. ----------------------------------------------------------------- ********* * 5 * Handling data on the Grid. ********* The following section is based mainly on Chapter 7 of https://edms.cern.ch/document/454439/2: it is recommended that you read it in order to get more complete information on how data are managed on the Grid (things are a bit more complicated than the way I present them here...). As usual, useful information about commands can be obtained from their help or man pages. A file inside a Storage Element (SE) can be identified in four different ways. By its Grid Unique IDentifier (GUID), like: guid: e.g. guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d by its Storage URL (SURL), also known as Physical File Name (PFN): :/// e.g. sfn://tbed0101.cern.ch/flatfiles/SE00/dteam/generated/2004-02-26/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d e.g. srm://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2004-09-15/file24e3227a-cb1b-4826-9e5c-07dfb9f257a6 by its Transport URL (TURL): :// e.g. gsiftp://tbed0101.cern.ch/flatfiles/SE00/dteam/generated/2004-02-26/file3596e86f-c402-11d7-a6b0-f53ee5a37e1d or finally by its Logical File Name (LFN), or Alias: lfn:/grid/VO_NAME/PATH_TO/FILENAME e.g. lfn:/grid/pamela/bongi_test/empty.file The association between the different names is managed by a "Catalog". A Grid file has a unique GUID, and a SURL which tells its location on the SE it physically resides in: one or more LFNs can be associated to it. On the other hand, the same file can be present on more than one SE (i.e. many "replicas" can exist): in this case they have different SURLs, but the same GUID and LFN. The LFN is the normal way a user should refer to a file: LFNs can be viewed as symbolic links (with "user friendly" filenames) in a "virtual" filesystem /grid/VO_NAME/PATH_TO/FILENAME, which point to the real files stored on some SE. There are two sets of "data handling" commands which an average user should know: the lfc-* and the lcg-* commands. The first set operates just on the Catalog (that is to say on LFN filenames, not on physical files), while the second one both on the physical file replicas and on the Catalog. The lfc-* commands are: lfc-chmod Change access mode of a LFC file/directory. lfc-chown Change owner and group of a LFC file/directory. lfc-delcomment Delete the comment associated with a file/directory. lfc-getacl Get file/directory access control lists. lfc-ln Make a symbolic link to a file/directory. lfc-ls List file/directory entries in a directory. lfc-mkdir Create directory. lfc-rename Rename a file/directory. lfc-rm Remove a file/directory. lfc-setacl Set file/directory access control lists. lfc-setcomment Add/replace a comment. Actually, the only ones among them which are really useful to the user are probably lfc-ls, lfc-mkdir, lfc-rename and lfc-rm (in particular lfc-mkdir is the only way to create a LFN directory). Please note that lfc-rm removes the LFN of a file from the Catalog, NOT the physical file from the SE. For example, this command lists the /grid top directory: [bongi@alterra bongi]$ lfc-ls -l /grid/ drwxrwxr-x 1 root 101 0 Feb 17 18:29 argo drwxrwxr-x 93 root 102 0 May 18 15:51 babar drwxrwxr-x 340 root 103 0 May 22 12:52 bio drwxrwxr-x 217 root 104 0 Feb 09 20:16 cdf drwxrwxr-x 5 root 105 0 Feb 09 20:20 compchem drwxrwxr-x 5 root 106 0 May 11 15:29 enea drwxrwxr-x 1 root 117 0 Feb 09 21:14 euchina drwxrwxr-x 1 root 116 0 Feb 09 21:10 eumed drwxrwxr-x 1318 root 107 0 Apr 27 12:32 gridit drwxrwxr-x 241 root 108 0 Apr 06 16:29 inaf drwxrwxr-x 1121 root 109 0 May 22 12:43 infngrid drwxrwxr-x 5897 root 110 0 May 15 11:04 ingv drwxrwxr-x 1 root 111 0 Feb 09 20:52 libi drwxrwxr-x 2 root 112 0 May 22 14:50 pamela drwxrwxr-x 4 root 113 0 Mar 17 18:00 planck drwxrwxr-x 1 root 114 0 Feb 09 21:03 theophys drwxrwxr-x 10 root 115 0 Feb 09 21:06 virgo The following commands create a directory (users can obviously read and write only in their VO directory) and set a comment on it: [bongi@alterra bongi]$ lfc-mkdir /grid/pamela/bongi_test [bongi@alterra bongi]$ lfc-setcomment /grid/pamela/bongi_test "Just to see if it works\!" [bongi@alterra bongi]$ lfc-ls -l --comment /grid/pamela/ drwxrwxr-x 0 156 112 0 May 22 14:50 bongi_test Just to see if it works! drwxrwxr-x 372 root 112 0 Feb 09 20:59 generated About lfc-ls, please note that the -R option for recursive listing is available, but it should not be used since it is a very expensive operation. The lcg-* commands are: Replica Management lcg-cp Copies a Grid file to a local destination (download). lcg-cr Copies a file to a SE and registers the file in the catalog (upload). lcg-del Deletes one file (either one replica or all replicas). lcg-rep Copies a file from one SE to another SE and registers it in the catalog (replicate). lcg-gt Gets the TURL for a given SURL and transfer protocol. lcg-sd Sets file status to "Done" for a given SURL in an Storage Resource Manager's request. File Catalog Interaction lcg-aa Adds an alias in the catalog for a given GUID. lcg-ra Removes an alias in the catalog for a given GUID. lcg-rf Registers in the the catalog a file residing on an SE. lcg-uf Unregisters in the catalog a file residing on an SE. lcg-la Lists the LFNs for a given LFN, GUID or SURL. lcg-lg Gets the GUID for a given LFN or SURL. lcg-lr Lists the SURLs for a given LFN, GUID or SURL. For instance, this command copies the local file file:///home/bongi/work/pamela/grid/test/empty.file from my user interface to the default SE of my VO (i.d. grid007g.cnaf.infn.it), and registers it to the standard LFN location (i.d. /grid/VO_NAME/generated/YYYY_MM_DD), with a standard (horrible) LFN (i.d. file-1d670e29-a51c-4bca-8226-41d9c0cedf17): [bongi@alterra bongi]$ lcg-cr -v file:///home/bongi/work/pamela/grid/test/empty.file Using grid catalog type: lfc Using grid catalog : lfcserver.cnaf.infn.it Source URL: file:///home/bongi/work/pamela/grid/test/empty.file File size: 0 VO name: pamela Destination specified: grid007g.cnaf.infn.it Destination URL for copy: gsiftp://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file65899533-a105-45ae-b3af-fc2af92a6914 # streams: 1 # set timeout to 0 seconds Alias registered in Catalog: lfn:/grid/pamela/generated/2006-05-22/file-1d670e29-a51c-4bca-8226-41d9c0cedf17 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 2700 ms Destination URL registered in Catalog: sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file65899533-a105-45ae-b3af-fc2af92a6914 guid:2d31d8f9-cdcf-4124-92f8-cf2053682331 [bongi@alterra bongi]$ lfc-ls -l /grid/pamela/generated/2006-05-22 -rw-rw-r-- 1 156 112 0 May 22 15:24 file-1d670e29-a51c-4bca-8226-41d9c0cedf17 A smarter LFN can be specified in this way: bongi@alterra bongi]$ lcg-cr -v -l /grid/pamela/bongi_test/empty.file file:///home/bongi/work/pamela/grid/test/empty.file Using grid catalog type: lfc Using grid catalog : lfcserver.cnaf.infn.it Source URL: file:///home/bongi/work/pamela/grid/test/empty.file File size: 0 VO name: pamela Destination specified: grid007g.cnaf.infn.it Destination URL for copy: gsiftp://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c # streams: 1 # set timeout to 0 seconds Alias registered in Catalog: lfn:/grid/pamela/bongi_test/empty.file 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 1080 ms Destination URL registered in Catalog: sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c guid:0bb23632-619e-4bd5-84d1-d72561b96a6f [bongi@alterra bongi]$ lfc-ls -l /grid/pamela/bongi_test -rw-rw-r-- 1 156 112 0 May 22 15:38 empty.file If I need a copy (a replica) of the file in a different SE, I can use: [bongi@alterra bongi]$ lcg-rep -v -d gridba6.ba.infn.it lfn:/grid/pamela/bongi_test/empty.file Using grid catalog type: lfc Using grid catalog : lfcserver.cnaf.infn.it Source URL: lfn:/grid/pamela/bongi_test/empty.file File size: 0 VO name: pamela Destination specified: gridba6.ba.infn.it Source URL for copy: gsiftp://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c Destination URL for copy: gsiftp://gridba6.ba.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file6cfe4b94-bca2-493b-a1cd-bfcd52185944 # streams: 1 # set timeout to 0 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 5930 ms Destination URL registered in LRC: sfn://gridba6.ba.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file6cfe4b94-bca2-493b-a1cd-bfcd52185944 Now I can use the three lcg-l* commands to list the files I have just created. This one lists the replicas of a certain LFN on all the SEs: [bongi@alterra bongi]$ lcg-lr lfn:/grid/pamela/bongi_test/empty.file sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c sfn://gridba6.ba.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file6cfe4b94-bca2-493b-a1cd-bfcd52185944 The same result can also be obtained if the SURL is specified: [bongi@alterra bongi]$ lcg-lr sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c sfn://gridba6.ba.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file6cfe4b94-bca2-493b-a1cd-bfcd52185944 or the GUID: [bongi@alterra bongi]$ lcg-lr guid:0bb23632-619e-4bd5-84d1-d72561b96a6f sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c sfn://gridba6.ba.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file6cfe4b94-bca2-493b-a1cd-bfcd52185944 The next commands show the GUID which is associated to a LFN: [bongi@alterra bongi]$ lcg-lg lfn:/grid/pamela/bongi_test/empty.file guid:0bb23632-619e-4bd5-84d1-d72561b96a6f [bongi@alterra bongi]$ lcg-lg sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c guid:0bb23632-619e-4bd5-84d1-d72561b96a6f Finally, these commands list the (one or more) LFNs of a file: [bongi@alterra bongi]$ lcg-la lfn:/grid/pamela/bongi_test/empty.file lfn:/grid/pamela/bongi_test/empty.file [bongi@alterra bongi]$ lcg-la sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c lfn:/grid/pamela/bongi_test/empty.file [bongi@alterra bongi]$ lcg-la guid:0bb23632-619e-4bd5-84d1-d72561b96a6f lfn:/grid/pamela/bongi_test/empty.file In order to download a file from the Grid, the following command can be used: [bongi@alterra bongi]$ lcg-cp -v lfn:/grid/pamela/bongi_test/empty.file file:/home/bongi/tmp/empty.file Using grid catalog type: lfc Using grid catalog : lfcserver.cnaf.infn.it Source URL: lfn:/grid/pamela/bongi_test/empty.file File size: 0 VO name: pamela Source URL for copy: gsiftp://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/file89283155-18f8-474b-841f-171ecbdc7f9c Destination URL: file:/home/bongi/tmp/empty.file # streams: 1 # set timeout to 0 (seconds) 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 1060 ms while the next one shows how to delete it: [bongi@alterra bongi]$ lcg-del -v -a lfn:/grid/pamela/bongi_test/empty.file VO name: pamela set timeout to 0 seconds ----------------------------------------------------------------- ********* * 6 * Jobs and data. ********* Typically a job needs some input files to work on and produces output files, which have to be read from / written to a SE. There are typically two possibilities in order to accomplish this: the user can transfer the input files from the SE to the worker node where the job actually runs (by means of lcg-cp commands run by a script), use them locally, and then transfer the output back to the SE (by means of lcg-cr commands); or the executable running on the worker node can directly access input and output files over the Grid (a specific Application Programming Interface called Grid File Access Library (GFAL) exists for this task, see Example F.0.0.2 in https://edms.cern.ch/document/454439/2). An example of the first kind of strategy is reported here. I have copied this file: [bongi@alterra bongi]$ more test.file I'm inside the GRID! into the Grid and registered it in the Catalog with a LFN. Then I have replicated it in three other SEs: [bongi@alterra bongi]$ lcg-lr lfn:/grid/pamela/bongi_test/test.file sfn://ce1.egee.unile.it/flatfiles/SE00/pamela/generated/2006-05-24/file797705b4-1f6a-49d5-baea-9adba8ab7113 sfn://grid005.ct.infn.it/flatfiles/pamela/generated/2006-05-23/file41faca90-3bbd-41b2-9526-e7a304e5a642 sfn://grid007g.cnaf.infn.it/flatfiles/SE00/pamela/generated/2006-05-22/filed81e3c08-5b2a-46cd-ae9f-1f740de373b6 sfn://gridse.pi.infn.it/flatfiles/SE00/pamela/generated/2006-05-24/filefe9ade03-a273-4df9-80b2-86ad2f9aadef Now I can run a job which is going to use this file, by specifying it in the "InputData" attribute (which requires the "DataAccessProtocol" argument to be specified too): [bongi@alterra bongi]$ more mytest_storage.jdl Executable = "myscript_storage.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = "myscript_storage.sh"; OutputSandbox = {"std.out","std.err"}; InputData = "lfn:/grid/pamela/bongi_test/test.file"; DataAccessProtocol = {"rfio","gsiftp","gsidcap"}; The "InputData" argument will cause the job to be run on a CE which is close to one of the SEs that contain the file. The executable consists of this script: [bongi@alterra bongi]$ more myscript_storage.sh #!/bin/sh echo -n "Start date is: " date echo "-------------" echo "Job running on WN: "`hostname` export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfcserver.cnaf.infn.it export LCG_GFAL_VO=pamela echo "*************" pwd echo "*************" ls echo "*************" lcg-cp -v lfn:/grid/pamela/bongi_test/test.file file:$PWD/test.file echo "*************" ls echo "*************" more test.file echo "-------------" echo -n "End date is: " date exit 0 The exported shell variables are needed to be sure the worker node knows which Catalog has to be used for my VO. After the job has run, by looking at the output we obtain: [bongi@alterra bongi]$ more std.out :::::::::::::: Start date is: Wed May 24 12:03:24 CEST 2006 ------------- Job running on WN: n58.unile.it ************* /home/pamela002/globus-tmp.n58.20182.0/WMS_n58_020655_https_3a_2f_2fegee-rb-01.cnaf.infn.it_3a9000_2fufQT4MAJXwTtdADxYEZpAQ ************* myscript_storage.sh std.err std.out ************* Using grid catalog type: lfc Using grid catalog : lfcserver.cnaf.infn.it Source URL: lfn:/grid/pamela/bongi_test/test.file File size: 23 VO name: pamela Source URL for copy: gsiftp://ce1.egee.unile.it/flatfiles/SE00/pamela/generated/2006-05-24/file797705b4-1f6a-49d5-baea-9adba8ab7113 Destination URL: file:/home/pamela002/globus-tmp.n58.20182.0/WMS_n58_020655_https_3a_2f_2fegee-rb-01.cnaf.infn.it_3a9000_2fufQT4MAJXwTtdADxYEZpAQ/test.file # streams: 1 # set timeout to 0 (seconds) Transfer took 1009 ms ************* myscript_storage.sh std.err std.out test.file ************* :::::::::::::: test.file :::::::::::::: I'm inside the GRID! Writing output files on a SE and registering them in the Catalog can be done by means of the "OutputData" argument of the .jdl file. The following .jdl file example represents a template which shows examples of how to use (almost) all the Job Description Language fields, and in particular the "OutputData" argument, too: [bongi@alterra bongi]$ more full_example.jdl [ Type = "Job"; JobType = "Normal"; Executable = "EXECUTABLE"; Arguments = "-COMMAND -LINE -OPTIONS"; StdInput = "STD.IN"; StdOutput = "STD.OUT"; StdError = "STD.ERR"; InputSandbox = { "NAME.SH", "NAME.EXE" }; OutputSandbox = { "STD.OUT", "STD.ERR" }; Environment = "ENV_VARIABLE=VALUE"; DataAccessProtocol = {"rfio", "gsiftp", "gsidcap"}; InputData = { "lfn:/grid/VO/PATH_TO_DATA/IN.FILENAME.1", "lfn:/grid/VO/PATH_TO_DATA/IN.FILENAME.2", "lfn:/grid/VO/PATH_TO_DATA/IN.FILENAME.3", "guid:0bb23632-619e-4bd5-84d1-d72561b96a6f" }; OutputSE = "HOSTNAME.HOSTDOMAIN"; OuputData = { [ OutputFile = "OUT.FILENAME.1"; LogicalFileName = "lnf:/grid/VO/PATH_TO_DATA/OUT.FILENAME.1"; ], [ OutputFile = "OUT.FILENAME.2"; LogicalFileName = "lnf:/grid/VO/PATH_TO_DATA/OUT.FILENAME.2"; ], [ OutputFile = "OUT.FILENAME.3"; LogicalFileName = "lnf:/grid/VO/PATH_TO_DATA/OUT.FILENAME.3"; ] }; RetryCount = 3; MyProxyServer = "myproxy.cnaf.infn.it"; Requirements = (other.CEId=="gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-gridit") && RegExp("cnaf.infn.it",other.CEId); ] ----------------------------------------------------------------- ********* * 7 * Grid on the web. ********* I list here for quick reference some links about the Grid. *********************************** Websites containing information about using the Grid: INFN-CNAF Grid site: step by step procedure to use the Grid http://grid-it.cnaf.infn.it LHC Computing Grid Middleware ver. 2 User Guide: one of the main sources of info https://edms.cern.ch/document/454439/2 LCG website http://lcg.web.cern.ch/LCG/ Grid Experience with EDG2 and LCG-1: lot of info here too, in particular there are two chapters about error messages http://grid-it.cnaf.infn.it/fileadmin/users/grid-experience/grid-experience.html Introduction to grid computing https://grid-it.cnaf.infn.it/cdsagenda/fullAgenda.php?ida=a0440 Job Submission Tutorial http://grid-it.cnaf.infn.it/fileadmin/users/job-submission/job_submission.html EU DataGrid Job Description Language (JDL) Guide: all the JDL attributes http://server11.infn.it/workload-grid/docs/DataGrid-01-TEN-0142-0_2.pdf EU DataGrid Tutorial https://edms.cern.ch/document/393671/1 Graphical user interface for job submission (does it really exist???) http://server11.infn.it/workload-grid/ GFAL man pages http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html *********************************** Websites displaying information about the status of the Grid: GridIce http://gridice4.cnaf.infn.it:50080/gridice/site/site.php LCG2 Real Time Monitor: java applet to monitor jobs http://gridportal.hep.ph.ic.ac.uk/rtm/ GIIS Monitor http://goc.grid.sinica.edu.tw/gstat/ Grid Operations Centre Database https://goc.grid-support.ac.uk/gridsite/gocdb2/ List of INFN Grid services http://grid-it.cnaf.infn.it/index.php?gridservices&type=1 INFN Grid Calendar: it shows which CE is down for fault or maintenance https://grid-it.cnaf.infn.it/support/calendar/index.php INFN Grid Downtime Advices https://grid-it.cnaf.infn.it/support/calendar/show_dtimes.php Deployment status and plan for INFN Grid http://grid-it.cnaf.infn.it/index.php?deployment&type=1