Tutorial for setting up Rose/Cylc in order to run JULES on CEDA JASMIN

Tutorial for setting up Rose/Cylc in order to run JULES on CEDA JASMIN
    (last updated 2 August 2023)

        1. First, complete the JASMIN setup at the following site:
          https://help.jasmin.ac.uk/category/158-getting-started
          If you have connection problems to ‘login1.jasmin’, see this page: https://help.jasmin.ac.uk/article/848-login-problems
        2. Request group workspace privilege and make sure that you receive group workspace privilege for ‘jules’ at this site: https://accounts.jasmin.ac.uk
          You might need to wait till after you get a JASMIN account for requesting this privilege.
          Granting of this privilege can take some period of time (e.g., days).
          You should receive an email that you have been granted the privilege to this group workspace.
          After you receive the email confirming that you have been granted this privilege, you can check to make sure that you have been granted access to this group workspace with this command: ls -ltr /gws/nopw/j04/jules from sci1.jasmin or cylc1.jasmin (skip ahead to step 4 here, if you need connection details), and with successful access to the GWS you should see a list of subdirectories of the jules GWS.
        3. Ask your project leader for help in getting an MOSRS account.
        4. Once you do the initial JASMIN setup and once you have an MOSRS account, do the steps: (a)  ‘Configuring your own laptop or desktop machine’, (b) ‘Logging on to JASMIN’, and (c) ‘Modify configuration files on cylc1.jasmin’  (skipping the ‘Running the suite’ tutorial),  in the following guide:
          https://code.metoffice.gov.uk/trac/jules/wiki/RoseJULESonJASMIN
          When you do the ‘Modify configuration files on cylc1.jasmin’ section of that guide, you should use cylc1.jasmin instead of sci1.jasmin or sci2.jasmin (as indicated in the instructions; users should now make changes to revert to cylc1.jasmin from sci1.jasmin or sci2.jasmin, contrary to previous instructions). We’re now using cylc1.jasmin instead of sci1.jasmin or sci2.jasmin, since the graphical user interface and X-windows have been enabled on cylc1.jasmin. The GUI and X-windows are necessary for this work.
          Once you do all of this, you should be able to ssh -AX from your laptop to cylc1.jasmin without entering your password. This works best on campus at the University of Reading. Using a VPN off campus should also work, as long as the network setup identified in step #1 (above) finds that you are a ‘.ac.uk’ client.
        5. If you are a University of Reading undergraduate or postgraduate student, then you might not be able to use the VPN. Furthermore, new campus hosts do not get an external DNS record that JASMIN can verify. An alternative is to use the Linux Managed Desktop Service or ssh to the University of Reading’s external visible ssh server – arc-ssh both of which have public DNS records. For a user to access these systems from off campus they need to put in a request on the Self Service Portal.
        6. The ability to login to cylc1.jasmin via login1.jasmin (as in step #4, above) is often useful, since this login1.jasmin access means that access to xfer1.jasmin or xfer2.jasmin is also available. Access to these xfer* transfer nodes is useful if the user wants to transfer files to their local computers from JASMIN. But sometimes, the connection to login1.jasmin is not working, or the user doesn’t have their access to the VPN or the Linux Managed Desktop service working. In this case, it is sometimes useful to use login2.jasmin instead of login1.jasmin in order to connect to cylc1.jasmin, since login2.jasmin doesn’t require the VPN or the Linux Managed Desktop. However, using login1.jasmin through the University of Reading’s servers is CEDA JASMIN’s preferred method of connection. Furthermore, if you use login2.jasmin, then some of the JASMIN capabilities (like full data-transfer privileges) might not be available.
        7. Once you have your jules group workspace privilege, when logged in to cylc1.jasmin.ac.uk , type rosie go at the command line. In the rosie go GUI, search for suite u-al752, where you should use the ‘trunk’ version of this suite.  Check out this suite by clicking on it once and selecting the button for checking out a suite. Once it is finished downloading the suite, you can quit rosie go. You won’t be able to commit any changes to this suite since you don’t have permission. However, if you make a copy of this suite, by pushing the copy button instead of the check-out button, then you will have a new suite with a different suite-number for which you have permissions to commit changes. You don’t need to make a different-numbered copy of the suite right now, as we will do that in step #21 of this tutorial. This suite, u-al752, was originally developed by Karina Williams (Met Office) and Anna Harper (U. Exeter), and it runs JULES models of (up to) 75 different FLUXNET2015 sites around the world. The TRAC site for this suite from Karina and Anna is here, where example plots generated on MONSooN can be downloaded and viewed. Patrick McGuire (U. Reading) ported this suite from MONSooN to CEDA JASMIN.
        8. In the ~/roses/u-al752 directory, you can view the various Rose/Cylc setup files like suite.rc, rose-suite.conf, rose-suite.info, site/suite.rc.CEDA_JASMIN, app/jules/rose-app.conf, and app/fcm_make/file/fcm-make.cfg . You can use vi or more to view these files. Study the files for a little while. Note that all the JULES namelists are condensed to the file: app/jules/rose-app.conf .
        9. To find out about the JULES/FLUXNET suite settings, the first file to look at is rose-suite.conf where we can set up the number of spinups, state whether we want prescribed datasets, give the path to output and plots folders, the path to fluxnet datasets, etc. The second file to check is suite.rc where the list of fluxnet sites, choice of using prescribed datasets, TRIFFID, Phenology and soil carbon pools is set. The version of the FLUXNET dataset used by the suite is also given in suite.rc. It is important to realize that the 2nd file (suite.rc) reads in the values defined in the 1st file (rose-suite.conf). The third file to check is app/jules/rose-app.conf , where all the namelists are, and setting parameters for the JULES run take place. More information on the configuration of prescribed datasets for fluxnet sites (ancillaries, driving datasets, lai, etc) are give in app/jules/opt/ . By searching the key words in the files above (in the recommended order) many questions can be answered. The parameters in rose-app.conf can be searched in JULES Userguide website (for the different JULES version numbers (here: version 7.3). For instance looking for ‘ignition_method’ can be carried out by following this link:
          http://jules-lsm.github.io/vn7.3/namelists/jules_vegetation.nml.html#JULES_VEGETATION::ignition_method
        10. Now change your rose-suite.conf file with vi so that it has the right myusername (in two different places):
          OUTPUT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/jules_output'
          PLOT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/peg/plots'
          Also, change the location:
          LOCATION='CEDA_JASMIN'
          and the suite_data:
          SUITE_DATA='/gws/nopw/j04/jules/pmcguire/fluxnet/kwilliam/suite_data'
          and the JULES archive location (it is important to use jules.x_tr instead of jules.xm_tr):
          JULES_FCM='fcm:jules.x_tr'
        11. The current policy at CEDA JASMIN (November 2020) is that:
          “For testing new workflows and for new JASMIN users, the testing queue test should be used…” See:
          https://help.jasmin.ac.uk/article/4881-lotus-queues
          So, to follow this policy, if you are a new user or if you are testing this workflow, in the file ~/roses/u-al752/site/suite.rc.CEDA_JASMIN , you can change:
          [[JULES_CEDA_JASMIN]]
          inherit = None, JASMIN_LOTUS
          [[[directives]]]
          --partition = short-serial-4hr
          --account=short4hr

          --constraint = "amd"

          to:
          [[JULES_CEDA_JASMIN]]
          inherit = None, JASMIN_LOTUS
          [[[directives]]]
          --partition = test

          --constraint = "intel"

          and:
          [[PLOTTING_CEDA_JASMIN]]
          [[[directives]]]
          --time = 08:00:00
          --partition = short-serial
          to:
          [[PLOTTING_CEDA_JASMIN]]
          [[[directives]]]
          --time = 04:00:00
          --partition = test
        12. JASMIN has changed over the batch queues from LSF to SLURM. The batch queue information for this u-al752 suite has been updated recently. The MPI libraries for SLURM for JULES on JASMIN were updated in ~/roses/u-al752/suite.rc.CEDA_JASMIN .
        13. Currently, the short-serial-4hr queue is used, with the constraint to the AMD processor type. Sometimes the short-serial-4hr queue’s waiting time is long, but switching to the short-serial queue doesn’t work well for reducing the queuing time. We have switched to using the AMD processor type, since it is no longer essential to keep the intel processor constraint as before. The short-serial-4hr queue has mostly/exclusively AMD processors, so this is a constraint for using that queue. Previously, if JULES was run on a different processor type than it was compiled on, then there might have been issues with the compiler-optimization flags and the processor’s instruction set. But this issue has recently been fixed (in the summer of 2023), and now a background compile of JULES on the cylc1 VM will properly run on AMD batch nodes.
        14. In the ~/roses/u-al752 directory, type rose edit, and explore the suite in the GUI this time, but don’t make any changes and don’t press run. Then quit rose edit.
        15. In the ~/roses/u-al752 directory, type rose suite-run. This will pop up the gcylc window, and you should see your job being submitted in sequence. The fcm_make task should take about 10 minutes to complete. The various JULES tasks can take up to 1-2 hours to complete, and then there is the Python plotting routine, which is called when all the JULES tasks are finished. Wait for all this to finish. You can go away for a while or overnight, if necessary. If it is still not finished, you can right-mouse-click over the various entries and look at the job.err or job.out log files, etc. If the X-windows is cut off or the session ends unexpectedly, you can go back to the ~/roses/u-al752 directory, and type rose sgc to view the gcylc GUI again.
        16. You might now be able to study (with ncinfo or python2.7, etc.; if you use python2.7, you might need to use sci1.jasmin or sci2.jasmin instead of cylc1.jasmin in order to get the proper python libraries working.) the NETCDF output files in where you set them to be in your rose-suite.conf file, e.g. in:
          OUTPUT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/jules_output'
          The plots are where you set them to be in your rose-suite.conf file, e.g. in:
          PLOT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/peg/plots'
          You can view these PDF plots by first typing module load jaspy/2.7 and then typing display filename.pdf. Using jaspy version 2.7 is important since otherwise fcm doesn’t work, and this is needed in order to run some Rose/Cylc suites. You can also use ncview to study the NETCDF files as well.
          You can compare your PDF plots to those available at the FLUXNET TRAC site listed above.
          Here are a few examples from the FLUXNET TRAC site; for the most up-to-date ones and for other plotted variables, check at the FLUXNET TRAC site.
          FLUXNET EXAMPLE PLOT: Available Soil Moisture (top layer, JULES model only)
          FLUXNET EXAMPLE PLOT: Latent Heat Flux (LE)
          FLUXNET EXAMPLE PLOT: Sensible Heat Flux (SH)
          FLUXNET EXAMPLE PLOT: Gross Primary Production
        17. You can also look at the log files in:
          ~/cylc-run/u-al752/log/job/1
          There is one jules log subdirectory for each site, i.e.:
          ~/cylc-run/u-al752/log/job/1/jules_at_neu_presc0
          In that subdirectory, the log files are
          ~/cylc-run/u-al752/log/job/1/jules_at_neu_presc0/01/job.err
          ~/cylc-run/u-al752/log/job/1/jules_at_neu_presc0/01/job.out
          There are also log files for the fcm_make app:
          ~/cylc-run/u-al752/log/job/1/fcm_make/01/job.err
          ~/cylc-run/u-al752/log/job/1/fcm_make/01/job.out
          There are also log files for the make_plots app:
          ~/cylc-run/u-al752/log/job/1/make_plots/01/job.err
          ~/cylc-run/u-al752/log/job/1/make_plots/01/job.out
          You can view these with vi or more. If you see anything in the log files that looks like there was something wrong, then you should investigate.
        18. You might want to transfer some PDF plots (for example) from CEDA JASMIN to your own computer. One simple way to do this is to pull them (from CEDA JASMIN to your own Linux or Macintosh machine),
          by first typing similar commands to these on CEDA JASMIN:
          mkdir ~/run11a_plots
          cp -pr /work/scratch-pw2/myusername/fluxnet/run11a/peg/plots ~/run11a_plots
          We first had to copy the files from the scratch drive to your home directory (or alternatively to a group workspace), since the transferring to external computers doesn’t work from the scratch disk.
          Then type these commands on your own machine:
          mkdir run11a_plots
          scp -pr xfer1.jasmin.ac.uk:run11a_plots run11a_plots
        19. The plotting (as of 24 September 2018) doesn’t work for for GPP for the US_WCr and ZM_Mon sites on CEDA JASMIN. These 2 sites have been disabled automatically. We think that the cause of this problem with these two sites is that a new version of iris for python (2.1) was installed in July, overriding the previous version 1.13. The function in the new version of iris on JASMIN that is called ‘aggregate_by’ doesn’t seem to handle the multi-year gaps of data in GPP for US_WCr or ZM_Mon very well.
        20. You can look at the Python code (as defined in your rose-suite.conf directory). It’s copied in:
          ~/cylc-run/u-al752/bin/fluxnet_evaluation.py
          The JULES source code (FORTRAN) is copied in the subdirectories of:
          ~/cylc-run/u-al752/share/fcm_make/preprocess/
        21. So the next part of the tutorial is to make a modification to your suite and then using fcm to check in your changes to MOSRS. It can be a trivial modification like changing your plotting path in rose-suite.conf .
          But first (since you don’t have permission to commit changes to the original suite u-al752 trunk version), you need to use rosie go to check out a copy of the suite (with a new suite-number) to your account on JASMIN. There should be a copy/duplicate button in rosie go to check out a copy of the suite with a new suite number. Push that button, and then quit rosie go after it is done. Note the new suite number in your roses directory.
          Then you can make a change to the plotting path, for example.
          Next, you can do fcm diff -r HEAD in your roses/suite-number directory. It should show the changes you made.
          Finally, you can check-in (a.k.a. ‘commit’) these changes to MOSRS with fcm ci in your roses/suite-number directory. You need to add a comment to the log file before fcm ci will let you proceed. Some general information about version control can be found (for example) at: https://svnbook.red-bean.com/en/1.8/svn.basic.version-control-basics.html .