Configuration files
The config file is where you set parameters affecting how you run the pipeline. The default config contains the following:
[data]
vis = ''
[fields]
bpassfield = ''
fluxfield = ''
phasecalfield = ''
targetfields = ''
extrafields = ''
[slurm] # See processMeerKAT.py -h for documentation
nodes = 1
ntasks_per_node = 8
plane = 1
mem = 232 # Use this many GB of memory (per node)
partition = 'Main' # SLURM partition to use
exclude = '' # SLURM nodes to exclude
time = '12:00:00'
submit = False
container = '/idia/software/containers/casa-6.5.0-modular.sif'
mpi_wrapper = 'mpirun'
name = ''
dependencies = ''
account = 'b03-idia-ag'
reservation = ''
modules = ['openmpi/4.0.3']
verbose = False
precal_scripts = [('calc_refant.py',False,''), ('partition.py',True,'')]
postcal_scripts = [('concat.py',False,''), ('plotcal_spw.py', False, ''), ('selfcal_part1.py',True,''), ('selfcal_part2.py',False,''), ('science_image.py', True, '')]
scripts = [ ('validate_input.py',False,''),
('flag_round_1.py',True,''),
('calc_refant.py',False,''),
('setjy.py',True,''),
('xx_yy_solve.py',False,''),
('xx_yy_apply.py',True,''),
('flag_round_2.py',True,''),
('xx_yy_solve.py',False,''),
('xx_yy_apply.py',True,''),
('split.py',True,''),
('quick_tclean.py',True,'')]
[crosscal]
minbaselines = 4 # Minimum number of baselines to use while calibrating
chanbin = 1 # Number of channels to average before calibration (during partition)
width = 1 # Number of channels to (further) average after calibration (during split)
timeavg = '8s' # Time interval to average after calibration (during split)
createmms = True # Create MMS (True) or MS (False) for cross-calibration during partition
keepmms = True # Output MMS (True) or MS (False) during split
spw = '*:880~933MHz,*:960~1010MHz,*:1010~1060MHz,*:1060~1110MHz,*:1110~1163MHz,*:1299~1350MHz,*:1350~1400MHz,*:1400~1450MHz,*:1450~1500MHz,*:1500~1524MHz,*:1630~1680MHz' # Spectral window / frequencies to extract for MMS
nspw = 11 # Number of spectral windows to split into
calcrefant = False # Calculate reference antenna in program (overwrites 'refant')
refant = 'm059' # Reference antenna name / number
standard = 'Stevens-Reynolds 2016'# Flux density standard for setjy
badants = [] # List of bad antenna numbers (to flag)
badfreqranges = [ '933~960MHz', # List of bad frequency ranges (to flag)
'1163~1299MHz',
'1524~1630MHz']
[run] # Internal variables for pipeline execution
continue = True
dopol = False
If you’re also performing self-calibration (option [-2 --do2GC]
- see here), the default config will also contain the [selfcal]
section:
[selfcal]
nloops = 2 # Number of clean + bdsf loops.
loop = 0 # If nonzero, adds this number to nloops to name images or continue previous run
cell = '1.5arcsec'
robust = -0.5
imsize = [6144, 6144]
wprojplanes = 512
niter = [10000, 50000, 50000]
threshold = ['0.5mJy', 10, 10] # After loop 0, S/N values if >= 1.0, otherwise Jy
nterms = 2 # Number of taylor terms
gridder = 'wproject'
deconvolver = 'mtmfs'
calmode = ['','p'] # '' to skip solving (will also exclude mask for this loop), 'p' for phase-only and 'ap' for amplitude and phase
solint = ['','1min']
uvrange = '' # uv range cutoff for gaincal
flag = True # Flag residual column after selfcal?
gaintype = 'G' # Use 'T' for polarisation on linear feeds (e.g. MeerKAT)
discard_nloops = 0 # Discard this many selfcal solutions (e.g. from quick and dirty image) during subsequent loops (only considers when calmode !='')
outlier_threshold = 0.0 # S/N values if >= 1.0, otherwise Jy
outlier_radius = 0.0 # Radius in degrees for identifying outliers in RACS
If you’re also performing science imaging (option [-I --science_image]
- see here), the default config will also conatin the [image]
section:
[image]
cell = '1.5arcsec'
robust = -0.5
imsize = [6144, 6144]
wprojplanes = 512
niter = 50000
threshold = 10 # S/N value if >= 1.0 and rmsmap != '', otherwise Jy
multiscale = [0, 5, 10, 15]
nterms = 2 # Number of taylor terms
gridder = 'wproject'
deconvolver = 'mtmfs'
restoringbeam = ''
stokes = 'I'
pbthreshold = 0.1 # Threshold below which to mask the PB for PB correction
mask = ''
rmsmap = ''
outlierfile = ''
If you do not perform either self-calibration or science imaging, the script related to those steps will be stripped from your config. Similarly calc_refant
will be stripped from your config if calcrefant=False
(the default), in which case m059
will be used as a good default reference antenna, or the pipeline will output a warning if this antenna doesn’t exist in your input MS, and will reommend other good reference antennas.
When the pipeline is run, the contents of your config file are copied to the hidden file .config.tmp
and each python script reads the parameters from this file as it is run. This way, the user cannot easily break the pipeline during the time it is running. This means changing the [slurm]
section in your config file will have no effect unless you once again run processMeerKAT.py -R
.
Polarisation config
If you select [-P --dopol]
during the [-B --build]
step of the pipeline, your config file be similar to the above, except xy_yx_solve
and xy_yx_apply
replace the 2nd call of the xx_yy_solve
and xx_yy_apply
scripts. Furthermore, in the [run]
section, dopol=True
will be set, which will cause all four correlations to be included during the initial partition
step (or the pipeline will output a warning during the [-B --build]
step if only two correlations exist). Additionally, we recommend setting gaintype = 'T'
in the [selfcal]
section, if performing self-calibration for polarisation processing.
Manually selecting field IDs
As discussed here, the pipeline selects field IDs for you by default, during the [-B --build]
step. However, these can be overwritten after this step by editing the [fields]
section of your config file and manually selecting field IDs. Both field names (i.e. strings) and field IDs (i.e. strings of integers) are supported, although field names are preferable.
Example use cases for manually selecting field IDs include when your input MS has mislabelled (or missing) INTENT
, and when multiple flux/bandpass calibrators were used (or labelled as such via their INTENT
), but where the default is less preferred (as the pipeline chooses the one with the most scans by default). For example, you may have two scans on field J0408-6545
, but only one on J1939-6342
, but the latter is still preferred as its model is more well constrained. Lastly, you may wish to remove extrafields
to reduce processing time, if you have no interest in calibrating these fields and producing quick-look images of them.
SPW Splitting
Version 1.1 introduced spectral window (SPW) splitting, where each separate SPW is processed concurrently. Here we discuss all that is relevant to this functionality.
This mode is the default mode, invoked by selecting a value greater than one with the config parameter nspw
. This mode is recommended for most use cases, when the input dataset is TB in size, and the number of scans is typical (tens of scans). When the input data (after pre-processing, including selection of spw
) is small (tens to hundreds of GB), or when the number of scans is very large (hundreds), the single MS mode (see MS only) is recommended. Alternatively, for such a use case, the user could set nspw=1
, resulting in the previous behaviour from version 1.0, but with potential polarisation calibration and flux scale issues.
When selecting nspw
> 1 and passing in a single SPW range via the spw
config parameter (e.g. spw=*:880~1680MHz
), the data will split into nspw
SPWs during the [-R --run]
step, which will be equally-sized frequency ranges encompassing the frequency range provided. The calibration algorithm within each SPW remains the same. However, the resulting Stokes I calibration when nspw
> 1 is improved overall, as it accounts for the intrinsic spectrum of the phase calibrator, which would otherwise be treated as flat across the entire spw
. Furthermore, the polarisation calibration is improved, since each SPW, within which the phase calibrator’s leakages and Stokes Q and U are assumed to be constant, is solved separately, which accounts for the wideband polarisation structure.
Setting spw
and nspw
The SPWs into which the pipeline splits and independently (and concurrently) processes, is determined by the final values stored in the config parameter spw
. Multiple SPWs are listed as comma-seperated values, but will only be considered as separate SPWs when nspw
is equal to the number of comma-separated SPWs given by the spw
parameter. If nspw=1
and spw
contains comma-separated values, only a single SPW will be processed, and the comma-separated values will be passed into the CASA tasks that set the spw
parameter. If nspw
is not equal to the number of comma-separated SPWs, the pipeline will output a warning and set nspw
in your config to be equal to the number of comma-separated SPWs.
Therefore, the SPWs can be manually set to ranges that will be separately (and concurrently) processed, such as manual frequency ranges that avoid regions of persistent RFI. We recommend the following default for this purpose:
nspw=11
spw = '*:880~933MHz,*:960~1010MHz,*:1010~1060MHz,*:1060~1110MHz,*:1110~1163MHz,*:1299~1350MHz,*:1350~1400MHz,*:1400~1450MHz,*:1450~1500MHz,*:1500~1524MHz,*:1630~1680MHz'
This also enables you to set badfreqranges = []
, as the SPWs have already been constructed to avoid these regions of persistent RFI.
Alternatively, if spw
is equal to a single value (i.e. not a comma-separated list), the pipeline can split the frequency range given by spw
into nspw
equal ranges, which is done during the [-R --run]
step. This is the default behaviour when spw
is not manually set. By default, nspw
is set to 16 during the [-B --build]
step, and then updated to 12 during the [-R --run]
step, since 4 SPWs are completely encompassed by the default flagging mask given by the config parameter badfreqranges
, which has a default value of ['933~960MHz', '1163~1299MHz','1524~1630MHz']
. So after the [-B --build]
step, the default spw
is *:860~1680MHz
, and during the [-R --run]
step, the pipeline will remove *:1167.5~1218.75MHz, *:1218.75~1270.0MHz, *:1526.25~1577.5MHz
and *:1577.5~1628.75MHz
, so that the final default spw
is
*:860.0~911.25MHz,*:911.25~962.5MHz,*:962.5~1013.75MHz,*:1013.75~1065.0MHz,*:1065.0~1116.25MHz,*:1116.25~1167.5MHz,*:1270.0~1321.25MHz,*:1321.25~1372.5MHz,*:1372.5~1423.75MHz,*:1423.75~1475.0MHz,*:1475.0~1526.25MHz,*:1628.75~1680.0MHz
Any frequency unit can be used for each comma-separated spw
, such as GHz, kHz, or no unit, which selects channel indices. However, the removal of SPWs that are encompassed by the RFI mask will only be removed when using MHz
.
After updating the config’s SPWs, the [-R --run]
step creates directories for each of the SPWs, named as 880~933MHz, 960~1010MHz
, etc. It then copies your config file into each of them but overwrites spw
to be the single SPW that will be processed, and nspw=1
. Furthermore, if you have not requested all 32 tasks per node (i.e. each SPW will be processed by an entire node) with the config parameter ntasks_per_node=32
, it will overwrite the config parameter mem
with the integet part of the previous value divided by nspw/2
. Lastly, it will set precal_script
and postcal_scripts
to empty lists (see below).
Each SPW directory created during the [-R --run]
step will be processed independently and concurrently, after the initial partition job that runs over the single input MS (see below). This is achieved by running an instance of the pipeline within each SPW directory, with nspw=1
, following the same calibration recipe within each separate SPW as within version 1.0.
Pre-cal and post-cal scripts
When nspw
> 1, the config parameter scripts
refers to the separate scripts that will be run as single jobs within each SPW directory. Any scripts that should be run within the top-level working directory (i.e. above the SPW directories) are stored in precal_scripts
, and postcal_scripts
, respectively run before and after running the scripts within each SPW directory. By default, precal_scripts = [('calc_refant.py',False,''), ('partition.py',True,'')]
, and postcal_scripts = [('concat.py',False,''), ('plotcal_spw.py', False, ''), ('selfcal_part1.py',True,''), ('selfcal_part2.py',False,''), ('science_image.py', True, '')]
, although calc_refant.py
will be stripped if calcrefant=False
, and the selfcal and imaging scripts will be stripped if you do not select these routines during the [-B --build]
step with [-2 --do2GC]
and [-I --science_image]
, respectively. The partition.py
and concat.py
steps are discussed below.
The steps following concat
are run over the concatenated dataset that spans all of the SPWs. Alternatively, these scripts (e.g. for self-carlibation and science imaging) can be appended to the end of scripts
, which will then be run on the targets split from each of the SPWs. Similar to the behaviour during the cross-calibration steps, self-calibrating separately over SPWs will partially solve for frequency dependence of your gain, albeit with a loss of signal-to-noise (compared to the full-band self-calibration run after concatenating). After doing this, concat can be run, and then one final science imaging step over the whole (concatenated) band. Alternatively, you could perform a custom combination of your SPW images, such as a weighted average image, assuming you’ve matched the resolution with a common restoringbeam
and/or uvtaper
.
During the [-R --run]
step, if nspw=1
and precal_scripts
or postcal_scripts
are not empty, a warning will be output, and these lists will be respectively prepended and appended to scripts
, and then set to empty lists.
Partition
The initial partition, given by the partiton.py
script (by default the last script in precal_scripts
), which reads your input MS and partitions out your selected spw
into an MMS, is run as a SLURM array job when nspw
> 1. This step allows multiple concurrently-running partitions over several SPWs, but limited to < 200 cores, meaning that sometimes not every SPW is partitioned concurrently. After the first SPW is partitioned, the first SPW launches an instance of the pipeline, and so on until the last SPW is processing. This behaviour is a special case that only works when partition.py
is the last script within precal_scripts
, otherwise the processing of every SPW waits until the last script within precal_scripts
has been run.
Bash jobScripts
When using nspw
> 1, the behaviour of the bash scripts within the jobScripts
directory (symlinked from the working directory) is different. Each bash script will iterate through the SPW directories and display the output for that SPW, running each SPW directory’s instance of that bash script, which remains the same as above. Since there are more than 100 jobs by default, ./summary.sh
will display only the running or failed jobs, and will not display completed or pending jobs, whereas ./fullSummary.sh
will display all of the jobs. Additionally, when precal_scripts
and postcal_scripts
are not empty lists, there will be a version of each of these jobScripts
starting with allSPW_
. These correspond to the pipeline jobs that are run at top-level directory over all SPWs, which is also displayed when calling the other jobScripts
. For example, when running ./summary.sh -X
(where -X
outputs ones line per job) during the middle of your processing with nspw=2
, you will see something similar to the following:
jcollier@slurm-login:/scratch/users/jcollier/MIGHTEE/nspw2$ ./summary.sh -X
SPW #1: /scratch/users/jcollier/MIGHTEE/nspw2/1350~1375MHz
JobID JobName Partition Elapsed NNodes NTasks NCPUS MaxDiskRead MaxDiskWrite NodeList TotalCPU CPUTime MaxRSS State ExitCode
--------------- --------------- ---------- ---------- ------ ------ ----- ------------ ------------ -------------------- ---------- ---------- ---------- ---------- --------
1838779 setjy Main 00:02:31 1 4 compute-039 00:00:00 00:10:04 RUNNING 0:0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SPW #2: /scratch/users/jcollier/MIGHTEE/nspw2/1375~1400MHz
JobID JobName Partition Elapsed NNodes NTasks NCPUS MaxDiskRead MaxDiskWrite NodeList TotalCPU CPUTime MaxRSS State ExitCode
--------------- --------------- ---------- ---------- ------ ------ ----- ------------ ------------ -------------------- ---------- ---------- ---------- ---------- --------
1838788 flag_round_1 Main 00:06:22 1 4 compute-059 00:00:00 00:25:28 RUNNING 0:0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
All SPWs: /scratch/users/jcollier/MIGHTEE/nspw2
JobID JobName Partition Elapsed NNodes NTasks NCPUS MaxDiskRead MaxDiskWrite NodeList TotalCPU CPUTime MaxRSS State ExitCode
--------------- --------------- ---------- ---------- ------ ------ ----- ------------ ------------ -------------------- ---------- ---------- ---------- ---------- --------
1838776_0 partition Main 00:03:49 1 8 compute-058 00:00:00 00:30:32 COMPLETED 0:0
1838776_1 partition Main 00:03:19 1 8 compute-018 00:00:00 00:26:32 COMPLETED 0:0
1838797 concat Main 00:02:13 1 1 compute-020 00:00:00 00:00:00 PENDING 0:0
1838798 plotcal_spw Main 00:00:19 1 1 compute-020 00:00:00 00:00:00 PENDING 0:0
Concatenation and further imaging
After all of the SPWs have completed, irrespective of their completion status, the scripts within postcal_scripts
are run at the top level directory (i.e. above the SPW directories). By default, the first of these is concat.py
, which concatenates any output MMSs/MSs and quick-look images. By default these are the split calibrator and target fields, and their quick-look images.
If the config parameter keepmms=True
, virtualconcat
will be run for each field that has been split into its own MMS, resulting in an MMS with nspw
x nscans
sub-MSs. If keepmms=False
, concat
is run over each field that has been split into its own MS, resulting in a single concatenated MS. In both cases, the resulting MMS/MS will now contain multiple SPWs, given by nspw
(assuming that all SPWs successfully completed processing). These can then be imaged over the whole concatenated frequency range via further image processing, such as self-calibration and science imaging.
Furthermore, each split field’s quick-look image is concatenated together into a quick-look continuum cube. If any SPWs failed to split out a field or create an image for that field, they will be excluded from the (image and MMS/MS) concatenation.