5. Compute environment¶
5.1. User account¶
User accounts are created during the initial registration step in the UPEX portal. At this point the account can only be used for the UPEX itself. If the user account is associated to the accepted and scheduled proposal, then the account is upgraded 4 weeks before the first scheduled beamtime of the given user. For the first early user period, the time between the upgrade of the accounts and the start of the experiment can be shorter. The upgraded account allows the user to access additional services such as the online safety training, the metadata catalog , and the computing infrastructure of the European XFEL.
By default upgraded user accounts are kept in this state for 1 year after the user’s last beamtime. An extension can be requested by the PI.
On-site guest WLAN (WiFi) is provided for all users. For those users with eduroam accounts provided by their home institute, access is straightforward. For those without eduroam account, a special registration procedure must be conducted to obtain guest access for the limited time period. After connecting to the XFEL-Guest network (also when using a network patch cable) and opening a web browser, the user will be able to register for the usage if guest network. The registration is valid for 10 days and 5 devices.
5.1.1. Tools¶
At different stages of the proposal, users are granted access to different services:
Stage | Access provided | Comments |
---|---|---|
Proposal submission | Access to User portal (UPEX) | |
Approval of proposal
and scheduling
|
Lightweight account | ca. 2 months before beam-time start |
Preparation phase | Access to Metadata catalog and beamtime store
filesystem. LDAP account upgraded for members of
all accepted proposals.
|
First-time users: once A-form is submitted
and accepted. Deadline for A-form submission
is normally 4 weeks before beam-time start.
|
Beam time | Access to catalogs and dedicated online and
offline services
|
|
Data analysis | Access to catalogs and shared offline computing
resources, initially limited to 1 year time
period after beamtime.
|
Importantly, first-time users should aim for a timely A-form submission. This ensures that they will have a time window of several weeks prior to the start of their beam-time when access to the Maxwell computing resources and the associated storage system (GPFS) is granted. An additional benefit of such access is that working with example data becomes possible, in order to get accustomed to the peculiarities of EuXFEL data and workflows.
5.2. Online cluster¶
During beam time, exclusive access to a dedicated online cluster (ONC) is available only to the experiment team members and instrument support staff.
European XFEL aims to keep the software provided on the ONC identical to that available on the offline cluster (which is the Maxwell cluster).
5.2.1. Online cluster nodes in SASE 1¶
Beamtime in SASE 1 is shared between the FXE and the SPB/SFX instruments, with alternating shifts: when the FXE shift stops, the SPB/SFX shift starts, and vice versa.
Within SASE1, there is one node reserved for the SPB/SFX experiments
(sa1-onc-spb
), and one node is reserved for the FXE experiments
(sa1-onc-fxe
). These can be used by the groups at any time during
experiment period (i.e. during shifts and between shifts).
Both the SPB/SFX and the FXE users have shared access to another 7
nodes. The default expectation is that those nodes are using during
the shift of the users, and usage stops at the end of the shift (so
that the other experiment can start using the machines during their
shift). These are sa1-onc-01, sa1-onc-02
, sa1-onc-03
,
sa1-onc-04, sa1-onc-05
, sa1-onc-06
, sa1-ong-01
.
Overview of available nodes and usage policy:
name | purpose |
sa1-onc-spb |
reserved for SPB/SFX |
sa1-onc-fxe |
reserved for FXE |
sa1-onc-01 to sa1-onc-06 |
shared between FXE, SPB use only during shifts |
sa1-ong-01 |
shared between FXE, SPB
GPU: Tesla V100 (16GB)
|
These nodes do not have access to the Internet.
The name sa1-onc-
of the nodes stands for SAse1-ONlineCluster.
5.2.2. Online cluster nodes in SASE 2¶
Beamtime in SASE 2 is shared between the MID and the HED instruments, with alternating shifts: when the MID shift stops, the HED shift starts, and vice versa.
Within SASE2, there is one node reserved for the MID experiments
(sa2-onc-mid
), and one node is reserved for the HED experiments
(sa2-onc-hed
). These can be used by the groups at any time during
experiment period (i.e. during shifts and between shifts).
Both the MID and the HED users have shared access to another 7
nodes. The default expectation is that those nodes are using during
the shift of the users, and usage stops at the end of the shift (so
that the other experiment can start using the machines during their
shift). These are sa2-onc-01, sa2-onc-02
, sa2-onc-03
,
sa2-onc-04, sa2-onc-05
, sa2-onc-06
, sa2-ong-01
.
Overview of available nodes and usage policy:
name | purpose |
sa2-onc-mid |
reserved for MID |
sa2-onc-hed |
reserved for HED |
sa2-onc-01 to sa2-onc-06 |
shared between MID, HED use only during shifts |
sa2-ong-01 |
shared between HED, MID
GPU: Tesla V100 (16GB)
|
These nodes do not have access to the Internet.
The name sa2-onc-
of the nodes stands for SAse2-ONlineCluster.
5.2.3. Online cluster nodes in SASE 3¶
Beamtime in SASE 3 is shared between the SQS and the SCS instruments, with alternating shifts, when the SQS shift stops, the SCS shift starts, and vice versa.
Within SASE3, there is one node reserved for the SCS experiments
(sa3-onc-scs
), and one node is reserved for the SQS experiments
(sa3-onc-sqs
). These can be used by the groups at any time during
experiment period (i.e. during and between shifts).
Both SASE3 instrument users have shared access to another 7 nodes.
The default expectation is that those nodes are used during users shift, and
usage stops at the end of the shift (so that the other experiment can
start using the machines during their shift). These are sa3-onc-01
,
sa3-onc-02
, sa3-onc-03
, sa3-onc-04
, sa3-onc-05
, sa3-onc-06
,
sa3-ong-01
.
Overview of available nodes and usage policy:
name | purpose |
sa3-onc-scs |
reserved for SCS |
sa3-onc-sqs |
reserved for SQS |
sa3-onc-01 to sa3-onc-06 |
shared between SCS, SQS use only during shifts |
sa3-ong-01 |
shared between SCS, SQS
GPU: Tesla V100 (16GB)
|
These nodes do not have access to the Internet.
The name sa3-onc-
of the nodes stands for SAse3-ONlineCluster.
Note that the usage policy on shared nodes is not strictly enforced. Scientists across instruments should liaise for agreement on usage other than specified here.
5.2.4. Access to online cluster¶
The ONC can only be accessed from workstation (Linux Ubuntu 16.04) in the control hutch or from dedicated access workstations located at the XFEL headquarter building on levels 1 and 2 (marked with an X in the map below).
Location of the ONC workstations

Workstations at Level 1

Workstation at Level 2
From these access computers, one can ssh directly into the online cluster nodes and also to the Maxwell cluster (see Offline cluster). The X display is forwarded automatically in both cases.
There is no direct Internet access from the online cluster possible.
5.2.5. Storage¶
The following storage resources are available on the Online user cluster:
- raw: data stored by DAQ (data cache) - not accessible (access planned via reader service in the long run)
- usr: beamtime store. Into this folder users can upload some files, data or scripts to be used during the beamtime. This folder is mounted and thus immediately synchronised with a corresponding folder in the offline cluster. There is not a lot of space here (5TB).
- proc: can contain data processed by dedicated pipelines (e.g. calibrated data). Not used at the moment (May 2019).
- scratch: folder where users can write temporary data, i.e. the output of customized calibration pipelines etc. This folder is intended for large amounts of processed data. If the processed data is small in volume, it is recommended to use usr.
Access to data storage is possible via the same path as on Maxwell cluster:
/gpfs/exfel/exp/<instrument>/<instrument_cycle>/p<proposal_id>/(raw|usr|proc|scratch)
Folder | Permission | Quota | Retention |
---|---|---|---|
raw | None | No | Data migration to offline storage and removed |
usr | Read/write | 5TB | immediately synced with Maxwell cluster |
proc | Read | NO | Data removed after migration |
scratch | Read/write | NO | Data removed when needed |
To simplify access to files, symbolic links are in place that create a file structure as is visible on the online cluster.
5.2.6. Access to data on the online cluster¶
Currently, no access to data files is possible from the online
cluster: the raw
directory is not readable and the proc
directory not populated with files.
Online analysis tools running on the online cluster thus have to be fed the currently recorded data through the Karabo Bridge.
File-based post-processing thus has to take place from the offline (=Maxwell) cluster after the files have been transferred at the end of a run. There is a delay of several minutes for this (depending on run length and overall business of the data transfer system).
5.2.7. Home directory warning¶
The home directory /home/<username>
for each user (with username
<username>
on the online cluster is not shared with the home
directory /home/<username>
on the offline(=Maxwell) cluster. The
home directory /home/<username>
within the online cluster is
shared across all nodes of the online cluster. The
home directory /home/<username>
within the offline cluster is
shared across all nodes of the offline cluster.
To share files between the online and the offline cluster, the
/gpfs/exfel/exp/<instrument>/<instrument_cycle>/p<proposal_id>/usr
directory should be used: the files stored here show up in
both the online and offline cluster, and are accessible to the whole
group of users of this proposal.
5.3. Offline cluster¶
The Maxwell cluster at DESY is available for data processing and analysis during and after the experiment. Users are welcome and encouraged to make themselves familiar with the Maxwell cluster and its environment well in advance of the beam time.
In the context of European XFEL experiments, the Maxwell cluster is also referred to as the “offline” cluster. Despite this name, you can connect to the internet from Maxwell. It is offline in that it can’t stream data directly from the experiments, unlike the “online cluster”.
5.3.1. Getting access¶
When a proposal is accepted, the main proposer will be asked to fill out the “A-form” which, among information on the final selection of samples to be brought to the experiment, also contains a list of all experiment’s participants. At time of submission of the A-form, all the participants have to have an active account in UPEX. This is the prerequesite for getting access to the facility’s computing and data resources. After submission of the A-form, additional participants can be granted access to the experiment’s data by PI request.
Users have access to:
- HPC cluster
- beamtime store, data repository and scratch space
- web based tools
5.3.2. Graphical login¶
To use Maxwell with a remote desktop, you can either:
- Go to https://max-display.desy.de:3443 in a web browser
- Or install FastX and connect
to
max-display.desy.de
See also
5.3.4. SSH access¶
ssh username@max-display.desy.de
Replace username
with your EuXFEL username.
Unlike most of the cluster, max-display
is directly accessible from outside
the DESY/EuXFEL network.
5.3.5. Running jobs¶
When you log in, you are on a ‘login node’, shared with lots of other people. You can try things out and run small computations here, but it’s bad practice to run anything for a long time or use many CPUs on a login node.
To run a bigger job, you should submit it to SLURM, our queueing system. If you can define your job in a script, you can submit it like this:
sbatch -p upex -t 8:00:00 myscript.sh
-p
specifies the ‘partition’ to use. External users should useupex
, while EuXFEL staff useexfel
.-t
specifies a time limit:8:00:00
means 8 hours. If your job doesn’t finish in this time, it will be killed. The default is 1 hour, and the maximum is 2 weeks.Your script should start with a ‘shebang’, a line like
#!/usr/bin/bash
pointing to the interpreter it should run in, e.g.:#!/usr/bin/bash echo "Job started at $(date) on $(hostname)" # To use the 'module' command, source this script first: source /usr/share/Modules/init/bash module load exfel exfel_anaconda3 python -c "print(9 * 6)"
To see your running and pending jobs, run:
squeue -u $USER
Once a job starts, a file like slurm-4192693.out
will be created - the
number is the job ID. This contains the text output of the script, which you
would see if you ran it in a terminal. The programs you run will probably also
write data files.
SLURM is a powerful tool, and this is a deliberately brief introduction. If you are submitting a lot of jobs, it’s worth spending some time exploring what it can do.
5.3.5.1. During beamtime¶
During your beamtime, a few nodes are reserved so that your group can run some jobs promptly even if there’s a backlog. To use your reservation, add an extra option when submitting your jobs:
sbatch --reservation=upex_002416 ...
Replace the number with your proposal number, padded to 6 digits.
You can check the details of your reservation like this:
scontrol show res upex_002416
The output of this command tells you the period when the reservation is valid, the reserved nodes, and which usernames are allowed to submit jobs for it:
[@max-exfl001]~/reservation% scontrol show res upex_002416
ReservationName=upex_002416 StartTime=2019-03-07T23:05:00 EndTime=2019-03-11T14:00:00 Duration=3-14:55:00
Nodes=max-exfl[034-035,057,166] NodeCnt=4 CoreCnt=156 Features=(null) PartitionName=upex Flags=IGNORE_JOBS
TRES=cpu=312
Users=bob,fred,sally Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
5.3.6. Software available¶
The EuXFEL data analysis group provides a number of relevant tools, described in Data analysis software. In particular, a Python environment with relevant modules can be loaded by running:
module load exfel exfel_anaconda3
5.3.7. Storage¶
Users will be given a single experiment folder per beam time (not per user) through which all data will be accessible, e.g:
/gpfs/exfel/exp/<instrument>/<instrument_cycle>/p<proposal_id>/(raw|usr|proc|scratch)
Storage | Quota | Permission | Lifetime | comments |
---|---|---|---|---|
raw | None | Read | 2 months | Fast accessible raw data |
usr | 5TB | Read/Write | 24 months | user data, results |
proc | None | Read | 6 months | processed data e.g. calibrated |
scratch | None | Read/Write | 6 months | Temporary data (lifetime not guaranteed) |
5.3.8. Synchronisation¶
The data in the raw
directories are moved from the
online cluster (at the experiment) to the offline (Maxwell) cluster as
follows:
when the run stops (user presses button), the data is flagged that it can be copied to the Maxwell cluster, and is queued to a copy service (provided by DESY). The data will be copied without the user noticing.
Once the data is copied, the data is ‘switched’ and becomes available on the offline cluster.
The precise time at which this switch happens after the user presses the button cannot be predicted: if the data is copied already (in the background), it could be instantaneous, otherwise the copy process needs to finish first.
- The actual copying process (before the switch) could take anything between minutes to hours, and will depend on (i) the size of the data and (ii) how busy the (DESY) copying queue is.
- The
usr
folder is mounted from the Maxwell cluster, and thus always identical between the online and offline system. However, it is not optimised for dealing with large files and thus potentially slow for lager files. There is a quota of 5TB.
5.4. Running containers¶
Singularity is available on both the online and offline cluster. It can be used to run containers built with Singularity or Docker.
Running containers with Docker is experimental, and there are some complications with filesystem permissions. We recommend using Singularity to run your containers, but if you need Docker, it is available.
- On the online cluster, Docker needs to be enabled for your account. Please email it-support@xfel.eu to request it.
- On the offline cluster, Docker only works on nodes allocated for SLURM jobs (see Running jobs), not on login nodes.