1. General

1.1 Where can I find European XFEL policies?

The European XFEL policies can be found under: Policies

1.2 What is the main difference between the current and new data policies?

The primary difference lies in the emphasis on compliance with FAIR principles of Findability, Accessibility, Interoperability, and Reusability (FAIR) in the new data policy (starting in 2025). For example:

  • Current Policy: Long-term storage is allowed only for RAW data and non-reproducible PROC data.
  • New Policy: Includes concepts to enhance FAIRness: Data management plans to help simplify analyses and ensure published data is FAIR, auxiliary data, data reduction strategies including support of dedicated long-term storage of curated reduced data volumes. Promoting the storage of RAW data on tape and selective publication of high-quality datasets.

1.3 What is a data management plan?

The term data management plan (DMP) is defined as the strategies and measures for collecting and handling scientific data and metadata throughout their lifecycle

1.4 Where can I find European XFEL Data Analysis User Documentation?

The European XFEL Data Analysis User Documentation can be found under: EuXFEL Data Analysis User Documentation

1.5 How can I access the data in a proposal?

The data stored under an example proposal can be accessed via:

  • Maxwell cluster, for further references, see: Offline data analysis
  • Globus service, the endpoint: EuXFEL Data Store
  • SFTP, sftp.xfel.eu (use interactive login)

1.6 How can I find out where my data is stored?

The data location is shown in the Repository tab on the proposal page in myMdC.
A green box indicates that the RAW data from a given run is present in the repository specified in the first row.

1.7 What is the current retention policy for RAW data?

The Data Retention Policy (effective since 2017) specifies:

  • Fast storage (GPFS): RAW data is kept for two months without guaranteed retention.
  • Commodity disks (dCache): RAW data is kept for six months.
  • Tape storage (archive): RAW data is preserved for 5 years (striving for 10 years).
    More detailed information about data retention can be found: Data retention policy

1.8 Why has data been kept on disk longer than the retention policy specifies?

We have retained RAW data on disk for more than five years to support complex analyses and researchers facing challenges in analyzing and publishing data.
However, due to storage capacity limitations, we can no longer continue this practice.

1.9 What is the difference between USR and SCRATCH spaces?

  • USR Space: For important data that cannot be easily reproduced and must not be lost. It is backed up but limited in size.
  • SCRATCH Space: For temporary storage and short-term data analysis. It is not persistent, and data in SCRATCH is not intended for long-term retention.
    Data in the SCRATCH folder* can be removed at any time without notification* if space is needed. Based on the fair share policy, the* oldest data* from proposals consuming the most space will have a higher chance of removal first. There is no backup.

1.10 What steps should I take to avoid accidental data loss in SCRATCH space?

Critical data should be:

  • Copied to USR space;
  • Transferred to external storage (e.g home institute storage) via Globus or SFTP service

1.11 What should I know about RED data storage (RED box)?

RED data storage is dedicated to user data analysis during the embargo period. Key information includes:

  • Size limit: max(10% of RAW data size; min(50TB; RAW data size)
  • Timeline: Data must be finalized within 6 months after beamtime, then it becomes read-only
  • Data selection to the RED box: It is a selection of reduced raw data and/or facility-processed data and/or user-processed data in a format supported by the facility
  • Storage duration: Kept on high-performance storage during the embargo period, then moved to OPEN data storage under the Scientific Data Policy rules
  • Data access: On request of the PI, data (fully or partially) can be made open before the default embargo period (e.g., for journal publication)
  • Data immutability: Once opened, the dataset becomes immutable

1.12 What is the purpose of the data-archiving notification email?

For proposals that are close to passing their embargo period, we send data-archiving notification emails.
These emails are sent six weeks before the expected date of data archiving, as published in the General tab of a proposal in myMdC.
Key points about the archival process:

  • We provide an option to reduce the data during the archival process.
    The reduced data can remain available on disk as open data.
  • Copies of data in the RAW format and non-reproducible PROC data
    will remain in the tape archive, but the disk copies will be removed.
  • The SCRATCH space assigned to this proposal will be discarded.
  • Files in the USR folder will remain in place but will become read-only.

1.13 How do we determine whether data can be retained on disk in case of data archiving notification email?

We evaluate each case individually, and the decision depends on why you want to retain the data and the type of data you wish to preserve.

Please contact: the data management team at email.

1.14 I received a data archiving notification email for my proposal, but my data are small. What will happen to them?

The size of the data is not the only factor considered when deciding whether to keep RAW data on disk.
We prioritize retaining only high-quality RAW data.
For instance, if you have 5TB of RAW data but 1TB are not good quality, or if certain data are determined to be uninteresting during analysis, those data should be removed.
Another factor is the ongoing interest in the data. Data that are not being actively analyzed have no reason to remain online – regardless of the data size.
SCRATCH data for the proposal will be permanently removed, USR directories will be set to read-only, and PROC data will be archived on tape if they are not reproducible.

1.15 The proposal data has been archived, but I still see the files in the folders. Why?

The visibility of files in the namespace does not mean that the actual data is accessible.
We retain metadata in the namespace to provide an overview of the folder content. If the data is not accessible, a Permission Error will be displayed when attempting to access the files. To check where your data is located, please go to the Repository tab of your proposal in myMdC.

1.16 How do I determine the correct instrument cycle code for my proposal?

The instrument cycle code follows the format YYYYTX, where YYYY is the current year, T represents the proposal type, and X is the cycle number.
Use the following codes for T:

0 or 1 – User proposals (UPEX)
2 – In-house research (UPEX)
3 – Instrument and tunnel commissioning
4 – Test stands
5 – Instrument examples
6 – Laboratories

For example, a user proposal submitted in 2025 for its first cycle would be 202501.

3. Example Proposals

3.1 What are example proposals?

Example proposals are sample datasets provided to users to help them understand the data structure, methods, and analysis of the scientific data collected at the facility. These proposals mimic real user proposals in metadata and low-level filesystem structure.

3.2 How to create an example proposal?

An instrument scientist should request the creation of an example proposal via
email to it-support@xfel.eu The ITDM representative is responsible for the following tasks to create an example proposal:

  • Copies data into the example proposal space.
  • Modifies metadata associated with the files.
  • Creates a metadata structure in myMdC, cloning details like run type, sample, and train information from the original proposal.

    This ensures that the example proposal mirrors the relevant structural and contextual elements of the original data, enabling users to explore it effectively.

3.3 Who can request example proposals?

Any instrument has the right to request one or more example proposals under the XMPL instrument. Please follow: How to create an example proposal?

3.4 Who is responsible for creating and managing the data in example proposals?

An example proposal is internally managed by the respective instruments and support groups.

  • Instrument scientists: Creates the data content either in the USR/SCRATCH space and requests raw data to be copied by the ITDM representative.
  • Data managers: Add analysis code, Python notebooks, and auxiliary data for example data analysis. They also curate and update the proposal data.

3.5 How is raw data for the example proposals obtained?

Raw data for the example proposals is copied from existing raw data collected at the facility. Data managers must provide details on which raw data should be copied by sending a list that includes the original proposal and run numbers via email to it-support@xfel.eu. Additional raw data can also be added to an already existing example proposal if needed

3.6 How should processed data for example proposals be handled?

Processed data are generated directly from raw data using myMdC.
This ensures calibration reports are available and the example is more realistic.

3.7 What additional resources should data managers of example proposals provide?

Data managers of an example proposal should include, among other analysis-related resources:

  • Example analysis code.
  • Python notebooks.
  • Auxiliary data needed for analysis, stored in the USR or SCRATCH folders.

3.8 Can users practice analysis with example proposal data?

Yes, the example proposals contain sample data and code that can be used to practice data analysis tasks.
However, the example proposals storage is not intended to serve as a scratch space for training analysis skills or working
with example data. Instead, users should utilize their currently scheduled proposal space or external storage for such activities.

3.9 Can example proposals be used for analysis code development?

No, USR and SCRATCH spaces within example proposals should not be used for:
• Testing code.
• Running analysis and saving its output there.

4. Commissioning Proposals

4.1 How often are the instrument cycles changed for commissioning proposals?

Instrument cycles are changed every six months.

4.2 Why should the PI validate the cloneable status of the commissioning proposal?

A review of the commissioning proposal cloneable status is needed to ensure the correctness of metadata information that will be cloned for the next instrument cycle.

4.3 What are the cloning options for commissioning proposals?

There are three cloning options:
Simple Clone: Clones the title, experiment team, PI (Principal Investigator), MP (Main Proposer), and Local Contact.
Advanced Clone: Clones all elements of the Simple Clone plus runtypes and sample types.
Not Clonable: The proposal will not be cloned for the next instrument cycle.

4.4 What happens if a member of the commissioning proposal’s experiment team is no longer affiliated with the European XFEL?

  • If the person still collaborates with the experiment team, they may need to retain access to the data.
  • If they no longer need access, their account should be removed from the new proposal team by the Principal Investigator (PI).

4.5 Why is it necessary to assess runs of a proposal?

The assessment helps to clean storage by removing not interesting and low quality data ensuring efficient resource management.

4.6 What will happen to runs of commissioning proposals that have not been assessed after the announced deadline?

The raw data from commissioning proposals not assessed by the announced deadline will be assessed as ”Not interesting” and permanently deleted from all storage, shortly after the deadline.