4. Candidate frame lists

4.1. Introduction

For some experiments, it common that only a small fraction of the data from XFEL experiments is scientifically interesting (with the beam hitting the sample for example). To use our data storage facilities efficiently, we would like to filter the calibrated data and only store the interesting parts. (The full calibrated data set will be available for a short time only.) The raw data will be stored in full for now, but in the future it may be necessary to filter this data as well.

We therefore work towards running analysis tools during experiments, to decide which frames are potentially interesting. (A frame is the unit of data coming from one detector exposure, across all detector modules. This corresponds to one X-ray pulse.) These tools should err on the side of including borderline frames, which may later be rejected by more specific algorithms.

4.2. Invitation to share list of interesting frames

We invite users to provide their candidate frame lists (from their own hit finding and/or indexing efforts) for their experiments / runs to support this development and support effective use of storage space:

Given such a list of frames from users, it may be possible to store calibrated files (containing only these interesting frames) for a longer time than the calibrated files for all frames can be kept.

This will save disk space for the facility, and help the users by having calibrated data for longer available for further processing.

A method of uploading/submitting these files is in preparation. For now, please email your instrument contact with the data.

4.3. The file format

This document describes a simple file format for such tools to declare which frames should be preserved and processed for later analysis. Specifically, it describes version 1.0 of the format.

The scope of this file is assumed to be one run and one detector.

4.4. Overall file format

Candidate frame list files are UTF-8 encoded text, with lines terminated by a single line-feed character (Unix style line endings).

Each file has a header section and a data section, separated by an empty line. No empty lines are allowed within the header or data sections. The file ends with a single newline.

4.5. Header section

The first line of the file is exactly xfel.eu candidate-frame-list v1.0. The version number within this will change if the format changes: incrementing the first part indicates incompatible changes, and incrementing the second part indicates compatible changes, such as defining a new header.

Code reading these files should inspect the version number in this line, and fail if it cannot process that file version.

After the first line, the header may include comments, beginning each comment line with a # character. Comments are ignored by code reading the file, but their use is encouraged to document how the candidate frame list was generated: e.g. which tool and which version produced the file, along with any relevant input parameters.

No other headers are currently defined.

As described above, a blank line terminates the header section.

4.6. Data section

The data is in CSV format with no header row. There are two columns: train ID and pulse ID. The rows are not required to be in sorted order because trains may be dispatched to parallel workers which finish processing out of order.

4.6.1. Pulse ID

There are three ways to identify a frame within a train. In some situations these appear to be the same, but they are different, and it’s vital that all tools use the same one: pulse ID.

  • Pulse ID: consistently identifies a pulse in chronological order. This is image.pulseId in the data coming from Karabo.
  • Cell ID: identifies where a value was stored in the detector hardware before it was read out. These are consistent, but not necessarily in chronological order, because the detector may go back to the start and replace some values.
  • Index in the data array: easy to compute, but not consistent, because any filtering changes the positions of data. Different tools also disagree on whether indexes start at 0 or 1.

4.7. Example

A complete file may look like this:

xfel.eu candidate-frame-list v1.0
# Generated by acme-hitfinder 0.1
# Threshold=7
# SPB proposal 900099, run 22


Of course, real files will probably be much longer than this.