4. Candidate frame lists¶
For some experiments, it common that only a small fraction of the data from XFEL experiments is scientifically interesting (with the beam hitting the sample for example). To use our data storage facilities efficiently, we would like to filter the calibrated data and only store the interesting parts. (The full calibrated data set will be available for a short time only.) The raw data will be stored in full for now, but in the future it may be necessary to filter this data as well.
We therefore work towards running analysis tools during experiments, to decide which frames are potentially interesting. (A frame is the unit of data coming from one detector exposure, across all detector modules. This corresponds to one X-ray pulse.) These tools should err on the side of including borderline frames, which may later be rejected by more specific algorithms.
4.3. The file format¶
This document describes a simple file format for such tools to declare which frames should be preserved and processed for later analysis. Specifically, it describes version 1.0 of the format.
The scope of this file is assumed to be one run and one detector.
4.4. Overall file format¶
Candidate frame list files are UTF-8 encoded text, with lines terminated by a single line-feed character (Unix style line endings).
Each file has a header section and a data section, separated by an empty line. No empty lines are allowed within the header or data sections. The file ends with a single newline.
4.5. Header section¶
The first line of the file is exactly
xfel.eu candidate-frame-list v1.0.
The version number within this will change if the format changes:
incrementing the first part indicates incompatible changes, and
incrementing the second part indicates compatible changes, such as
defining a new header.
Code reading these files should inspect the version number in this line, and fail if it cannot process that file version.
After the first line, the header may include comments, beginning each
comment line with a
Comments are ignored by code reading the file, but their use is encouraged
to document how the candidate frame list was generated:
e.g. which tool and which version produced the file,
along with any relevant input parameters.
No other headers are currently defined.
As described above, a blank line terminates the header section.
4.6. Data section¶
The data is in CSV format with no header row. There are two columns: train ID and pulse ID. The rows are not required to be in sorted order because trains may be dispatched to parallel workers which finish processing out of order.
4.6.1. Pulse ID¶
There are three ways to identify a frame within a train. In some situations these appear to be the same, but they are different, and it’s vital that all tools use the same one: pulse ID.
- Pulse ID: consistently identifies a pulse in chronological order.
image.pulseIdin the data coming from Karabo.
- Cell ID: identifies where a value was stored in the detector hardware before it was read out. These are consistent, but not necessarily in chronological order, because the detector may go back to the start and replace some values.
- Index in the data array: easy to compute, but not consistent, because any filtering changes the positions of data. Different tools also disagree on whether indexes start at 0 or 1.
A complete file may look like this:
xfel.eu candidate-frame-list v1.0 # Generated by acme-hitfinder 0.1 # Threshold=7 # SPB proposal 900099, run 22 987654321,0 987654321,24 987654322,3 987654322,63
Of course, real files will probably be much longer than this.