6. Data analysis software

6.1. Software to access and inspect data

6.1.1. Karabo Data

karabo_data is a Python library for accessing and working with data produced at European XFEL. It can:

  • Conveniently access data from an experimental run, which is often spread across dozens of ‘sequence’ files.
  • Read data into pandas and xarray, two popular Python libraries which support powerful, efficient data analysis.
  • Assemble image data for multi module detectors like LPD and AGIPD, using geometry files in different formats.
  • Stream data from files over a ZeroMQ socket. This stream of data can then be accessed using Karabo Bridge Clients clients, to test live-processing tools with data from real experiments.

6.1.2. Karabo Bridge Clients

We provide client libraries in Python and C++ to receive data from the karabo bridge, allowing users to integrate their tools with the karabo framework and receive live data during an experiment run.

6.1.3. HDF5 command line tools

We hope most users will be able to access data through existing tools such as Karabo Data, or by converting it to standard formats such as CXI. But if none of these options work for you, you may need to look directly at the HDF5 files.

Basic HDF5 command line tools are available by default:

h52gif         h5copy         h5fc-64        h5perf_serial  h5unjam
h5c++          h5debug        h5import       h5redeploy
h5c++-64       h5diff         h5jam          h5repack
h5cc           h5dump         h5ls           h5repart
h5cc-64        h5fc           h5mkgrp        h5stat

You can inspect HDF5 files in the terminal with the h5glance tool, available in the exfel_anaconda3 module (see Access on online and Maxwell cluster).

The hdfview interactive viewer is available in the xray module (see https://confluence.desy.de/display/IS/hdfview). To use it, first run module load xray.

6.2. Software with scientific purposes

6.2.1. karaboFAI

karaboFAI is a tool that provides on-line (real-time, as fast as the calibration pipeline) and off-line data analysis and visualization for experiments at European XFEL that require azimuthal integration of diffraction data acquired with 2D detectors. It works with AGIPD, LPD, JUNGFRAU and FastCCD detectors.

_images/karaboFAI-LPD_azimuthal_integration.png
_images/karaboFAI-ImageTool.png
_images/karaboFAI-ROI.png

6.2.2. Karabo Data Interactive

Adds a user interface to view detector images inside a Jupyter Notebook, these plots are interactive and can be zoomed/panned, additionally users can pick which trains and pulses to plot, as well as which type of image to plot from the data file (e.g. data, mask, or gain), as seen here:

_images/karabo_data_interactive_example.png

Fig. 6.1 Example image showing the karabo data interactive widget and plot layout

It is recommended to use this application from pre-installed path on the on- and offline cluster (See Access on online and Maxwell cluster)

For more information about karabo_data_interactive check the source code.

6.2.3. GeoAssembler

This tool provides a tool to calibrate AGIPD detector geometry.The tool can be seen as an alternative to the calibration mode of CrysFEL’s hdfsee. The calibration can either be based on a starting geometry that needs to be refined or a completely new geometry. In the latter case the initial conditions for the geometry are defined so that all modules are 29px apart from each other and 4px gap between asics within a module.

The geometry calibration is supported by two modes of graphical user interfaces. A Qt-based and a jupyter notebook based interface.

Using the Qt-Gui

It is recommended to use this Gui application through the pre-installed path on the on- and offline cluster (See Access on online and Maxwell cluster). The command is:

geoAssembler

The following optional arguments can be set via the command line:

-h, --help

Show help about these options

-nb, --notebook

Do not start gui, create a notebook

-nb_dir

Set default directory to save notebooks

-nb_file

Set file name of the notebook

-r <run_dir>, --run <run_dir>

The path to a run folder

-g <geomfile>, --geometry <geomfile>

Path to a CrystFEL format geometry file

-c <clen>, --clen <clen>

Detector distance [m]

-e <energy>, --energy <energy>

Photon energy [eV]

-l <min> <max>, --level <min> <max>

Display range for plotting

_images/geoAssembler.png

If no run directory has been preselected using the -r/--run option, a directory can to be set by clicking the Run-dir button. Train IDs can be selected after a run has been selected. The user can either choose to display images by pulses or if the signal is to week/noisy by applying a Maximum or Mean across the entire train to all images. To do so the user can just select the Max or Mean button instead of the default Sel #. After an image number / function has been selected the image can be assembled using the Assemble button. Optionally a pre-defined geometry file can be loaded using the Load button.

After the image is displayed quadrants can be selected by clicking on them. They can be moved by using the Ctrl+arrow-up/down/left/right key combination. Circles that can help to align quadrants are added by the Draw Helper Objects button. The radii of the circles an be adjusted using the radius spin box in the top left.

Once the quadrants have been positioned a geometry file can be saved by using the Save button.

Calibration Using Jupyter The -nb, --notebook flag creates a Jupyter notebook in the home directory. This notebook is self explanatory.

Dependencies If the user doesn’t want or cannot use the xfel module and wants to install the tool the following python packages should be available:

  • numpy
  • cfelpyutils
  • pyqtgraph
  • matplotlib
  • ipywidgets
  • pyqt5
  • pyFAI

6.2.4. XasTim

A toolchain for real-time data analysis and visualization of XAS (X-ray Absorption Spectroscopy) experiments using the TIM (Transmission Intensity Monitor) device.

See XasTim in the SCS toolbox for more details.

6.2.5. Cheetah

The CFEL group at DESY provides Cheetah on the Maxwell cluster. See their documentation for how to get started using it.

6.3. Access on online and Maxwell cluster

The tools described above are readily available on both the online cluster and the Maxwell cluster. We recommend using the already setup applications available in the XFEL specific Anaconda3 distribution:

module load exfel exfel_anaconda3

This will provide access to all these libraries and tools for data analysis:

  • karabo_data
  • karaboFAI
  • karabo_bridge
  • karabo_data_interactive
  • geoAssembler
  • pyFAI
  • xas-tim-view
  • karabo-bridge-record
  • karabo-bridge-replay
  • h5glance
  • karabo-data-validate

There are many other applications and libraries available on the Maxwell cluster, maintained by DESY. There is a listing at: https://confluence.desy.de/display/IS/Alphabetical+List+of+Packages

Not all of this software is on the online cluster (maintained by EuXFEL).

6.4. Adding extra software

Both the Offline cluster and the Online cluster environment feature a set of data analysis tools. If an experiment requires access to additional analysis packages or applications, this user requirement shall be discussed and agreed in advance. CAS (control and analysis software) keep active connection to the authors of the different scientific data analysis packages, like Mosflm, XDS, etc. and maintains the availability of these tools.

In addition, users can bring their own tools and install them in their user space. This will be available for immediate use in both offline and online environment.

Users can separate space within an experiment group for the further development of these tools even during the experiment.

In general, users are encouraged to share the progress on such data analysis development with European XFEL. Wider collaborations are also welcome and happily hosted and coordinated by CAS. In this case, dedicated space is going to be created and managed for such collaborative projects.