Wednesday 2 August 2017

Massive Eiger data sets (i.e. > 100k frames) - some practical recommendations

The vast majority of Eiger data sets are sensible i.e. measured as would be with a Pilatus, but with a different file format. There is however the perfectly sensible approach of putting a crystal on, spinning with a relatively low transmission beam and collecting until dead, then deciding post mortem when to have finished the experiment. This could however end up with very large data sets, with one master file and (say) over 1,000 HDF5 data files.

A clue: this may not end well - for a start you could have > 200GB of data....

Anyhow, some hints.

  1. ulimit -n 4096 
  2. run on a machine with a lot of memory or
  3. xia2 image=/path/to/master.h5:1:100800:3600
First increases the OS limit on the allowed number of file handles a process is entitled to. You will probably need a lot of RAM for e.g. indexing (because a lot of reflections) or split into evenly sized chunks with the syntax above (e.g. 1:360000:3600 if you did 100 turns, into one turn chunks) - this will make the processing much more swift though the scaling will still take a little while.

Whatever happens, with xia2 this is unlikely to be fast... if using 3d/i/ii pipeline, the use of the neggia plugin with XDS will appear in the next release with DIALS 1.7, which is due real soon now...