Wednesday, 2 August 2017

Massive Eiger data sets (i.e. > 100k frames) - some practical recommendations

The vast majority of Eiger data sets are sensible i.e. measured as would be with a Pilatus, but with a different file format. There is however the perfectly sensible approach of putting a crystal on, spinning with a relatively low transmission beam and collecting until dead, then deciding post mortem when to have finished the experiment. This could however end up with very large data sets, with one master file and (say) over 1,000 HDF5 data files.

A clue: this may not end well - for a start you could have > 200GB of data....

Anyhow, some hints.

  1. ulimit -n 4096 
  2. run on a machine with a lot of memory or
  3. xia2 image=/path/to/master.h5:1:100800:3600
First increases the OS limit on the allowed number of file handles a process is entitled to. You will probably need a lot of RAM for e.g. indexing (because a lot of reflections) or split into evenly sized chunks with the syntax above (e.g. 1:360000:3600 if you did 100 turns, into one turn chunks) - this will make the processing much more swift though the scaling will still take a little while.

Whatever happens, with xia2 this is unlikely to be fast... if using 3d/i/ii pipeline, the use of the neggia plugin with XDS will appear in the next release with DIALS 1.7, which is due real soon now...

Friday, 7 July 2017

Eiger data sets > 10k frames

Turns out xia2 would potentially fail with this - see

https://github.com/xia2/xia2/issues/155

Now fixed - will be updated in next release of DIALS (1.6.4 probably)

Meanwhile will probably set out to replace other jiffies used in DIALS with CCTBX code [link] to make the system more future proof... if anyone has a student in need of a project, please get in touch!

Friday, 12 May 2017

Processing data to higher resolution than diffraction

If you are using xia2.small_molecule the default behaviour is to process every reflection and report the resolution limits observed, but include all reflections however weak in the output data. If you subsequently look at the I/sig(I) statistics and you have data to a lower limit than all reflections, you may see a substantial number of reflections with low or negative I/sig(I) i.e.


This is to be expected - these are essentially a population of noisy zero values, the noise a result of statistical errors from background subtraction. These should have no impact on refinement.

If this is not the behavior you want, set "keep_all_reflections=False" on the command line and only those data considered to be present will be included in the output.