Wednesday, 2 August 2017

Massive Eiger data sets (i.e. > 100k frames) - some practical recommendations

The vast majority of Eiger data sets are sensible i.e. measured as would be with a Pilatus, but with a different file format. There is however the perfectly sensible approach of putting a crystal on, spinning with a relatively low transmission beam and collecting until dead, then deciding post mortem when to have finished the experiment. This could however end up with very large data sets, with one master file and (say) over 1,000 HDF5 data files.

A clue: this may not end well - for a start you could have > 200GB of data....

Anyhow, some hints.

  1. ulimit -n 4096 
  2. run on a machine with a lot of memory or
  3. xia2 image=/path/to/master.h5:1:100800:3600
First increases the OS limit on the allowed number of file handles a process is entitled to. You will probably need a lot of RAM for e.g. indexing (because a lot of reflections) or split into evenly sized chunks with the syntax above (e.g. 1:360000:3600 if you did 100 turns, into one turn chunks) - this will make the processing much more swift though the scaling will still take a little while.

Whatever happens, with xia2 this is unlikely to be fast... if using 3d/i/ii pipeline, the use of the neggia plugin with XDS will appear in the next release with DIALS 1.7, which is due real soon now...

Friday, 7 July 2017

Eiger data sets > 10k frames

Turns out xia2 would potentially fail with this - see

https://github.com/xia2/xia2/issues/155

Now fixed - will be updated in next release of DIALS (1.6.4 probably)

Meanwhile will probably set out to replace other jiffies used in DIALS with CCTBX code [link] to make the system more future proof... if anyone has a student in need of a project, please get in touch!

Friday, 12 May 2017

Processing data to higher resolution than diffraction

If you are using xia2.small_molecule the default behaviour is to process every reflection and report the resolution limits observed, but include all reflections however weak in the output data. If you subsequently look at the I/sig(I) statistics and you have data to a lower limit than all reflections, you may see a substantial number of reflections with low or negative I/sig(I) i.e.


This is to be expected - these are essentially a population of noisy zero values, the noise a result of statistical errors from background subtraction. These should have no impact on refinement.

If this is not the behavior you want, set "keep_all_reflections=False" on the command line and only those data considered to be present will be included in the output.

Friday, 8 July 2016

xia2 command syntax: now changed

As per previous post the command syntax change branch has now been merged, if you get your xia2 from git your next git pull will be a big one and you may find e.g. -resolution / -atom etc don't work any more... The functionality is still there however, you just need to use atom=Se (say) or anomalous=true, d_min=1.6... full documentation will follow of the changes.

The pipeline options -3d. etc remain however are now deprecated but still work, you will get a note saying "please use pipeline=3d"

Please if you find any bugs as a side-effect of this contact xia2.support@gmail.com or add an issue at

https://github.com/xia2/xia2/issues

Monday, 4 July 2016

Please be aware: changing command syntax

At the moment we use a mix of command syntax, some using Phil (i.e. unit_cell=a,b,c,al,be,ga) and some using legacy arguments: we plan to completely remove the latter in favour of the former, see

https://github.com/xia2/xia2/issues/42

with the intention that using xia2 in an automated environment is more straightforward and making it much simpler to automatically maintain much more complete documentation of all the options.

This will be a feature of the 1.3 release of DIALS.

Thursday, 5 May 2016

Eiger HDF5 data, with xia2

A common question over the past couple of months is when xia2 will work with Eiger HDF5 data natively. Well, in the dials 1.2 release now available from dials.github.io it now does: your mileage may however vary, there is a certain amount of variation in the file formats still, but I think we have tested with all of the machines running out there in the wild.

This is the result of a substantial amount of hard work from the dials and xia2 development teams, thank you to you all.

Friday, 18 March 2016

xia2 -dials spotfinding output

The latest xia2/dials nightly builds contain new output for the spotfinding step of xia2 -dials:

-------------------- Spotfinding SWEEP1 --------------------
102684 spots found on 1800 images (max 1854 / bin)
*****   **                             * * * ******  *     *
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************

1                         image                         1800

These show the number of strong spots found on images during the spot finding and may provide insight into data processing problems due e.g. to sample misalignment. This is rather a nice example, you may see rather more variation than this with more typical data, but if you see a clear minimum anywhere in the set (i.e. results tend to 0) do not be surprised if the processing fails. 

For Diamond Light Source users, this is very closely related to the per-image-analysis performed during data collection and should provide a similar level of insight. It is no accident that similar output is now visible in the summary from DIALS spot finding:

Histogram of per-image spot count for imageset 0:
102684 spots found on 1800 images (max 1854 / bin)
*****   **                             * * * ******  *     *
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
1                         image                         1800