Friday 18 March 2016

xia2 -dials spotfinding output

The latest xia2/dials nightly builds contain new output for the spotfinding step of xia2 -dials:

-------------------- Spotfinding SWEEP1 --------------------
102684 spots found on 1800 images (max 1854 / bin)
*****   **                             * * * ******  *     *
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************

1                         image                         1800

These show the number of strong spots found on images during the spot finding and may provide insight into data processing problems due e.g. to sample misalignment. This is rather a nice example, you may see rather more variation than this with more typical data, but if you see a clear minimum anywhere in the set (i.e. results tend to 0) do not be surprised if the processing fails. 

For Diamond Light Source users, this is very closely related to the per-image-analysis performed during data collection and should provide a similar level of insight. It is no accident that similar output is now visible in the summary from DIALS spot finding:

Histogram of per-image spot count for imageset 0:
102684 spots found on 1800 images (max 1854 / bin)
*****   **                             * * * ******  *     *
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
************************************************************
1                         image                         1800

Processing compressed data

Even with modern machines today, X-ray diffraction data can still be big. However, pixel array data also compress well with e.g. gzip - say 20:1 on CBF files from a Pilatus detector. If only they could be processed compressed...

... well they can be. Actually xia2 can run just fine with compressed images:

xia2 -atom Zn image=/Volumes/DATA/data/thermc_1_0001.cbf.gz:1:1800

and works with XDS as well. In the case of this thermolysin data, which were kept on an external USB3 drive, the whole xia2 job was about 20% faster using the compressed data rather than the raw. Worth considering as it also saved about 95% of the storage space as well.

The effects for even bigger data sets could be more substantial, as the 1800 images were able to fit in the cache. If this was not the case the time saving would be greater as the compressed data could be cached but the raw no...

Worth thinking next time you complain about the amount of storage X-ray data takes up.

YMMV, not tested on non-CBF images, some software may not support this, ...