Tech Meeting Minutes 20100603
Present: Graeme, Sam, Andrew, Wahid
|Table of contents|
- LHCb upload problems at Glasgow. Sam has discovered a strong correlation between upload failures and network traffic peaks. The network traffic seems to be internal, so why it affects outbound gridftp is still a mystery.
- Durham have a ticket about torque queue ACLs: https://gus.fzk.de/ws/ticket_info.php?ticket=58692. ACTION: Graeme to try to get David Grellscheid to fix this. [Addendum - with the power of X-site login, Sam dealt with it, thanks!]
- Glasgow were hit by very high load on a DPM disk server overnight. It caused 2 race condition directory problems and many failed transfers. Worryingly, this seems very correlated with SL5. Sam thinks it might have something to do with xfs on SL5.
- squid at ECDF: Wahid reports that access to Lancaster squid is now working.
- CREAM: Still waiting for extra servers from systems team. Need to discuss the recommended prologue/epilogue scripts which cream on SGE would like the batch system to have. ACTION: Andy to investigate and raise with Orlando.
- Storm + GPFS: ACTION: Wahid to chase up the adding of LOCALGROUPDISK in ToA (http://savannah.cern.ch/bugs/?67701).
- ECDF will have 150TB in storm. We aim to get them an official ATLAS datashare. ADC will not send data to sites with less then 25TB (soon to be 40TB) in MC/DATADISK. Graeme's initial suggestion: MCDISK, DATADISK 50TB; PRODISK 10TB; SCRATCHDISK 20TB; LOCALGROUPDISK 18TB; HOTDISK 2TB.
- Jose is working on this, but needs to make sure he submits the initial job with Role=pilot.
- Glasgow's cooling off period is about to end...
- Meeting next week: http://indico.cern.ch/conferenceDisplay.py?confId=97220.