Tech Meeting Minutes 20110106
Present: Graeme, Sam, David, Peter, Stuart, Andy
- DPM daemon died (GGUS 65861). Peter has been wanting to get an idea of how often this happens (also seen with the gatekeeper daemons). However, now considering an auto-restart via cfengine to avoid this having too much of an impact. ACTION.
- Maui stopped scheduling anything just before Christmas (fixed by Stuart). Scheduler is making odd decisions regarding balance between ATLAS and LHCb work - seems to run 1:2 instead of 2:1. There is a problem with the priority assigned for waiting in the queue and the fairshare integration time being too short. The configuration isn't doing what we really want. Rather than relentlessly fiddle, we need to work out what we do want. Idea to increase FS integration time to that of longest jobs (48 hours) as well as adjust ATLAS shares to reflect workload instead of management.
- Some trouble with ATLAS releases (again) but was eventually fixed. cmvfs!
- Durham have a glite-APEL node now. Publishing working now that RAL APEL service is back running.
- ECDF is close to switching to APEL, but at the moment the glite-MON is having some out of memory errors which need to be looked at before transferring the database across.
- Next meeting F2F, Jan 14 in Edinburgh.
- Andy will poll who is coming to book somewhere for lunch.