Tech Meeting Minutes 20101202

From ScotGrid

Present: Sam, Mark, David, Peter, Wahid, Andy

http://indico.cern.ch/conferenceDisplay.py?confId=114761

Backlink: http://www.scotgrid.ac.uk/wiki/index.php/ScotGrid_Technical_Meetings [edit] Hot Topics

   * Glasgow
         o svr026, New Cream CE reporting Memory Issues. Memtest run on the server and no issues found. Svr026 will be re-entered into production on 02/12/2010
         o WMS MyProxy ticket (GGUS 63640) - reassigned to RAL to investigate hostname issues as there is a problem there. "Waiting for reply"
         o svr023 matching non existant CEs (GGUS 63931). David found an old BDII setting for this machine in YAIM. He'll figure out where it gets put and reconfigure it.
         o Svr014 at Glasgow requires its Cream install to be upgraded once Svr026 is back in full production.
         
   * Durham
         o BDII not yet changed to publish GRIDPP tag. Peter will update this shortly.


Actions and Deployment

   * Glasgow
         o glite-APEL now done. David has a few notes to add to the official instructions which he will put in the blog. Now green to go at other sites. ACTION.
         o Disk deployment - not yet done, but in final tests. Half of the Disks 62 - 71 now tested completely.
         o This then frees up enough space to tackle:
               + Migration of SL4 disk servers to SL5
               + Restructure partitions on smaller servers to 10TB, closer to the 15TB partitions on the new servers.
               + Draining good for data distribution across servers! 


   * ECDF
         o glite-APEL: Andy has requested a VM for this.
               + Clarified that all the batch specific publisher tweaks are on the CE. 

+ Work to commence shortly for this install.

         o CREAM. Needed to install patch for software to allow full service testing. Patch should be installed this week or early next week.
         o ATLAS Analysis:
               + New disk is deployed
               + Production has moved to the new disk servers
               + Wahid did a small analysis test (30 jobs)
                     # Saturated links out of disk servers, so need to get their additional interfaces working (each server has 4x1Gb links)
                     # nfs server was under very heavy load - now reinstalling atlas s/w to new machine 
   * Durham
         o Will try to do glite-APEL upgrade early next week. 



AOB

   * Graeme noted major ATLAS downtime at RAL, 6-7 Dec. Good time for sites to take downtime if needed.
         o Mark thinks Glasgow might do so, to deal with heat flow issues. 
   * Stuart is chasing up stuck jobs in cream - all ATLAS pilots.
   * Graeme noted impending security challenge. ACTION to review procedure document.
  • Peter to commence Certificate upgrades at Durham.
  • Third Cream CE to be installed at Glasgow on Svr008 once Svr026 and Svr014 are back into full production.