Tech Meeting Minutes 20101202
From ScotGrid
Present: Sam, Mark, David, Peter, Wahid, Andy
http://indico.cern.ch/conferenceDisplay.py?confId=114761
Backlink: http://www.scotgrid.ac.uk/wiki/index.php/ScotGrid_Technical_Meetings [edit] Hot Topics
* Glasgow
o svr026, New Cream CE reporting Memory Issues. Memtest run on the server and no issues found. Svr026 will be re-entered into production on 02/12/2010
o WMS MyProxy ticket (GGUS 63640) - reassigned to RAL to investigate hostname issues as there is a problem there. "Waiting for reply"
o svr023 matching non existant CEs (GGUS 63931). David found an old BDII setting for this machine in YAIM. He'll figure out where it gets put and reconfigure it.
o Svr014 at Glasgow requires its Cream install to be upgraded once Svr026 is back in full production.
* Durham
o BDII not yet changed to publish GRIDPP tag. Peter will update this shortly.
Actions and Deployment
* Glasgow
o glite-APEL now done. David has a few notes to add to the official instructions which he will put in the blog. Now green to go at other sites. ACTION.
o Disk deployment - not yet done, but in final tests. Half of the Disks 62 - 71 now tested completely.
o This then frees up enough space to tackle:
+ Migration of SL4 disk servers to SL5
+ Restructure partitions on smaller servers to 10TB, closer to the 15TB partitions on the new servers.
+ Draining good for data distribution across servers!
* ECDF
o glite-APEL: Andy has requested a VM for this.
+ Clarified that all the batch specific publisher tweaks are on the CE.
+ Work to commence shortly for this install.
o CREAM. Needed to install patch for software to allow full service testing. Patch should be installed this week or early next week.
o ATLAS Analysis:
+ New disk is deployed
+ Production has moved to the new disk servers
+ Wahid did a small analysis test (30 jobs)
# Saturated links out of disk servers, so need to get their additional interfaces working (each server has 4x1Gb links)
# nfs server was under very heavy load - now reinstalling atlas s/w to new machine
* Durham
o Will try to do glite-APEL upgrade early next week.
AOB
* Graeme noted major ATLAS downtime at RAL, 6-7 Dec. Good time for sites to take downtime if needed.
o Mark thinks Glasgow might do so, to deal with heat flow issues.
* Stuart is chasing up stuck jobs in cream - all ATLAS pilots.
* Graeme noted impending security challenge. ACTION to review procedure document.
- Peter to commence Certificate upgrades at Durham.
- Third Cream CE to be installed at Glasgow on Svr008 once Svr026 and Svr014 are back into full production.
