Tech Meeting Minutes 20110224
From ScotGrid
Present: Graeme, Mike, Wahid, Andy, David, Sam, Michael, Stuart
http://indico.cern.ch/conferenceDisplay.py?confId=129019
Backlink: http://www.scotgrid.ac.uk/wiki/index.php/ScotGrid_Technical_Meetings
| Table of contents |
[edit]
Site Issues
[edit]
ECDF
- Sunday outages:
- Middleware racks 'blew up'. Rack switches needed replaced.
- Disk pool server suffered hardware errors from Saturday night.
- Server was reporting a CPU 'missing' - required firmware update and reboot.
- Monday was a planned outage for power work. Coincidence...?
- Lots of racks had been covered in dust, but actual intervention went ok.
- CREAM CE
- Load spiked and lots of stuck jobs from mw.
- Lease manager is trying to clean out dead jobs - might need manual purge of db.
- Stuart: Maybe the same issue seen at Glasgow (BLAH looses jobs). Tuning of MySQL helps. But maybe this is SGE related?
[edit]
Durham
- Bad audio so could not hear Mike.
- Please look at tickets and ask for help if necessary.
[edit]
Glasgow
- Decommissioned last lcg-CE (svr021).
- Took slightly longer because of extra publishing which was done through this machine.
- Suffered a downtime hit when the machine was in final draning - could have been avoided by taking it out of the GOC earlier.
- No objections from GridPP for correcting the availability figures.
- Network
- Colin changed network path to avoid the bad module, but lost connectivity because of a misconfiguration (now fixed).
- Looks less congested
- Test again next Tuesday.
- Colin changed network path to avoid the bad module, but lost connectivity because of a misconfiguration (now fixed).
- ATLAS Sonar
- Alessandra/Sam tuning SACK - YAIM tuning for disk servers is very old and probably not optimal.
- Will need to inject more transfers.
- Moving SL4 DPM nodes to SL5.
- Register for storage workshop!
- Jamie Ferguson is working on some improved monitoring.
- Have a second ARC CE to test LCMAPS. Will run other ARC CE for Andrej/ATLAS.
[edit]
Other Topics
[edit]
glexec
- Andy will discuss with Oralando (glexec has been security audited twice).
- Glasgow should look at ARGUS in a spare moment(!).
[edit]
Whole Node Queues
- Graeme will submit a bug to condor people for option passing to CREAM.
[edit]
AOB
- Next meeting: Feb 24, http://indico.cern.ch/conferenceDisplay.py?confId=129019.
