Tech Meeting Minutes 20110609
Present: Peter, Wahid, Andy, Sam, David, Mark
Agenda Location: http://indico.cern.ch/conferenceDisplay.py?confId=138124
|Table of contents|
Gold Star to Edinburgh for no tickets.
Gold Star to Durham for getting stuff sorted (TCPNUT)
Bronze Star to Glasgow.
Major events: finally got to start to finish installing all the new nodes. Now @ 288 cores up. 884 job slots currently.
Republished all the APEL for the last 6 months with the Encrypted DNs mode activated (which is annoyingly fiddly, APEL being what it is).
Now getting a gap publisher issue - but seems to have been fixed.
CREAM install started yesterday, and is the priority.
Towards the end of the month, plan to do SL5 upgrade.
There is no Mike around at the moment.
A series of problems with the pheno LFC being overloaded occurred. Seems to have been fixed by increasing the number of concurrent threads.
Seem to have seen some issues with proxy availability for running jobs (this is the ticket against Glasgow re: sara, jet)
Get some scp failures exactly every 30 minutes on CE. Looking into recurrent jobs that could cause this.
- Mark is looking into wildcard cert suggestion/query by Peter.
TCP testing; incoming ports situation is different to what Peter had been told, but we now know the truth. We're now able to do some tests against Brian's stuff at RAL (talk to Brian)
1 month without a ticket!
ECDF is currently running on string and cardboard!
ECDF is at risk due to the GPFS downtime for fschecking. ECDF grid stuff has been decoupled from GPFS as much as possible (last minute issues yesterday), and most of the tests seem to be passing today, and LHCb jobs are running. ATLAS jobs are not running, though - Wahid is talking to Peter Love about this. We were also blacklisted on spacetokens, but due to overly "keen" shifter.
Next week the actual power cut is due, but we'll try to be up until then.
Wahid is even now talking to Peter about getting the queues reactivated for ATLAS.
History of the Glasgow problem with David Winn. David has submitted a number of jobs to a number of WMSen to a number of sites. AFAWK, jobs from our WMS are running happily at us and Imperial, as are jobs from the imperial WMS. We can't see any jobs successfully running at SARA (from any WMS). Investigation is ongoing.
svr016 has been causing us grief. maui has been being increasingly unreliable. Wireshark + tcpdump showed a lot of additional traffic from our VM ARC CE, so we turned it off (RCP + NFS traffic). We're going to be moving the ARC CE to a real box.
We're also procuring a new box for the batch system, as we've also seen some seg-faults from maui and torque on it (and the box is as old as the original cluster).
The powercut has been moved to August. We're planning on downtime for this period.
Action Peter: IPPP automated restart of services with cfengine.
* Peter has done this for the important services. Closed.
Action: Decommission lcg-CEs: ongoing Durham, done ECDF.
Action: Upgrades to SL5: Ongoing GLA, ECDF, Durham. (Just storage at GLA, ECDF).
Proposed that we have another meeting in Jan 2012 as with the previous one.
This may be complicated by the lack of NeSC after July, but we're sure we can find a suitable venue.
Mark will be in CERN on the 23rd and Edinburgh next week, so we will have guest hosts for the meeting the next two weeks (we're hoping to acquire Lembit Opik for the 23rd.)
Chat window Log
11:00:09] David Crooks joined
[10:57:31] David Crooks Hi Peter
[10:57:57] Peter Grandi hi david we are both early
[11:00:10] Peter Grandi joined
[11:01:26] Andrew Washbrook joined
[11:01:26] Sam Skipsey We're just waiting for Mark to escape from the ATLAS meeting...
[11:02:44] Andrew Washbrook hi guys the atlas meeting is still going on
[11:02:53] Wahid Bhimji joined
[11:03:18] Mark Mitchell joined
[11:03:48] Peter Grandi can hear your
[11:05:25] Andrew Washbrook got me im afraid
[11:05:47] Andrew Washbrook are you still pushing your royalist agenda?
[11:06:36] Mark Mitchell Absolutely I want an OBE
[11:08:24] Peter Grandi http://goc-accounting.grid-support.ac.uk/rss/UKI-SCOTGRID-DURHAM_Sync.html
[11:12:43] Wahid Bhimji its easy to get no ticket when you run no jobs
[11:13:07] Peter Grandi EGI-DEV SITE-NULL
[11:14:26] Peter Grandi National University of Lower Lothian?
[11:14:37] Mark Mitchell
[11:19:02] Peter Grandi BTW I have sent around to our computational phenomenlogists the link to the Edinburgh workshop,
[11:21:03] Wahid Bhimji um.. well registration ought to be closed now ... did you do it a while ago . it is a bit of Atlas meeting....
[11:23:35] Peter Grandi no, I had not realized it might be of interest to our computational phenomenologists, I thought it mostly a technology thing. But there seems to be a significant physics content.
[11:26:58] Peter Grandi thanks!
[11:27:16] Peter Grandi left
[11:27:18] Andrew Washbrook left
[11:27:18] Mark Mitchell left