Tech Meeting Minutes 20111006

From ScotGrid

ScotGrid Tech Meeting 6 October 2011


Chair: Mark
Minutes: Sam
Peons: Wahid, David C, Andy Washbrook + tan, Mike
Agenda: http://indico.cern.ch/conferenceDisplay.py?confId=148164
- Edinburgh.

Mark taunted Andy about missing GridPP27 at CERN due to the whole getting-married thing.

One ticket against ECDF. It's the CVMFS-migration ticket across the UK.
ECDF is in a transitional state on this.

Andy: The only thing that was a little bit of an issue was the inclusion of sudo as a requirement. As far as we can tell, it doesn't seem to be a showstopper, so we can probably work around it. We still need to do some testing.

Sam noted the existence of the new CVMFS "rollout" process on the GridPP wiki for this. The absolute deadline for this according to ATLAS is the end of the year.

Mark: We'll keep going on with CVMFS at Glasgow, too.

- Durham

Mark: The ROD were ticketed by the COD, concerning the fact that one of the tickets against Durham had no reached a month, with a suppressed alarm on the ROD panel. This meant they had to discuss suspending the site. It's important to note that this was a failure of the operational process, not of the site, per se.
Mark manfully managed to wrestle Durham aside from this fate, Sam made a working SE for Durham, and we're slowly backing away from the edge.

The most important thing here is to get the CreamCE up and running, as the affected ticket is associated with it. If it is at all humanly possible, it should be up before the end of tomorrow.
Alastair Dewhurst has closed the active tickets against Durham and replaced them with a nice shiny new one for ease of management.

(Then we can concentrate on the dark data on the SE and problems with the NFS mount.)

Mike is even at this moment booting the VM for the CreamCE.

Mike has asked the network guys to reset the counters on the network interfaces for the grid.

The SE has been brought up as a brand-new UMD 1.8.1 DPM, and seems happy. It also has a hostcert signed by the new UK eScience CA "2B" trust anchor.
ATLAS is now reallocating and retransferring data back to the SE. We imagine it will be less than about 20TB, and probably less than 10.

The last time we managed to break the Durham network was with a large data transfer by a single user, so it'll be interesting to see what happens with the ATLAS resubscription.

- Glasgow.

Like everyone else, we have one open ticket concerning CVMFS.

Dave: Mostly this week we've been dealing with the instabilities from last week. We've been seeing an increase in spiky network traffic, which Stuart+Mark have isolated to a single unit in the stack in 141. This seems to have also had knock-on effects on torque+maui stability, which have been needed babying.

We've also managed to get the Windows 7 "connect to server" thing to talk to our DPM WebDAV instance on svr025, although it doesn't support certificate authentication…

Mark: Also managed to fix cacti at Glasgow, which has been helpful in isolating our problem with the network (it turns out that setting it to use 64bit counters isn't *quite* enough to get it to work with >1Gbit transfers). The damaged unit in the stack is full of nonsense and filling up all its buffers.
The power work issues are still being investigated.


AOB.
Can people get any outstanding CHEP abstracts in?
If anyone wants to go to Taiwan or Iceland, they could also submit abstracts there.

Planning for expenditure for the GridPP funding tranche should be done forwith.
Wahid will be sending some things through from Edinburgh concerning this and some ideas Phil has had.

Mark suggests that Durham should probably buy more disk, or maybe a low-end 1Gbit switch (6248 series Dell) that supports 10Gbit expansion ports. The problem is that the 10Gbit modules are £200 + the £6000 for the unit.

Dr David Crooks, Esq. will be chairing the meeting next week.

- Andy asked about network monitoring.
Mark just wants a rough figure for the amount of raw data the collaboration is pushing.
JANet think that we might be pushing about 7Gbit/s, but we'd like to actually have a rough measure ourselves for how much data we move around. There are various options available for separating our data from others, or improving our latency and contention ratio, but we need to be able to argue for them with real data.

-

Chat log:
10:58:34] Mark Mitchell joined
[10:59:39] Wahid Bhimji joined
[11:01:00] David Crooks joined
[11:01:08] Andrew Washbrook joined
[11:01:48] Andrew Washbrook are we holding a one minute slience for Steve Jobs?
[11:02:13] Sam Skipsey Yes, this is the inaugural iSilence.
[11:02:19] Andrew Washbrook very good!
[11:03:45] Wahid Bhimji oh no - what an honour
[11:06:51] Wahid Bhimji not absolute deadline
[11:07:12] IPPP1 UofDurham joined
[11:07:18] David Crooks left
[11:08:25] David Crooks joined
[11:15:40] Wahid Bhimji in the world?
[11:16:08] Wahid Bhimji some of the early adopter sites must have it in production (not to belittle the achievement)
[11:21:33] Wahid Bhimji at the moment we have 2.1 T on PRODIDISK at ecdf and 1.5 for hot
[11:21:49] Wahid Bhimji but we allow for 16 in PROD and 4.4 in HOT
[11:27:15] Wahid Bhimji For CHEP are we already supposed to put in travel requests to robin (before hearing about abstract accpeted etc.)
[11:27:29] Sam Skipsey It's all a bit political, Wahid.
[11:27:52] Sam Skipsey I think you can indicate to Jeremy an intention to go, first, and see how stressed he seems.
[11:28:40] Wahid Bhimji Well I replied to Petes email re "who's going?"
[11:28:47] Andrew Washbrook I sent an email to Pete - is that good enough?
[11:29:31] Sam Skipsey Yeah.
[11:29:39] Sam Skipsey It'll just make the travel budget explode.
[11:29:40] Sam Skipsey
[11:29:55] Andrew Washbrook i will go by boat
[11:30:13] Sam Skipsey Kayak.
11:35:59] Wahid Bhimji its co- timed with wlcg ech
[11:36:08] Wahid Bhimji tech - isn't it - so that might save money
[11:36:42] Sam Skipsey It's always co-timed with WLCG-tech.
[11:36:54] Sam Skipsey The problem is that GridPP doesn't fund international travel for WLCG Tech.
[11:36:58] Sam Skipsey Never have.
[11:37:04] Wahid Bhimji what - did I miss it in Taiwan.... or was that it
[11:37:12] Wahid Bhimji (this one I thought had seperate days )
[11:37:29] Sam Skipsey Yeah, so Prague worked the same way as New York is going to.
[11:37:36] Andrew Washbrook yep
[11:37:39] Sam Skipsey GridPP was prepared to only pay for the CHEP bit.
[11:37:48] Wahid Bhimji wasn't DESY a tech
[11:40:16] IPPP1 UofDurham cream02.dur.scotgrid.ac.uk is now up...
[11:40:51] Wahid Bhimji cheers gotto go - thanks very much for all info