Tech Meeting Minutes 20110804
Present: Andy, David, Sam, Mark, Mike, Stuart
Hurrah, no tickets.
|Table of contents|
Mike is mikeless. Nothing really to report.
Mark: Mike received all of the equipment that PG had ordered (all the
IPPP stuff) and is installing. Durham seems okay at present!
Andy: nothing exciting. some upkeep - going to try cvmfs at ECDF. Met
with Orlando, and they were overwhelmingly positive. Try on test nodes
first. Looking into the squid deployment now.
Had 6000 pilots yesterday, but seemed to not break anything.
CreamCE excitement. A few times we've had a "Too Many Open Files"
error - too many open connections between WN/CE (10000!).
Cream support guys were a little bit dismissive - think it is a local
(This is on the prerelease "improved" version.)
Mark noted that programmers tend to think that the infrastructure is
the problem, and infrastructure people think that the code is the
Andy noted that the 4 SGE sites all seem to have issues. Could we
coordinate with Imperial, Lancs, et al?
Andy asked what's the distinction between UMD and EMI - what should we
Stuart: Fundamentally, as a WLCG GridPP Site, we should be following
their recommendations (if we knew what they were). The EMI stuff is
"gLite 3.3". The UMD contains the EMI stuff that has been through
staged rollout, and therefore the UMD has had more testing. EGI are
expecting WLCG when they recommend one of them, to recommend the UMD.
In terms of what we should be doing, it is slightly ambiguous. Stuart
is installing some EMI stuff with the expectation of moving over to
Known and stable is still gLite3.2
If you want a newer package, then go for the UMD.
The group expressed their confidence in the high quality of Grid Middleware.
Last Tuesday, power cut. Did some internal maintenance work when we brought it up, which caused some unexpected complications. We're now at 20Gbit 141-243d.
Planned offline power Sunday-Weds (not inc)
Sam: CVMFS activated on nodes001 - 005, fine at the moment, but it's
helped possibly by the increased bandwidth 243-141, and some squid
Stuart: the student is working on gqSub for ARC. We had a failure of
the physical ARC node here, so we had to reinstall on a new node.
Steve Lloyd's metrics page, hepsec06 usage page, you can see that Glasgow has a discrepancy between ATLAS and APEL's records for consumed resources at around 50%. This appears to be because of poor interaction between ATLAS's measures for CPU use and CPU scaling.
May month 5- ATLAS think Durham did a lot more work than APEL thinks you did. The ratio in the Apel/ATLAS column should be ~1.
This will come up at Ops team when Stuart has fully investigated.
Weds for Glasgow will be exceptionally busy. Mark & GlasgowPPEAndy
have managed to pick up responsibility for Kelvin Building networking.
In an ideal world, we'd be up by Weds' midday, but things have a habit of not being ideal.
Mark will inform people if we are still going to be busy by Thurs.
Since we have the power-cut, we don't have much to do on Tues 9th. So, we propose a Grant ScotGrid Meeting at Edinburgh.
If Mike could also make it, then we can have a full f2f and discussion. (All Agreed! The One Caveat is working out how Ops meeting attendance will work. Provisionally: meet up at 10am Waverley Station.
Andy will confirm it's okay by the end of today. )