Tech Meeting Minutes 20121201
ScotGrid Technical Meeting 12 January 2012
Mark - Chair
Sam - Minutes
Gareth Roy ("the new guy")
No contact from Mike@Durham so far.
Andy: BDII fell over on Christmas Eve, but caught it before it was noticed. One of our older CEs broke on tuesday, but it's not as much of an issue (old SL4 box). CVMFS is doing fine.
Mark: Have made Andy's requested amendments to the Quarterly Report.
GGUS Tickets: svr018 FTS failure. (GGUS78153) This appears to have been a temporary issue related to the network issues with the link between 141 and 243d. (The overall failure rate for the storage is consistently elevated since that issue, but is still no more than 1%).
disk043: is a box with known issues, which has been RDONLY for several months because of load problems it was having. Recently showed up with corrupt files for ATLAS; we're checksumming it now (but by definition, it can't have new, important, files).
Network blip yesterday: At around 4pm, nodes in 141 essentially stopped responding. Offlining all of the nodes in 141 seemed to improve stability (resetting the stack in 141 didn't fix matters). Dave is currently slowly experimenting with unlinking small numbers of nodes in 141 to see where we remain stable. It looks like the issue is related to the ongoing state of affairs with the Nortel stack on 141.
9th February Scotgrid F2F. Wahid can't make it (due to TEGs etc). Afternoon trip to Auchentoshen is probably going to be funded.
DRI: Pricing for DELL servers has changed - the recommendation for the test/monitoring boxes will be altered, it'll probably be an R410, but there will be an email confirming this. The idea is that both Glasgow and Edinburgh would have both PerfSonar and GridMon clients, Durham would have GridMon (this is a microcosm of the general plan for large sites to run both, and small to run GridMon). The data from this monitoring will, amongst other things, be provided to JANET to improve their network planning for the UK. Note that there are price-cuts across the market for hardware, and it is worth looking at multiple vendors (Cisco, for example).
- Chat Log
[11:01:29] Wahid Bhimji I see there is a rival SCOgrid meeting going on at same time
[11:02:03] Andrew Washbrook joined
[11:02:03] Wahid Bhimji joined
[11:02:05] Mark Mitchell joined
[11:02:07] David Crooks joined
[11:03:04] Gareth Roy joined
[11:03:42] Andrew Washbrook welcome Gareth!
[11:10:34] Mark Mitchell The magic number is 2147483647
[11:26:07] Andrew Washbrook left
[11:26:08] David Crooks left
[11:26:10] Wahid Bhimji left
[11:26:11] Mark Mitchell left
[11:26:28] Gareth Roy left