Tech Meeting Minutes 20110512
|Table of contents|
- Durham ticket - is waiting on user resubmitting jobs.
Peter: the user in question is Peter Richardson, who says he doesn't have any problems. Just waiting to close it for completeness.
- Gridwise mostly okay. Some trouble with the usual
non-grid VMware stuff - so frequent that it must be some issue with the install. One VM went to 200% cpu load - logged in to bring it down and up and it resolved. Have made a lot of progress - previously, the VM disks were extremely fragmented and exported over NFS, now local disks and not fragmented. Going to do the same for the Grid VMs, but the load is lower on them (other than the lcg-CE),.
*(no tickets gold star!)
- Andy Washbrook: ECDF systems team busy this week - phase 2 kit has
arrived. Outages planned Wed 8th -> 16th June for install, extended from original plan to include the annual one day power outage. Should be able to run through the downtime mostly (other than the power outage), but needs some proactive preparation. Some configuration changes made this morning, new CE -> BDII, rationalisation of the gridmapdir in anticipation of ARGUS/glexec etc.
- Wahid: currently fighting with Dell over one of our disk servers being
rather unstable from time to time. Dell still insist that must upgrade the firmware before they'll look at other issues, but progressing.
(Mark: when Dell visit next week, we can bring it up with them.)
- (two tickets)
- Power outage on the 29th -> possibly related to the transfer issues
with disk058 not he 5th. On the 141 switch stack, we're getting buffer memory errors, which we think are causing issues with the rest of the network. We're going to take the entire site down, and bring it up in a specific order to see if we can resolve the issue.
- It has been proposed that, in July, there will be more power work done
to resolve the power issues permanently. We have a plan B if our shutdown-and-up does not resolve the issues - we have a source for (borrowed) replacement kit.
- Tickets: issues with svr026, which is on-going and being looked at by Stuart.
- transfer issue ticket which might be resolvable.
- Dell visiting Glasgow on the 19th. Anyone who wants to come is welcome.
Dell are coming to tell us how awesome they are. (Wahid and Andy were at the Lancaster meeting which preceded this. They do buy a good lunch!)
- Peter - has done some on-site tests of the slightly modified bandwidth
tests. On campus and within physics we get nominal 1Gbit. On some machines (not Peter's) there's a network limit. But there seems to be no problem within IPPP/Durham. The next step is to do the wider testing on the WAN (with Brian). The network guys don't want Peter to run any tests on the WAN before exams have finished (which shows confidence!).
ATLAS only use PD2P to copy data sets. This seems to not fill up big disks (see Glasgow).
- Peter: suppose I was to run an rfio server on a desktop could things
use it. All experimental software as an NFS share. And then, for our local users, a shared NFS scratch area. Slightly concerned about that concerning load issues on the NFS server if lots of people used that simultaneously. Ideally, they'd use normal grid access methods - they don't want to due that because of the authentication time overhead for transfers for gridftp. Could we use rfio to export an fs? (Wahid suggested using Chirp) Peter also brought up the Amazing Brian Bockleman xrootd transfer thing. Peter will email the list as this deserves some discussion and thought.
- Chat window:
[11:01:48] Wahid Bhimji great !
[11:02:47] Peter Grandi also, had some personal downtime: authentic mexican cuisine came with authentic "montezuma revenge"
[11:06:38] Andrew Washbrook sounds nasty!
[11:08:24] Peter Grandi sort of incovenient and messy, reminds me of a lot of Java sw or certain shell scripts
[11:09:24] Peter Grandi BTW as to Java sw Evo instead I sort of like, and now that Skype is a Microsoft property IK hope more people in our environment will use it.
[11:18:11] Wahid Bhimji yes
[11:23:19] Wahid Bhimji indeed I was going to say your users might like posix
[11:24:12] Wahid Bhimji http://www.cse.nd.edu/~ccl/software/manuals/chirp.html
[11:24:21] Wahid Bhimji not sure that is the best documentation
[11:28:58] Wahid Bhimji bye then
[11:29:32] Wahid Bhimji no meeting next week I guess
[11:29:44] Wahid Bhimji ok
[11:29:46] Wahid Bhimji bye