Tech Meeting Minutes 20121705
ScotGrid Technical Meeting 17 May 2012
- Present: Sam (minutes) Dave C (chair) Gareth Mike J Wahid B - Mike noted that he's continuing to have issues with the site post-A/C. The nodes have updated inconsistently.
There was an A/C failure, requiring the site to be down quickly. Many ancillary problems as a result (the "torque overwrites /dev/null with an empty file" problem), and the worker nodes which have updated have done so inconsistently (many package resolution issues). Torque has been flakey since the downing of the site. It appears that some of the nodes are generally working, but others are missing multiple packages (lcg_util and the data management packages)
Pragmatically, might be necessary to offline the non-working nodes over the weekend.
Mike discussed the possibility of moving to the UMD-WN and starting from scratch. Generally, the feeling was that it was better to stick with the "tried and tested" glite/lcg WN release for the time being.
ECDF Nothing much new - Wahid and Andy both busy with CHEP stuff. Have not had a repeat of offlining with the CVMFS slowness, but a different glitch failed with a lost heartbeat. ATLAS SAM tests currently failing for everyone's SRM ever so we're not worried.
na62.vo.gridpp.ac.uk was having issues with FTS for svr018. The situation was resolved by fixing the srmv2.2 publishing on svr018 - oddly, it hadn't been updated by the yaim configuration.
Dave & Gareth have been working on the exciting new Interlagos nodes. Our HEPSPECs are lower than were promised - the HEPSPEC of about 360 per node is what we see. We have now what seems to be the "official" line from DELL which is that to see stable (and good) performance you need to use SL6. There's a bios update which is supposed to help, but it is being problematic in itself.
Sam is getting grumpy at DPM for CHEP stuff.
It would be helpful if we could migrate to SL6, as Wahid is also getting indications from Orlando that ECDF will be moving to SL6 at some point soon.
Gareth is also adding some more EMI/UMD CEs.
- Chat log
[10:59:28] Mike Johnson joined [10:59:47] David Crooks joined [10:59:47] Gareth Roy joined [11:01:34] Wahid Bhimji joined [11:02:52] Wahid Bhimji hello [11:02:57] Wahid Bhimji ok [11:05:01] Sam Skipsey node001:~# yum search lcg_util Loaded plugins: kernel-module, versionlock Excluding Packages from SL 5 base Finished Excluding Packages from SL 5 security updates Finished Reading version lock configuration
glite-WN-version.x86_64 : Version tag/marker for glite-WN lcg_util.i386 : org.glite.data.dm-util lcg_util.x86_64 : org.glite.data.dm-util lcg_util-py25.i386 : org.glite.data.dm-util-py25 lcg_util-py25.x86_64 : org.glite.data.dm-util-py25 lcg_util-py26.i386 : org.glite.data.dm-util-py26 lcg_util-py26.x86_64 : org.glite.data.dm-util-py26
[11:05:07] Sam Skipsey is what I get on a node [11:05:18] Sam Skipsey (note lcg_util so I misspoke) [11:08:25] David Crooks https://ggus.eu/ws/ticket_info.php?ticket=82214 [11:09:42] Sam Skipsey lfc-1.8.2-2sec.sl5.x86_64 : CLI for LCG File Catalogue Repo : installed Matched from: Filename : /opt/lcg/bin/lfc-mkdir
[11:09:50] Sam Skipsey so, try installing that package [11:09:59] David Crooks https://ggus.eu/ws/ticket_info.php?ticket=82276 [11:15:30] Sam Skipsey yum whatprovides */lfc-mkdir [11:21:53] Wahid Bhimji well its not so bad SL6 [11:22:19] Wahid Bhimji might be possible at some point in the not so distant future [11:24:15] Mike Johnson well I've made lfc-mkdir appear. Which of course isn't to say it works. [11:24:32] Mike Johnson but moving repos over and yum install lfc seemed to work [11:24:54] Mike Johnson This is where someone tells me it's the wrong lfc-mkdir or something [11:25:20] Sam Skipsey It should be fine.... [11:25:47] Mike Johnson hope sp [11:28:22] Wahid Bhimji left [11:28:23] Gareth Roy left