Tech Meeting Minutes 20110908
Scotgrid Tech meeting
Andy: too many issues to address with virtual instance - don't want to have to restart it daily. (CREAM virtual CE) have been doing some low-level Squid +cvmfs testing. Seems to have been reasonable.
Orlando had some queries: 1) package dependancy of sudo on cvmfs. Seems to work fine without it? (cvmfs uses sudo to drop down to cvmfs user, we think.) We need to test this out fully, on a test WN. Need to advertise the squid properly - Wahid was talking to Alastair about this.
Wahid: Alessandro said he'd follow up squidness with a ticket. We still can't contact the Glasgow squid - which we'd like to be able to do (currently we fail over to Lancs).
Dave: Yes, we need to look at that.
Andy: UK CREAM + SGE meeting. This was useful, especially with Stuart present and being awesome. We can put in some GGUS tickets about it. Slides have been sent to Mark for use at GridPP27.
Wahid: in the ATLAS meeting just now, it was announced sites which have been blacklisted for more than 20% of their time will be asked to explain themselves (each month). ECDF is getting a bit close to this (partly due to the RAL issue). This is also being used for the alpha/beta/gamma/delta site classification, for deciding data pre-placement. you can check your site's downtime here: http://hammercloud.cern.ch/atlas/autoexclusion/ but there is no good list for the site classification available.
- - Glasgow:
Dave: Glasgow has been fairly quiet. CREAM CEs got quite grumpy, with build up of BLAH registry files (in the back-up location), which BUpdater then spends ages clearing out. This has been a rolling problem since the power down for power work. Came to a head early this week, when all 3 CEs all decided to clean themselves out at once (with concomitant lack of response to anything else).
(This is with CREAM 3.2.8-2 (glite3.2)).
Stuart is pondering/threatening to rewrite BLAH to use a proper DB backend, rather than flat files/directory structures.
- Sam has just returned from being on the Data Management panel at ACAT 2011.
1) caching vs pre-placement was a big discussion point on the panel (vis STAR one-click transfer, cvmfs, xrootd-global-redirector). Caching is a necessary evil if your data management problem is sufficiently complex - but you need to preseed data, especially if dataset hotness is volatile (to prevent seed sites from being DDoSed at the start of the caching process). There are many issues with cache management - for example, the need to restrict single users from trashing your cache with a single monster-sized dataset transfer - that need to be properly addressed however (although these tend to have existing solutions).
2) data preservation! We're not aware of any particularly organised efforts to concretely preserve + archive data from the WLCG experiments. 3) will future experiments even be able to use Grids (via monolithic "Cloud" data centres). Butter's Law empirically shows network capacity (back-haul) doubles every 6-9 months. But the data requirements by the next gen experiments could be an order of magnitude larger than the WLCG, and there will be more than one of them!
4) GPUs (and GPUs on the grid).
5) multicore! (From a data management perspective, we don't think that 48-core nodes with 48-jobs per node will be sustainable for transfer rates - Experiments *must* move to a multithreaded/multiprocess schema before we get that far - we're at the tipping point in the current procurement cycle. For future experiments, we suspect that Map/Reduce etc style data-aware job parallelisation and placement is the only sensible approach.)
Durham: (Mike, in absentia): Another power issue, is spending his life turning thing back on in the right order. (We at Glasgow empathise.)
AOB: Good luck to the happy husband-to-be next week!