Tech Meeting Minutes 20110127

From ScotGrid

Present: Graeme, Mike, Wahid, Andy, David, Sam, Mark, Peter

http://indico.cern.ch/conferenceDisplay.py?confId=124890

Backlink: http://www.scotgrid.ac.uk/wiki/index.php/ScotGrid_Technical_Meetings

Hot Topics

  • Glasgow disk server intervention went well. New cables were supplied which now seat properly into the 10GE cards and make both ports accessible.
  • Graeme did declare the files as 'suspicious', but Cedric clarified that the DA tools do not look at this replica attribute, so it doesn't really work. Also needed is an API to clear the attribute.
    • Handling these interventions needs to be managed flexibly, e.g., a long downtime for one server is still better managed by draining.
  • There was some discussion about why downtimes for service components makes little sense - essentially the site needs to manage everything beyond the interface layer (WN, disk servers, etc.).
  • GGUS Tickets:
    • Durham ILC authentication 65923 - Peter needs to check VOMS setup (should be /C=DE/O=GermanGrid/OU=DESY/CN=host/grid-voms.desy.de). N.B. The CIC portal is the authoritative source for VO information: https://cic.gridops.org/ (select VO tab, then the specific VO from the drop down menu).
    • Glasgow WMS myproxy issues 66080 - Seems to be solved, but user is also having trouble with the RAL WMS servers.
    • Glasgow LHCb uploads 66203 - asked for more information, which isn't yet forthcoming. Difficult to use LHCb dashboard. Ticket marked as "waiting for reply".

Action Review

  • ECDF CREAM CE
    • Graeme sent test jobs which worked, so queue is now configured for production.
    • Issue with failing SAM tests is resolved - need to pass an extra parameter in the job submission script.
    • LHCb 444444 waiting jobs - fixed by adding SSL cert authentication to the info script, which is now needed to query the SGE queues at ECDF.
    • Plan to setup second CREAM CE and decomission both lcg-CEs.
  • ECDF Analysis
    • Queue share split between prod and analysis is 50/50, which is ok for a T2.
    • Wahid ran some analysis, which worked well (no s/w area overload). Hammerclouds don't seem to work as well. ACTION Subscribe in more data to get a better balance across servers. Hammercloud with various job limits and determine correct cap. Want to get PD2P switched on for Edinburgh.
  • Durham
    • Auto restart of daemons and central syslog are on the task list.
    • ACTION: Need to do CREAM CE! Somehow this action was dropped. Timescale is progress next week?

AOB

  • David: Started to get replacement power supplies for Viglen nodes now. Two should be able to come online fresh and the others taken out of r/o status.