Day one began with the conference being opened by the keynote speaker, David De Roure, speaking about a recent service, “myexperiment”. Myexperiment is essentially a social virtual research environment which also includes a repository of research methods. The environment has as little mandating of activity as possible, with the researchers able to take a more flexible approach to what can be included. The repository consists of “packs” which don’t just contain a workflow, but also any number of objects attached which relate somehow to that workflow. David also mentioned what he called the six ‘Rs’ of research object behaviours – that is must be: replayable, repeatable, reproducible, reusable, repurposable, and reliable. He also added in a seventh ‘R’ which was ‘referencable’. The environment is currently in its second generation, where the key characteristic is re-use of an increasing pool of tools etc, and provenance analytics are coming into play. In the anticipated third generation, they are looking at the global re-use of the tools, data, and methods in what they call a ‘radical sharing’. ‘myexperiment’ was summed up as being an evolving social infrastructure for sharing.
Session one of the day included two main areas of interest. Firstly that of Research Data. Matthias Razum spoke about eSciDoc, a portal which sits on top of data, and facilitates sharing and collaboration. In eSciDoc, the researcher creates an experiment, then as the researcher does the actual experiment, all data goes straight into eSciDoc. From there the researcher can perform the analysis, visualisation, etc. Mark Hedges then followed on to speak about gathering of data, and made several interesting points: firstly that researcher practices vary, but there are commonalities. For example, many researchers keep their data on their desktop, process the data, output the data (in the form of a publication), then submit some data to a databank – and what is not submitted to a databank is lost. Mark mentioned the OPM – the Open Provenance Model, and discussed the Model’s concepts of node types and the relationships between the nodes called edges which are stored using RELS-EXT. Finally, Julian Jonier from Kitware spoke about the work being done on collecting medical imaging and its related data in a system called MIDAS. He spoke a little about the structure of the system as well as some of the challenges that this area of data faces such as the massive datasets, the challenge of many different formats, and the lack of standardisation across the format metadata schemas.
Session two also included two main areas, one of which was Adminstrative Systems. Sally Rumsey of Oxford spoke first on the blurring of the boundaries between the IR and the research information registry. During this presentation she spoke about the research activity data registry which does not store the actual data, only the metadata. She also mentioned the entities that are used including Person (and persona), Project, Funding Body, Organisational Unit, etc, along with the other controlled vocabularies that are used. Les Carr from Southampton then presented on Research Assessment and the impact on the openess agenda. He mentioned CRIS (Current Research Information Systems) which seem to be becoming quite widespread. Along with CRIS he also highlighted CERIF which is a standard data schema to interoperate between systems. CERIF-ed Irs may contain many separate datasets, all linked by explicit relationships. He explained how their Eprints repository prior to the CERIF contained funder information which was just added as metadata, and how the Eprints repository now contains funders as their own objects – for example, a paper links to affiliated projects instead of just naming them. To finish up the session, Arnoud Jippes from the Netherlands spoke about NARCIS and Research information services on a national scale. The Netherlands has a very unified approach – all unversities use the same CRIS system called METIS. All IRs are harvested by NARCIS, as well as the National Library. He also mentioned that all the repositories use the “DAI” – the Digital Author Identifier.
The last session for the day included a session on Repository Frameworks. Stephen Abrams spoke frst on the work being done by the California Digital Library (CDL) and University of California San Diego Supercomputer Center. It is a collaboration between CDL and 10 campuses and several peer institutions. They are looking at how to manage and add value to a body of trusted digital content. Their approach is to build complexity through composition, not addition – in other words they are taking a very modular approach. They keep everything as small blocks which are easier to replace when they outlive their usefulness. The implementation project which Stephen focussed on is the MERRIT project, which is aiming to consolidate 140TB of existing content. Tom Cramer from Stanford then spoke about the Hydra project, a collaboration between Hull and Virginia Universities, and Duraspace. He mentioned how IRs are well established, but there has been no framework to produce applications that interface between IRs and users. They have identified shared functions including deposit, manage, search, etc. Their approach has been “one body, many heads”, which is a component approach – including components such as Fedora, ActiveFedora, Solr, Blacklight, along with a Hydra Plugin, Services, and a portal called Hydrangea. They have recognised that no single institution can resource the development of a full range of solutions of its own, so they have also taken a “one body many heads” approach to the community which shares the load. Finally Alex Wade from Microsoft Research spoke on the tools that Microsoft Research has been developing to assist with dataset lifecycle management, including Pivot and Zentity.
Copyright Caroline Drury, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>
