CAIRSS

Two ANDS events and what they mean for CAIRSS

Copyright Peter Sefton, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

HTTP://DBPEDIA.ORG/SNORQL/?QUERY=SELECT+%3FRESOURCE%0D%0AWHERE+{+%0D%0A%3FRESOURCE+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%2FBIRTHPLACE%3E+%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FRESOURCE%2FSYDNEY%3E+%3B%0D%0A%3CHTTP%3A%2F%2FDBPEDIA.ORG%2FONTOLOGY%2FPERSON%

Over the last two weeks I attended a couple of ANDS-sponsored events. The first was a week-long Bootcamp held at ANU, designed to induct people from ANDS partner institutions, hosting ANDS-funded projects into the mysteries of ANDS and to help us establish professional networks. The second was the ARDCPIP round table which was part of the consultation process for the ANDS-funded Australian Research Data Commons Party Identifier Project, aiming to provide a single ID for every researcher in Australia (with opt-out for those who don’t want one), and help researchers match their IDs across other services such as Thomson Researcher ID or the forthcoming ORCID service.

My travel for both of these events was paid for by ANDS, but in my CAIRSS strategist role I tried to bring in the concerns of the repository community in Australia, and talk about some of the lessons we have learned.

Bootcamp

From a CAIRSS perspective these are the main points I brought home from the bootcamp.

The biggest issue, for me was that organizations have to make sure they meet their own needs. That might sound like it’s stating the obvious, but Simon Porter of The University of Melbourne pointed out at both meetings that building something to ANDS specifications (such as a metadata store for research data) without looking at broad value to the organisation is bound to fail once the ANDS funding dries up why would you maintain that ANDS-oriented service?

I’ve been making the same point, with a slightly different emphasis. I think that unless we develop models for managing research data, and feeding descriptions out to the commons, that are useful to the all internal stakeholders then we risk what happened in many early institutional repository developments; they grew up in libraries with an Open Access agenda but failed to integrate with offices of research. There are some exceptions, but broadly speaking, in Australia repository managers report to us that integration with the research office’s systems could be better. ANDS projects feeding metadata to the Australian Research Data Commons risk repeating this unless we stop and think about what metadata will create lasting value.

Sure, open access to data is A Good Thing but we all know in CAIRSS that when the Australian Government cooks up a new reporting scheme that’s what you’re going to be eating, no matter how tasty or nutritious the Open Access fare. So, when we go into these deals with ANDS designed to seed the data commons, lets make sure that we have enough management metadata about research data to be able to comply with the next RQF/ERA style repository exercise, and make sure you can prove to funding bodies like ARC and NHMRC that there are data management plans in place and we actually know where our data collections reside.

Models for research data management abound. There are many variants in the way organisations have responded to the research data management challenge, ranging from the almost indiscernible in some institutions, to various configurations of collaboration between offices of research, IT, library and records management. The group we had Bootcamp wasn’t representative of all Australian universities, but one of the stand-out stories for me was the way Griffith have configured their services for data management and eResearch. Griffith bootcamper Heidi Perrett was mobbed by others desperate to get their hands on Griffith materials such as the online Data Management Plan Tutorial, a kind of web-based wizard for generating data management plans. Just to give you an idea of the Griffith commitment in this area, Heidi comes from a Research Data Management Services team which lists 15 members in the uni phone book.

Griffith are part of a consortium with the University of Melbourne and QUT working on a metadata stores project which is developing in parallel to the one my team is building for ANDS. In all three organisations the model for storing metadata about research data is to have a central ‘hub’ with contributions coming from many different systems and services. The hub is going to be a semantic-web application, meaning it looks at metadata purely in terms of statements about resources the resources themselves will live elsewhere.

I was able to talk a little about the Newcastle model which was developed by Vicki Picasso and team it has a central metadata store managed by the library, which will watch an institutional data storage service and respond to what Vicki dubbed institutional triggers including events in the grants management system run by the research office. Interestingly, there is a tie-in at Newcastle with the records management office.

Caroline Drury, the CAIRSS/ANDS liaison officer (who is employed by USQ using ANDS money) is available to consult with CAIRSS stakeholders about institutional models. As a rule of thumb, I’d say the larger the institution the more likely it is that a central metadata store will have a ‘registry’ focus, with stuff being harvested from many sources, while in smaller institutions there is more chance that a more repository-like metadata store might be appropriate. Contact CAIRSS if you would like to get in touch with Caroline.

Community is important. We know this from the ARROW / APSR / RUBRIC days, networks formed through those projects have persisted and form the basis for new collaboration.

To bootstrap networking, the bootcamp started with an organised ’speed dating’ session where we stood up in two lines, and did a kind of getting to know each other bush-dance1, where we’d talk for two minutes, then move on. I think this was pretty effective and getting an idea of the breadth of the group, but meeting twenty-ish other people all in one go is a bit overwhelming. It might have been better to spread the process out, or at least encourage us to take notes. I forgot which was the person I was supposed to get back to about DSpace, and about half the names, and so on.

On the final day we talked about how to maintain the community. I think the experience from services like CAIRSS is pretty clear a mailing list what librarians like to call an eList is the best vehicle for ongoing conversation in diffuse groups; it’s very hard to get the critical mass you need to keep a discussion forum going. Lists need careful management though, on the CAIRSS list Kate Watson encourages us to look for a balance between posts from CAIRSS-central and group participation; we try for a sense of community where people are able to participate and not feel shy about asking questions2.

Blogs, wikis and document sharing sites are useful, but most of the group will use the list as the notification service for all of those. I asked for a blog-site for ANDS projects after discussion with my collaborators at USQ and Newcastle as I think individual project blogs will lack the critical mass to be really useful but maybe what we really need is an ANDS aggregator, which could be as simple as a delicious feed.

At least one regular meetup session has been sparked by the networking at bootcamp Gabrielle Gardner from UTS has started a Sydney chapter of what she calls the surviving RIF-CS support group3.

The semantic web is coming. OK, so maybe it’s coming on a rather slow boat. But some of the basic semantic web principles of linked data are becoming really important; ANDS obviously thinks so as they brought in Anne Cregan to talk about the semantic web. A few technically oriented people I spoke to saw her talk as one of the highlights; it served to introduce them to some of the basic concepts of this thing they’d heard about but not understood. Me, I would have liked to see theoretical stuff Anne talked about related more closely to the metadata-about-data that ANDS is encouraging us to share. I tried to cover some of the what this linked data approach to metadata repositories/registries would look like in my talk at the second ANDS event. Which brings us to the ARDCPIP4.

ARDCPIP

The Australian Research Data Commons will be exposed to the world using Research Data Australia. RDA has an information model with 4 classes of thing:

Records by Classification:

  • Collections (1123)

    Where a collection is a useful grouping of physical or digital items.

  • Parties (329)

    Where a party is a person or organisation that has some relationship to a collection, service, activity, or party.

  • Services (2)

    Where a service is a mechanism for gaining some kind of access to or information about a collection (or items within a collection).

  • Activities (2)

    Where an activity is an undertaking or process related to the creation, update, or maintenance of a collection.

http://services.ands.org.au/home/orca/rda/ (as at 2010-06-22)

Obviously the Party infrastructure is about managing the second class of thing on that list. Parties. You can read all about the NLA’s project at the project site. The NLA team, Basil Dewhurst and Natasha Simons, presented an overview of the service, and some of the tools they are planning to make available to institutions to manage name identities.

My input to the round table was very much in line with CAIRSS concerns. I posted the notes for my talk over on my blog. Simon Porter covered many of the institutional issues, and Rebecca Parker gave a summary of the NicNames project (ppt) and the complexities of names and identities. This is important stuff for those of us implementing services.

Data quality is far from assured in the Research Data Commons, particularly with the current enthusiasm at ANDS for getting stuff up and visible ASAP. The problem with this is that the Collections/Parties/Services/Activities model is very much dependent on having relations between the classes of thing in their model. But if the things are imprecisely identified the the whole model is compromised. In a typical IR there in Australia, parties are very imprecisely identified by strings meaning that if you tried to enter collection descriptions into the repository then push them through to ANDS, the RDA model would not be able to distinguish between John and Joan Smith, when they publish as Smith, J, nor bundle the relevant citations from those two or more individuals together.

Against this background I talked about a project we are working on at ADFI at USQ to build a local service which can be used to provide name authority linking services building software for Linked Authority-Control. I talked about how we are planning to work with Vicki Picasso’s team at the University of Newcastle to sort out all the names in their VITAL repository so that all the Newcastle parties have a proper ID, in the form of an HTTP URI. This URI can be fed both to the People Australia service which is part of the Party Infrastructure, and to ANDS and they will be able to be matched-up in Research Data Australia.

One of the suggestions I made to ANDS at this meeting was that if people are able to develop interfaces with the NLA’s party services, so parties defined locally flow through to People Australia, then they should not have to feed the same data to ARDC the ARDC system could quite easily do a look-up at People Australia for name IDs that it doesn’t recognise.

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project.


1 Sans largerphones

2 The ANDS-general list has maybe not got this balance right yet. It was formed as a vehicle for announcements (which would usually be called ANDS-announce or similar), and when people misundertood and started to use it for discussion we were told it was for ANDS-to-many communication not many-to-many conversation. It seems to be turning into more of a discussion now, but I’m wary of posting there, having been blocked once.

3 I’d say there’s no room for complacency at this stage, as there is not enough evidence that RIF-CS sufferers have good long term prognosis.

4 Nobody except Rebecca Parker from Swinburne even tried to say Ard-K-pip, it’s an acronym that reminds me of trying to pronounce the periodic table, which is how I memorised it at high school. H! He! LiBeBCNOFNe, NaMgAlSiPSClArKCa.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment