CAIRSS

Open Repositories 2010 – Update – McCallum

Today we reached the point in the conference on Open Repositories 2010 where the general sessions ceased and the user group sessions began. The user group sessions that I attended were very practical and slightly more technical than the general sessions. One talk included an excellent live presentation using DSpace where the audience were able to log in and participate.

Statistics

I have heard some good discussions regarding the relevance and accuracy of repository statistics back home and have personally experienced persistence issues where statistics have disappeared during the administration and upgrading of software. I decided to attend the talk named named DSpace 1.6 useage statistics, what can it do for you? (Ben Bosman) in the hope of discovering some new facts about statistics which I could share.

It was noted during the talk that 36% of DSpace users requested features relating to statistics. The latest version of DSpace (1.6.x) now includes a sophisticated statistics package, which was a contribution by @MIRE. It is worth noting that the logging and collection of the statistics ships with DSpace for free, however the statistics visualisation package which filters and displays the statistics in a browser can be purchased from @MIRE. It is possible to create your own visualisation software by simply querying the stored statistics.

DSpace can trigger a statistics event in 4 ways. A community home page visit, a collection home page visit, an item visit or a bitstream download. When an event is triggered, the information that is stored includes (but is not limited to) the IP address of the visitor, the referring web site, the longitude and latitude of the visitor as well as the continent code (eg Asia) and country code of the visitor (eg Japan).

These events are stored in a raw format using XML. The information is then indexed using SOLR, meaning the indexing and retrieval of information is fast and efficient. This architecture allows the use of an additional/separate machine for the SOLR component, saving CPU and RAM resources in the repository and allowing load balancing for the SOLR machine if required.

When we think of statistics the first thing that comes to mind is traffic, hits and popularity, however statistics based on metadata and bitstreams allow us to measure the growth of repository over time and analyse the repository based on information about the objects and their associated bitstreams. The collection of statistics in this system is highly configurable.

Having rich data in any form is a good thing I guess and even if traffic and popularity of single objects is not your goal this type of granular statistics collection on any platform may add value one day when justifying the existence, history or importance of your repository.

Fedora

I gathered some notes on Fedora as it is widely used in Australian repositories, below is a guide/roadmap of Fedora for the near future.

Fedora 3.3 will be released on December the 18th. New features will include the ability to ingest and provide local content using the file:// address. We were advised that the RESTful API in this version is no longer experimental and is now ready for use. Also RELS-INT relationships are supported.

Fedora 3.4 is coming soon, no date set at this point. The release candidate is out now users are able to assist in the testing of this release. This version includes size attribute improvements, scaling of logging without restarting and 25 bug fixes.

Fedora 3.5 will pave the way for 4.0 and has the following features pencilled in. A RESTful relationship API (currently versions are using SOAP to manage relationships), as well as the use of Spring and OSGi, enabling the community to contribute code more efficiently. Improvements to the Fedora command line interface was mentioned also.

Copyright Tim McCallum, 2010. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. < http://creativecommons.org/licenses/by-sa/2.5/au/ >

graphics1

This post was written in OpenOffice.org, using templates and tools provided by the Integrated Content Environment project and published to WordPress using The Fascinator.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment