<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CAIRSS &#187; Open Repositories 09</title>
	<atom:link href="http://cairss.caul.edu.au/blog/category/open-repositories-09/feed/" rel="self" type="application/rss+xml" />
	<link>http://cairss.caul.edu.au/blog</link>
	<description>The primary function of CAIRSS is to offer support for Repository Managers in the higher education sector in Australia.</description>
	<lastBuildDate>Wed, 21 Dec 2011 01:55:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Open Repositories Conference 09 Part 2</title>
		<link>http://cairss.caul.edu.au/blog/2009/06/10/open-repositories-conference-09-part-2/</link>
		<comments>http://cairss.caul.edu.au/blog/2009/06/10/open-repositories-conference-09-part-2/#comments</comments>
		<pubDate>Wed, 10 Jun 2009 03:38:13 +0000</pubDate>
		<dc:creator>tpmccallum</dc:creator>
				<category><![CDATA[Open Repositories 09]]></category>
		<category><![CDATA[The Fascinator]]></category>

		<guid isPermaLink="false">http://caulcairss.wordpress.com/2009/06/10/open-repositories-conference-09-part-2/</guid>
		<description><![CDATA[Open Repositories Conference 09 Part 2]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id1">Performance</a></li>
<li><a href="#id2">The Fascinator</a></li>
<li><a href="#id3">Normalization</a></li>
</ul>
</div>
<div>
<h1><a id="id1" name="id1"><!--id1--></a>Performance</h1>
<p>I learned of a multi-disciplinary search engine for academically relevant web resources called <a href="http://www.base-search.net/">BASE</a> during some recreation/mingling time. I spoke to a couple of developers about two important issues relating to repository systems/solutions. The first issue was performance. We discussed how BASE uses a search technology called <a href="http://www.microsoft.com/enterprisesearch/en/us/fast-customer.aspx">FAST</a>. I believe that Microsoft acquired <span class="spCh spChx201c">&#x201C;</span>FAST Search &amp; Transfer<span class="spCh spChx201d">&#x201D;</span> in 2008, the product is now known as <span class="spCh spChx201c">&#x201C;</span><a href="http://www.microsoft.com/enterprisesearch/en/us/fast.aspx">FAST ESP from Microsoft</a><span class="spCh spChx201d">&#x201D;</span>. Base currently holds over 20 Million records from <a href="http://base.ub.uni-bielefeld.de/en/about_sources_map.php?menu=2&amp;submenu=1">1265 sources</a> around the globe and is contributing to the <span class="spCh spChx201c">&#x201C;</span>Digital Repository Infrastructure Vision for European Research<span class="spCh spChx201d">&#x201D;</span> (DRIVER).</p>
<p>The average search that I did on generic topics produced about 150,000 hits out of  20,084,184 items, all in fractions of a second. Very impressive performance. I have never tested using item numbers this large and certainly have not seen results like this with even a fraction of the content. It appears that BASE holds meta data,  full text and precise bibliographic data and uses OAI-PMH for harvesting. I searched for quite a while to get a data stream such as a PDF served via the BASE url but was redirected every time. I am therefore assuming that there are no data streams stored locally (meta data only). Guys please correct me if I am wrong about this.</p>
<h1><a id="id2" name="id2"><!--id2--></a>The Fascinator</h1>
<p>I do not wish to make any performance comparisons at all with BASE as <a href="http://fascinator.usq.edu.au/">The Fascinator</a> has only been tested with a minute amount of records compared to BASE. The interesting part that I would like to raise is that The Fascinator is not only able to harvest and provide meta data only, but can harvest and store data stream content locally as well. In this case it is possible to configure The Fascinator in two ways. The first way is to enable it to engage directly with Fedora and harvest meta data as well as data streams using Fedora&#8217;s API&#8217;s. The second way is to configure The Fascinator to harvest using OAI-ORE, if there are references to data streams in the resource maps they will be downloaded and stored locally along with the meta data it was configured to harvest at the time. The <a href="http://www.usq.edu.au/">University of Southern Queensland</a> in conjunction with the <a href="http://cairss.caul.edu.au/www/">CAIRSS project</a> is getting ready to carry out a nation wide harvest called the <span class="spCh spChx201c">&#x201C;</span>Australian University Repository Census<span class="spCh spChx201d">&#x201D;</span>(<a href="http://aust-repos-census.usq.edu.au/the-fascinator/">AURC)</a>, this harvest will be carried out using The Fascinator software. </p>
<h1><a id="id3" name="id3"><!--id3--></a>Normalization</h1>
<p>As I mentioned above there was another important issue that was brought up in our casual conversation, Normalization. It appears that this is a problem for everyone in the repository space and harvesting projects. I was throwing a <span class="spCh spChx201c">&#x201C;</span>developer challenge<span class="spCh spChx201d">&#x201D;</span> idea around in my head before the conference about creating an application, well more of a web service really that would harvest a repositories metadata and then display it in a web browser. pointing out obvious mistakes first, followed by suggestions for normalization (all the while linking back to the item, so that the user could organize the editing of that item). I talked to Oliver Lucido briefly (could not discuss it with Peter as he was a Judge for the challenge). We came to the conclusion that this is pretty much what we are doing with <a href="http://cairss.caul.edu.au/www/aust-repos-census/aust-repos-census.htm">AURC</a> using The Fascinator. This being my first conference, I was unsure about how much conference content I would miss out on by trying to code something up for 2 out of the 4 days&#8230; so that idea kind of died. </p>
<p>Now that I am back I am revisiting that idea and wondering if it is possible to put together some pieces that exists already and combine that with some software (plagiarism detection style) in the hope of creating a web service that is capable of pointing out problems with normalization on a Institution by Institution basis,  giving suggestions regarding conforming with other institutions and/or repairing internal normalization issues. I think ultimately the best solution would be for each individual institution to be able to see and repair normalization issues in house.</p>
</div>
</div>
<h3 class="bsuite_related">Related items</h3>
<ul class="bsuite_related">
<li><a href='http://cairss.caul.edu.au/blog/2009/05/28/open-repositories-conference-09/'>Open Repositories Conference 09</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://cairss.caul.edu.au/blog/2009/06/10/open-repositories-conference-09-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Open Repositories Conference 09</title>
		<link>http://cairss.caul.edu.au/blog/2009/05/28/open-repositories-conference-09/</link>
		<comments>http://cairss.caul.edu.au/blog/2009/05/28/open-repositories-conference-09/#comments</comments>
		<pubDate>Thu, 28 May 2009 03:16:52 +0000</pubDate>
		<dc:creator>caulcairss</dc:creator>
				<category><![CDATA[DSpace]]></category>
		<category><![CDATA[Fedora]]></category>
		<category><![CDATA[Microsoft Research]]></category>
		<category><![CDATA[Open Repositories 09]]></category>
		<category><![CDATA[Squire]]></category>
		<category><![CDATA[The Fascinator]]></category>

		<guid isPermaLink="false">http://caulcairss.wordpress.com/2009/05/28/open-repositories-conference-09/</guid>
		<description><![CDATA[Open Repositories Conference 09]]></description>
			<content:encoded><![CDATA[<div>
<div class="page-toc">
<ul>
<li><a href="#id1">General Overview</a></li>
<li><a href="#id2">Participants and Sponsors</a></li>
<li><a href="#id3">Microsoft Research</a></li>
<li><a href="#id4">DSpace</a></li>
<li><a href="#id5">Dura Space Organization</a></li>
<li><a href="#id6">@MIRE</a></li>
<li><a href="#id7">Fedora</a></li>
<li><a href="#id8">Poster Presentations</a></li>
<li><a href="#id9">Squire</a></li>
<li><a href="#id10">The Fascinator</a></li>
<li><a href="#id11">Photographs</a></li>
<li><a href="#id12">Wrap up</a></li>
</ul>
</div>
<div>
<h1><a id="id1" name="id1"><!--id1--></a>General Overview</h1>
<p>This years Open Repositories Conference was held in Atlanta Georgia USA. This year marked the 4<sup>th</sup> year for this International Annual Conference. The Conference was held at the Georgia Institute of Technology Hotel and Conference Center.</p>
<h1><a id="id2" name="id2"><!--id2--></a>Participants and Sponsors</h1>
<p>The Conference had representatives from organizations including Dspace, Eprints, Fedora, VTLS, JISC, Microsoft Research, Sun Microsystems, @MIRE and NSF (National Science Foundation).</p>
<h1><a id="id3" name="id3"><!--id3--></a>Microsoft Research</h1>
<p>It was made quite clear to me throughout the conference that Microsoft Research were looking at carrying out research and development and not concerned with directly profiting from their involvement. There were several open discussions during workshops about how they would best create plug in functionality for their products that would enable their users to interact with Repositories. There was allot of constructive conversation hovering around how SWORD would be integrated with new Microsoft Research products/plug-ins. There were good discussions about whether the processing and converting of documents and meta data should be done on the Client, as a Web Service or handled directly by the Repository Software. The main challenges that I could see with doing this is deciding how much freedom to give to the user. Do they simply click a button upon completion of their work, or does the software allow them to interact at quite a low level with regards to meta data and file types, allowing them to review their work in the different formats before the final submission. I am assuming that if a researcher has spent several years writing and researching they would have a substantial amount of time to put the final touches on the master document to make sure that it rendered correctly in HTML and PDF. It would be amazing if we could write software that would handle everything behind the scenes, perhaps eventually we will arrive at this point.</p>
<h1><a id="id4" name="id4"><!--id4--></a>DSpace</h1>
<p>Where is Dspace heading? 2.0 can be expected early 2010</p>
<p>In the mean time 1.6 will be released as a stepping stone to 2.0 and will include bug fixes (due October 2009)</p>
<p>I ran into Kim Shepherd from the Library Consortium of New Zealand on my way out of Atlanta. Kim is a DSpace committer, we had a good conversation about DSpace 2.0 amongst other things.  I will be sure to keep in touch and keep an eye on future development.</p>
<h1><a id="id5" name="id5"><!--id5--></a>Dura Space Organization</h1>
<p>Dura Space is an organization. The first technology to emerge from Dura Space will be a product called Dura Cloud. Dura Cloud consists of a complete hosting service using Dura Space partners (commercial cloud providers). While Dura Space is offering a cloud computing solution as a service, it is possible to download the code and create a cloud computing solution inside your own institution.</p>
<p>Components used by Dura Space are Akubra (A pluggable file storage interface), Mulgura (Semantic store), and Dura Cloud.</p>
<p>Dura Space expect more components will be considered for use as they are discovered.</p>
<h1><a id="id6" name="id6"><!--id6--></a>@MIRE</h1>
<p>I took a bit of time to talk to Bram Luyten from <a href="http://www.atmire.com/">@MIRE</a>. From what I understand @MIRE is a commercial company that works very closely with the developers of DSpace, as I understand it their staff include DSpace committers. @MIRE provide services including preparing and implementing repository solutions, technical assistance, bug fixes, customizations and a support service for the DSpace product.</p>
<p>As I understand it DSpace ships with a BSD license and is therefore very open to this sort of interaction and collaboration with a commercial company. To me this seems to be a fairly  good approach to a Repository solution as it allows the flexibility of using an open source product with the option to request immediate assistance and support at a price should you need it.</p>
<h1><a id="id7" name="id7"><!--id7--></a>Fedora</h1>
<p>Fedora 3.2 wants to shift to using <a href="http://www.fedora-commons.org/confluence/display/AKUBRA/Akubra+Project">Akubra</a> to replace the old Fedora storage interface. The Akubra API is not turned on by default in<a href="http://expertvoices.nsdl.org/hatcheck/2009/05/11/fedora-32-now-available/"> Fedora 3.2</a>, it is hoped that developers will take interest in it over time. This will allow the new technology to be tested and implemented gradually.</p>
<p>An interesting feature of Fedora 3.2 that was mentioned is that you are now able to run multiple Fedoras instances with one Tomcat instance. This has been a topic that I have heard raised a few times over the last couple of years.</p>
<h1><a id="id8" name="id8"><!--id8--></a>Poster Presentations</h1>
<h1><a id="id9" name="id9"><!--id9--></a>Squire</h1>
<p>The poster sessions included Squire. Squire as you probably already know is the Java version ov the VTLS product VALET. It was developed with ARROW funding. It appears that VTLS has recently taken an interest in this product and it is possible that they will further develop it. Whether it remains open source or not remains to be seen.</p>
<p><span style="display:block;"><a><img class="fr1" style="border:0;vertical-align:top;" src="http://caulcairss.files.wordpress.com/2009/05/6894c119.png" alt="graphics1" height="842" /></a></span></p>
<h1><a id="id10" name="id10"><!--id10--></a>The Fascinator</h1>
<p>This poster was presented by Peter Sefton. The Fascinator is an Apache <a href="http://lucene.apache.org/solr/">Solr</a> front end to the <a href="http://www.fedora-commons.org/">Fedora commons</a> repository, I am again guessing that most of you probably already know that. You can find out more about <a href="http://fascinator.usq.edu.au/">The Fascinator here</a>.</p>
<p><span style="display:block;"><a><img class="fr1" style="border:0;vertical-align:top;" src="http://caulcairss.files.wordpress.com/2009/05/71d4a4c6_552x782.jpg" alt="graphics2" height="782" /></a></span></p>
<p><a>You can find a full list of the Open Repositories </a><a href="https://or09.library.gatech.edu/poster.php">Poster Sessions here</a></p>
<h1><a id="id11" name="id11"><!--id11--></a>Photographs</h1>
<p>The Open Repository organisers have provided a <a href="https://or09.library.gatech.edu/photos.php">Flickr slide show</a> of the entire conference. You will see Peter and Myself in the <span class="spCh spChx201c">“</span>Minute Madness Poster Presentations<span class="spCh spChx201d">”</span> as well as us discussing the finer points of our posters in the ball room.</p>
<h1><a id="id12" name="id12"><!--id12--></a>Wrap up</h1>
<p>I found that I got just as much information out of talking to people casually than I did during the formal presentations. I met so many people that I have a big job of going through my notes and contacting them all.</p>
<p>In my opinion there was a definite trend towards having distributed systems rather than a single repository. There were even discussions about Repository performance and how running only the database components on separate servers had marked increase in said Repositories performance. I was surprised at how many people are using open source products and building their own applications over the top. Very few used full proprietary solutions. One of the many examples of this would be Ruby on Rails application that incorporated Fedora using Jruby and of course one of the most impressive, our very own <span class="spCh spChx201c">“</span>The Fascinator<span class="spCh spChx201d">”</span> complete with multi-portal creation, harvesting framework, Solr indexing,  security model as well as installers for Linux, MacOS and Windows. Oliver Lucido has also recently created a screen cast of the new <a href="http://www.screencast.com/users/lucido/folders/Jing/media/649c76b1-4205-4d83-8261-d7189438d48b">desktop feature</a>. Peters presentation went down really well he got quite a few laughs with some witty humor. Over all had a great time and cant wait until next time.</div>
</div>
<h3 class="bsuite_related_bypageviews">People who looked at this item also looked at&#8230;</h3>
<ul class="bsuite_related">
<li><a href='http://cairss.caul.edu.au/blog/2010/06/23/two-ands-events-and-what-they-mean-for-cairss/'>Two ANDS events and what they mean for CAIRSS</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2010/04/21/innovative-ideas-for-cairss/'>Innovative Ideas for CAIRSS</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2009/10/13/australian-repository-software-in-use/'>Australian repository software in use</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2010/12/22/thoughts-from-eresearch-australasia-librarians-get-involved-in-research-data-management-2/'>Thoughts from eResearch Australasia &#8211; Librarians, get involved in research data management</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2010/01/20/how-to-export-contents-from-an-institutional-repository-to-a-spreadsheet/'>How to export contents from an Institutional Repository to a Spreadsheet</a></li>
</ul>
<h3 class="bsuite_related">Related items</h3>
<ul class="bsuite_related">
<li><a href='http://cairss.caul.edu.au/blog/2010/01/20/how-to-export-contents-from-an-institutional-repository-to-a-spreadsheet/'>How to export contents from an Institutional Repository to a Spreadsheet</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2009/10/13/australian-repository-software-in-use/'>Australian repository software in use</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2009/06/10/open-repositories-conference-09-part-2/'>Open Repositories Conference 09 Part 2</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2009/05/13/what-do-you-get-when-you-combine-dspace-and-fedora-%e2%80%a6-duraspace/'>What do you get when you combine DSpace and Fedora? … DuraSpace</a></li>
<li><a href='http://cairss.caul.edu.au/blog/2010/03/09/cairss-repository-software-sandbox/'>CAIRSS Repository Software Sandbox</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://cairss.caul.edu.au/blog/2009/05/28/open-repositories-conference-09/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

