Can we further subdivide nuances of open access?
July 15, 2009
The Public Knowledge Project 2009 conference ultimately made me re-think the way that open access (OA) is defined and subdivided.
The current subdivision is dichotomous. Open access is subdivided into the gratis and the libre models as described by Peter Suber in his Open Access Newsletter, where gratis OA refers to access without price barriers alone, while libre OA involves the removal of price and at least some permission barriers. I perceive this to be a hierarchy of use, where gratis OA is less usable as permissions for further use of these items are not clear.
The concept of the hierarchy was echoed in a workshop that I attended at the PKP Scholarly Publishing Conference on Lemon8-XML (L8X). One of the speakers, MJ Suhonos, underscored that all document dissemination formats are not created equal. If one compares an XML encoded article to the same article available in PDF, we see that the XML encoded article enables enhanced access to the content. The strength is in the modularity of the XML, which enables the content to be labeled and described explicitly in a standardized way. The usefulness of XML can be described using the example of citations. In a PDF, the citations sit lumped with the rest of the PDF and can not be reliably harvested or parsed as discrete citations because to a machine they appear to be identical to the text of the article. In XML, the citations are denoted as citations and hence can be parsed and analyzed as such.
One can’t help by imagine a world where every document has semantically encoded citations! We would not need to rely on ISI and Scopus anymore (or pay the Crossref fees). Everyone would have equal access to citation harvesting and analysis. (Two years ago, a Scopus vendor told me their indexing rejection rate was approximately 80%…talk about an elite society!) XML markup could enable global barrier-free citation analysis, where elite membership would no longer be necessary.
In this same L8X session, Juan Pablo Alperin discussed other benefits of XML markup besides the infinite possibilities of enhanced bibliometric analysis. He asked us to imagine the benefits of discovering collaboration networks, where enhanced author markup, for example, would enable us to see which institution collaborates with whom. Enhanced document discovery would also be a benefit, where the availability of complete metadata means that we can find related works in many ways such as: by the same author, subject, in same journal, by the same publisher.
While we are already seeing some of these benefits in Google Scholar, not all articles are marked up in a way to be able to fully benefit from what Google Scholar has to offer.
We see then, that there is a divide between articles which are static in their nature like PDF vs. articles that are marked up in such a way that all their components have meaning associated with them. I argue, then, that articles that are not marked up in XML are less usable than those that are, just like research that is available as gratis OA is somewhat less usable than libre OA.
The PKP team has been aware of the benefits of XML early on and responded by creating the Lemon8-XML software. They recognize the need for equal semantic exposure for all scholarship and have created a tool that puts this ability within everyone’s reach.
The Lemon8 software enables an editor to upload an article, and takes them step by step through marking up that document in XML while abstracting them from the gory details. Lemon8 identifies document metadata such as title and author, and among other features, searches multiple databases to help verify citations by automatically suggesting additional data in a user friendly way. Article markup is still not a quick venture, but if editors were to incorporate Lemon8 into their workflow, it could actually save them time as it would greatly reduce the time it takes to verify citations while at the same time enabling their semantic markup.
I am excited to learn that integration of Lemon8 into the Open Journal Systems software is on the development roadmap for the Public Knowledge Project, and am looking forward to working with this added functionality.
What comes after cloud computing?
May 7, 2009
On Tuesday May 5, my colleague Jeff Newman and I presented The Library in the Cloud at the TRY Conference. In the days leading up to the presentation, I was asking myself why I had agreed to add yet another item to my “to do” list which seems to continually grow longer instead of becoming more manageable.
After the presentation, however, I realized that I was glad to be part of the show. First off, I got a front row seat to Jeff’s part of the talk…and boy did he blow me away as a presenter! He’s an absolute natural…I wouldn’t be surprised if he’s doing the keynote circuit in a few years.
I was also grateful for an audience member’s question. It was something along the lines of “what comes after cloud computing?”.
During the talk I remember discussing the benefits of cloud computing, and one of them included shared standards. I think that we still have much further to go in this respect. The ability to share and aggregate information via RSS feeds is great, but I would love to see more semantic interoperability taking place. Creating a friend of a friend profile is still a challenge, and I’m finding that I’m still having to re-enter much of my information in too many places.
While I do see the cloud continuing to enable more information to be created and available online, I hope that further interoperability between cloud based platforms develops so that information can be mined and shared much more efficiently and creatively.
PRONOM
March 25, 2009
Paving the way towards sustainability of electronic records is PRONOM, an online registry of technical information.
An initiative of the National Archives (the UK government’s official archive in Surrey) the PRONOM registry was “originally developed to support the accession and long-term preservation of electronic records”. The National Archives have graciously made this valuable resource available to all.
As described on the site, “PRONOM holds information about file formats, and the software products which can process (read, write, identify etc) each format. Information related to the file formats, such as documentation about them, their compression types, character encoding schemes and intellectual property rights is also held. “
When browsing the site, I was pleased to find that in addition to a simple search, one can search by file format, vendor, software, lifecycle, migration pathway and Pronom unique identifier. The search also allows you to find file formats by extension, and to search for software that can process files with a particular extension (or file format name). An online submission form is available to encourage user contributions and to help keep the registry current.
What an important step towards tackling the challenges of digital preservation!
SIMILE makes me smile…
January 8, 2009
I’ve been researching semantic web applications and have finally set aside some time to try out some of the wonderful applications developped by the folks at MIT.
The SIMILE project creates open source applications that allow users to “access, manage, visualize and reuse digital assets”. I recently downloaded and installed the Seek add-on for the Thunderbird e-mail application.
Seek essentially allows me to toggle a view in Thunderbird that enables me to browse my email in multiple faceted views that I can add or subtract, and even allows me to view and sort through threads.
This add-on has changed the way I tackle my email, especially when I become overwhelmed after coming back from a conference or vacation. By clicking on a facet for a certain person’s email address, for example, I can quickly view and track all their correspondence in one screen. It is also really helpful to find that one missing email that you know is lurking somewhere in a folder but is just not coming up in your searches.
Check out the google code location for the project to see updates and a great instructional video. Congratulations to the SIMILE team, and especially David François Huynh for an extremely useful tool.
Next on my list for trying out SIMILE projects is Longwell, a web-based RDF-powered highly-configurable faceted browser.
FSOSS 2008
October 26, 2008
I attended the Free Software and Open Source Symposium October 23-24, 2008. As always, it was a rewarding experience, and I intend to go back next year. For the low price of only $50 (early registration), a delegate has access to two days of sessions, several workshops, and an assortment of goodies including a tee shirt, lunches and a reception. All sessions are recorded and available on-line for viewing after the conference.
I am definitely going to try out the TikiWiki CMS/Groupware application. It has a very handy database tool that enables the user to create databases through an easy web interface. Web forms with customized fields can be also easily created to populate your databases. I like the TikiWiki philosophy where each release comes with every available add-on which can then be enabled or disabled through site administration pages. This ensures that all modules are updated at the time of a new release, and saves one from having to go module-hunting when new functionality is required.
I was happy to see that FSOSS featured a session on open access. Leslie Chan discussed the convergence of open access with open source. His session reminded us of the significance of the open source contribution to the open access revolution. John Willinsky was visionary in realizing that a major barrier to publishing journals on-line barrier-free was the cost of creating journal publishing software. His Open Journal Systems project has enabled over 2000 journals worldwide to make journal content available on-line, most of it without barriers to access. Open source projects like his are contributing to the steady increase of peer-reviewed scholarship freely available on-line.
Marcus Bornfreund was absolutely swarmed with questions after his talk on Creative Commons and creative copyright licensing. His session helped to bring home the message that assigning a creative commons license to a work does not limit ones ability to make a commercial profit from said work. The cc license only sets the base standard for all who have not made alternate arrangements with the copyright holder. It is necessary to remember that once a creative commons license is assigned to a work, any further arrangements made with respect to that work cannot be exclusive. Marcus will be speaking at York University with Professor Pina D’Agostino on November 19th about copyright in the academy.
CARL IR Meeting at Access 2008
October 2, 2008
The Canadian Association of Research Libraries hosed an Institutional Repository meeting in Hamilton Ontario on Wednesday October 1st, 2008 to coincide with Access 2008, and I was happy to be in attendance.
This was an extremely worthwhile meeting where participants were able to trade stories of their successes, challenges, and plans for the future. With over 40 of us in the room, it took almost the entire duration of the meeting to do a round of introductions discussing individual repositories. I sincerely hope that this becomes an annual event!
There were three major themes that emerged from the meeting:
Theme 1 – DSpace is commonly used, most institutions running 1.4 version
Most of the repositories in attendance were hosted using DSpace software. I was very surprised to hear that most of the DSpace hosted repositories were versions of the 1.4 release, and that only two institutions had migrated to the 1.5 release. I was slightly relieved because we finally completed our migration to 1.5 and I thought that we were behind!
As a result, the Manakin XML interface layer for DSpace was also not being used. We were the only ones to have a production version of Manakin running.
Reasons cited for not migrating to 1.5 included:
- customizations made to DSpace 1.4 will take a lot of programming time to move over to 1.5
- certain plug-ins and enhancements that are in heavy use in 1.4 have not yet been made available for 1.5
- administrators are evaluating other platforms and are not willing to invest the time in upgrading to 1.5 if they end up switching platforms
- programmers are hard to find, train and retain
Please visit my follow-up post to this section that elaborates on these observations.
Theme 2 – Electronic Theses and Dissertations (ETDs)
ETDs were discussed at length as they are very popular and make up a sizable percentage of most Canadian repository content. Only one institution has mandated electronic thesis deposit, but many have effective relationships with their respective Graduate Faculties where procedures have been established to enable the depositing of theses into repositories on an ongoing basis.
Copyright has been tackled in many ways: seeking legal advice from campus legal counsel, sending letters to alumni, taking out an ad in institutional alumni magazines, and re-writing agreements to be signed by current graduates.
The availability of past Proquest theses were discussed but common problems were echoed: poor quality scans for certain year ranges, Proquest marc records not tying to digital copies of theses by filename, lack of ocr, and the need to remove signature pages have slowed down workflow to ingest these items.
Theme 3 – Scholarly Communications Programs
Many of the participating institutions are hosting outreach programs to discuss Scholarly Communications challenges with faculty. Efforts include hosting speaker events and creating websites/supporting materials.
The DSpace Course
August 28, 2008
Congratulations to the JISC-funded Repositories Support Project for The DSpace Course initiative! A colleague brought this to my attention today and I immediately took a look.
The DSpace course is a set of 20 modules that can be used to create a custom course. Released with a creative commons “by-attribution share-alike” license, both pdfs and powerpoint/word files are released for easy remixing and resue.
The course modules cover topics such as configuration, customization and management. It is an incredibly valuable training resource, especially for new hires that have the skills but are not familiar with DSpace. It will save me weeks of training time in the future by quickly demystifying DSpace and making it more approachable. To further enhance the learning experience, an accompanying live CD has also been released which allows DSpace to be run and used without having to install it on a server.
My thanks to Stuart Lewis and Chris Yates for all their hard work. My printer is humming in the background as I’m looking forward to ploughing through the course on my home commute.
DSpace and Fedora join forces
July 30, 2008
I’m very happy to see an announcement regarding the upcoming collaborative relationship between DSpace and Fedora.
In repository circles, DSpace has the reputation of being relatively easy to get off the ground which has contributed to its popularity, while Fedora is considered to be more of an investment to assemble and roll out, but has the benefits of increased modularity. I am hoping that this collaboration will bring forward the best of both platforms to the point where it will not matter which product is being used. We’re all in it for the same reasons after all!
Scopus’ friendly seaching interface
July 10, 2008
If you have not yet had a chance to play with the Scopus interface, its well worth a look. Through their extensive research, Scopus has made the use of facets approacable and intuitive. Just like in Erik Hatcher’s Collex platform, users can choose to either limit or exclude multiple facets within specific categories in their searches.
I’m glad that the Scopus folks made it down to demo the service as I had a chance to ask them some questions about their content. I’m always thinking about how to promote and include the journals that we’re hosting at York through Open Journal Systems software, and so I asked one of the three Scopus representatives about the process involved. It turns out the there is an on-line form through which one can suggest a title. Applications are reviewed once per year. This year’s deadline is September 1st, 2008. It was disappointing to hear that the rejection rate is 65% because being accessible through this interface would enable Scopus users to more deliberately discover a journal’s content.
Another huge benefit to journals indexed by Scopus is the Journal Analyzer function. This tool allows a journal to track its citations back to 1996. The Journal Analyzer also allows users to select up to 10 journals and compare their performance next to each other on the same graph. This could be a useful visual accompaniment to a grant application.
ELPUB 2008
June 27, 2008
I am really inspired by some of the sessions I attended at ELPUB 2008 in Toronto June 25-27.
Publishers at York University will soon be introduced to the WebCite service. It is a self-archiving service for web URLs, allowing users to request that a web page be archived. A successful request results in a permanent link that can be used to cite that snapshot of the web location at that particular time in perpetuity, allowing authors to use web URLs in their bibliographies with confidence.
Gunther Eysenbach`s paper discussing the WebCite service can be found here.
I am also looking forward to indexing some of York University`s encoded archival desciption files with California Digital Library`s eXtensible Text Framework. I attended the pre-conference workshop and am now happily playing with XTF on my laptop.