The Public Knowledge Project 2009 conference ultimately made me re-think the way that open access (OA) is defined and subdivided.

The current subdivision is dichotomous.  Open access is subdivided into the gratis and the libre models as described by Peter Suber in his Open Access Newsletter, where gratis OA refers to access without price barriers alone, while libre OA involves the removal of price and at least some permission barriers.  I perceive this to be a hierarchy of use, where gratis OA is less usable as permissions for further use of these items are not clear.

The concept of the hierarchy was echoed in a workshop that I attended at the PKP Scholarly Publishing Conference on Lemon8-XML (L8X).  One of the speakers, MJ Suhonos, underscored that all document dissemination formats are not created equal. If one compares an XML encoded article to the same article available in PDF, we see that the XML encoded article enables enhanced access to the content.  The strength is in the modularity of the XML, which enables the content to be labeled and described explicitly in a standardized way.  The usefulness of XML can be described using the example of citations.  In a PDF, the citations sit lumped with the rest of the PDF and can not be reliably harvested or parsed as discrete citations because to a machine they appear to be identical to the text of the article.  In XML, the citations are denoted as citations and hence can be parsed and analyzed as such.

One can’t help by imagine a world where every document has semantically encoded citations!  We would not need to rely on ISI and Scopus anymore (or pay the Crossref fees).  Everyone would have equal access to citation harvesting and analysis. (Two years ago, a Scopus vendor told me their indexing rejection rate was approximately 80%…talk about an elite society!)  XML markup could enable global barrier-free citation analysis, where elite membership would no longer be necessary.

In this same L8X session, Juan Pablo Alperin discussed other benefits of XML markup besides the infinite possibilities of enhanced bibliometric analysis. He asked us to imagine the benefits of discovering collaboration networks, where enhanced author markup, for example, would enable us to see which institution collaborates with whom. Enhanced document discovery would also be a benefit, where the availability of complete metadata means that we can find related works in many ways such as: by the same author, subject, in same journal, by the same publisher.

While we are already seeing some of these benefits in Google Scholar, not all articles are marked up in a way to be able to fully benefit from what Google Scholar has to offer.

We see then, that there is a divide between articles which are static in their nature like PDF vs. articles that are marked up in such a way that all their components have meaning associated with them. I argue, then, that articles that are not marked up in XML are less usable than those that are, just like research that is available as gratis OA is somewhat less usable than libre OA.

The PKP team has been aware of the benefits of XML early on and responded by creating the Lemon8-XML software.  They recognize the need for equal semantic exposure for all scholarship and have created a tool that puts this ability within everyone’s reach.

The Lemon8 software enables an editor to upload an article, and takes them step by step through marking up that document in XML while abstracting them from the gory details. Lemon8 identifies document metadata such as title and author, and among other features, searches multiple databases to help verify citations by automatically suggesting additional data in a user friendly way. Article markup is still not a quick venture, but if editors were to incorporate Lemon8 into their workflow, it could actually save them time as it would greatly reduce the time it takes to verify citations while at the same time enabling their semantic markup.

I am excited to learn that integration of Lemon8 into the Open Journal Systems software is on the development roadmap for the Public Knowledge Project, and am looking forward to working with this added functionality.

SIMILE makes me smile…

January 8, 2009

I’ve been researching semantic web applications and have finally set aside some time to try out some of the wonderful applications developped by the folks at MIT.

The SIMILE project creates open source applications that allow users to “access, manage, visualize and reuse digital assets”.  I recently downloaded and installed the Seek add-on for the Thunderbird e-mail application.

Seek essentially allows me to toggle a view in Thunderbird that enables me to browse my email in multiple faceted views that I can add or subtract, and even allows me to view and sort through threads.

This add-on has changed the way I tackle my email, especially when I become overwhelmed after coming back from a conference or vacation.   By clicking on a facet for a certain person’s email address, for example, I can quickly view and track all their correspondence in one screen.  It is also really helpful to find that one missing email that you know is lurking somewhere in a folder but is just not coming up in your searches.

Check out the google code location for the project to see updates and a great instructional video.  Congratulations to the SIMILE team, and especially David François Huynh for an extremely useful tool.

Next on my list for trying out SIMILE projects is Longwell, a web-based RDF-powered highly-configurable faceted browser.