The Public Knowledge Project 2009 conference ultimately made me re-think the way that open access (OA) is defined and subdivided.

The current subdivision is dichotomous.  Open access is subdivided into the gratis and the libre models as described by Peter Suber in his Open Access Newsletter, where gratis OA refers to access without price barriers alone, while libre OA involves the removal of price and at least some permission barriers.  I perceive this to be a hierarchy of use, where gratis OA is less usable as permissions for further use of these items are not clear.

The concept of the hierarchy was echoed in a workshop that I attended at the PKP Scholarly Publishing Conference on Lemon8-XML (L8X).  One of the speakers, MJ Suhonos, underscored that all document dissemination formats are not created equal. If one compares an XML encoded article to the same article available in PDF, we see that the XML encoded article enables enhanced access to the content.  The strength is in the modularity of the XML, which enables the content to be labeled and described explicitly in a standardized way.  The usefulness of XML can be described using the example of citations.  In a PDF, the citations sit lumped with the rest of the PDF and can not be reliably harvested or parsed as discrete citations because to a machine they appear to be identical to the text of the article.  In XML, the citations are denoted as citations and hence can be parsed and analyzed as such.

One can’t help by imagine a world where every document has semantically encoded citations!  We would not need to rely on ISI and Scopus anymore (or pay the Crossref fees).  Everyone would have equal access to citation harvesting and analysis. (Two years ago, a Scopus vendor told me their indexing rejection rate was approximately 80%…talk about an elite society!)  XML markup could enable global barrier-free citation analysis, where elite membership would no longer be necessary.

In this same L8X session, Juan Pablo Alperin discussed other benefits of XML markup besides the infinite possibilities of enhanced bibliometric analysis. He asked us to imagine the benefits of discovering collaboration networks, where enhanced author markup, for example, would enable us to see which institution collaborates with whom. Enhanced document discovery would also be a benefit, where the availability of complete metadata means that we can find related works in many ways such as: by the same author, subject, in same journal, by the same publisher.

While we are already seeing some of these benefits in Google Scholar, not all articles are marked up in a way to be able to fully benefit from what Google Scholar has to offer.

We see then, that there is a divide between articles which are static in their nature like PDF vs. articles that are marked up in such a way that all their components have meaning associated with them. I argue, then, that articles that are not marked up in XML are less usable than those that are, just like research that is available as gratis OA is somewhat less usable than libre OA.

The PKP team has been aware of the benefits of XML early on and responded by creating the Lemon8-XML software.  They recognize the need for equal semantic exposure for all scholarship and have created a tool that puts this ability within everyone’s reach.

The Lemon8 software enables an editor to upload an article, and takes them step by step through marking up that document in XML while abstracting them from the gory details. Lemon8 identifies document metadata such as title and author, and among other features, searches multiple databases to help verify citations by automatically suggesting additional data in a user friendly way. Article markup is still not a quick venture, but if editors were to incorporate Lemon8 into their workflow, it could actually save them time as it would greatly reduce the time it takes to verify citations while at the same time enabling their semantic markup.

I am excited to learn that integration of Lemon8 into the Open Journal Systems software is on the development roadmap for the Public Knowledge Project, and am looking forward to working with this added functionality.

On Tuesday May 5, my colleague Jeff Newman and I presented The Library in the Cloud at the TRY Conference.  In the days leading up to the presentation, I was asking myself why I had agreed to add yet another item to my “to do” list which seems to continually grow longer instead of becoming more manageable.

After the presentation, however, I realized that I was glad to be part of the show.  First off, I got a front row seat to Jeff’s part of the talk…and boy did he blow me away as a presenter!  He’s an absolute natural…I wouldn’t be surprised if he’s doing the keynote circuit in a few years.

I was also grateful for an audience member’s question.  It was something along the lines of “what comes after cloud computing?”.

During the talk I remember discussing the benefits of cloud computing, and one of them included shared standards.  I think that we still have much further to go in this respect.  The ability to share and aggregate information via RSS feeds is great, but I would love to see more semantic interoperability taking place.  Creating a friend of a friend profile is still a challenge, and I’m finding that I’m still having to re-enter much of my information in too many places.

While I do see the cloud continuing to enable more information to be created and available online,  I hope that further interoperability between cloud based platforms develops so that information can be mined and shared much more efficiently and creatively.