The Public Knowledge Project 2009 conference ultimately made me re-think the way that open access (OA) is defined and subdivided.

The current subdivision is dichotomous.  Open access is subdivided into the gratis and the libre models as described by Peter Suber in his Open Access Newsletter, where gratis OA refers to access without price barriers alone, while libre OA involves the removal of price and at least some permission barriers.  I perceive this to be a hierarchy of use, where gratis OA is less usable as permissions for further use of these items are not clear.

The concept of the hierarchy was echoed in a workshop that I attended at the PKP Scholarly Publishing Conference on Lemon8-XML (L8X).  One of the speakers, MJ Suhonos, underscored that all document dissemination formats are not created equal. If one compares an XML encoded article to the same article available in PDF, we see that the XML encoded article enables enhanced access to the content.  The strength is in the modularity of the XML, which enables the content to be labeled and described explicitly in a standardized way.  The usefulness of XML can be described using the example of citations.  In a PDF, the citations sit lumped with the rest of the PDF and can not be reliably harvested or parsed as discrete citations because to a machine they appear to be identical to the text of the article.  In XML, the citations are denoted as citations and hence can be parsed and analyzed as such.

One can’t help by imagine a world where every document has semantically encoded citations!  We would not need to rely on ISI and Scopus anymore (or pay the Crossref fees).  Everyone would have equal access to citation harvesting and analysis. (Two years ago, a Scopus vendor told me their indexing rejection rate was approximately 80%…talk about an elite society!)  XML markup could enable global barrier-free citation analysis, where elite membership would no longer be necessary.

In this same L8X session, Juan Pablo Alperin discussed other benefits of XML markup besides the infinite possibilities of enhanced bibliometric analysis. He asked us to imagine the benefits of discovering collaboration networks, where enhanced author markup, for example, would enable us to see which institution collaborates with whom. Enhanced document discovery would also be a benefit, where the availability of complete metadata means that we can find related works in many ways such as: by the same author, subject, in same journal, by the same publisher.

While we are already seeing some of these benefits in Google Scholar, not all articles are marked up in a way to be able to fully benefit from what Google Scholar has to offer.

We see then, that there is a divide between articles which are static in their nature like PDF vs. articles that are marked up in such a way that all their components have meaning associated with them. I argue, then, that articles that are not marked up in XML are less usable than those that are, just like research that is available as gratis OA is somewhat less usable than libre OA.

The PKP team has been aware of the benefits of XML early on and responded by creating the Lemon8-XML software.  They recognize the need for equal semantic exposure for all scholarship and have created a tool that puts this ability within everyone’s reach.

The Lemon8 software enables an editor to upload an article, and takes them step by step through marking up that document in XML while abstracting them from the gory details. Lemon8 identifies document metadata such as title and author, and among other features, searches multiple databases to help verify citations by automatically suggesting additional data in a user friendly way. Article markup is still not a quick venture, but if editors were to incorporate Lemon8 into their workflow, it could actually save them time as it would greatly reduce the time it takes to verify citations while at the same time enabling their semantic markup.

I am excited to learn that integration of Lemon8 into the Open Journal Systems software is on the development roadmap for the Public Knowledge Project, and am looking forward to working with this added functionality.

If you have not yet had a chance to play with the Scopus interface, its well worth a look.  Through their extensive research, Scopus has made the use of facets approacable and intuitive.  Just like in Erik Hatcher’s Collex platform, users can choose to either limit or exclude multiple facets within specific categories in their searches.

I’m glad that the Scopus folks made it down to demo the service as I had a chance to ask them some questions about their content.  I’m always thinking about how to promote and include the journals that we’re hosting at York through Open Journal Systems software, and so I asked one of the three Scopus representatives about the process involved.  It turns out the there is an on-line form through which one can suggest a title.  Applications are reviewed once per year.  This year’s deadline is September 1st, 2008.  It was disappointing to hear that the rejection rate is 65% because being accessible through this interface would enable Scopus users to more deliberately discover a journal’s content.

Another huge benefit to journals indexed by Scopus is the Journal Analyzer function.  This tool allows a journal to track its citations back to 1996.  The Journal Analyzer also allows users to select up to 10 journals and compare their performance next to each other on the same graph.  This could be a useful visual accompaniment to a grant application.

I was reading Peter Suber’s July 08 Open Access Newsletter, and its enough to make my head spin…there are so many developments posted on Open Access News I just can’t keep up anymore, its fantastic that “hot” stories are now tagged and the feed to these stories can be subscribed to here.

I was struck by a particular point Peter raised:  that the availability of funds to pay for access [to research] does not scale to keep pace with the growth of published knowledge.

It made me think about the format problem.  I’ve been hearing it mentioned over and over again, this question: why are we so attached to packaging our research into a journal format?

Is it the brand name that we’re so attached to?  If we’re looking for quality, do we simply just seek out the Prada of journals?  Does not the research stand up for itself, just like a consumer good has to? If your Vuitton luggage falls apart after one trip down the baggage conveyor belt, does the fancy brand matter anymore?

Maybe its about lack of time.  Who has the time to compare quality of consumer goods…we’ve all purchased a generic brand at one point or another that greatly underperformed. To protect against that disappointment, its just easier to pay a little more for the name brand version.  Perhaps we adopt a similar mentality with research?

This worries me a bit.  The fact that research output volumes are multiplying so quickly…is it not in a way working against the cause?…is it not further fueling the demand for these high impact journals to exist? Is it not so much easier to simply save time by trusting the name brand research?

In my mind, the solution lies in the metrics…where citations and downloads can be measured and compared to the opinions of the elite groups of peer reviewers that decide which articles are Vuittons and which are simply generic.   I can’t help but predict that once a more unified and unprejudiced method of tracking impact appears, brand names just won’t matter.  The quality of an item of research will simply stand up for itself, visible for all to see, no longer in need of being sold under a designer label.

Stephen Harnad, in his article “Validating research performance metrics against peer rankings” creates a solid argument for using both technometrics and peer review to strengthen the validity of both.

Stephen mentions Citebase, a scientometric search engine that reference-links nearly 500,000 papers and ranks papers and authors based on citation counts, download counts and other metrics.

Stephen cites different factors contributing to the OA impact advantage:

(1) early access

(2) quality bias – higher quality articles are more likely to be made OA

(3) quality advantage – higher quality articles benefit more from being made OA

(4) usage advantage – OA articles more accessible quickly and easily

(5) competitive advantage

Useful quote: “Citation counts of OA articles are on average twice as high as those that are not.”
Harnard & Brody 2004 doi: 10.3354/esep00076
Hajjem et al. 2005 http://eprints.ecs.soton.ac.uk/11688/