Time and Tide

There are benefits to posting post-event quickly (you get the word out fast) and maybe there are benefits to being so swamped with work that you get time to mull. It has been nearly a month since lod-lam NZ happened in Wellington and the session on Dec 1, Thursday afternoon on vocabularies was one I’ve been waiting for for some time. Tim Wray (who is a PhD student from University of Wollongong) wrote in an email “I was wondering if you like to contribute your part – particularly your humanist / social perspective on the issue of vocabularies and alignment”. Tim is going to explain the discussion in that session from a computational linguistics point of view and his own perspective, so this post is food for the culture vultures and semi-technical cake eaters of the GLAM sector keen on linked open data.

These were the topics the conversation ranged through in the vocabularies session:

  • curatorial questions around selecting vocabularies
  • vocabulary as cultural artefact
  • cultural questions around automating vocabularies
  • roles of curation and linguistic computation in aligning vocabularies

The discussion that afternoon started solidly thanks to Stuart Yeates from the New Zealand Electronic Text Centre (NZETC) who lead the discussion and called for input around what vocabularies people might want to use in their linked open data. You can see his post Metadata vocabularies LODLAM NZ cares about on his blog Open Source Exile and the breakdown of the vocabularies shouted out quickly by the group. A quick glance at these is telling of the New Zealand cultural context, the Trans-Tasman common areas of interest and the strong influence and immense value of the work still being done by the Library of Congress in the USA.

The point about semantics (and one that needs many and more qualified perspectives to answer well) is that what can be interpreted literally is not always meaningful culturally. The title of the blogpost was intentional and intuitively made in drawing from an old adage. Language and its usage (like tides) ebbs and flows and vocabularies provided via linked open data need to in some way allow for this shifting of meaning over time and yet being able to assert levels of accuracy at a point in time. The only word I could find to describe this was a concept of attenuation that is, how long the note sounds and in the sense of semantics the signal in the original meaning lingers in some way over time. Concepts like tide marks, water measures, lunar calendars, and sea currents all means to triangulate and test smooth sailing all play into this idea of how to programme for linked open data, so do concepts like scales, tonal and atonal music, harmony, dissonance, assonance, signals, top, middle and low notes. Does the cake seem all seems too thick with metaphorical icing? Not quite.

A while back I watched a presentation Open data for the cultural masses – Mapping and the Europeana Semantic Layer by Guus Schreiber on Amalgame and the alignment of vocabularies and he seemed to be talking about levels of accuracy in asserting vocabulary alignments. Forgive me, I was listening with one ear whilst working, but I got so excited by what I seemed to be hearing (with a non computer science ear) that I sent the link along to colleagues at the Australian National Data Service and to one of the directors there who is a computational linguist – Andrew Treloar. There are several of that ilk in the ANDS team, Adrian Burton is a technologist and linguist and so is Nick Nicholas. Next year is going to be hot stuff for the @andsdata team on the linked open data front. The linked open data services for the Party (researchers and research organisations) and Research Activity (projects, grants, funds etc) Infrastructures are going into action and include a vocabulary service for the Australian and New Zealand Standard Research Classification (ANZSRC) – so watch that space for some Trans-Tasmanian action.

The group began a brief discussion about how people choose vocabularies and the librarians in the room all had ‘interesting’ or ‘old fashioned’ looks on their faces. Perhaps it is so obvious to library and information workers that vocabularies are social, political, cultural etc etc, that the choice to use one is not just one based on information theory but also on cultural theory – an act of data curation. Tim asked if ontologies were cultural artefacts and there was a resounding ‘yes’ from within the group. I’m not sure who was louder, Adrian Kingston, Stuart Yeates, Chris Todd or Sydney Shep. It is gratifying that those passionate responses came from a museum technologist, computer scientist in the digital humanities, a leading cataloguer and an academic in library and information studies. Maybe because the GLAM sector have used ontologies both schematic and semantic for such a long time and for some of the humanities computing people in the room, this was all ‘understood’ and we’re on to the next challenge: how to make this work? It seemed powerful for me in any event that culture, linked open data, vocabularies and curation were combined in a discussion and in that discussion people felt strongly about resolving these questions from each domain effectively and together. This is what I quietly call digital cultural heritage – working out the design – driving technological design with cultural questions, testing technological techniques out against cultural questions and on and on. The types of judgements about which ontology or standard to follow is the bread and butter of a great deal of the information practices in GLAM sector work by web developers, programmers, registrars, curators, cataloguers, and archivists. The next challenges is linked open data and how to use, reuse and align vocabularies to expose cultural collection data in expected and new ways, audiences and minds. In two words, awesome challenge.

The discussion at the end around the capacity to use Kupu – a Māori thesaurus developed by the National Library of New Zealand was informative and instructive. Years ago a subject database called ‘KUPU’ was one of the few means to begin to access resources described using Māori provided by the National Library of New Zealand (the subject access in the database was built around a thesaurus called He puna kupu Māori : hei tohu-ā-kupu ; an indexing thesaurus in the Māori language). Precious few means were available then to provide access to resources in Māori and English at that time (1990s) with appropriate Māori intellectual access points. More information tools have been developed and we learned from Chris Todd from the National Library that Kupu needs to be updated and needs more work to ensure that the intellectual access the thesaurus provides is both meaningful and/or culturally appropriate. So the tides have shifted and in a positive way and yet are they are undiminished, the demand remains for enriched Māori intellectual access to information resources. The group were voting keenly for the Māori Subject Headings (maintained by the National Library of New Zealand) to be linked open data and further thoughts were exchanged around the use of name authorities.

Designs of ornamentation on Maori rafters. Nos 13, 14, 15. (1890s). Herbert William Williams 1860-1937. Alexander Turnbull Library, National Library of New Zealand

Time and tide way for no-one… so this is an urging to keep a thought that Michael Lascarides offered in his keynote presentation at the National Digital Forum to be mindful of the new and better challenges for the GLAM community to work on: the ‘Why | What | How | What If/Then’ questions of culture, linked open data and vocabularies.

Note: meanings for kupu used as a noun means ‘word or vocabulary’ and upoko used as a noun means ‘head’ in English provided by Māori Dictionary online


The International Linked Open Data in Libraries, Archives, and Museums Summit (“LOD-LAM”) will convene leaders in their respective areas of expertise from the humanities and sciences to catalyze practical, actionable approaches to publishing Linked Open Data, specifically:

Identify the tools and techniques for publishing and working with Linked Open Data.
Draft precedents and policy for licensing and copyright considerations regarding the publishing of library, archive, and museum metadata.
Publish definitions and promote use cases that will give LAM staff the tools they need to advocate for Linked Open Data in their institutions.
Where and when?

The LOD-LAM Summit will take place June 2-3, 2011 in San Francisco, CA.

How will the Summit be organized?

LOD-LAM will utilize the Open Space Technology meeting format, designed to give this group of expert innovators the time and space to freely identify and address as a group the most pressing issues related to forwarding Linked Open Data in libraries, archives, and museums. This format involves an initial session in which the participants collaboratively create the agenda for breakout sessions for the first day. Because the LOD-LAM Summit is action-oriented, a similar process happens on the second day, but with a focus on actionable items, documentation, and collaboration over the short term period of the next year. The meeting is based on the two primary principles of passion and responsibility: passion to jump in and play an active role; and responsibility to lead, and follow through with action. No papers will be submitted or read, no plenaries given, and everyone will participate.

In essence, Open Space puts a focus on convening passionate players across multiple disciplines to address one specific question or theme; in this case the question is “How do we expand international adoption of Linked Open Data amongst Libraries, Archives, and Museums.”

Is it open to all?

Unfortunately, we can only accommodate about 50 people, so we are seeking representative candidates from a broad range of institutions from around the world with diverse levels of leadership and technical expertise. We hope to hold future meetings at various locations around the world that will be open to more participants. All summit proceedings will be open and published in real time.

Who should attend?

The ideal candidate may be a programmer, administrator, lawyer, LAM professional, or any number of things, but will have at least a working understanding of Linked Open Data if not some direct experience with the technology or policies involved. Participants will have the authority in their position to implement policy or technology, or influence decision makers in their institution or sector. We’ll be looking for people that have organized others in their field around Linked Open Data and will have a wide sphere of influence. We seek to have at least 25% of participating institutions contribute to a working use case, so the ideal candidate will be able to contribute to that goal.

How much does it cost?

Thanks to the generous support of our funders and sponsors, there is no cost for attending the meeting. Limited travel grants will be available.

How do I apply?

We will be accepting applications beginning at 8am, PST February 1, 2011, and closing 5pm PST, February 28, 2011. Participants will be selected and notified by March 7, 5pm PST.

Who are the organizers?

Jon Voss (@jonvoss), Founder, LookBackMaps, principal organizer/facilitator.
Kris Carpenter Negulescu, Director of Web Group, Internet Archive, project manager

And special thanks to our Organizing Committee:
Lisa Goddard (@lisagoddard), Acting Associate University Librarian for Information Technology, Memorial University Libraries.

Martin Kalfatovic (@UDCMRK), Assistant Director, Digital Services Division at Smithsonian Institution Libraries and the Deputy Project Director of the Biodiversity Heritage Library.
Mark Matienzo (@anarchivist), Digital Archivist in Manuscripts and Archives at the Yale University Library.

Mia Ridge (@mia_out), Lead Web Developer & Technical Architect, Science Museum/NMSI (UK)
Tim Sherratt (@wragge), National Museum of Australia & University of Canberra
MacKenzie Smith, Research Director, MIT Libraries.
Adrian Stevenson (@adrianstevenson), Research Officer, UKOLN; Project Manager, LOCAH Linked Data Project.

John Wilbanks (@wilbanks), VP of Science, Director of Science Commons, Creative Commons.

Proposed: a 4-star classification-scheme for linked open cultural metadata

One of the outcomes of last week’s LOD-LAM Summit was a draft document proposing a new way to assess the openness/usefulness of linked data for the LAM community. This is a work in progress, but is already provoking interesting debate on our options as we try to create a shared strategy. Here’s what the document looks like today, and we welcome your comments, questions and feedback as we work towards version 1.0.



A 4 star classification-scheme for linked open cultural metadata

Publishing openly licensed data on the Web and contributing to the Linked Open Data ecosystem can have a number of benefits for libraries, archives and museums.

Driving users to your online content (e.g., by improved search engine optimization);
Enabling new scholarship that can only be done with open data;
Allowing the creation of new services for discovery;
Stimulating collaboration in the library, archives and museums world and beyond.
In order to achieve these benefits libraries, museums and archives are faced with decisions about releasing their metadata under various open terms. To be open and useful as linked data requires deliberate design choices and systems must be built from the beginning with openness and utility in mind. To be useful for third parties, all metadata made available online must be published under a clear rights statement.

This 4-star classification system arranges those rights statements (e.g. licenses or waivers) that comply with the relevant conditions (2-11) of the open knowledge definition (version 1.1) by order of openness and usefulness: the more stars the more open and easier the metadata is to used in a linked data context. Libraries, archives and museums wanting to contribute to the Linked Open Data ecosystem should strive to make their metadata available under the most open instrument that they are comfortable with that maximizes the data’s usefulness to the community..

Note: This system assumes that libraries, archives and museums have the required rights over the metadata to make it available under the waivers and licenses listed below. If the metadata you want to make available includes external data (for example vocabularies) you may be constrained by contract or copyright to release the data under one of the licenses below.

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is not contingent on anything
metadata can be combined with any other metadata set (including closed metadata sets)
as a provider:

you are waiving all rights over your metadata so it can be most easily reused
you can specify whether and how you would like acknowledgement (attribution or citation, and by what mechanism) from users of your metadata, but it will not be legally binding
This option is considered best since it requires the least action by the user to reuse the data, and to link or integrate the data with other data. It supports the creation of new services by both non-commercial and commercial parties (e.g. search engines), encourages innovation, and maximizes the value of the library, archive or museum’s investment in creating the metadata.

★★★ Attribution License (CC-BY / ODC-BY) when the licensor considers linkbacks to meet the attribution requirement

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is contingent on providing attribution by linkback to the data source
metadata can be combined with any other metadata set, including closed metadata sets, as long as the attribution link is retained
as a provider:

you get attribution whenever your data is used
This option meets the definition of openness, but constrains the user of the data by requiring them to provide attribution (in the legal sense, which is not the same as citation in the scholarly sense). Here, attribution is satisfied by a simple, standard Web mechanism from the new data product or service. By using standard practice such as a linkback, attribution is satisfied without requiring the user to discover which attribution method is required and how to implement it for each dataset reused. Note that there are other methods of satisfying a legal attribution requirement (see below) but here we propose a specific mechanism that would minimize the effort needed to use the data if the LAM community collectively agrees to it. Also note that even this simple (ideally shared) attribution method could prevent some applications of linked data if linkbacks are required by many datasets from many sources.

★★ Attribution License (CC-BY / ODC-BY) with another form of attribution

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is contingent on providing attribution in a way specified by the provider
metadata can be combined with any other metadata set (including closed metadata sets)
as a data provider:

you get attribution whenever your data is used by the method you specify
This option meets the definition of openness in the same way as the linkback attribution open, but requires the user to provide attribution is some way other than a linkback, as specified by the data provider. The provider could specify an equally simple mechanism (e.g. by retention of another field, such as ‘creator’ from the original metadata record) or by a more complex mechanism (e.g. a scholarly citation in a Web page connected to the new data product or service). The disadvantage of this option is that the user must discover what mechanism is wanted by the particular data provider and how to comply with it, potentially needing a different mechanism for each dataset reused. For large-scale open data integration (e.g. mashups) this option is difficult to implement.

★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is contingent on providing attribution in a way specified by the provider
metadata can only be combined with data that allows re-distributions under the terms of this license
as a provider:

you get attribution whenever your data is used
you only allow use of your data by entities that also make make their data available for open reuse under exactly the same license

This option meets the definition of openness but potentially limits reuse of data since if more than one dataset is reused and if each dataset has an associated Share-Alike license. Under an Share-Alike license, the only way to legally combine two datasets is if they share exactly the same SA license, since most SA licenses require that reused data be redistributed under exactly same license. If the source datasets had different Share-Alike licenses originally (e.g. CC-BY-SA and ODC-ODbl) then there is no way for the user to comply with the requirements of both source data licenses so this option only allows users to link or integrate data distributed under one particular SA license (or one SA license and any of the other license or waiver options above). In the LAM domain, where significant value is created by combining datasets, the Share-Alike license requirement severely reduces the utility of a dataset.

