Jan 10 2012

Recently Published Reports

I’ve just seen two recently published reports that will certainly be of interest and thought I’d share here:

The Stanford Linked Data Workshop Technology Plan. “If instantiated at several institutions, will demonstrate to end users the value of the Linked Data approach to recording machine operable facts about the products of teaching, learning, and research. The most noteworthy advantage of the Linked Open Data approach is that it allows the recorded facts , in turn, to become the basis for new discovery environments.” Personally, I love their push for CC0 and I also really liked the push to publish early and often and not wait until things are perfect, with recommended workflows. [Thanks to Jerry Persons for feeding input from the LODLAM Summit into the Stanford working group and to this technology plan. Thanks Rachel Frick for flagging this with #LODLAM on Twitter]

Proceedings of the 1st International Workshop on Semantic Digital Archives, Berlin, Sept.29, 2011. Fantastic collection of papers from this workshop, and a really good preface that summarizes the meeting, each of the papers, and the growing presence of Linked Data in libraries, archives and museums. [thanks Johan Oomen for flagging this with #LODLAM on Twitter]


Jan 3 2012

LODLAM-NZ Round Up

The following was cross posted on the Open Knowledge Foundation blog on 12/20/2011.

I recently traveled to Wellington, New Zealand to take part in the National Digital Forum of New Zealand (#ndf2011), which was held at the national museum of New Zealand, Te Papa. Following the conference, the amazing team at Digital NZ hosted and organized a Linked Open Data in Libraries, Archives & Museums unconference (#lodlam). The two events were well attended by Kiwis as well as a large number of international attendees from Australia, and a few from as far as the US, UK and Germany.

When it comes to innovative digital initiatives in cultural heritage, the rest of the world has been looking to New Zealand and Australia for some time. Federated metadata exchanges and search has been happening across institutions in projects like Digital NZ and Trove. I was able to learn more about the Digital NZ APIs as well as those from Museum VictoriaPowerhouse Museum, and State Records New South Wales. In fact, the remarkable proliferation of APIs in Australasia has allowed us to consider the possibilities of Linked Open Data to harvest and build upon data held in databases in multiple institutions.

Given the extent to which tools for opening access to data have been developed here, I was surprised by the level of frustration that exists around copyright issues. There’s a clear sense that government is moving too slowly in making materials available to the public with open licensing. We talked a lot about the idea of separately licensing metadata and assets (i.e. information about a photo vs the digital copy of the photo), as has been happening across Europe and increasingly the United States. There are strong advocates within the GLAM sector (galleries, libraries, archives & museums) here, and demonstrating use cases utilizing openly licensed metadata will go far in helping to move those conversations forward with policy makers.

To that end, a session was convened to explore the possibilities of an international LODLAM project focused on World War I, the centennial commemoration of which is fast approaching. The Civil War Data 150 project we’ve been slowly moving forward in the US may provide a rough framework to build from. At least a half dozen or more libraries, archives and museums have expressed interest in participating in a WWI project already. First steps may be identifying openly licensed datasets to be contributed, key vocabularies and ontologies to apply, and ideas for visualizations that would leverage the use of Linked Open Data. For anything to happen here, someone will need to take the lead in organizing (not me, we’re still trying to build some tools around the Civil War Data 150 concept!). Good notes were posted on the LODLAM blog about the conversation and how to convene future conversations. Anyone who gets involved with this, please spread the word and keep the LODLAM community apprised of your progress and ways to contribute.

We also had a workshop on using Google Refine by Carlos Arroyo from the Powerhouse Museum, with props to the FreeYourMetadata crew. Some lively sessions dug into just what and how Linked Data is and some of the pitfalls and potentials. Another session explored the importance and potential of local vocabularies, and how they can contribute to Linked Data implementations. One great example was the vocabulariessurrounding Maori artifacts (Taonga) at Te Papa, and how publishing those datasets can aid other museums around the world to better describe and provide digital access to Maori collections.

As I’ve attended various LODLAM meetups since June, I’ve noticed clear momentum from one to another as these conversations progress rapidly, with those further along helping those of us just learning. After LODLAM-DC I realized the importance of including library, archive, and museum vendors in all of these gatherings. At LODLAM-NZ I could see the potential of bringing together developers in the GLAM sector and those utilizing Linked Data in commercial settings. In places like San Francisco, where commercial interests are already leading the charge on Linked Data (which is not a bad thing) and there’s an active Semantic Web developer community, the GLAM sector may be playing catchup. But the sheer number of datasets potentially available as open data coming from the GLAM sector, together with the expertise of managing massive amounts of structured data, creates a space ripe for collaboration and experimentation, and these lines will continue to blur.


Dec 18 2011

Time and Tide

There are benefits to posting post-event quickly (you get the word out fast) and maybe there are benefits to being so swamped with work that you get time to mull.  It has been nearly a month since lod-lam NZ happened in Wellington and the session on Dec 1, Thursday afternoon on vocabularies was one I’ve been waiting for for some time.  Tim Wray (who is a PhD student from University of Wollongong) wrote in an email “I was wondering if you like to contribute your part – particularly your humanist / social perspective on the issue of vocabularies and alignment”.  Tim is going to explain the discussion in that session from a computational linguistics point of view and his own perspective, so this post is food for the culture vultures and semi-technical cake eaters of the GLAM sector keen on linked open data.

Francesco Miglionico - Sailing Ship Ploughing the Waves | scarygami | CC BY-SA 2.0  | http://www.flickr.com/photos/scarygami/4035896864/

Francesco Miglionico - Sailing Ship Ploughing the Waves | scarygami | CC BY-SA 2.0 | http://www.flickr.com/photos/scarygami/4035896864/

These were the topics the conversation ranged through in the vocabularies session:

  • curatorial questions around selecting vocabularies
  • vocabulary as cultural artefact
  • cultural questions around automating vocabularies
  • roles of curation and linguistic computation in aligning vocabularies

The discussion that afternoon started solidly thanks to Stuart Yeates from the New Zealand Electronic Text Centre (NZETC) who lead the discussion and called for input around what vocabularies people might want to use in their linked open data.  You can see his post Metadata vocabularies LODLAM NZ cares about on his blog Open Source Exile and the breakdown of the vocabularies shouted out quickly by the group.  A quick glance at these is telling of the New Zealand cultural context, the Trans-Tasman common areas of interest and the strong influence and immense value of the work still being done by the Library of Congress in the USA.

The point about semantics (and one that needs many and more qualified perspectives to answer well) is that what can be interpreted literally is not always meaningful culturally.  The title of the blogpost was intentional and intuitively made in drawing from an old adage.  Language and its usage (like tides) ebbs and flows and vocabularies provided via linked open data need to in some way allow for this shifting of meaning over time and yet being able to assert levels of accuracy at a point in time.  The only word I could find to describe this was a concept of attenuation that is, how long the note sounds and in the sense of semantics the signal in the original meaning lingers in some way over time.  Concepts like tide marks, water measures, lunar calendars, and sea currents all means to triangulate and test smooth sailing all play into this idea of how to programme for linked open data, so do concepts like scales, tonal and atonal music, harmony, dissonance, assonance, signals, top, middle and low notes.  Does the cake seem all seems too thick with metaphorical icing?  Not quite.

Cawan Cake | n-o-n-o | CC BY 2.0 | http://www.flickr.com/photos/n-o-n-o/3280580620/

Cawan Cake | n-o-n-o | CC BY 2.0 | http://www.flickr.com/photos/n-o-n-o/3280580620/

A while back I watched a presentation Open data for the cultural masses – Mapping and the Europeana Semantic Layer by Guus Schreiber on Amalgame and the alignment of vocabularies and he seemed to be talking about levels of accuracy in asserting vocabulary alignments.  Forgive me, I was listening with one ear whilst working, but I got so excited by what I seemed to be hearing (with a non computer science ear) that I sent the link along to colleagues at the Australian National Data Service and to one of the directors there who is a computational linguist – Andrew Treloar.  There are several of that ilk in the ANDS team, Adrian Burton is a technologist and linguist and so is Nick Nicholas.  Next year is going to be hot stuff for the @andsdata team on the linked open data front.  The linked open data services for the Party (researchers and research organisations) and Research Activity (projects, grants, funds etc) Infrastructures are going into action and include a vocabulary service for the Australian and New Zealand Standard Research Classification (ANZSRC) – so watch that space for some Trans-Tasmanian action.

bee | CC BY-NC 2.0 | http://www.flickr.com/photos/bee/2765331138/

bee | CC BY-NC 2.0 | http://www.flickr.com/photos/bee/2765331138/

The group began a brief discussion about how people choose vocabularies and the librarians in the room all had ‘interesting’  or ‘old fashioned’ looks on their faces.  Perhaps it is so obvious to library and information workers that vocabularies are social, political, cultural etc etc, that the choice to use one is not just one based on  information theory but also on cultural theory – an act of data curation.  Tim asked if ontologies were cultural artefacts and there was a resounding ‘yes’ from within the group.  I’m not sure who was louder, Adrian Kingston, Stuart Yeates, Chris Todd or Sydney Shep.  It is gratifying that those passionate responses came from a museum technologist, computer scientist in the digital humanities, a leading cataloguer and an academic in library and information studies.  Maybe because the GLAM sector have used ontologies both schematic and semantic for such a long time and for some of the humanities computing people in the room, this was all ‘understood’ and we’re on to the next challenge: how to make this work?  It seemed powerful for me in any event that culture, linked open data, vocabularies and curation were combined in a discussion and in that discussion people felt strongly about resolving these questions from each domain effectively and together.  This is what I quietly call digital cultural heritage – working out the design – driving technological design with cultural questions, testing technological techniques out against cultural questions and on and on.  The types of judgements about which ontology or standard to follow is the bread and butter of a great deal of the information practices in GLAM sector work by web developers, programmers, registrars, curators, cataloguers, and archivists.  The next challenges is linked open data and how to use, reuse and align vocabularies to expose cultural collection data in expected and new ways, audiences and minds.  In two words, awesome challenge.

The discussion at the end around the capacity to use Kupu – a Māori thesaurus developed by the National Library of New Zealand was informative and instructive.  Years ago a subject database called ‘KUPU’ was one of the few means to begin to access resources described using Māori  provided by the National Library of New Zealand (the subject access in the database was built around a thesaurus called He puna kupu Māori : hei tohu-ā-kupu ; an indexing thesaurus in the Māori language).  Precious few means were available then to provide access to resources in Māori and English at that time (1990s) with appropriate Māori intellectual access points.   More information tools have been developed and we learned from Chris Todd from the National Library that Kupu needs to be updated and needs more work to ensure that the intellectual access the thesaurus provides is both meaningful and/or culturally appropriate.  So the tides have shifted and in a positive way and yet are they are undiminished, the demand remains for enriched Māori intellectual access to information resources.  The group were voting keenly for the Māori Subject Headings (maintained by the National Library of New Zealand) to be linked open data and further thoughts were exchanged around the use of name authorities.


Designs of ornamentation on Maori rafters. Nos 13, 14, 15. (1890s). Herbert William Williams 1860-1937. Alexander Turnbull Library, National Library of New Zealand

Time and tide way for no-one… so this is an urging to keep a thought that Michael Lascarides offered in his keynote presentation at the National Digital Forum to be mindful of the new and better challenges for the GLAM community to work on: the ‘Why | What | How | What If/Then’ questions of culture, linked open data and vocabularies.

Note: meanings for kupu used as a noun means ‘word or vocabulary’ and upoko used as a noun means ‘head’ in English provided by Māori Dictionary online


Dec 5 2011

The Web of Assertions

I attended the LODLAM meeting last week in Wellington, New Zealand as a relative newcomer to the concept of linked open data.

One of the questions the meeting tried to answer was “what are the best use cases for linked data?” “What is a use case that is compelling enough to warrant our investment in creating linked data?”

One comparison made* was that of Charles Darwin making his notes on the voyage of The Beagle. Did Darwin record merely a list of points in tabular notebook? No, he recorded his observations in journal form and his ideas were slowly organised over years. In his journal of assertions Darwin found connections and common themes emerging over time.

Linked open data is a series of assertions in the form of subject/predicate/object. Ideally this is not on isolated sets of data but across open sets of data on the web. Complex topics are made up of assertions of varying degrees of authority. Linked open data is a better representation of knowledge on the web.

There are problems with each of the three parts of the linked open data triple. The subject may require disambiguation. E.g. Which “Hamilton” are we referring to? We need some consistency in the language used for the predicate. E.g. How do we describe relationships between people? The objects that we refer to may not be trustworthy. How do we choose sets of data we are confident to refer to? These problems with linked open data perhaps aren’t problems after all – it’s a system that reflects the uncertainties of the real world.

This reminded me of Aaron Cope’s paper “The Interpretation of Bias”. By reverse-geocoding photo locations in Flickr, maps could be drawn which better represented the disputed boundaries of each named place.

In the same way, linked open data may give us a way to better describe grey knowledge – all of the objects that are grey around the edges.

* please comment if you recall who made this comparison


Nov 30 2011

LODLAM-NZ grid


Thanks @fogonwater!


Nov 30 2011

Anzac WW100 session notes LODLAM NZ

These notes are rough, and may not have captured everything. The session started with a discussion about URL structures, and how/if to use regimental or service numbers in those URLs. Regimental numbers are not always unique, and in Australia, some First World War servicemen and women were not issued the numbers. There are also some duplicates and multiple numbers allocated to some individuals.
The Auckland War Memorial Museum’s Cenotaph project has delivered a web page for each serviceman and woman. Many include links to digitised service records at the archives. There is good metadata about place, people, things and ships.

With place names, need to share a vocabulary to describe the places and link to geo codes. Auckland War Memorial Museum also has some 1200 links from Wikipedia to their collection, especially official histories.
There was discussion about using latitude and longitude as battles were large and took place over, sometimes, large areas. It is possible to use a regional scale or bounding boxes to specify an area. The metaphor of place was mentioned: for instance, Heaven is a place with no lat and long. Kiwi troops in WW2 used Hitler as a word for the enemy, and where the enemy was. Some place names are specific to a time, e.g Western Front, Baby 700, Quinn’s Post. How do you take the particularity and scale it to the concepts? How do you show the links? There is no need to be making new vocabularies, we can use what already exists.

Jon talked about the process followed with the American Civil War project, and said it was useful to think about ​what vocabularies can we use? What apps are we envisaging, how will people use these?​ and how to get data sets together. There is policy work to be done here within organisations.
For the Civil war work, they use ​Freebase as a rosetta stone, mapping places to it, using things like Library of Congress subject headings. Suggestion is to start simple with geo codes. Start with the easiest things, maybe battles? Consider what would it look like to map a campaign and how would that look, and how would it be used?
​Then we started to talk about how we march up work on the two sides of the Tasman. Do we use Wikipedia? Make a commitment to publish semantically marked up data. Don’t have to use Wikipedia, look at how what we have compares to freebase, publish data as a CSV file so that others can use it. (as a sidebar, though this was not discussed, we have to remember to tell people when we publish this material).

The discussion turned back to URLs for a moment, while we looked again at a structure someone had propposed: http://domain/ww1/NZ/person/identifier. one thought was to go with dumbest possible format for persistent URL, and that the URL above above might be better to use as a URI. Do we need a tool to help small organisations to link what they have with what the larger organisations are doing?

Working out where to start is an issue, and there was discussion about how we could use the Commonwealth War Graves Commission data. Key thing is a persistent URL. Can go a long way down the track without talking to others. Internal commitment to persistent URIs. People, places, events, ships can all have a page, and then we can be ready to start linking. We have our own subjects and out in the world are the objects, we just have to agree on the verbs. First, publish data, make it available, then work on linking . What happens if we don’t do this? To have any chance of guiding what happens to our data, we need to be working to get it out, to work with other organisations in this WW100 domain. We also need to let people do stuff with the data, make it available and let people use, and re-use it. Put it within some sort of context, make use of the community of interest that we have here (that is, the organisations represented at this meeting, as well as others who have also expressed interest on both sides of the Tasman). Once we have data in a CSV file, other people can make the RDF.

We need to work out a way to keep in contact. Do we need to set up a Google Group?


Oct 14 2011

LODLAM SF Summit Raw Session Notes


Thanks to Kris Carpenter Negulescu and the team at the Internet Archive, as well as Asa Letourneau, we’re adding an archive of session notes recorded on easel pads during the SF Summit. The images are available through the Public Record Office Victoria Flickr set, and you can also download a Word Doc or .rtf version of the transcriptions. The latter are numbered, though I don’t think we’ve mapped the images to the numbered transcriptions, which may be helpful.


Oct 11 2011

LODLAM Videos & Presentations

There have been a few requests lately for LODLAM presentations, so I wanted to consolidate some of the resources people have pointed out over on the listserv. This is by no means exhaustive, so please feel free to add more in the comments!

Recent Video:

I gave this presentation at the Smithsonian Institution as part of LODLAM-DC. It’s a general intro to the concepts, and we did it specifically to add to our forthcoming Summit proceedings toolbox. The slides are available there too, feel free to use.

The FreeYourMetadata.org crew just released a video as well as a series of screencasts which is targeted at librarians and metadata managers to provide some easy ways to start exploring publishing metadata as Linked Data.

Presentations:

Here’s one hot off the press, Dublin Core 2011 Keynote Presentation: Towards Linked Data for Libraries, Archives & Museums.

Of course, a lot of people are using Slideshare to host their presentations, and if you’re doing so, please use the lodlam tag to describe your presentation, just like the #lodlam hashtag for Twitter. Just by doing a quick search for lodlam, you’ll see a good number show up. You’ll pick up a few more if you search for lod-lam (I so regret ever using that hyphen!).

Finally, Jodi Schneider pointed out this fantastic resource of presentations spanning several years, from the W3C LLD incubator.

Thanks to everyone for sharing your presentations and pointing out the ones you’re seeing out in the wild!


Oct 1 2011

LODLAM on the Radio

Mia Ridge was featured on the September 27 Outriders radio show, “BBC Radio 5 live’s programme dedicated to exploring the frontiers of the web.” In addition to giving a rundown on what Linked Open Data in libraries, archives, and museums entails, she was also able to promote the upcoming LODLAM-London meetup with Open Knowledge Foundation, and the Metadata Licensing Clinic.

You can download the podcast or stream it here, just look for the 27 September show, “Linked to Relaxation.”


Sep 28 2011

Free Your Metadata

I just found out about the fabulous Free Your Metadata project this week, and am very excited to see these kinds of actionable workshops popping up around the world–and big ups to these guys for developing screencasts that show how people can use free and open source tools to create Linked Data from library, archive and museum metadata now!

Seb Chan posted an interview with Seth van Hooland with the catchy title: Things clever people do with your data #65535: Introducing ‘Free Your Metadata’ which is well worth a read.