A few weeks ago, myself and Jon Voss had the pleasure of speaking at the ‘Libraries, Media & The Semantic Web’ event hosted by the BBC Academy, along with folks from the New York Times, the BBC, Google in the guise of Schema.org, and KONA. The event was organised by the Lotico London Semantic Web Group. I’ve written a fairly comprehensive post about the event over on the Linking Lives blog, including videos of all the talks, for those who want to read/hear more.
The Melbourne LODLAM event is shaping up. Put a slot in your diary for Tuesday 17th April!
The day will start with a series of lightning talks (5-10 mins) from people active in the field (including Mia Ridge @mia_out) and move to a structured discussion around practical applications in the Victorian and National context (including collaboration around WWI/ANZAC material). More details to come closer to the date.
Lightning talks are being arranged. For those that have projects being worked on, please get in touch with Eleanor Whitworth @elewhitworth – the more the merrier!
Session details are: 9.30am – 1.00pm (lunch provided)
Date: Tuesday 17th April
Place: Melbourne Museum, Carlton Gardens
RSVP: 10th April: Eleanor Whitworth, Senior Arts Officer/Content Curator, Culture Victoria (Monday – Wednesday)
Email: eleanor[dot]whitworth[at]dpc[dot]vic[dot]gov[dot]au or @elewhitworth
Australian politics might dominate the landscape in Canberra during the day and politicians swell the bars in the evening, but linked open data helps anyone to make good connections!
The Canberra Linked Open Data – Libraries, Archives Museums (LODLAM) minibar will be held on Tuesday 27th March, 2012 from 5.30-6.30pm. We will meet in the Fellows Bar and Cafe, University House at the Australian National University.
Those local to Canberra and in the library, archives, museum and gallery world of metadata and web development, or gov2 enthusiasts or those attending the Australasian Digital Humanities 2012 conference may wish to find peers and interested in attending the lodlam minibar. The Fellows Bar at University House is about 5 minutes walk from the Shine Dome (where the conference is being held).
The event is a means to:
- Get to know each other – let’s all get a drink from the bar and we do some introductions
- Get some shared understanding – let’s collate some information about what people are doing, ask questions and do some quick brain storming
lodlam attendees may like to head out to dinner to continue the conversation about linked open data (and perhaps digital humanities use of LOD too) in smaller groups.
We had about 18 people gather together to talk linked open data – libraries, archives, museums. From University of Queensland, Anna Gerber and Kerry Kilner; from the Australian War Memorial Roby Van Dyk, Adam Bell, Liz Holcolmbe; from University of Melbourne eScholarship Research Centre, Gavan McCarthy; from University of Western Sydney, Peter Sefton; from Deakin University, Deb Verhoeven; from Victoria University of Wellington, Sydney Shep; from Auckland War Memorial Museum, Russell Briggs; and last but definitely not least, Mia Ridge, PhD candidate from the Open University (UK).. and Oyvind Eide, PhD candidate at King’s College, London (UK). There were a handful of others, but I think the pong from the scratch and sniff ice cream stickers was affecting my capacity to memorise who was there… who I’ve missed, feel free to advise or correct me.
The upshot was, we shared our interest, questions, potential projects, desire to regroup again, so, here’s the takeaway:
- A number of people in the group (from Australia) are working on the HuNI project (Humanities Networked Infrastructure) NeCTAR funded virtual laboratory project (which aims to start in May and goes for 2 years). Linked data is going to be a key aspect of this project. It is being led out of Deakin University.
- There is another NeCTAR funded research tools project, Aust-ESE which will involve linked data, led out of University of Queensland.
- Anna Gerber talked about how ITEE eResearch Group at University of Queensland has been focusing efforts around the use of RDF and linked data with their open annotation work.
- Gavan McCarthy talked about how Melbourne eScholarship has been using linked data in their projects.
- Peter Sefton talked about how he’s been interested and working with linked data in his application development work.
- The Trans-Tasman ‘museums folks’ talked about an ongoing, and stronger collaboration around WWI data to enable them to contribute to centenary commemorations of WWI in 2014.
- A Melbourne #lodlam date was set, 17th April, more information will be coming, check with @elyw or @elewhitworth for more information or watch for blog posts soon.
- A Brisbane #lodlam date was mooted, 26th August, to time with a possible THATCamp, the 2012 International Council of Archives Congress check with @wragge and @annagerber for more information or watch for more blog posts soon.
- A clear idea that a Sydney #lodlam event, late October/early November, to align with the eResearch Australasia 2012, and needs to have 3 sessions: a tech session, a content session, a mixed session, so that all parties (developer, scholar, collection manager, etc) can all get their heads around the work space. Check with @1n9r1d @dfflanders @richardlehane for more information or watch for more blog posts soon.
That’s all folks! See you at the next #lodlam Australian Style!
So what is LODLAM Australian style? Does it mean our linked open data will have a particular twang that we all know and love? Will a fantastic dictionary of Australian slang finally impart to the world of searchers and researchers the cultural subtleties between saying AC/DC or acker dacker; or enable people to understand that when you say someone was wearing bathers, swimmers, budgie smugglers or togs – it meant that they were wearing a swim suit? Oh… the joys and that’s just the slang, of amazing connections linked open data is going to offer. Think about all the different (but almost similar) ways events, places, object, and people are referred to – it’s so spooky possums – it can make a girl dizzy!
In November last year DigitalNZ hosted a LOD-LAM summit in Wellington, New Zealand. There was a small contingent of Aussies over in Wellington for the National Digital Forum that stayed on for a day to attend the summit. It was a day of great exchange and collective understanding, and better, some rattling of chains into action. The word is that WWI and ANZACs are going to drive some Trans-Tasman collaboration around linked open data – and – there are a number of eResearch projects based around Australia that will have linked open data at their core and allied to cultural datasets that are curated by researchers in the scholarly community in Australia. So perhaps in a year’s time there might be both cultural collection and scholarly datasets up and linked… let’s see.
Some of us are keen to run a series of LODLAM events in Australia to build the conversation and wider understanding and also look at opportunities to “do something” together. So here’s what’s happening so far:
You tell us – here’s a short straw poll – even better leave your name and email so we can be in touch.
The more we know of your interest, the better that would be! Murmurs are there may be a lodlam event that slides into the National Digital Forum 2012 in New Zealand in November too.
Here’s a short video from Europeana which is a nice intro to Linked Open Data.
Of course, they’re not just making cute videos over there at Europeana. They’ve published metadata for 2.4 million objects under a CC0 license, with millions more on the way.
Please note that the smaller afternoon session has already filled up there is now a wait list, but we still have slots open for the morning plenary session.
LOD-LAM-NYC: A Day of Linked Data Discussion & Activities for the NY Metropolitan Area
Thurs, Feb 23, 9:00am-6:00pm
There is no fee to attend, but registration is required.
Following the success of the LOD-LAM Summit (http://lod-lam.net/summit/) in June, 2011, discussions of Cultural Heritage Linked Data have continued at a variety of Regional LOD-LAM (Linked Open Data for Libraries, Archives, and Museums) events. These events, characterized by their “Unconference” style and focus on cutting edge Semantic Web technologies, have continued to further the goals defined in the World Wide Web Consortiums Library Linked Data Incubator Report and the various outputs of the Stanford Linked Data
Continuing this conversation, we would like to announce LOD-LAM-NYC, two related events that add up to a day of Linked Data discussions for the Cultural Heritage Sector in the NY Metropolitan Area on February 23, 2012. The event will be comprised of two separate sessions, a morning plenary, and a smaller afternoon “hands-on” workshop. While these events are being offered free-of-charge, separate registration is required for each (see below for links).
This event, co-organized by METRO, The New York Public Library’s NYPL Labs, and New York University, sponsored by METRO, and hosted by NYPL, will accommodate 175 attendees for the morning sessions. The afternoon workshop will be smaller, with space for up to 40 participants.
Learn more & register at http://www.metro.org/en/art/488/.
We live in a world of silos. Silos data. Silos of culture. Linked Open Data aims to tear down these silos and create unity among the collections, their data and their meaning. The World Museum awaits us.
It comes as no surprise that I begin this post with such Romantic allusions. Our discussions of vocabularies – as technical behemoths and cultural artefacts – were lively and florid at a recent gathering of researchers library and museum professionals at LODLAM-NZ. Metaphors of time and tide – depicted beautifully in this companion post by Ingrid Mason, highlight issues of their expressive power of their meaning over time and across cultures. I present a very broad technical perspective on the matter beginning with a metaphor for what I believe represents the current state of digital cultural heritage : a world of silos.
Among these silos lie vocabularies that describe their collections and induce meaning to their objects. Originally employed to assist cataloguers and disambiguate terms, vocabularies have grown to encompass rich semantic information, often pertaining to the needs of that institution, their collection or their creator communities. Vocabularies themselves are cultural artefacts representing a snapshot of sense making. Like the objects that they describe, vocabularies can depict a range of substance from Cold War paranoia to escapist and consumerist Disneyfication. Inherent within them are the world views, biases, and focal points of their creators. An object’s source vocabulary should always be recorded as a significant part of it’s provenance. Welcome to the recursive hell of meta-meta-data.
Within the context of the museum, vocabularies form the backbone from which collection descriptions are tagged, catalogued or categorised. But there are many vocabularies, and the World Museum needs a universal language. LODLAM-NZ embraced the enthusiasm of a universal language but also understood the immense technical challenges that follow vocabulary alignment and, in many cases, natural language processing in general. However, if done successfully, alignment does a few great things: it normalises the labels that we assign to objects so that a unity of inferencing, reasoning and understanding can occur across vast swathes of collections; it can provide semantic context to those labels for even deeper, more compelling relations among the objects and it can be used to disambiguate otherwise flat or non-semantic meta-data, such as small free-text fields and social tags.
Vocabulary alignment is the process of putting two vocabularies side-by-side, finding the best matches, and joining the dots.
In many cases, alignment is straight forward – a simple string match on the the aligned terms could be sufficient to create a match. However, as the above example shows, aligning can require a lot more intuition – ceremonial exchange from the Australian Museum’s thesaurus could map to the ceremonies, exchange and gift concepts from the Getty’s Art and Architecture Thesaurus. This necessary one-to-many relation, along with other possible quirks and anomalies such as missing terms, semantic differences between term use and interpretation, and the general English language bias of many natural language processing tools make such a task fraught with difficulty, especially when alignment occurs across vocabularies that address specific cultural groups.
The challenges of alignment are compounded when the source terms come from non-semantic sources, such as unstructured free text (labels, descriptions and comments) and user tags. Let’s say for example that someone has tagged an object with the term gold. Now, could they mean “this object is made of gold” or “this object has a golden colour”? The Getty’s Art and Architecture thesaurus has the term gold in both senses of the word. We could use a tool called SenseRelate::Allwords that gives us the correct WordNet concept (based on the context of an object’s description label) but then we need to align the WordNet gold to the AAT’s gold. Performing these two computations in a pipeline significantly increases the risk that the tag as ‘misinterpreted’ – or even worse, it’s original meaning and intention is skewed or lost altogether. Vocabulary alignment, if not done correctly, has the potential to dilute, skew, or destroy the meaning of its terms.
Over the past few years, elaborate algorithms have been developed to try and address these alignment challenges. However they often don’t work on the unpredictable and highly heterogenous nature of cultural data-sets, or their performance differs across and even within vocabularies. And when things do go wrong, problems are often hard to diagnose and even more difficult to solve.
But researchers have brought humans back into the equation. The idea is that, within the alignment process, machines do the heavy lifting and processing on very simple and straight-forward natural language processing tasks while humans fine tune the steps of the process until they are satisfied with their results. This paper, by Ossenbruggen et al., describes what they call interactive alignment. Their Amalgame tool allows humans to make judgements about the nature of the vocabularies being aligned, fine-tune parameters and analyse, select or discard matching results. This mixed initiative approach empowers both computers and humans to solve tough problems. Likewise, the vocabularies (or ontologies within the computer science science realm), while encoded in bits and bytes, are only realised in the minds of their creators, their users and conversely, the people that interact with the objects.
The concept of meaning, understanding and encoding – and the crucial differences between the three concepts, seeded a reflective discussion at LODLAM-NZ. Even in light of the technical issues, how can we ensure accurate alignment that preserves the sense making of the objects from both their custodians and creator communities? What vocabularies do we use, what vocabularies should we align to and why? What are the dangers of doing this? We could not find the answers to these questions – to steal an anecdote from Michael Lascarides, the best we could do is create better questions, and more importantly, a broader understanding of alignment on both technical and social dimensions.
I’ve just seen two recently published reports that will certainly be of interest and thought I’d share here:
The Stanford Linked Data Workshop Technology Plan. “If instantiated at several institutions, will demonstrate to end users the value of the Linked Data approach to recording machine operable facts about the products of teaching, learning, and research. The most noteworthy advantage of the Linked Open Data approach is that it allows the recorded facts , in turn, to become the basis for new discovery environments.” Personally, I love their push for CC0 and I also really liked the push to publish early and often and not wait until things are perfect, with recommended workflows. [Thanks to Jerry Persons for feeding input from the LODLAM Summit into the Stanford working group and to this technology plan. Thanks Rachel Frick for flagging this with #LODLAM on Twitter]
Proceedings of the 1st International Workshop on Semantic Digital Archives, Berlin, Sept.29, 2011. Fantastic collection of papers from this workshop, and a really good preface that summarizes the meeting, each of the papers, and the growing presence of Linked Data in libraries, archives and museums. [thanks Johan Oomen for flagging this with #LODLAM on Twitter]
The following was cross posted on the Open Knowledge Foundation blog on 12/20/2011.
I recently traveled to Wellington, New Zealand to take part in the National Digital Forum of New Zealand (#ndf2011), which was held at the national museum of New Zealand, Te Papa. Following the conference, the amazing team at Digital NZ hosted and organized a Linked Open Data in Libraries, Archives & Museums unconference (#lodlam). The two events were well attended by Kiwis as well as a large number of international attendees from Australia, and a few from as far as the US, UK and Germany.
When it comes to innovative digital initiatives in cultural heritage, the rest of the world has been looking to New Zealand and Australia for some time. Federated metadata exchanges and search has been happening across institutions in projects like Digital NZ and Trove. I was able to learn more about the Digital NZ APIs as well as those from Museum Victoria, Powerhouse Museum, and State Records New South Wales. In fact, the remarkable proliferation of APIs in Australasia has allowed us to consider the possibilities of Linked Open Data to harvest and build upon data held in databases in multiple institutions.
Given the extent to which tools for opening access to data have been developed here, I was surprised by the level of frustration that exists around copyright issues. There’s a clear sense that government is moving too slowly in making materials available to the public with open licensing. We talked a lot about the idea of separately licensing metadata and assets (i.e. information about a photo vs the digital copy of the photo), as has been happening across Europe and increasingly the United States. There are strong advocates within the GLAM sector (galleries, libraries, archives & museums) here, and demonstrating use cases utilizing openly licensed metadata will go far in helping to move those conversations forward with policy makers.
To that end, a session was convened to explore the possibilities of an international LODLAM project focused on World War I, the centennial commemoration of which is fast approaching. The Civil War Data 150 project we’ve been slowly moving forward in the US may provide a rough framework to build from. At least a half dozen or more libraries, archives and museums have expressed interest in participating in a WWI project already. First steps may be identifying openly licensed datasets to be contributed, key vocabularies and ontologies to apply, and ideas for visualizations that would leverage the use of Linked Open Data. For anything to happen here, someone will need to take the lead in organizing (not me, we’re still trying to build some tools around the Civil War Data 150 concept!). Good notes were posted on the LODLAM blog about the conversation and how to convene future conversations. Anyone who gets involved with this, please spread the word and keep the LODLAM community apprised of your progress and ways to contribute.
We also had a workshop on using Google Refine by Carlos Arroyo from the Powerhouse Museum, with props to the FreeYourMetadata crew. Some lively sessions dug into just what and how Linked Data is and some of the pitfalls and potentials. Another session explored the importance and potential of local vocabularies, and how they can contribute to Linked Data implementations. One great example was the vocabulariessurrounding Maori artifacts (Taonga) at Te Papa, and how publishing those datasets can aid other museums around the world to better describe and provide digital access to Maori collections.
As I’ve attended various LODLAM meetups since June, I’ve noticed clear momentum from one to another as these conversations progress rapidly, with those further along helping those of us just learning. After LODLAM-DC I realized the importance of including library, archive, and museum vendors in all of these gatherings. At LODLAM-NZ I could see the potential of bringing together developers in the GLAM sector and those utilizing Linked Data in commercial settings. In places like San Francisco, where commercial interests are already leading the charge on Linked Data (which is not a bad thing) and there’s an active Semantic Web developer community, the GLAM sector may be playing catchup. But the sheer number of datasets potentially available as open data coming from the GLAM sector, together with the expertise of managing massive amounts of structured data, creates a space ripe for collaboration and experimentation, and these lines will continue to blur.
There are benefits to posting post-event quickly (you get the word out fast) and maybe there are benefits to being so swamped with work that you get time to mull. It has been nearly a month since lod-lam NZ happened in Wellington and the session on Dec 1, Thursday afternoon on vocabularies was one I’ve been waiting for for some time. Tim Wray (who is a PhD student from University of Wollongong) wrote in an email “I was wondering if you like to contribute your part – particularly your humanist / social perspective on the issue of vocabularies and alignment”. Tim is going to explain the discussion in that session from a computational linguistics point of view and his own perspective, so this post is food for the culture vultures and semi-technical cake eaters of the GLAM sector keen on linked open data.
These were the topics the conversation ranged through in the vocabularies session:
- curatorial questions around selecting vocabularies
- vocabulary as cultural artefact
- cultural questions around automating vocabularies
- roles of curation and linguistic computation in aligning vocabularies
The discussion that afternoon started solidly thanks to Stuart Yeates from the New Zealand Electronic Text Centre (NZETC) who lead the discussion and called for input around what vocabularies people might want to use in their linked open data. You can see his post Metadata vocabularies LODLAM NZ cares about on his blog Open Source Exile and the breakdown of the vocabularies shouted out quickly by the group. A quick glance at these is telling of the New Zealand cultural context, the Trans-Tasman common areas of interest and the strong influence and immense value of the work still being done by the Library of Congress in the USA.
The point about semantics (and one that needs many and more qualified perspectives to answer well) is that what can be interpreted literally is not always meaningful culturally. The title of the blogpost was intentional and intuitively made in drawing from an old adage. Language and its usage (like tides) ebbs and flows and vocabularies provided via linked open data need to in some way allow for this shifting of meaning over time and yet being able to assert levels of accuracy at a point in time. The only word I could find to describe this was a concept of attenuation that is, how long the note sounds and in the sense of semantics the signal in the original meaning lingers in some way over time. Concepts like tide marks, water measures, lunar calendars, and sea currents all means to triangulate and test smooth sailing all play into this idea of how to programme for linked open data, so do concepts like scales, tonal and atonal music, harmony, dissonance, assonance, signals, top, middle and low notes. Does the cake seem all seems too thick with metaphorical icing? Not quite.
A while back I watched a presentation Open data for the cultural masses – Mapping and the Europeana Semantic Layer by Guus Schreiber on Amalgame and the alignment of vocabularies and he seemed to be talking about levels of accuracy in asserting vocabulary alignments. Forgive me, I was listening with one ear whilst working, but I got so excited by what I seemed to be hearing (with a non computer science ear) that I sent the link along to colleagues at the Australian National Data Service and to one of the directors there who is a computational linguist – Andrew Treloar. There are several of that ilk in the ANDS team, Adrian Burton is a technologist and linguist and so is Nick Nicholas. Next year is going to be hot stuff for the @andsdata team on the linked open data front. The linked open data services for the Party (researchers and research organisations) and Research Activity (projects, grants, funds etc) Infrastructures are going into action and include a vocabulary service for the Australian and New Zealand Standard Research Classification (ANZSRC) – so watch that space for some Trans-Tasmanian action.
The group began a brief discussion about how people choose vocabularies and the librarians in the room all had ‘interesting’ or ‘old fashioned’ looks on their faces. Perhaps it is so obvious to library and information workers that vocabularies are social, political, cultural etc etc, that the choice to use one is not just one based on information theory but also on cultural theory – an act of data curation. Tim asked if ontologies were cultural artefacts and there was a resounding ‘yes’ from within the group. I’m not sure who was louder, Adrian Kingston, Stuart Yeates, Chris Todd or Sydney Shep. It is gratifying that those passionate responses came from a museum technologist, computer scientist in the digital humanities, a leading cataloguer and an academic in library and information studies. Maybe because the GLAM sector have used ontologies both schematic and semantic for such a long time and for some of the humanities computing people in the room, this was all ‘understood’ and we’re on to the next challenge: how to make this work? It seemed powerful for me in any event that culture, linked open data, vocabularies and curation were combined in a discussion and in that discussion people felt strongly about resolving these questions from each domain effectively and together. This is what I quietly call digital cultural heritage – working out the design – driving technological design with cultural questions, testing technological techniques out against cultural questions and on and on. The types of judgements about which ontology or standard to follow is the bread and butter of a great deal of the information practices in GLAM sector work by web developers, programmers, registrars, curators, cataloguers, and archivists. The next challenges is linked open data and how to use, reuse and align vocabularies to expose cultural collection data in expected and new ways, audiences and minds. In two words, awesome challenge.
The discussion at the end around the capacity to use Kupu – a Māori thesaurus developed by the National Library of New Zealand was informative and instructive. Years ago a subject database called ‘KUPU’ was one of the few means to begin to access resources described using Māori provided by the National Library of New Zealand (the subject access in the database was built around a thesaurus called He puna kupu Māori : hei tohu-ā-kupu ; an indexing thesaurus in the Māori language). Precious few means were available then to provide access to resources in Māori and English at that time (1990s) with appropriate Māori intellectual access points. More information tools have been developed and we learned from Chris Todd from the National Library that Kupu needs to be updated and needs more work to ensure that the intellectual access the thesaurus provides is both meaningful and/or culturally appropriate. So the tides have shifted and in a positive way and yet are they are undiminished, the demand remains for enriched Māori intellectual access to information resources. The group were voting keenly for the Māori Subject Headings (maintained by the National Library of New Zealand) to be linked open data and further thoughts were exchanged around the use of name authorities.
Time and tide way for no-one… so this is an urging to keep a thought that Michael Lascarides offered in his keynote presentation at the National Digital Forum to be mindful of the new and better challenges for the GLAM community to work on: the ‘Why | What | How | What If/Then’ questions of culture, linked open data and vocabularies.
Note: meanings for kupu used as a noun means ‘word or vocabulary’ and upoko used as a noun means ‘head’ in English provided by Māori Dictionary online