Jul 17 2012

LODLAM posse for THATCamp Brisbane?

Digital humanities plotting by Anna Gerber and others has blossomed into the organisation of a THATCamp in sunny Brisvegas, Queensland, Australia. We’ve been keen to have a LODLAM event in any state or territory in Australia that can rustle up space and interest in talking about and testing our linked open data.

So, for anyone in Brisbane on Saturday 25th August 2012, that would like to be a part of the THATCamp action, it is entirely possible that linked open data enthusiasts from the GLAM sector will appear to thrash out their ideas and test some of their code. Queensland GLAMMers and Digital Humanities folk that want to be a part of that, get in touch: thatcampbne [at] gmail [dot] com

Jun 19 2012

Melbourne LOD-LAM gets serious

Well, that’s perhaps an overstatement, but we are pursuing the ‘practical and pragmatic applications’ approach and running two follow-up sessions to our April event:

Tuesday 31st July
Venue: TBC
1.30 – 3.00pm – Place names
3.30 – 5.00pm – ANZAC material

People already working with linked data in each area will be present, and the sessions will canvas opportunities to work together – with regards to linked-open-data proper and linked data more generally. It will be relevant for both technical and programming staff.

RSVP: 16th July : Eleanor Whitworth, Senior Arts Officer/Content Curator, Culture Victoria (Monday – Wednesday) email: eleanor[dot]whitworth[at]dpc[dot]vic[dot]gov[dot]au or @elewhitworth

Jun 18 2012

Linked Data: A Personal View from Jerry Persons

This piece inaugurates an occasional series by or about linked data practitioners that will be published here on LODLAM.net and cross-posted on the Digital Library Federation blog. The first post in the series is a personal reflection on the linked data landscape written by Jerry Persons, technology analyst at Knowledge Motifs, Chief Information Architect emeritus at Stanford, and author of the CLIR-commissioned Literature survey in support of Stanford Linked Data Workshop.

The ecosystem in which both library-generated metadata and vendor-generated search environments are players has changed radically with unprecedented swiftness:

Richard Wallis (late of Talis, now OCLC) recently summarized these trends in terms of web-wide factors in his post A data 7th wave approaching:

With the advent of many data associated advances, variously labelled Big Data, Social Networking, Open Data, Cloud Services, Linked Data, Microformats, Microdata, Semantic Web, Enterprise Data, it is now venturing beyond those closed systems into the wider world.

Well this is nothing new, you might say, these trends have been around for a while – why does this constitute the seventh wave of which you foretell?


It is precisely because these trends have been around for a while, and are starting to mature and influence each other, that they are building to form something really significant ….

Indeed, for those in pursuit of a broader-than-library take on what’s going on in the web-wide world of structured data, one should take advantage of Richard’s experience including a deep understanding of libraries as a member the Talis library systems group and spanning the company’s evolution toward its present-day provision of Kasabi, “a startup business spun out from and backed by Talis. Our aim is to unlock the value in the World’s data by enabling new business models for producers and consumers of structured data at all scales.” Among his posts and presentations worth close review are those that can be had at his Data Liberate site, for example:

  • Create data not records
  • Libraries through the linked data telescope
  • Who will be mostly right – Wikidata, Schema.org

My own views on the potential benefits to be had from a rapidly evolving web that is increasingly dominated by well-structured and well-curated data were shaped in large part by exposure to the vision, concepts, and people involved in a set of antecedents to the current flurry of activity and developments. The thread leads from a turn of the century piece written by Danny Hillis, through his Applied Minds and Metaweb companies, leading to Freebase and John Giannandrea, and onward from there to the recent Wall Street Journal interview with Amit Singhal and the subsequent discussions surrounding Knowledge Graph and things not strings:

Hillis: With the knowledge web, humanity’s accumulated store of information will become more accessible, more manageable, and more useful. Anyone who wants to learn will be able to find the best and the most meaningful explanations of what they want to know. Anyone with something to teach will have a way to reach those who want to learn. Teachers will move beyond their present role as dispensers of information and become guides, mentors, facilitators, and authors. The knowledge web will make us all smarter. The knowledge web is an idea whose time has come. Hillis, W. Daniel. “Aristotle”: (The knowledge web), 2000, published in The Edge (138) in 2004.

Freebase: A new company founded by a longtime technologist is setting out to create a vast public database intended to be read by computers rather than people, paving the way for a more automated Internet in which machines will routinely share information. Markoff, John. Start-up aims for database to automate web searching. NYT (9 March 2007).

Giannandrea: Freebase is an open database of the world’s information, built by a global community and free for anyone to query, contribute to, and build applications on. … Part of what makes this open database unique is that it spans domains, but requires that a particular topic exist only once in Freebase. Thus freebase is an identity database with a user contributed schema which spans multiple domains. For example, Arnold Schwarzenegger may appear in a movie database as an actor, a political database as a governor, and in a bodybuilder database as Mr. Universe. In Freebase, however, there is only one topic for Arnold Schwarzenegger that brings all these facets together. The unified topic is a single reconciled identity, which makes it easier to find and contribute information about the linked world we live in. Giannandrea, John. Freebase: an open, writable database of the world’s information (a one-hour lecture delivered in October 2008).

[Amit Singhal] said in a recent interview that the search engine [Google] will better match search queries with a database containing hundreds of millions of “entities”—people, places and things—which the company has quietly amassed in the past two years. Semantic search can help associate different words with one another. Efrati, Mair. Google gives search a refresh. WSJ (15 March 2012).

Knowledge Graph: [W]e’re focused on comprehensive breadth and depth. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on what people search for, and what we find out on the web. Britt, Phil. Google unveils knowledge graph. (24 May 2012).

Taken together, these and other suggestive developments in the linked-data ecosystem represent a confluence of tools, data, and methodologies of sufficient potential to warrant efforts that pursue:

new opportunities for addressing the traditional and prevailing problems of too many silos of content, too many disparate modes of search and access, and too little precision and too much ambiguity in search results in the extreme environments of academic information resources intended to support and report on the research and teaching in large research enterprises. Keller, Michael A. Linked data: a way out of the information chaos and toward the semantic web. EDUCAUSE Review 42 (4): July/August 2011.

Such opportunities are inextricably bound up with linked-data’s potential for (1) reshaping the infrastructure that supports web-wide management of information, knowledge, and data, and for (2) fueling unprecedented improvements in the efficiency and efficacy of navigation and discovery capabilities. It’s long past being a matter of if, now it’s about when—the game that’s afoot is about finding roles that libraries can play in aiding and abetting the creation of an increasingly dense tapestry of facts and links woven together from the flows of intellectual resources that the global academic community consumes and produces in the course of its research, teaching, and learning.

Apr 23 2012

Melbourne LODLAM event

On April 17th apx 35 people from a range of sectors, including memory organisations, tertiary institutions and government departments gathered at the Melbourne Museum. It was a lively session and in keeping with the focus on “practical and pragmatic applications and opportunities for sectors to work together” concluded with agreement to continue discussions, working on two LODLAM projects: Victorian place names and World War 1.

Lightning talks by Mia Ridge, Peter Neish (Victorian parliamentary Library), Conal Tuohy (HuNI), Helen Morgan (eResearch, University of Melbourne) and Adam Bell (Australian War Memorial) got the ball rolling. A spontaneous Melbourne-San Francisco-Skype-in with Jon Voss and Simon Sherrin started the general discussion.

A detailed write-up from notes taken by myself and Ely Wallis is now up at Culture Victoria.

Big thanks go to Mia Ridge, Ely Wallis and Ingrid Mason for their insights and planning for what will continue to be an active space…  With, we anticipate, more muffins…

Muffin remnants

Melbourne LODLAM muffin remnants

Apr 12 2012

Radically Open Cultural Heritage Data at SXSW Interactive 2012

SXSW logoI had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with our fellow LODLAMers, Jon Voss, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were well chuffed that Mashable.com picked up on it as one of ’22 SXSW Panels You Can’t Up This Year’.

I’ve written about our session and a few of the other sessions over the UK Discovery blog for those who wanting the full lowdown.

Sep 16 2011

LODLAM-DC session matrix

Sept. 16 LODLAM-DC Sessions

Rooms from left to right are Main (Screen), Main (quilt), 3rd Fl Team, 3rd Floor Conf.

I also think that Kevin Ford added a session in that open slot on using id.loc.gov.

Jul 20 2011

Persistent Object Identifiers POID

I learned about a workshop discussing ideas around persistent identifiers held in the Netherlands last month as a result of seeing an email from Andrew Treloar @atreloar (Australian National Data Service – ANDS).  This workshop organised by the Knowledge Exchange was a seminar to pay:

“attention to the usage of PIDs for publications, and increasingly for data, and for combinations of text, media and data. Also the relation with Author Identifiers was discussed. Standardisation and specifications for transparency between systems was addressed.  In break out sessions participants discussed the benefits and challenges in operating multiple persistent identifier systems and the relation of persistent identifiers to Linked Data.”

Numbered | howtodesign | CC BY-NC-ND

Numbered | howtodesign | CC BY-NC-ND

This grabbed my attention because of some of the discussions both semantic and technical at #lodlam back in May and some of the architectural conundrums facing linked open data enthusiasts.

“more than 40 experts involved in various Persistent Object Identifier (POID) communities met for a Knowledge Exchange seminar to discuss the challenges and opportunities involved in interoperability between multiple PID-systems.  Three major systems – Handle, URN:NBN and DOI – presented their current state of affairs and examples of their systems in practice….”

The presentations from this seminar are online and provide some food for thought for the techies thinking around how to set up IDs in linked open data systems.

So I figure this community if it isn’t already aware of this discussion might like to be.  I know this is a conundrum that many of those involved with undertaking ANDS funded projects are trying to get their heads around what identifier systems to use and there has been a heap of documentation made available on the ANDS website in an effort to support this.  There is information to guide those into the area of system identifiers; there are several pages designed to inform the newby, familiar, and the expert on persistent identifiers, and there is a focused page on DOI (Digital Object Identifiers).

If you’re interested to know more about the Party Infrastructure soon to be launched in Australia through the National Library of Australia, keep your eye on the NLA Party Infrastructure project wiki.

I hope some of this information comes in handy!

Ingrid @1n9r1d

Jun 6 2011

Proposed: a 4-star classification-scheme for linked open cultural metadata

One of the outcomes of last week’s LOD-LAM Summit was a draft document proposing a new way to assess the openness/usefulness of linked data for the LAM community. This is a work in progress, but is already provoking interesting debate on our options as we try to create a shared strategy. Here’s what the document looks like today, and we welcome your comments, questions and feedback as we work towards version 1.0.



A 4 star classification-scheme for linked open cultural metadata

Publishing openly licensed data on the Web and contributing to the Linked Open Data ecosystem can have a number of benefits for libraries, archives and museums.

  1. Driving users to your online content (e.g., by improved search engine optimization);
  2. Enabling new scholarship that can only be done with open data;
  3. Allowing the creation of new services for discovery;
  4. Stimulating collaboration in the library, archives and museums world and beyond.

In order to achieve these benefits libraries, museums and archives are faced with decisions about releasing their metadata under various open terms. To be open and useful as linked data requires deliberate design choices and systems must be built from the beginning with openness and utility in mind. To be useful for third parties, all metadata made available online must be published under a clear rights statement.

This 4-star classification system arranges those rights statements (e.g. licenses or waivers) that comply with the relevant conditions (2-11) of the open knowledge definition (version 1.1) by order of openness and usefulness: the more stars the more open and easier the metadata is to used in a linked data context. Libraries, archives and museums wanting to contribute to the Linked Open Data ecosystem should strive to make their metadata available under the most open instrument that they are comfortable with that maximizes the data’s usefulness to the community..

Note: This system assumes that libraries, archives and museums have the required rights over the metadata to make it available under the waivers and licenses listed below. If the metadata you want to make available includes external data (for example vocabularies) you may be constrained by contract or copyright to release the data under one of the licenses below.

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is not contingent on anything
  • metadata can be combined with any other metadata set (including closed metadata sets)

as a provider:

  • you are waiving all rights over your metadata so it can be most easily reused
  • you can specify whether and how you would like acknowledgement (attribution or citation, and by what mechanism) from users of your metadata, but it will not be legally binding

This option is considered best since it requires the least action by the user to reuse the data, and to link or integrate the data with other data. It supports the creation of new services by both non-commercial and commercial parties (e.g. search engines), encourages innovation, and maximizes the value of the library, archive or museum’s investment in creating the metadata.

★★★ Attribution License (CC-BY / ODC-BY) when the licensor considers linkbacks to meet the attribution requirement

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution by linkback to the data source
  • metadata can be combined with any other metadata set, including closed metadata sets, as long as the attribution link is retained

as a provider:

  • you get attribution whenever your data is used

This option meets the definition of openness, but constrains the user of the data by requiring them to provide attribution (in the legal sense, which is not the same as citation in the scholarly sense). Here, attribution is satisfied by a simple, standard Web mechanism from the new data product or service. By using standard practice such as a linkback, attribution is satisfied without requiring the user to discover which attribution method is required and how to implement it for each dataset reused. Note that there are other methods of satisfying a legal attribution requirement (see below) but here we propose a specific mechanism that would minimize the effort needed to use the data if the LAM community collectively agrees to it. Also note that even this simple (ideally shared) attribution method could prevent some applications of linked data if linkbacks are required by many datasets from many sources.

★★ Attribution License (CC-BY / ODC-BY) with another form of attribution

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can be combined with any other metadata set (including closed metadata sets)

as a data provider:

  • you get attribution whenever your data is used by the method you specify

This option meets the definition of openness in the same way as the linkback attribution open,  but requires the user to provide attribution is some way other than a linkback, as specified by the data provider. The provider could specify an equally simple mechanism (e.g. by retention of another field, such as ‘creator’ from the original metadata record) or by a more complex mechanism  (e.g. a scholarly citation in a Web page connected to the new data product or service). The disadvantage of this option is that the user must discover what mechanism is wanted by the particular data provider and how to comply with it, potentially needing a different mechanism for each dataset reused. For large-scale open data integration (e.g. mashups) this option is difficult to implement.

★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can only be combined with data that allows re-distributions under the terms of this license

as a provider:

  • you get attribution whenever your data is used
  • you only allow use of your data by entities that also make make their data available for open reuse under exactly the same license

This option meets the definition of openness but potentially limits reuse of data since if more than one dataset is reused and if each dataset has an associated Share-Alike license. Under an Share-Alike license, the only way to legally combine two datasets is if they share exactly the same SA license, since most SA licenses require that reused data be redistributed under exactly same license. If the source datasets had different Share-Alike licenses originally (e.g. CC-BY-SA and ODC-ODbl) then there is no way for the user to comply with the requirements of both source data licenses so this option only allows users to link or integrate data distributed under one particular SA license (or one SA license and any of the other license or waiver options above). In the LAM domain, where significant value is created by combining datasets, the Share-Alike license requirement severely reduces the utility of a dataset.

Related Material

Jun 3 2011

Users, uses, service

Yesterday at LOD-LAM we talked about users and what users might want to do with data (and thus what we could create for users from LOD). Here’s the mind map of that:
user verbs

Jun 3 2011

Library Linked Data cloud – a teaser

Following up on this afternoon’s dork shorts, with Tom Baker presenting W3C’s Library Linked Data incubator, and Adrian Pohl telling us about the really useful ckan.net, here’s a graphical rendering of library linked datasets on the CKAN LLD group, courtesy of William Waites:

Snapshot CKAN LLD group

We plan to keep updating and use this in one of our incubator’s deliverables (draft in progress here, comments welcome as for the main LLD report draft). The idea is to get something closer to our community than the ever growing general LOD cloud.

So, we’ve got a start for the “L” in LAM. But where are As and Ms? C’mon!!!

I guess augmenting such a library graph with the published datasets from museums and archives could be a first work item for a W3C community group on LOD-LAM…