Experimenting with the British Library’s Data Content

University of Bristol, Monday 21st March

On 21st March, the University of Bristol hosted the British Library Labs in a day-long event that explored the British Library’s Digital Collections. Armed only with a free GW4 (GW4 is the alliance between Cardiff, Exeter, Bristol and Bath Universities) notebook and pen, and a dream of creating a sentient life-form that would combine Shakespeare’s way with words with the suave elegance of Cary Grant (an AI fantasy that I like to call Eamon Holmes-bot), I settled down for a day of terrific presentations, insights, and thought-provoking discussion.

IMG_5806The purpose of the day was both to showcase some of the projects that had used the British Library’s digital content in innovative ways and to encourage delegates to work with British Library in the future on new projects and to share their ideas with them. And, if there is one thing I took away from this event, it is that the British Library really want to work with you. According to a rough estimate, the BL holds around 180 million items, of which between 1-2% has been digitised. Now, depending on which way you look at it, either that means there is still a lot of digitising left to be done, or that the BL are already sitting on a lot of digital content – a quick calculation reveals that to be 3, 600,000 items, at the top end of the scale. The BL are so keen to work with researchers, artists and entrepreneurs on this material because not only does it promote the BL as an institution and their resources, but it also helps them to make sense of this vast amount of digital data.

One such project was Cardiff Digital Network’s very own Illustration Archive (illlustrationarchive.cardiff.ac.uk). Led by CDN member, Professor Julia Thomas, the project attempted to make sense of the illustrations found in the ‘BL’s Million’ – the name used to refer to the million images that were taken from the 65,000 books that Microsoft digitised for the BL around 2008. Illustration, unlike verbal text (which can be OCR-ed relatively easily), provides a vast challenge for both computer scientists and humanities researchers because without tagging (a process that requires human agency) these images are not searchable.  The Illustration Archive, then, as fellow CDN member and software developer on the project, Ian Harvey, discussed in his presentation at this event, makes these illustrations searchable online by using crowd sourcing within a framework that combines machine learning with machine vision, in a ‘positive feedback-loop’. The importance of this work lies in how it can help us to better understand our literary history more fully and satisfyingly as illustrations were a significant part of literary production. By neglecting them digitally – because computers do not deal very well with images – means we lose a whole other dimension in how these works created meaning.

One of the ways that the BL supports and engages scholars is through public-engagement activities like competitions and awards. The BL Labs Competition asks researchers to use the BL’s digital content in creative, exciting and inspiring new ways. The winners of the Competition then work at the BL for five months before showcasing their work at the annual Labs symposium. The BL Labs Awards, meanwhile, recognises work that has already been completed using this content. In 2015, one of the winners was Dr Adam Crymble from the University of Hertfordshire. Crymble used the ‘BL’s Million’ to experiment with crowdsourcing via a bespoke 1980’s arcade cabinet. Called Crowdsource Arcade, the cabinet was installed with games that would help with the tagging process. According to Crymble, ‘this project takes the crowdsourcing experience off the web and puts it into a replica machine, replete with joysticks and plastic shiny buttons. This old interface put to new uses acknowledges that people increasingly associate their computers with work and by providing a digital experience that doesn’t feel like a computer, we can tap into energy currently reserved for play.’ One of the games you can play on the machine is Art Treachery where you play the ‘art thief’ and use a torch to find pieces of art that you have been asked to steal from a gallery whilst being chased by robot guards. It sounds tremendous.

Much like the work of BL Labs itself. What struck me most about the day in Bristol was not just the amount of digital data the BL has, but the friendliness of the Lab team and their openness to share ideas and willingness to collaborate. The deadline for this year’s competition is 11th April. I had better start working on my proposal for Eamon Holmes-bot.


Illustration Archive: http://illustrationarchive.cardiff.ac.uk/
BL Labs Competition: http://labs.bl.uk/british+library+competition
BL Labs Awards: http://labs.bl.uk/british+library+awards
Crowdsource Arcade: http://goo.gl/nfg9d5

–Michael Goodman

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s