Exploring Digitised Official Publications: Large-scale Text Analysis of “Britain and UK Handbooks”

By Lucy Havens

As a Digital Library Research Intern at the National Library of Scotland (NLS), I’ve contributed to several tools for digitally exploring NLS collections. On the NLS Data Foundry website, the NLS publishes its digitised collections and collections data, as well as tools for exploring that data. Prior to my internship, the NLS had published tools for exploring its map collection and geospatial datasets. Now, its tools include Jupyter Notebooks for conducting text analysis. I created five Jupyter Notebooks for five collections on the NLS Data Foundry which explore digitised text and metadata.

What is a Jupyter Notebook? A Jupyter Notebook is an interactive document that can display text, images, code, and data visualisations. Jupyter Notebooks have become popular in data science work because they facilitate easy documentation of data cleaning, analysis, and exploration. Explanatory text can describe what code will do and comment on the significance of the results of code after it runs. Live code can create and display text, images, tables, and charts. The data on which code runs can be sourced from a file, a folder of multiple files, or from a URL (an online data source). Even if you don’t have the Jupyter Notebook software (which is free and open source!) downloaded to your computer, you can still interact with Jupyter Notebooks using MyBinder, which runs the Notebooks in an Internet browser.

Why use a Jupyter Notebook? Jupyter Notebooks are useful for exploring collections as data, helping new research questions to be asked that complement close readings of text with distant readings. Using a coding language such as Python (which I used as NLS Digital Library Research Intern), linguistic patterns such as word occurrences, and diversity of word choice can be measured across thousands of sentences in a matter of seconds. Thanks to the technical fields of computational linguistics and natural language processing, developers have created libraries of code that make it easy to reuse code that answer common questions in text analysis. One such library of code is Natural Language Toolkit, often referred to by its abbreviation, NLTK.

Jupyter Notebooks are useful for writing code to explore digitised collections because they support NLS collections data in achieving the FAIR data principles: 1. Findability – Jupyter Notebooks can be assigned a digital object identifier (DOI) to facilitate their findability. The Jupyter Notebooks on the NLS Data Foundry have DOIs and are also available on GitHub, a platform for creating and sharing open source coding tools. 2. Accessibility – As mentioned previously, a Jupyter Notebook is an open source software platform that anyone can download to their own computer, or that can be run in an Internet browser. 3. Interoperability – Jupyter Notebooks do not depend on a particular operating system or Internet browser (they can be run with Windows, macOS, Linux, etc.; and
in Chrome, Firefox, Safari, etc.). Furthermore, the Jupyter Notebooks on the NLS Data Foundry include links to their data sources (which are .TXT files on the Data Foundry). Every data source has licence information that provides guidance on how to use and cite the data. 4. Reuse – Jupyter Notebooks promote the reuse of NLS collections data because the Notebooks can be edited live in a browser, whether using MyBinder online or a local version of the software downloaded to your computer. You can edit both the explanatory text and code of a Jupyter Notebook, making it easier to write code even if you don’t have prior programming experience.

Exploring Britain and UK Handbooks in a Jupyter Notebook One of the five collections I used as a data source during my time as NLS Digital Library Research Intern is the Britain and UK Handbooks. The Handbooks dataset I used contains digitised text from official publications that report statistical information on Great Britain and the United Kingdom between 1954 and 2005. In the Exploring Britain and UK Handbooks Jupyter Notebook, I organise the data exploration process into four sections: Preparation, Data Cleaning and Standardisation, Summary Statistics, and Exploratory Analysis. The Notebook serves as both a tutorial for people who would like to write code to analyse digitised text, and a starting point for research on the Britain and UK Handbooks.

In Preparation, I load the files of digitised Britain and UK Handbooks available on the Data Foundry’s website.

Image 1: Excerpt of the Preparation section from Exploring Britain and UK Handbooks

In Data Cleaning and Standardisation, I create several subsets of the data that normalise the text as appropriate for different types of text analysis. For example, to analyse the vocabulary of a dataset (e.g. lexical diversity, word frequencies), all the words of a text source should be lowercased so that “Mining” and “mining” are considered the same word. In computational linguistics, this process is called casefolding.

Image 2: Excerpt of the Data Cleaning and Standardisation section from Exploring Britain and UK Handbooks

In Summary Statistics, I calculate and visualise the frequency of select words in the Handbooks.

Image 3: A data visualisation from the Summary Statistics section from Exploring Britain and UK Handbooks

In Exploratory Analysis, I group the Handbooks by the decade in which they were published and compare the occurrences of select words in the Handbooks over time.

Image 4: A data visualisation from the Exploratory Analysis section from Exploring Britain and UK Handbooks

For More Information If you’re interested in learning more about using Jupyter Notebooks for large-scale analysis of official publications (or other digitised and digital collections), here are a few resources to get you started:

• National Library of Scotland’s Data Foundry (see the Tools page for Exploring Britain and UK Handbooks and four other Jupyter Notebooks)

• Tim Sherratt’s introduction to working with Jupyter Notebooks for analysing gallery, library, archive, and museum collections, along with many other Notebooks that comprise his GLAM Workbench

• The CERL blog post about the NLS Data Foundry supporting scholarship that approaches “collections as data”

• The NLS Digital Scholarship’s collections-as-data GitHub repository that contains all five Jupyter Notebooks created for the NLS Data Foundry

By Lucy Havens 23 September 2020

Public Petitions to Parliament 1833-1918

The National Library of Scotland has set up trial access to Public Petitions to Parliament, 1833-1918 and this is available to registered users until 1st October 2020. There is a link for feedback provided. Please let us know if you find the resource useful and that will assist us in our decision to subscribe.

The ‘Public Petitions to Parliament’ is part of the U.K. Parliamentary Papers resource, focusing on the Select Committee on Public Petitions in the years 1833 to 1918. It includes descriptive records for every one of the over 900,000 petitions accepted by Parliament and the full text of each petition that the Committee transcribed.

OECD digital hub for coronavirus (COVID-19) content

For the next few months, all content related to the global coronavirus crisis is fully accessible for all users on the OECD iLibrary.

A trusted source of timely, reliable information, OECD is making all content related to coronavirus COVID-19 easily accessible via a dedicated digital hub – www.oecd.org/coronavirus.  Updated on a daily basis, the site provides a single entry point for OECD’s latest evidence, analysis and advice on economic and social policy responses to the pandemic.

All of the OECD iLibrary content is freely available to anyone with a residential address in Scotland and registered with the National Library of Scotland. There is guidance on how to register and access this resource.

Archiving Scotland’s response to COVID-19

This image has an empty alt attribute; its file name is corona-5017820_640-640x410-1.jpg

If you look at traditional media such as newspapers and magazines just now it often feels like everything is about coronavirus. The National Library of Scotland as you would expect will collect the newspapers, official publications and magazines that appear during the pandemic and when they are published the inevitable books that will chronicle this period.

Just now though we are trying to collect the websites and webpages that document the impact of COVID-19 on Scotland and how the nation has reacted. We are collecting everything from official Scottish Government advice to blogs and social media. This will become a permanent resource on COVID-19 and Scotland as well as the wider United Kingdom that will be available long after the pages we collect have disappeared from the internet.

My colleague Trevor Thomson is one of the team doing this. For the last few weeks and no doubt many weeks to come Trevor has been at home bent over a red hot laptop identifying and capturing hundreds of websites relating to the pandemic. Trevor explains what we have been doing and why below.

By early March 2020 it was apparent that the coronavirus (COVID 19) outbreak was going to affect Scottish society in substantial ways. As with many national events a great deal of material has been produced online that addresses all aspects of the pandemic – the output is vast, but the Library has been striving to collect web based material representative of the coverage and gather it together in one place in the UK Web Archive.

One of the great aspects of collecting online material, of course, is that it is available anywhere there is broadband hooked up to a PC or laptop. It is also a saving grace that the means of collecting and tagging URLs for the web archive is also available online – and access to the software is not restricted to physical presence in a particular institution or building. It is therefore a perfect job for working from home.

The first change the virus caused to our lives in Scotland was the cancellation of sports and theatre as it became clear that large public gatherings were likely to lead to the infection spreading more quickly. If you follow the arc of the collecting you will see we targeted for collection the websites of theatres and other cultural institutions as well as the governing bodies for sport as they began to react to the virus. We then targeted coverage of these cancellations in local and national newspapers and on the news pages of the BBC and STV.

As social isolation, social distancing and the lockdown were introduced the focus of the collecting changed to capture the radical effects of staying at home. Online information issued by local authorities on school closures and other matters as well as by transport providers, places of worship and the reactions and advice issued by the Scottish Government were targeted for collection. A selection of business reaction from employers and advice and support emanating from chambers of commerce was targeted for collection. Volunteers and charities have done admirable work responding to the needs of our most vulnerable citizens and their online presence often in the form of social media has been and will continue to be captured.

The greatest impact of the outbreak has been on the health service treating people who have contracted the virus. We have collected material, information and advice issued online by the NHS and social care partnerships throughout the country. Scotland also has a notable medical research response and this has been reflected in the collecting. More hidden impacts of lockdown such as the strains on families and mental health have also been targeted for collection.

As with most activities at this time it has been a communal activity across the Library. Colleagues with expertise in an area have identified websites and collated lists whilst others have input these selections into the UK Web Archive so they can be collected. By the end of 21st April 2020 2,176 individual URLs have been identified for collection based on their relevance to documenting the COVID-19 outbreak and this work of course continues.

In due course the full results of this project will be presented as a focused collection alongside broader collections on the coronavirus and its impact on the United Kingdom in the UK Web Archive which can be found at http://www.webarchive.org.uk

This post originally appeared on the National Library of Scotland Blog

National Library of Scotland launches Data Foundry

Data foundry site page

As part of its Digital Scholarship service, the National Library of Scotland has launched a website for its data collections.

The new Data Foundry site presents Library collections as data in a machine-readable format, widening the scope for digital research and analysis.

Techniques like content mining and image analysis can now be carried out using the Library’s collections. It features more than 70GB of data, including digitised text and images, metadata collections, map data and organisational data.

Digitised Library collections available as data through the site include some great official publications collections with more to follow.

Datasets from more Library material like British military lists, audiovisual collections and web archives are also planned to be published as the site is regularly update.

The National Library of Scotland: what’s in it for you?

Originally posted on the Local Government Information Unit Scotland Blog

As an information service, LGiU Scotland is committed to maximising access to quality information for those working in local government – even if it’s not directly from us! LGiU Scotland’s Hannah Muirhead met up with Fiona Laing, Official Publications Curator at the National Library of Scotland, to explore how elected members and others working in local government might benefit from the library’s vast and quite underused information archive.

Picture1

Although it may look like a solid block of stone from the outside, the National Library of Scotland is one of the most extraordinary buildings in Scotland. Behind those walls is a gateway to 120 miles of shelves which store 30 million items. What this means is that it is extremely unlikely that the library can’t be useful to you in some way. Whether you are trying to understand a historic policy change, get to grips with something scientific, economic, environmental, cultural, or political; or find out more about a local area, community, industry, or hobby – there’s probably something at the National Library of Scotland that will be of use.

Continue reading

Visit to National Library of Scotland’s Map Collections 20th Oct 2.30pm

CILIP’s Government Information Group has organised a visit to the National Library of Scotland  Map Library on the 20th Oct. maps-of-scotland

The National Library of Scotland Map Library is one of the ten largest map collections in the World, holding around 2 million maps, as well as atlases, gazetteers, and a growing collection of digital map datasets. This visit will include a talk describing the main highlights of the map collections, their users, and the growing ways the content is delivered online through http://maps.nls.uk, as well as a brief tour to view maps themselves and storage facilities. Chris Fleet is Map Curator at the Library, where he has worked since 1994, with particular responsibilities for digital mapping.

This visit is open to SWOP members and their colleagues

Reserve your place here

State Papers online now available from the National Library of Scotland

If you are resident in Scotland you can apply for a reader’s card online to gain access to this digital manuscript database covering 200 years of British history, from the reign of Henry VIII to the end of the reign of Queen Anne, describing both domestic and overseas activity and events. This resource contains over 3 million manuscript pages and over 500,000 fully searchable calendar and catalogue entries, with links between the two types of document when they are related. The resource also includes introductory essays, research tools, and an image library.

 

Explore the amazing resources at the National Library of Scotland at Kelvin Hall

KH Sign

 

 

 

 

If you haven’t been to the National Library of Scotland’s new premises at Kelvin Hall it is certainly worth a visit for the Moving Image archive alone.

video-wall-02a

However, as well as  film, Kelvin Hall opens up access to ALL of the National Library of Scotland’s  electronic legal deposit material.  This includes journal articles, e-books and archived websites.

Content which the UK’s legal deposit libraries can harvest via the UK Web domain crawl includes millions of sites from the ‘.uk’ domain and a great many from other UK-based domans, such as ‘.scot’ or ‘.london’.

View and search UK content from the web using the Legal deposit UK Web Archive access tool available at the National Library of Scotland.

Social media content, such as from Twitter and Facebook, is part of the legislation, and therefore legal deposit libraries can collect them too.

Kelvin hall eresources

Electronic legal deposit material is available via the main catalogue. For access to journal articles, choose the ‘e-articles (legal deposit)’ option from the ‘quick limits’ dropdown menu. Other legal deposit material is indicated by a prompt in the catalogue record for each item asking you to accept the legal deposit terms of access and use.

You dont need a library card to access all of these resources so just walk in and explore these resources for yourself.

#LibrariesMatter because…

In the lead up to the local government election in May CILIP in Scotland will be campaigning for libraries across Scotland and showing why #LibrariesMatter.  SWOP members can help with this campaign.

If you would like to know more and become more involved take a few minutes to visit:

http://www.cilips.org.uk/advocacy-campaigns/campaigns/libraries-matter/help-us-show-librariesmatter/
????????????????????????????????????

Share your posters with us!

Introducing the SWOP Business Committee – Fiona Laing (Chair)

ME

 

I have been a part of the Scottish Working Forum on Official Publications for many years and have found being a member of the group of great benefit to my work at the National Library. I was secretary of the group from 2009 to 2013 when I replaced Fiona McParland as Chair.

Being part of this group has given me an invaluable network of colleagues to learn from, share experiences and challenges with, and hopefully find some solutions that benefit us all.

The SWOP Business Committee is the driving force of the Group. It takes forward issues that members have raised at the meetings, plans training and outreach events.

The more people and expertise we have on the Business Committee the lighter the load. A huge amount of knowledge in official publications is not essential. We need people to assist with  social media and planning events. An enthusiasm for sharing information  and helping others goes a long way.

Government information should be accessible to all.  Unfortunately we are all overloaded  with the amount of information that is out there. Anything  that can improve access to this material should be applauded and supported.  SWOP is free to join and  can also provide excellent CPD opportunities.

In my role as Official Publications Curator at the National Library of Scotland I have responsibility for ensuring that the Library is  collecting government publications, in print and digital, from the UK and Ireland, the Commonwealth countries and also  Intergovernmental Organisations such as the United Nations and the  OECD. I suggest subsets of the collection for in-house digitisation, explore funding opportunities for external digitisation projects  and I actively promote the collection, of around 2 million items, at every opportunity. I work closely with the Scottish Parliament, Scottish Government, its  Agencies  and NDPB’s  to ensure that best practice in publishing is followed and their material is deposited with the National Library of Scotland and made accessible to everyone that needs to consult it.

 

 

 

 

 

Digital content in the National Library of Scotland

Review of NLS Digital content event by Stephanie Longmuir, Information Specialist, NBS:

“I recently spent a fascinating afternoon at the National Library of Scotland (NLS), with a tour of the building, with behind the scenes access to the collections, reading rooms and a presentation about a digitisation project.

Fiona Laing, Curator of the Official Publications collection, started the tour off in the NLS exhibition space, currently highlighting their holdings and interesting items relating to plague in Scotland. The information walls also displayed statistics about how big the library is, what collections they have and what the type of material they cover. More information can be found here: http://www.nls.uk/collections

Tour pic exhibition area

It is staggering in comparison with our library which includes approximately 30, 000 items relating to the construction industry!

We then went behind the scenes to one of the floors to see examples of items readers may request to view, how they are stored, organised, reserved and distributed as well as learning about current security, safety and conservation measures in place. We also discovered more about the role of a legal deposit library and how the agents obtain items. It was astounding to think this vast space held only a fraction of the entire collection.  After this we headed back up to one of the meeting rooms to find out more about a digitisation project from Jan Usher, a Social Sciences Curator. This project aimed to digitise the House of Lords papers between 1806-2000. The project remit had undergone several stops and starts and changes along the way, but they are nearly there, with a soft launch imminent.

The final part of the afternoon was spent exploring the vast digital collections which are available on their website. Some of the resources are only available to those who have registered as a reader ticket and an address in Scotland, but there are several worth exploring which are available as open access resources.

The visit was very useful to gain an understanding of current issues facing legal deposit libraries, as well as collating digital publications and websites.”

Stephanie Longmuir,
Information Specialist, NBS

Stephanie is a chartered and revalidated librarian who works as an Information Specialist at RIBA Enterprises in Newcastle upon Tyne. She works as part of a team of technical editors for The Construction Information Service, a specialist subscription product for architects, engineers and other construction industry professionals. This product is run as a joint venture with IHS who is based in Bracknell.

This visit was organised on behalf of the CILIP Government Information Group