The SKOS HASSET project had several technical objectives and deliverables:
- to create SKOS-HASSET by applying RDF to an existing, well-respected and well-used thesaurus (HASSET)
- to bring HASSET and ELSST into a single framework at database level
- to improve and update HASSET’s online user-facing webpages, hosting SKOS-HASSET and using open source technologies wherever possible
- to extend ELSST’s online management interface (http://elsst.esds.ac.uk/) to facilitate the release of new versions of the thesaurus products
Following agreement from the JISC in October 2012, the second and fourth of these original technical objectives were refined and amended. Two wide-ranging objectives in fact became three more specific ones. This was done in response to a changed requirement landscape and to pave the way for further, and more in-depth development work for which we’ve received additional funding.
Rather than bringing both HASSET and ELSST together on a single platform and tweaking the ELSST management interface, it was agreed to:
- test the alignment of the HASSET and ELSST hierarchies by injecting HASSET terms into ELSST and testing that the combined hierarchies work
- establish version control system for new releases
- release a new version of ELSST, testing the mechanism
These actions will provide us with a good, solid base on which we can entirely re-imagine the management interface and underlying data, rather than tweaking an existing system.
SKOS-HASSET was created and validated and released online on 26 February 2013. As was documented in a previous blog, we used Pubby as the publication tool. This previous blog from Darren Bell describes the work undertaken to achieve this objective.
The product is available as genericode, Turtle and RDF.
The HASSET web pages have been extended and enhanced with new, SKOS-related information and a browseable version of the thesaurus. This HASSET browser (in beta at present) obtains its data from WCF REST services, supplying Json objects obtained from the relevant database queries. Select boxes have been used to generate the humanly-browseable structure for HASSET. Initially, a proof of concept was set up using asp.NET web forms. This was further developed to enhance the users’ experience, by allowing searches within the terms, while also protecting the Archive’s intellectual property. These new and updated pages were released on 27 March 2013. Feedback from users is welcomed.
An online licence form for requests to download and use the entire thesaurus is also being developed. We expect to release this within 2013.
Alignment of hierarchies
Information development and technical development work combined to achieve this objective.
Our project officers, Lorna Balkan and Suzanne Barbalet, compared all the hierarchies within HASSET and ELSST. Those which differ were thoroughly investigated, with all the history and log files consulted and the extent of the issues identified. The following results were found:
- terms that are in HASSET but not in ELSST:
the majority of these will remain but will not be deemed to be ‘core’ terms;
those considered to have international applicability have been added to the ELSST comments file for discussion with CESSDA colleagues
- terms that are in ELSST but not in HASSET:
these were more crucial as they could have skewed the ‘core’ hierarchies;
the majority of these were methodological terms; however, a small number were concepts that had been deleted from HASSET (but not yet from ELSST) in order to maintain currency and relevance of the thesaurus. After investigation and consultation with European colleagues, it was decided that these terms should in fact be proposed as deletions from both products. This will require official international agreement; to expedite this these terms have been added to the ELSST comments file as suggestions for deletion.
Technical systems have been established to monitor any differences between the two products, using SQL Server Reporting Services. Ten reports have been set up, with alerts, to check that the hierarchies remain in alignment from now until their inclusion in a single application.
Additionally, systems have been established at the database level to identify all terms shared between the two products (known as ‘core’ terms).
A version control system has been established for both HASSET and ELSST. The following principles are being followed:
- All terms are date-stamped
- All changes to terms are recorded, no matter how small, and stored in the HASSET history file. The details of the user who made the changes are also recorded
- All version information is available to the project team via a SQL Server Reporting Services dynamic interface
- Live versions of the thesaurus products are made available at regular, agreed intervals:
- ELSST is released annually, with major increments (1.00, 2.00 etc.); minor increments are not expected, but provision has been made for them in the first year
- SKOS-HASSET as an external product is released quarterly, with minor increments and annually as a major increment (1.00, 1.01, 1.02, 1.03, 2.00 etc.)
- HASSET is constantly updated and available for use for indexing internally
- SKOS-HASSET and ELSST annual version numbers will match
In order to test and implement this system, a previously-released version of HASSET was identified and version control applied. This was version 1.00. SKOS-HASSET was then released on 26 February 2013, version 1.00. A second release (version 2.00) was then made on 25 March 2013.
From this point on, the pattern of quarterly releases began, with the next SKOS-HASSET version due in the second quarter of 2013. This will be version 2.01. A formal, internal procedure for managing these releases has been established.
Release of new version of ELSST
All existing ELSST translators and IP owners have been contacted and kept informed of all developments.
A new version of ELSST, including 136 ‘core’ terms agreed to have international applicability, was released on 25 March 2013. This is version 2.00, bringing all the versions of the thesaurus products in line. Version control will be applied at the table level.
All our technical objectives have now been completed and we are ready to move forward with our new and improved thesaurus products. We are looking forward to taking this work further by entirely re-developing the management interfaces, which will give us, our international ELSST colleagues and the users of our thesauri improved and enhanced applications.