The #SKOS-HASSET project has been up and running now for 7 weeks. In those 7 weeks we have started work on the majority of the work packages – and have made some good progress. Future blog posts will describe what we’re doing in more detail. We’d like to take the opportunity right now, however, to explain a bit more about the project’s aims and objectives, what we will be delivering and how we will measure success.
Aims and objectives
Thesauri are controlled, hierarchical lists of keywords. They describe the relationships between terms and also sometimes provide contextual notes. They are used in many online services, from library catalogues and data services to search engines. Specialist thesauri are extremely important as, often, they are the most comprehensive descriptive resource available for a particular subject. The Humanities And Social Science Electronic Thesaurus (HASSET) is an example of this.
HASSET is owned by the University of Essex. It and the European Language Social Science Thesaurus (ELSST), its multi-lingual sister product, are used globally by other social science archives and libraries. This project aims to extend HASSET’s usage by:
- applying SKOS to HASSET;
- improving its online presence;
- testing its automated indexing applications.
SKOS is a language designed to represent thesauri and other classification resources. It encodes these products in a standardised way using RDF to make their structures comparable and to facilitate interaction.
It is also vital that thesauri maintain currency. Like dictionaries, thesauri describe the changing world around them; thesaurus-creators work hard to ensure that their products are up-to-date. Applying SKOS to HASSET will mean both that its terms will be more easily available to other services and also that, potentially, HASSET itself will be in a better position to be updated via the inclusion of new hierarchies.
The project’s second aim, that of improving the thesaurus’s online presence, will enhance both the existing management interface and the user-facing pages. HASSET’s existing, browseable tree structure will be joined by a downloadable version of SKOS-HASSET.
The third aim of this project will test SKOS-HASSET’s automated indexing capacity. SKOS-HASSET will be taken as the terminology source for an automatic indexing tool and applied to question text, abstracts and publications from the Archive’s collection. The results will be compared to the gold standard of humanly-undertaken indexing.
User guidance will be created and a webinar held, showcasing the new product and its uses. Throughout the project, communication with interested external parties will take place and views sought. We are keen to hear from anyone with an interest in SKOS or automated keyword application. Please do join our HASSET-THESAURUS JISCmail list or contact the project directly.
We will be creating a number of outputs or improving some existing systems:
Output / Outcome Type
|Tool||SKOS-HASSET, the HASSET thesaurus with SKOS applied. To be released online.|
|Case study||Exemplar describing how SKOS-HASSET was tested for its automatic indexing capabilities and sharing knowledge with JISC and the wider community about how this can be done. The exemplar will include step-by-step notes on what was done, along with screen shots. To be posted online.|
|Report||Report evaluating the techniques used in applying SKOS-HASSET to automated indexing for social science data products. This report will capture the knowledge built during the creation of the exemplar and share it with JISC and the wider community. To be posted online.|
|Report||Report reviewing the licensing options available for the thesaurus and recommending the route to be taken. To be posted online.|
|Technical design||Unified Database, with all hierarchies sitting on the same platform.|
|User interface||An updated and improved set of live, user-facing thesaurus webpages, including the publication of SKOS-HASSET.|
|User interface||Extended thesaurus management interface which contains the functionality to release thesaurus versions.|
|Content||A5 promotional leaflet for SKOS-HASSET. To be printed and posted online.|
|Event||Webinar showcasing SKOS-HASSET whereby the work of the project, plus the usability of the SKOS-HASSET product, will be demonstrated to the wider community. A live webinar held and also recorded.|
|User manual||User guidance for SKOS-HASSET. To be posted online.|
|Report||Final report, reviewing and assessing all tasks, activities, techniques and lessons learned. To be shared with JISC, and to be posted online in redacted form once approved.|
Critical success factors
We’ve also identified some critical success factors. They are as follows:
- it may be used as a keyword repository within automated indexing;
- the file may be updated easily and quickly and as necessary, when new terms are added to the thesaurus;
- it will be possible for the application of SKOS to be extended to include other hierarchies in the Archive, specifically, ELSST (tested using the combined, core hierarchies only);
- the results of applying SKOS have been reviewed, written up and shared with the wider community via the blog, the online case study/exemplar and the webinar.
AUTOMATED INDEXING EXEMPLAR:
- HASSET terms have been applied automatically to all the selected texts using at least two systems and compared with the gold standard at term level;
- the results have been reviewed, written up and shared with the wider community via the blog, the online case study/exemplar and the webinar.
- all Archive-based thesaurus tables are sitting on the same platform;
- all terms are tagged with the correct identifiers.
- SKOS-HASSET is available online;
- HASSET’s browseable tree structure has been reviewed and improved;
- open source technologies have been used or information about how to apply open source technologies to the pages has been included.
- a single management interface is available for the administration of HASSET/ELSST;
- the management interface includes the functionality to update the thesaurus and to release new versions;
- open source technologies have been used or information about how to apply open source technologies to the pages has been included;
- information about the management interface has been written up and shared with the wider community.
Thanks for reading! More posts to follow on the precise work packages, our timings, budget and risks, evaluation methodologies, text/data mining and the application of SKOS to HASSET.