Lucy Bell and Mahmoud El-Haj, members of the SKOS-HASSET project team, recently presented the work of the project at the University of Essex’s 2012 Language and Computation Day (held on 4 October 2012). The University’s Language and Computation Group is an inter-disciplinary research group, containing members drawn from a number of departments, including Language and Linguistics, Computer Science and Electronic Engineering as well as the UK Data Archive. It organises inter-disciplinary meetings, as well as inviting external speakers to the University.
Lucy presented an overview of the work of the project: SKOS-HASSET: a project at the UK Data Archive. The presentation described the headline objectives of applying SKOS to HASSET, testing this via automatic indexing, investigating licensing and improving the user interfaces. Lucy also gave a summary of the progress that the project has made. We are now halfway through the contract and have achieved the following so far:
- SKOS has been applied to HASSET
- A system for the re-application of SKOS to other hierarchies has been established internally
- The texts have been prepared for the automated indexing case study and two corpora (catalogue records and SQB questionnaires) have already been automatically indexed
- The gold standard of manual indexing of questions is taking place (almost 21,000 questions have been indexed so far)
- An evaluation timetable, incorporating both automatic and manual evaluation of the automatic indexing results, has been drawn up
- The research on licences is well under way and the licensing report is expected soon
- Initial requirements for user and management interfaces have been drafted
- The project has been promoted via this blog and at conferences (see earlier blogs!)
Mahmoud presented a thorough review of the data mining work undertaken in Work Package 2: Keyword Indexing with a SKOS Version of HASSET Thesaurus. Mahmoud’s presentation described the work to apply the automatic indexing to the four corpora which we are targeting (catalogue records, SQB questionnaires, full text case studies and support guides and questions/variables taken from Nesstar).
Many interesting questions were fielded, and suggestions given about how to extend work in this area, post-project. A short debate was held about whether it would be possible to apply automatic indexing to the data within the Archive collections; however, it was concluded that questions of copyright, disclosure and data protection would not permit this. Good ideas were also received from colleagues in other faculties regarding ways of extending HASSET via suggestions of new terms provided via automatic indexing. These will be further examined as part of the sustainability work of the project.