Networked Knowledge Organization Systems and Services

The 5th European Networked Knowledge Organization Systems (NKOS) Workshop

Workshop at the 10th ECDL Conference, Alicante, Spain

September 21, 2006


Successful submissions to the NKOS workshop at ECDL 2006


2. Lauser , B., Sini, M., Liang, A., Keizer, J. and Katz, S.
From AGROVOC to the Agricultural Ontology Service / Concept Server. An OWL model for creating ontologies in the agricultural domain.

FAO, Rome, Italy
Boris.Lauser@fao.org
4.5. (longer paper);

This paper illustrates the conversion from a traditional thesaurus in agriculture (AGROVOC) to a new system, the Agricultural Ontology Service Concept Server (AOS/CS). The Concept Server will serve as a multilingual repository of concepts in the agricultural domain providing ontological relationships and a rich, semantically sound terminology. The Food and Agriculture Organization recently developed the underlying model for this new system in the Web ontology language OWL. In this paper, we describe the purpose of this conversion and the use of OWL and highlight in particular the core features of the developed OWL model. We go on to explain how it evolves and differs from the traditional thesaurus approach. [Full Word Version]

Keywords

Ontologies, Thesauri, Semantic Web, OWL, Classification Schemes, Metadata, AGROVOC, AOS

Back to top


4. Nicholson, D. and McCulloch, E.
HILT Phase III: Design requirements of an SRW-compliant Terminologies Mapping Pilot.

HILT, Centre for Digital Library Research, Glasgow, UK
d.m.nicholson@strath.ac.uk.
9.5. (slightly longer);

Beginning of paper:

As Zeng and Chan (2004) note, Interoperability of knowledge organisation systems (KOS) is a key issue in today's networked environment. It is an issue likely to impact, in time, on the semantic web vision (Berners-Lee et al, 2001), but is more usually tackled at present in an information retrieval context . Information services employ a plethora of different subject schemes to describe their resources. In some cases, they use recognised standards, in others 'in-house' or even uncontrolled schemes. Either way, the practice acts as a barrier to effective cross-searching by subject over distributed information services. The issue has attracted a good deal of interest in recent years. Potential solutions proposed include linking or switching between schemes, mapping, derivation/modelling (see for example Doerr, 2001; Chan and Zeng, 2002), and automatic or semi-automatic classification (see for example Koch and Vizine-Goetz, 1998; Godby et al, 1999; Ardo, 2004). CARMEN (2000), LIMBER (2000), Renardus (2002), and MACS (2005) are amongst a range of recent projects that have tackled the problem, and key international players such as OCLC ( http://www.oclc.org/ ) have also done relevant work (see http://www.oclc.org/productworks/terminologiespilot.htm).... [Full Word Version]

Back to top


6. Price, S., Lykke Nielsen, M. and Delcambre, L.
The Feasibility of Using the Semantic Components Model for Indexing Documents in Digital Libraries.

CS, Portland State Univ., OR, USA
prices@cs.pdx.edu
9.5. (pdf);

Finding one or more documents that exactly answer a targeted information need often fails, especially in digital libraries that lack the hyperlink structure that is so successfully exploited by the page rank algorithm. In the absence of extensive hyperlinks, successfully matching document requests to document content is essential. Assignment of keywords using human intellectual processes is expensive and prone to inconsistency. Automated full-text indexing is less expensive but requires the searcher to anticipate the language used in relevant documents. We have developed a new model, which we call Semantic Components, that leverages expert knowledge about how information is organized and expressed within the domain and is intended to facilitate precise searching in domain-specific libraries. We hypothesize that semantic component indexing will yield improved search results over automated full-text indexing and that indexing using this model will be faster (and therefore cheaper) and more consistent than human keyword assignment. [Full PDF Version]

Back to top


12. Miles, A. SKOS
Preparing for Standardization.

Description

The Simple Knowledge Organisation System (SKOS) is a formal language for representing controlled structured vocabularies. SKOS is an application of the Resource Description Framework (RDF), and provides a basis for using controlled structured vocabularies within distributed and decentralised applications.

Initially published as deliverables of the SWAD-Europe project [1,2,3,4,5], and subsequently published as W3C Working Drafts [6,7,8,9,10] after further development within the context of the W3C's Semantic Web Best Practices and Deployment Working Group, there is support for a W3C Recommendation Track specification based on this work. This presentation therefore focuses entirely on a discussion of the appropriate scope for a SKOS W3C Recommendation.

It is proposed that the purpose of SKOS be defined as the formal representation of controlled structured vocabularies intended for use within information retrieval applications. Within this general statement of purpose, it is proposed that a clearly defined set of requirements be established by:

  1. The set of representation styles that must be supported.
  2. The set of query types and query evaluation strategies that must be supported.

By "representation style" is meant for example the conventions for the layout of a thesaurus on visual media as defined in ISO 2788:1986 and BS 8723-2.

By "query type" is meant for example atomic queries, or compound queries where multiple references to vocabulary units may be composed in some way. Note that a query type is an abstract concept, and is independent of the various means by which an application may implement this functionality in a user-interface.

By "query evaluation strategy" is meant for example the direct evaluation of a query against an index, or an expanded evaluation of a query where the expansion is achieved according to some expansion algorithm, and where relevance metrics are derived in order to rank query results.

By supporting a representation style is meant that it is possible to generate a representation in the given style from a formal SKOS representation via an entirely automated process. By supporting a query type and query evaluation strategy is meant that the formal SKOS representation of an index over a collection of items using a controlled vocabulary is sufficient to enable the implementation of the given query type and query evaluation strategy within an information retrieval application.

A number of representation styles are reviewed, indicating which of these is not supported by SKOS at the time of writing, and discussing options for enabling support. In particular, known issues in the representation of thesauri are presented, including the annotation of non-preferred terms, thesauri that employ UF+ and UFO links, and the use of node labels and arrays.

A classification of query types is proposed, integrating the retrieval functionalities normally associated with a classification scheme and the functionalities normally associated with a thesaurus within a unified framework, and in particular providing a basis for solving the ambiguity that currently exists where searchers and/or indexers may coordinate vocabulary units.

This classification is based on the notion of a compound query, which consists of multiple components where each component is a reference to a conceptual unit in a controlled structured vocabulary, and where a component may be required, optional or required-absent. An atomic query is simply a compound query with a single required component.

The notion of a compound query may be extended to allow each component to consist of either a reference to a vocabulary unit, or a coordination of references, thus overcoming the inherent ambiguity in the meaning of the keyword AND as used in some currently deployed user interfaces, and solving the problem of false hits in coordinated searches. It is proposed that it be a requirement for SKOS to support this extended notion of a compound query.

A number of query evaluation strategies are reviewed, including some simple algorithms for achieving query expansion and relevance metrics. The aim is to further an understanding of (1) how these strategies can be used to maximise the value and utility of information retrieval applications based on controlled structured vocabularies and manual or semi-automated indexing, (2) where a particular strategy is most appropriate, and (3) how each strategy can be implemented above an underlying SKOS/RDF representation of an index. This latter point is fundamental to establishing the sufficiency of the SKOS representation framework.

It is briefly discussed how query expansion strategies can make the problems of change management in controlled vocabularies and of semantic mapping between controlled vocabularies tractable, and it is suggested that declarative support for solutions to these problems be in scope for SKOS.

[1] SWAD-Europe Deliverable 8.1 - "An RDF Schema for Thesauri (SKOS-Core 1.0 Guide)"
http://www.w3.org/2001/sw/Europe/reports/thes/8.1/

[2] SWAD-Europe Deliverable 8.3 - "RDF Encoding of Multilingual Thesauri"
http://www.w3.org/2001/sw/Europe/reports/thes/8.3/

[3] SWAD-Europe Deliverable 8.4 - "Inter-Thesaurus Mapping"
http://www.w3.org/2001/sw/Europe/reports/thes/8.4/

[4] SWAD-Europe Deliverable 8.5 - "RDF Encoding of Classification Schemes"
http://www.w3.org/2001/sw/Europe/reports/thes/8.5/

[5] SWAD-Europe Deliverable 8.8 - "Migrating Thesauri to the Semantic Web (SKOS-Core 1.0 Guidelines for Migration)"
http://www.w3.org/2001/sw/Europe/reports/thes/8.8/

[6] SKOS Core Guide W3C First Public Working Draft
http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20050510

[7] SKOS Core Vocabulary Specification W3C First Public Working Draft
http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20050510

[8] Quick Guide to Publishing a Thesaurus on the Semantic Web W3C First Public Working Draft
http://www.w3.org/TR/2005/WD-swbp-thesaurus-pubguide-20050517

[9] SKOS Core Guide W3C Second Public Working Draft
http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102

[10] SKOS Core Vocabulary Specification W3C Second Public Working Draft
http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20051102

CCLRC - Rutherford Appleton Laboratory, Didcot, UK
a.j.miles@rl.ac.uk
11.5. (mail)

Back to top


13. Kruk, R.S., Haslhofer, B., Piotrowski, Westerski, A. and Woroniecki, T.
The Role of Ontologies in Semantic Digital Libraries. [Full PDF Version]

DERI, National Univ. of Ireland, Galway et al
bernhard.haslhofer@researchstudio.at
11.5. Sem. Web session (pdf)

Back to top


14. Voss, J.
Knowledge Organization with Wikipedia: Joining the Free Encyclopaedia and Digital Libraries.

Extended Abstract

Five years after it was founded almost unintentionally, the free-content encyclopaedia Wikipedia is still growing with astonishing success. Many thousand volunteers have created 4 million articles in more than 100 languages and wikipedia.org is one of the 20 most visited websites worldwide, an impact libraries can only dream of (according to (Alexa, 2006) archive.org is ranked around 120 and loc.gov around 1,200, to mention only the most popular library sites). While libraries work with elaborate rules and experts there is no dedicated management in Wikipedia. Everyone can directly edit almost any article and standards only emerge by means of self-organization. This makes Wikipedia less reliable while libraries apperently provide objective and accurate information. Nevertheless many online searchers prefer Wikipedia as first reference. But the differences are not that wide: libraries and Wikipedia both aim to collect and arrange knowledge and try to make it accessible for everyone with information needs. Despite different methods they both share common goals so cooperation between libraries and Wikipedia makes sense.

In this presentation several strategies to connect Wikipedia and digital libraries are shown. In making Wikipedia part of a digital library (or the other way round), especially knowledge organisation systems play an important role. Since 2005 there is a cooperation between the German Wikipedia and the German National Library that involves the usage of Personennamendatei (PND) name authority file (Hengel and Pfeifer, 2005). Around 100,000 biographic articles in the German Wikipedia are equipped with metadata about persons that contain a PND number in around 20,000 cases. This number generates a link on which Wikipedia users can directely navigate to library catalogues to find publications from or about a specific person. A reciprocal link to Wikipedia articles is planned and the method could also be expanded to other authority files and maybe even subject headings (Voss, 2005). Subject indexing in Wikipedia is handled with so called categories. In fact this system of tagging Wikipedia articles is the first collaborative tagging system with multiple hierarchical relationships: a collaboratively created thesaurus (Voss, 2006). Based on this categories mappings between Wikipedia and other information systems could also be established. Methods of thesaurus and ontology matching will help to get concordances if legal restricions are solved. First experiments in Wikipedia show that indexing Wikipedia articles with a foreign classification is not suitable, but German Wikipedia's categories in the field of library and information science could successfully be mapped to the JITA Classification System of Library and Information Science.

Beside categories you can also directly use Wikipedia articles to index other resources. Wikipedia contains many articles about complex concepts but also articles about explicit entities like people, organisations, places and so on. Each article is identified by a unique name, so Wikipedia can also be seen as a controlled vocabulary. Homonyms are handled with disambiguation pages (http://en.wikipedia.org/wiki/Wikipedia:Disambiguation) that list all meanings of a word with links to the according articles, and synonyms are joined with redirects (http://en.wikipedia.org/wiki/Wikipedia:Redirect) which link to preferred terms. Wikipedia is also the first strict hypertextual encyclopaedia. Methods of network analysis and data mining will provide networks of concepts that can be used for browsing and mapping knowledge. An extension of MediaWiki (the software Wikipedia runs on) adds typed links and supports RDF (Vlkel et al, 2006) this lets you create semantic networks with a wiki and may integrate Wikipedia into the promised Semantic Web. Beside normal hyperlinks between Wikipedia articles there are specific links to other databases that can be used for integrated services. These special links mostly contain a unique identifier per article. Examples are ISBN and ISSN numbers, laws, patent numbers, digital object identifiers and links to the the Internet Movie Database (IMDb). A third type of links are links between Wikipedias in different languages (different language versions of Wikipedia are mostly independent and have different highlights and specialities).

Wikipedia provides a vast number of possibilities to connect its knowledge structure with other systems, especially digital libraries. However the Wiki paradigm with no firm rules and directions may be unfamiliar. Its self-organization allows flexible and quick solutions; virtually everything can be changed at any time but if there is no one willing to work on a specific task voluntarily then it won't be processed . Also essential to Wikipedia is its restriction to free content. All textual content is licensed under the GNU Free Documentation License (GFDL) that allows anyone to use, modify and republish the content as long as authors are named and derivated works are published under the same license. Keeping this in mind Wikipedia content can be used in portals, catalogue enrichment and other context. Connections with other databases facilitate browsing-structures over multiple information systems. The various prospects of collaboration are not even sighted.

References

Alexa.com (2006): Traffic Rankings. http://www.alexa.com/site/ds/top_500 (accessed May, 2006)

Hengel, Christel and Pfeifer, Barbara (2005): Kooperation der Personennamendatei (PND) mit Wikipedia". In: Dialog mit Bibliotheken, volume 17, number 3, page 18-24.

Vlkel, Max; Krtzsch, Markus; Vrandecic, Denny; Haller, Heiko; Studer, Rudi (2006): Semantic Wikipedia. In: Proceedings of the 15th international conference on World Wide Web, May 2006. http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_id=1055

Voss, Jakob (2005): Metadata with Personendaten and beyond. In: Proceedings of the first Wikimania conference, August 2005. http://meta.wikimedia.org/wiki/Transwiki:Wikimania05/Paper-JV2

Voss, Jakob (2006): Collaborative thesaurus tagging the Wikipedia way. April 2006. http://arxiv.org/abs/cs/0604036

About the Author

Jakob Voss studied computer science and library science at Humboldt-University, Berlin. He is member of the board of Wikimedia Germany and involved in the German Wikipedia since 2002.

Humboldt Univ., Berlin, Germany (?)
jakob.voss@nichtich.de
11.5.

Back to top


15. Terwilliger,J. F., Delcambre. L.M.L. and Logan, J.
User-Oriented Knowledge Organization Systems.

Overview

Data, as it resides in most databases or file systems, is incomprehensible to the average person. This problem arises largely because database schemas are not designed to be user-friendly; rather, they are intended to be efficient for storage and retrieval (Figure 1). Some databases serve as the back-end for sophisticated, domain-specific applications, such as electronic medical records or other clinical software (Figure 2). These software applications are obviously designed to be used by domain experts. We want to reuse the structure and content of the user interface for this kind of software application (used to capture data) to describe the data and allow domain experts who are not computer scientists to query the underlying data using this familiar structure for their data. In addition, we want to support users that would like to transform data in such databases into whatever form is necessary to perform a task, such as a statistical study. [Full Word Version]

CS, Portland State et al, OR, USA
jterwill@cecs.pdx.edu
11.5. (long)

Back to top


17. Falquet, G., Nerima, L., Mottaz Jiang,C-L. and Ziswiler, J-C.
Integrating Semi-formal Knowledge Organization Structures.

ABSTRACT

The last years, we had been working on hyperbook structures to build digital libraries. A hyperbook is made of a domain ontology containing the most important concepts of the field or subject in question and of information fragments linked to the ontology's concepts. Fragments are text junks and serve primarily to define a concept, but they also can describe different aspects of the concept or can contain examples, references, etc. Optionally, links between fragments and concepts can be typed. The digital library is build by alignment of the different hyperbook ontologies that identifies equivalent and similar concepts. The aim is to create an extended view of each hyperbook in the form of a virtual document that provides readers with supplementary information found in the other hyperbooks, like additional examples, term definitions, more detailed or more general information, etc.

Much in the spirit of Marshall and Shipman outlining that "The difficulty of knowledge acquisition, representation and reasoning has a long history of being underestimated", the aim of inventing hyperbooks is to build a knowledge organization structure that is as easily to construct as low structured KOS (for instance glossaries or metadata annotated models like learning objects), but has a stronger semantic structure that can be used for the integration process.

Many research communities proposed to write full-fledged ontologies that result in a KOS with a strong semantic structure. With such kind of ontologies, it might be possible to process logic reasoning, which might become more difficult with a hyperbook structure that just contains a small domain ontology and textual fragments. On the other side, ontology built according to specifications like the ones proposed in the RDF/OWL family are time-consuming to construct and suitable only for homogenous domains. For instance, it might be possible to create an OWL ontology describing all elements of a house, but it seems nearly impossible to write an ontology about the United Nations under OWL specification.

Anyway, we found evidence through different example that the hyperbook structure is suitable to integrate hyperbooks into a digital library of hyperbooks. But concepts must be linked to representative fragments that either define, or describe, or show examples, or simply refer to the concept.

Last year, we tried to integrate two hyperbooks about agriculture politics made by domain specialists. A complete automatic integration approach allowed sorting out relations indicating equivalent and similar concepts.

Last winter, we let graduated students of a computer science course model hyperbooks about the topics of the course. We found a clear difference when comparing the students' hyperbook with the one build by domain specialists. Students found appropriate concepts, but finally didn't take a lot of care to select the fragments. This probably because we provided them with slides out of the course presentation and with selected publications around the course topics, so fragments we not easily to find and to write. Domain specialists can take advantage of documents of their daily work, so it might be easier for them to create well-done hyperbooks. We conclude that hyperbook creation is fastest when there exists already adequate material in a knowledge base that easily can be fragmented. Particularly, glossaries or similar KOS might be the best starting point for the construction of hyperbooks.

We propose the following integration process to assemble the digital library: First, we compute semantic similarities between concepts of the hyperbook. The mapping approach relies on both conceptual structure comparison (based on word matching, semantic neighbourhood matching and the positions in the "is-a" and "part-of" hierarchies) and fragment comparison. The existence of semantic similarity between fragments increases the concepts' similarity. Secondly, the weighted similarity links are used to generate a reading interface of an extended hyperbook by presenting the book content within its semantic context. We built a prototype to generate virtual documents of formal hyperbooks and to apply filtering, organization and assembling mechanisms. To avoid information overflow by attaching any kind of links to the initial hyperbook, we designed a graphical user interface generator that produces expand-in-place links for larger textual fragments that are showed to users after activating the corresponding link.

In the example with graduated students, it was more difficult to find appropriate similarities in a fully automatic integration process as with the hyperbooks built by domain experts. In this case, we need an alternative way to validate the determined relations. In social navigation and social bookmarking when a user follows a link or bookmarks a page, this action increases the score (or weight) of this link respectively page. So each one can benefit from everyone's experiences, discoveries, etc. We use a similar principle to establish more reliable semantic relations between ontologies if concept description is not well done. It process as follows:

When a user reads a hyperbook, the system automatically proposes links to (or expansion in place of) fragments from other hyperbooks according to the alignment described above. Then, the user can do three things:

1 and 2 reinforce the value of the similarity of the concepts involved in the link inference (2 is stronger), 3 weakens the similarity value. The underlying hypothesis is that the user will immediately see what is completely irrelevant.

Future work consists of adding functions to "create your own book". Such a tool would contain link proposals that the user can include or not, or introduce a very simple "language" to express inter-book queries to create dynamically derived fragments, e.g. "Find examples of this concept in other books" or "find more descriptions of this concept". Here again, the user should be able to accept or reject answers.

Centre universitaire d'informatique CUI, University of Geneva, Switzerland
Jean-Claude.Ziswiler@cui.unige.ch
11.5.

Back to top


18. Tudhope, D.
A tentative typology of KOS: towards a KOS of KOS?

The NKOS community has had a longstanding aim to describe the different kinds of Knowledge Organisation Systems, as relevant to networked terminology services, and relate them to similar knowledge schemes in other information disciplines. An initial taxonomy of KOS was one of the first items posted on the NKOS website. However progress since then has been sporadic and there is a need to further advance this agenda. This proposal aims to take some further steps down this route. The presentation will review and delineate different types of KOS and discuss appropriate roles for KOS in the Semantic Web. It will attempt to build on previous NKOS work in this area, in particular by discussing use contexts for KOS at a level high enough to allow some rough comparison with other disciplines contributing to the Semantic Web. [Full Word Version]

University of Glamorgan Wales, UK
dstudhope@glam.ac.uk
11.5. (rtf)

Back to top


19. Lykke Nielsen, M., Eslau, A.G.
Indexing challenges in work place information retrieval. [Full PDF Version]

Royal School of Library and Information Science, Aalborg, Denmark MLN@db.dk 15.5. accepted delay (pdf)

Back to top


Content by: Traugott Koch of UKOLN.
Page last revised on: 08-Jun-2006