Kal Ahmed

Subscribe to Kal Ahmed: eMailAlertsEmail Alerts
Get Kal Ahmed: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: XML Magazine

XML: Article

Introducing Topic Maps

Introducing Topic Maps

Topic maps are a standard way of representing the complex relationships that often exist between the pieces of information that we use in day-to-day business processes. This article begins by discussing what topic maps are, what they can do, and what people are currently using them for. However, my main goal is to introduce the basic concepts of topic maps and their representation in XML.

The Missing Link in Information Management
Almost all XML vocabularies are designed with a single purpose: to describe information in a way that enables automated processing. We use XML to describe document structures so our documents can be rendered as HTML, WML, PDF, or some other presentation format. We also use XML so that business systems can interchange data reliably. Both the ability to render content to different output formats and the reliability of data interchange arise from a predefined agreement about how individual pieces of markup describe the information they wrap.

There's no doubt that information markup is of immense use in automated processes, but alone it can't describe the relationship between different resources or between an information resource and the subject or subjects that the information resource describes. In many systems it's these relationships that enable human beings to make sense of and organize the information they need to work with. In some respects, then, these relationships are the missing link between manageable and unmanageable information.

Topic Maps: Subject-Oriented Markup
If the current set of information-oriented vocabularies based on XML isn't sufficient for information management purposes, it would indicate that what's needed is an alternate view of the information. One such alternate view is provided by topic maps.

In a topic map, rather than focus on the information, you focus on what the information is about. The markup you create is subject oriented rather than information oriented. The fundamental unit of a topic map is a topic, an electronically processible stand-in for a thing that doesn't necessarily have to be processible itself. For example, a topic can represent a person, a building, or a concept, but it can also be used to represent processible things such as Web pages, files, or database cells. In the topic map approach the thing a topic represents is called the subject of the topic.

A single document may describe many different subjects and may establish relationships between those subjects. Using topic map constructs, a topic map author can describe the relationships between the subjects described by different topics and also point to resources that provide information related to the topic in some way. In addition to topics, the two other basic topic map constructs that make this possible are associations, which relate topics to each other, and occurrences, which connect topics to resources that contain some information related to the topic. These three basic concepts are shown in Figure 1.

A Brief History of Topic Maps
The first standard to describe the topic map approach was published by ISO early in 2000 as ISO 13250:2000. In addition to describing the paradigm itself, ISO 13250 describes an interchange syntax making use of SGML Architectural Forms and link structures defined by the HyTime standard (ISO 10744). The interchange syntax described in this first edition of the ISO topic map specification is highly flexible, enabling users to derive their own DTDs for describing their own information structures, and allowing the use of the full range of multimedia linking structures defined by HyTime.

Soon after the publication of ISO 13250, a separate group called TopicMaps.Org was formed to create a topic map interchange standard based on the ISO standard, but with its syntax expressed in XML rather than SGML and with a linking mechanism using the W3C XLink standard rather than HyTime. The result was the XML Topic Maps (XTM) specification, published early in 2001. While some topic map applications support a restricted subset of the ISO 13250 interchange syntax, almost all those available today support XTM 1.0. For this reason (and because this is an XML magazine!), I'll only describe XTM interchange syntax.

Applications of Topic Maps
Before getting into the technical detail about how topic maps are constructed, let's briefly survey the state of the art with regard to the use of topic maps. Current applications can be broken down into three major purposes:

  1. A way to improve accessibility to information
  2. A flexible, extensible data structure for standalone applications
  3. An integration layer between existing applications
Topic maps for information accessibility
This is by far the largest category of current applications of topic maps. Topic maps are being used as the structuring paradigm for portals to all sorts of information. Because they're subject oriented, they provide an intuitive way for users to find their way around large sets of information. Using a topic map, the information provider can create a high-level overview of the concepts covered by the documents. Users can then navigate the overview to find the subject area of interest before accessing the related documents. You can also go the other way - from a document to a list of all subjects in the document and from there to other documents related to the same subject(s). Because all of this structure is explicit in the topic map, navigating from subject to document and back again can be done without requiring the user to construct search terms.

Many commercial topic map tools focus on making information more findable and presenting navigation structures for topic maps. Thus, once you have your information structured using topics maps, it's possible to get a configurable, off-the-shelf application to do the work of presenting that structure to the end users.

Topic maps for programmers
A topic map is a data-driven structure. As you'll see, to create new categories of things and relationships between things, it isn't necessary to modify a fixed schema. Instead, you simply add new data to the topic map. Currently, the use of topic maps for representing data structures is hampered by the lack of a common API for accessing and modifying topic map information, but work is under way to define such an API.

Topic maps for integration
The basic structures of a topic map - topics, their relationship to each other, and their relationship to information resources - can be used to represent information harvested from a wide variety of business systems, from databases through content management systems to workflow systems. By making use of a single standard data structure and by identifying the common business subjects that each of these systems describe, it's possible to create a unified topic map view of the business and its processes.

Topic Maps in Detail
In this section I'll describe the fundamental concepts of topic maps and their expression in the syntax defined by the XTM standard.

Topics
Topics are the building blocks of topic maps. Listing 1 shows a minimal topic map (not quite the smallest possible as the <topic> element is actually optional). (Note: All XML listings in this article are slightly abridged. The full source code for each can be downloaded at www.sys-con.com/xml/sourcec.cfm.) The root element of the document is the <topicMap> element. The document uses the XTM 1.0 namespace (http://www.topicmaps.org/xtm/1.0/) and also declares the XLink namespace, which will be used for all linking inside the topic map and to resources external to the topic map. A <topicMap> element may contain any number of <topic>, <association>, and <mergeMap> elements. In this example there's a single topic that has a value only for its (required) ID attribute.

Subject identity
As already discussed, a topic represents a subject that may be some otherwise unprocessible thing. The way in which a topic identifies precisely what subject it represents is by the use of subject identifiers. A subject identifier is some resource that either:

  • Is the subject that the topic represents
    or
  • Describes the subject that the topic represents

    A subject identifier that describes the subject the topic represents is also known as a subject indicator. In XTM such a subject identifier is required to be addressable with a URI. As an example of the difference between the two uses of a URI for identifying a subject, consider the URI http://tm4j.org/. If you visit this address using a standard Web browser, you'll find a Web page describing the TM4J Project, an open source project developing topic map processing tools and applications. So this page can be considered a resource that describes the subject "The TM4J Project". The URI is also the root URI for the TM4J Project's Web site, so this same URI could also be considered a resource that is the subject "The TM4J Project Web site". These are two distinct usages of the same URI and so have different forms of markup in XTM, as shown in Listing 2.

    The URIs used to define the identity of the topic's subject are always contained within the <subjectIdentity> element. In Listing 2 the <topic> with the ID "tm4j-site" represents the TM4J Project's Web site. Because the URI actually points to the subject that the topic represents, the <resourceRef> element is used. However, the <topic> with the ID "tm4j-project" represents the TM4J Project itself, so the URI points to a resource that describes the subject and the <subjectIndicatorRef> element is used instead.

    Topic names
    The examples we've seen so far say nothing about the subjects they describe. In a topic map the information related to a subject can be provided by either names or occurrences. A topic may have any number of base names, each specified by a separate <baseName> element that contains a <baseNameString> element containing the text of the topic name. Each base name may have any number of variant names, which may be either text or pointers to other resources. This enables the use of icons or multimedia clips for the naming of topics. To distinguish variant names, each variant may declare any number of parameters, each of which is actually just a reference to a topic. The XTM standard provides topics that can be used as parameters to represent sortable variants of a base name. Listing 3 shows this usage - note that the reference to the topic defined by the XTM standard simply makes use of the XLink ability to address an element in another document entirely. This means that it's possible to reuse topics defined in other topic maps without needing to copy the topics into your own topic map.

    Occurrences
    Occurrences are typed links that point from the topic to related information resources. A topic may have any number of occurrences, each pointing to a single resource. An occurrence may be typed by referencing a topic that defines the type. This reference is made using a <topicRef> element contained within an <instanceOf> element for the <occurrence> being typed. As with the parameters, the topic we reference can even be in a different topic map document! Listing 4 shows the markup for an occurrence.

    In addition to pointing to external resources, an occurrence can simply wrap inline data. The current version of the XTM specification limits such inline resources to PCDATA only. To specify an inline resource, use the <resourceData> element in place of the <resourceRef> element, with the occurrence data specified as the <resourceData> element content.

    Topic types
    Just as occurrences can be typed by referencing a topic that defines the type, so topics themselves can be typed by referencing other topics. As with occurrences, the reference to the topic that defines the type is made using a <topicRef> element contained within an <instanceOf> element. However, a topic may have any number of types, so the XTM DTD allows the <instanceOf> element to appear any number of times as a child of the <topic> element. This flexibility allows us to create multiple classification systems and still use the same mechanism (typing) to classify topics. Listing 5 shows a topic with a single type.

    Associations
    Associations are the mechanism by which a topic map author can express the relationship between topics. An association can be regarded as a grouping of topics that are in some way related. An association consists of one or more roles, each of which may be played by one or more topics. For example, the statement "Kal Ahmed is a member of the TM4J Project" can be expressed in a topic map as an association between the topic representing "Kal Ahmed" and a topic representing "the TM4J Project". In XTM an association is created using the <association> element. Each role in the association is defined using a separate <member> element that contains any number of <topicRef> elements that point to the topics that play that role in the association.

    A very simple, uninformative association is shown in Listing 6. It's "uninformative" because it tells us only that there is some sort of relationship between the topics with the IDs "kal" and "tm4j-project". To express the nature of the relationship, we must add a type to the association. Once again, this typing is achieved by reference to a topic that defines the association type, using the <instanceOf> element. As with occurrences, an association can be typed by only one topic. Listing 7 shows this additional type information.

    With Listing 7 we now know there is a relationship between the topic "kal" and the topic "tm4j-project" and that the relationship is "member-of". But we don't know which is the individual and which is the group. Is "kal" a member of "tm4j-project" or is it vice versa? In topic maps, associations are groupings without any direction. Instead, each topic plays a specific role in the association, and it's this role that defines the nature of a topic's participation in the association.

    In XTM syntax each <member> element has an optional <roleSpec> element that contains a <topicRef> element pointing to the topic that defines the role. To disambiguate the association in Listing 7, we need to create topics for the concept of "individual" and "group" and reference these topics from the <member> elements of the association. This is shown in Listing 8.

    Scope
    The scope mechanism of topic maps enables any information provided about a topic to be qualified by defining a context within which the information is valid. Scope may be used to define several different perspectives on the same set of information. Some suitable applications of scope include:

  • Language perspectives: You can use scope to differentiate English names and occurrences of a topic from French or German names and occurrences.
  • Audience perspectives: Scope may be used to separate "beginner" resources from "intermediate" or "advanced" resources, thus enabling different sets of information to be presented to users of different levels.
  • Time perspectives: Some properties change over time - for example, many cities in the world have had different names at different times in their history.
  • Author perspectives: Scope can be used to track who said what; names and occurrences scoped by their source can be used by readers of the topic map to determine which pieces of information to trust.

    Because scope is such a powerful and general concept, topic maps don't have a predefined set of scopes, but instead allow authors to construct them using topics as the building blocks. Scope may be applied to names (as a child of the <baseName> element), occurrences (as a child of the <occurrence> element), and associations (as a child of the <association> element).

    Listing 9 shows the application of scope for differentiating between the English and German names for "The TM4J Project" and for hiding information for programmers (the API Javadoc) from other kinds of users. Note that simply specifying this scoping information is only one part of effective use of scope - the other lies in the application itself. The topic map paradigm has few rules about exactly how scope is applied and allows applications to define their own processing models for deciding how scope controls what a reader of the topic map does and doesn't see.

    Merging
    We've now covered the fundamental structures required to build a single standalone topic map. However, the "hidden" power of the topic map approach is that there's a simple, well-defined process by which two or more topic maps can be combined to enable a topic map reader to seamlessly view all the information provided by those topics maps simultaneously. The basic rule for topic map merging can be expressed as follows:

    Whenever two or more topic maps are merged, the result is a single topic map in which all topics that the processor determines to represent the same subject are merged and duplicates are eliminated.

    But how does a processor determine that two topics are about the same thing? The topic map paradigm defines three ways:
    1.   If the two topics have the same resource as their subject, then they are about the same thing.
    2.   If two topics use the same resource to describe their subject, then they are about the same thing.
    3.   If two topics have the same string value for a base name, and if the two base names are in the same scope, then the topics are about the same thing.

    (In the last of these rules, "the same scope" means a scope consisting of the same collection of topics.)

    In addition to these basic rules, a processor may use any other means to determine whether two topics are about the same thing. For example, a processor with sufficient background knowledge could determine that a topic with the base name "United Kingdom" and a topic with the base name "Royaume Uni" refer to the same subject.

    The process of merging two topic maps may be initiated by one of the following means:

  • Merging of topic maps using <mergeMap>: In this case a topic map processor must retrieve the topic map document pointed to by the <mergeMap> element and merge it with the topic map containing the <mergeMap> element. This enables modularization of topic maps by allowing one topic map to "include" another and build on the subjects it provides.
  • Merging of topic maps using <topicRef>: A topic map may contain a <topicRef> to a topic in another topic map document. In this case a topic map processor must retrieve the topic map document containing the referenced topic and merge it with the topic map containing the reference. This requirement ensures that whenever a topic is referenced from another topic map, the processing application always gets the full context of the referenced topic.
  • Manual: A topic map application may allow the user to choose to merge any set of topic maps to create a single unified view of them all.

    During merging, whenever a processor has determined that two topics are about the same thing, the topics must be replaced by a single merged topic. The names and occurrences of the resulting topic are the union of the names and occurrences of the topics merged, and the resulting topic replaces the merged topics wherever those topics are referenced (as types, players in associations, members of a scope, or as association role specifications).

    The merge process enables a set of topic maps to be created independently by both human beings and automated processes and then combined in a well-defined manner. By combining the merge process with the concept of well-known subject identities, an environment can be created in which human beings can share their knowledge of a domain and contribute their insights on subjects previously identified by automated processing. This approach is actually used by the TM4J.org Web site, where the site itself is constructed from a merge of topic maps created by human beings, a topic map generated from the XML sources of the user documentation, and a topic map generated from the Javadoc comments in the source code. The fully merged Web site enables a user browsing a class definition (from the Javadoc) to see on the same Web page a link to related FAQ entries or documentation.

    Conclusions
    The topic map paradigm is a powerful, subject-oriented approach to structuring sets of information. The building blocks of topic maps are topics that have names and point to occurrences of the subject in external resources. Associations relate topics to each other. Each topic in an association is the player of a defined role. Topics, occurrences, and associations can all be typed by other topics. In addition, topic names, occurrences, and the roles they play in associations can be specified to be valid only in a given context by the use of a scope, which is defined by a collection of topics.

    Further Reading

  • ISO/IEC 13250, 2nd ed: www.y12.doe.gov/sgml/sc34/document/ 0322_files/iso13250-2nd-ed-v2.pdf
  • XML Topic Maps 1.0: www.topicmaps.org/xtm/1.0
  • topicmapmail list (Infoloom): www.infoloom.com/mailman/listinfo/topicmapmail
  • Web resources for topic maps (Open Directory Project): http://dmoz.org/Computers/Artificial_Intelligence/ Knowledge_Representation/Topic_Maps/
  • Park, J., and Hunting, S., eds. (2002). XML Topic Maps. Addison-Wesley
  • Ancha, S., Cousins, J., et al. (2001). Professional Java XML. Wrox
  • Dodds, D., Watt, A., et al. (2001). Professional XML Meta Data. Wrox
  • More Stories By Kal Ahmed

    Kal Ahmed, an independent consultant specializing in XML technologies and information management, was a member of the TopicMaps.Org consortium during the creation of the XTM 1.0 specification. The main developer of the TM4J suite of open-source topic map tools as well as the creator and maintainer of TMTab, a freely available tool for creating topic maps, he has spoken and written widely on the subject of topic maps. More details about TM4J and TMTab can be found at www.techquila.com.

    Comments (1) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    radixweb 08/21/08 09:11:55 AM EDT

    Great Post......

    Java Consultant.....