In the beginning there was a problem, cascading drop downs, or more specifically the lack of them in MOSS 2007. Although there were many valiant attempts to introduce such functionality none of them really hit the spot. So, for SharePoint 2010, the good folks at Microsoft thought a bit harder about the problem. Rather than add the elusive cascading drop down functionality a new concept was born: the managed metadata service. After all, the main reason for cascading drop downs was to drive user selection of metadata through a hierarchical structure. With the Managed Metadata service (or the term store as it’s also known), there was no need for this indirection, users could simply navigate the hierarchy and select the terms they needed using a single column. You have to admit it’s a much nicer solution than a page full of drop downs.
With SharePoint 2013 the Managed Metadata service has really come to the forefront. Having a user accessible hierarchical metadata repository opens up a world of additional possibilities and we’ve now got metadata based navigation. Term store terms can represent pages within a site and configuration attached to those terms is used to dynamically generate page contents. Now (using the tried and tested example of a shopping site) we can load our product catalog into the term store and create a few template pages and we’re done.
As well as these much trumpeted uses of the term store, another use is social tagging. Since SharePoint 2010, users have been able to add their own tags to content within SharePoint. It’s been possible to follow social tags and easily find other content with the same tag. (I say easily – it’s easy once you get the whole thing hooked up to Search properly! See this post for my struggles). Social tagging is useful because it makes it easier for users to find content by allowing them to add context that can’t be inferred by a search engine. A great start but social tagging is disconnected from the Managed Metadata column that we might use when addressing the cascading drop downs problem and that brings me to my point (Finally!): Are we missing a trick when it comes to our use of managed metadata?
Why we need metadata
To address this question let’s start with examining why we need metadata at all. In my opinion there are two reasons: The first reason is to support business processes – this would include usages such as document approval status or associated client id. The second reason is to make content findable. Just as social tagging allows users to add context that is separate from the document, so metadata provides the same opportunities at the document level. Of course this is the case for all field types available in SharePoint, from a text field containing an item title through to Business Data Catalog drop down that allows a selection to be made from a line-of-business system. In fact, it’s this additional context that’s used to populate dynamically generated topic pages in SharePoint 2013 an din my opinion we’re not using this as well as we could when it comes to the Managed Metadata column.
So why is the Managed Metadata column different?
It’s different because it’s hierarchical. It’s different because it has greater potential to improve the way we find information. When looking for information we tend to adopt one of two modes: we either search, Google style, by typing keywords into a box and hoping the ranking algorithm is attuned to our needs, or we browse, by navigating through a well known hierarchy to locate the item that we need. (Think Dewey Decimal, Yellow Pages or the file system on your PC). We alternate between these modes depending on how clear we are on what we’re trying to locate. So you see how the hierarchical nature of the term store lends itself to locating information via the browse approach? But there’s more. The hierarchical approach is so last season. One of the requirements of the browse approach is that the hierarchy must be well known. So we end up in the tricky situation of trying to create a single hierarchy that makes sense to everybody. With the term store we can have many hierarchies – each hierarchy can represent an alternative approach to categorising information that meets the needs of a particular audience. We’re not quite in the semantic web utopia of the ontology but we’re getting there. Now we can browse through a hierarchy that makes sense to us to find what we need. In SharePoint 2013 we’re seeing the introduction of this idea via metadata navigation – it’s not really about shopping carts it’s about context based browsing.
How does it work with search?
So far so good. We’re making great use of the hierarchical nature of the term store to allow us to browse information. What happens if we want to search for information though? We go to the search box and we type in a term from our hierarchy and we get back a load of results. We very quickly see that the results returned have not been tagged with our term and those that have don’t feature prominently in the result set. We can change our query a bit to find only results that match our tag but that presents another problem – the ranking is arbitrary and documents that don’t include our term but feature the term text prominently elsewhere are not returned. Sure we’re missing something here?
Whirlwind tour of SharePoint Search
There is a better way but before we go there we need to understand how search works in SharePoint. It goes a little something like this:
- The crawler processes content, extracting data in the form of crawled properties. So, for example, the title of document will be stored in one crawled property whereas the user that created the content will be stored in another. Depending on the type of document other metadata may be extracted such as language, creation date etc.
- The aim of the crawling process is to create two structures, this first is the search index which contains text based, searchable crawled property data. The second is the property database which contains crawled properties that have been designated as metadata (this designation is done via the crawled property/managed property mapping process). The mapping of crawled properties to managed properties is known as the search schema and as we’ll see this plays an important role i our overall search experience.
The important thing to understand is that when we perform a search, generally we’re using the search index. The results of our search are governed by a ranking model that uses data stored in each of the crawled properties and the search schema to determine how relevant a given item is to the query performed. For example, if the title crawled property is mapped to the title managed property and a match is found, it’ll achieve a high ranking because the ranking model assigns a high weight to the title managed property (and therefore it’s associated crawled properties). The result of a search index query is a result set containing items, each with a numerical ranking calculated based on the ranking model.
You’ll notice earlier, I said that generally when you perform a search you’re using the search index. That’s not always the case. If you search for content containing a particular managed property value, you’re using the property database instead. Effectively you’re saying “search this specific set of crawled properties”. A property database search doesn’t rank results since there’s no criteria on which to rank them, we only have one managed property. Either the property value exists or it doesn’t as a consequence the result of a property search is a list of results all with the same rank.
So this explains what we see when we try to search for a term. If we search for the term text, we’re searching the index and all occurrences of the text influence the ranking. The fact that it’s matches a term store term is ignored. If we do a property search for a term store term, sure enough we get back only items that have that term attached to them but the ranking is arbitrary. While the property search approach works well if there are a small number of items tagged with the same term, it get’s painful as the number increases. Finding documents that have a particular social tag is useful if there is 10 or 20 but if there’s a few hundred or even a few thousand it’s useless.
How do we improve the situation?
There are three ways to make better use of the term store to locate tagged information.
1. Term set Granularity
Having a term property search return 10 or 20 results is manageable. Users can look through these results to determine which is most suited to their purpose. If there are thousands of results we have a problem. The goal is therefore to ensure that the number of items tagged with a particular term is relatively low. That means developing and maintaining a term set with an appropriate degree of granularity. In much the same way as if you put all of your documents in a single folder on your hard drive called ‘Documents’, it would be easy to find what you needed when you had a few but over time, as the number increased, it would be time consuming and ultimately unmanageable. At that point you’d possible add sub-folders to further categorise the documents or restructure your filing structure in some other way.
2. Term based refinement
As we’ve seen, property searches are not ranked and are therefore useless for finding a specific document in a large corpus containing the same property values. However, the property database has another important use – it allows us to refine the results of an index query based on managed property values. So for example, we can search the index for all documents containing particular keywords and the refine this result set to only those documents that were tagged with our term store term. This approach is common on shopping sites such as Amazon, where a product search will yield a long list of results and a refinement panel that allows us to select particular features or aspects of the product that we’re interested in.
3. Prioritising term ranking
By default, term text is treated like any other text attached to a document. It’s extracted into a particular crawled property and added to the search index. If we search for the text of a term, the document will feature in the result set somewhere regardless of whether the term actually features in the document itself. As we saw, however, there is no special treatment of term text. Above we looked at ranking models and how they can be used to give particular metadata greater influence when it comes to determining ranking. By developing a custom ranking model that assigns a priority to term data we can make term searches more useful.
Although we have three techniques that we can use to make better use of the term store to make content easier to locate, these approaches each contribute to a wider strategy. No single tactic will be effective in every case. For example, prioritising term ranking is pretty useless if every document is tagged with the same few terms. Conversely, prioritising term ranking works well with term based refinement, where users can return a list of document tagged with one (or more) terms and refine the result set by selecting from a list of other terms.
Is anybody actually going to build a shopping system in SharePoint? I doubt it. SharePoint is a tool for collating all of an organizations knowledge and making it easily accessible to the right people at the right time. All of an organizations knowledge is quite a lot though! We very quickly get into information overload territory and the problem isn’t, “where can we put all this information?”, it’s, “where did I put that proposal document last month?”. Making information findable is a critical factor in any SharePoint deployment. There are many aspects to this, starting from a sound information architecture and carrying on all the way through to an intuitive and feature rich search interface. When the rubber hits the road and we start looking at how this can be implemented in SharePoint, Managed Metadata and the term store has a huge role to play. This article examines the functionality available and highlights the strengths and weaknesses of the platform as well as providing a few suggestions for making best use of the term store in search.