Recently, I’ve released a new cool feature for the react-search-refiners SPFx sample: the ability to benefit from Natural Language Processing services like Microsoft LUIS to enhance the search user query by recognizing relevant keywords. This technique could be particularly useful to improve you search results in SharePoint. Explanations…
The complete sample is available here :
“Tag your documents with taxonomy metadata, they will appears first in the results”: simply not true
Actually, tagging your documents with managed metadata is often the first advice we give you to improve the SharePoint search. However, as pointed out a couple of years ago by this article by Mikael Svenson, it doesn’t guarantee at all they will appear first in search results. With tagging only, the mainly benefits regarding the search is you will have corresponding values generated automatically in refiners and full text searches will be performed against those values also so, eventually, surface up relevant documents. For the first benefit, unless you use react-search-refiners or a custom built Web part, refiners won’t be available in default modern Web Parts. Therefore, the only improvement will only concern full text searches. Unfortunately, by default, the auto-generated “owstaxId_<xxx>” managed properties for taxonomy values or other custom managed properties are set to ‘Context 0‘ weight group which appear to be far from the most important group when it’s time to rank results…
To deal with this issue and beyond that, improve the relevancy of search results taking into account this reality, there are some existing solutions that we already can use to enhance the SharePoint Online search engine:
- Change the search engine ranking model: an official Ranking Model Tuning App from Microsoft is available for SharePoint Online to customize the search results ranking. The documentation indicates this is not supported for SharePoint Online but the app is still available in the store. Personally, I’ve never implemented it and I consider more this option as my last option and for very specific requirements.
- Use the XRANK KQL operator in search queries to boost results ranking for specific conditions (i.e. managed properties associated to taxonomy values). Unlike native Web Part, this technique can be used in the modern search experience page but, honestly, do you think your users will write a query like this in the real world? Nop.
- Change managed properties weight groups: as suggested in the previous mentioned article, you can also change the managed properties weight groups in the search schema to match those of well known properties like Title, Author, etc. This way, matching values in theses properties will show up higher in results.
- Use SharePoint query rules: An other underestimated feature from the on-premise world. The query rules can be used to update the search query dynamically under certain conditions, like detected word, pattern, etc. It can be combined with the XRANK operator as well. Like this one, unless you use custom Web Parts, query rules can’t be used with default components or new search experience (notice the react-search-refiners SPFx supports query rules as well as the promoted results feature).
Wait…the SharePoint search engine can actually find everything!
If you’ve just smiled as this one, think twice ;). Why not…instead of tweaking the SharePoint search engine itself to improve search results, we could just send an accurate search query to the search engine?
With the right search query, the SharePoint search engine can actually surface up any document in your sites. The challenge is more how to build the most relevant query knowing your users can’t deduce it by themselves since they don’t know available managed properties nor KQL syntax (XRANK operator for instance). This is where NLP services like LUIS come into play.
This is the feature showcased in the react-search-refiners sample. The idea is quite simple: clean and interpret the input user query by recognizing relevant keywords and send them directly to the search engine as a new query to improve search results. For instance the underlying transformed search query for the user input “I’m looking for HR vacation policy” would be “HR vacation policy”.
Of course, it is a very simple demonstration but the key idea behind all this in the entity extraction capability. From the moment you are able to recognize relevant keywords in a query matching an specific intent, you can easily build a more accurate query afterwards. It can be:
- Match taxonomy values and build a query with dynamic ranking using XRANK operator.
- Match a specific intent to an arbitrary search query.
- As a first audit step, just use LUIS to get search query logs to see what users are looking for and accumulate relevant amount of data for a future use.
It is up to you to put whatever suits your needs behind the ‘relevant keywords‘ notion and actually transform it to an accurate query using your own logic like you would do with SharePoint query rules.
How it works
This sample provides you the basic building blocks to create your own logic regarding the entities recognition. Here, to be generic, keywords are just extracted using the prebuilt LUIS entity ‘keyPhrase‘ and the transformation is just to send them straight away to the search engine.
First, you will have to create a model from the LUIS portal. The creation can be done using a declarative JSON file included in the sample. Since NLP is very dependent of each language particularities I’ve created one model per language (FR & EN).
In this sample, a generic unique intent “PnP.SearchByKeywords” is used. In the real world, you might consider to create more specific intents according to your context depending what you are trying to achieve. However, in my opinion and for search purpose, start with only one intent and discover/refine to more specific intents as you go might be a good strategy. It can be done simply by inspect user queries in LUIS endpoint logs.
Entities are used to recognize and extract specific parts of an utterance. LUIS provides several entity types and also prebuilt entities for common use cases. I use here the builtin ‘keyPhrase‘ entity which is an easy way to start to recognize relevant keywords in an query. The only disadvantage: this entity can’t be trained manually meaning you can’t improve recognition with your own utterances and keywords if the default output is not satisfying. For better results tailored to your context, I suggest you to create your own ‘simple’ entity (ex: PnP.Keyword) and train it manually by submitting sample utterances or reviewing user input queries. It can be done by manually mapping a keyword to an entity for each reviewed utterance.
Publish your model
The last step is to publish your application. You have the choice to either publish to the staging or production slot. The search box Web Part configuration allows you to select the endpoint to use (staging/production).
The ‘Starter_Key‘ corresponds to the authoring key and is created by default when you create your application. It allows to manage your LUIS model programmatically (create intents, entities ,etc.). To consume the LUIS service and send requests from the search box WP, you will need to create a LUIS endpoint key in Azure ($$$). Be careful, authoring key and endpoint key are not available in the same Azure regions and could use different base service URL (ex: eastus.api.cognitive.microsoft.com for ‘East US’ region).
During the publish process, you can also benefit from the Bing spell check service. Despite it is not mandatory, when dealing with user queries, this service can be super useful to clean the query before sending it to LUIS for recognition. Using this service requires to create an associated key in Azure as well ($$$).
Build the query enhancement service
An Azure Function is used to implement the search query enhancement logic. The function is written with TypeScript and have been started from the following boilerplate project available here. I’ve already wrote a blog post about the benefits of TypeScript functions in an SPFx context so I won’t repeat myself here. Briefly, using an Azure Function avoids making REST calls to LUIS and other services directly in the Web Part and therefore reduce the code complexity by decoupling elements. It also avoid exposing API keys for cognitive services directly in the browser.
As you will see in the sample, the logic itself is very dummy and just send recognized entities as a new search query. Nothing sensational. Like I said before, it is up to you to build your own logic regarding your context. You have now all the building blocks to do so. In addition to the LUIS and Bing spell check services, an other Microsoft Cognitive Service is used here: the ‘Text Analysis‘ service. It is used to determine the language for the user input query and redirect to the associated LUIS application.
Go to the next level
If we go back to the initial problematic mentioned earlier in this article and since we have now all the required building blocks, the next level is to be able to recognize taxonomy term values (or synonyms) from the user raw query and boost them in the transformed search query using the XRANK operators. A possible target architecture can be something like this:
The objective is to leverage taxonomy term synonyms as search keywords hints for users (i.e. what would users type to get documents tagged with this term?). Then these are synchronized with LUIS using a ‘List‘ entity with the term id as the normalized value (i.e the value to use in the transformed search query). Unlike other entities, list entities are not machine learning entities like ‘keyPhrase‘. It means the recognition is performed on an exact match and not deduced using AI. This way, if an exact term label or one of its synonyms is present is the original query, it will be recognized by LUIS and sent it back by the service. Then you can use it to build the query using the corresponding search managed property with an arbitrary weight using XRANK.
For having done this with one of my clients, combined with the ‘keyPhrase‘ entity, it works pretty well. The main disadvantage of this approach is the lack of lemmatization and stemming support. Due to the exact match behavior for the ‘List‘ entity, the recognition is not performed for term variations like plurals, etc. You can resolve this either by adding manually all possible synonyms including plurals and other forms (but it can be very fastidious to do this for every term if you have a large taxonomy) or pre-process these variations at run time or sync time.
If the term is not recognized as a taxonomy label or a synonym, we simply fallback to the ‘keyPhrase‘ entity to match against taxonomy search managed properties.
With this cool addition, you can now easily experiment by yourself the benefits of NLP services and AI to improve your search queries using LUIS and other Microsoft Cognitives Services. If you are not ready yet to implement your own transformation logic, you can start to use LUIS as a search query logs tool and accumulate enough data for future use and see what users are looking for in your portal. Then you will decide what you can do to improve their results.
Also, with this approach, you can consider using this service not only with traditional Web Parts but also other channels like, for instance, chat bots ;).