Suppose you are looking to buy some books by Michael Lewis. Imagine walking into a market hall that is completely filled with books, newspapers, journals and other publications all mixed together in a gigantic unordered, unstructured pile. Trying to find anything in that mess is next to impossible right, eh?
Well, welcome to the way we more or less search for information online. Not really all that different. Semantic search could be one of the ways to help us get a better search experience. This article discusses some of the benefits of semantic search and names the biggest challenges that need to be overcome to enjoy those benefits.
Regarding the market hall example you might protest:
"Yes, but don't we have Google in real life?"
Indeed, we have Google and that would be the equivalent of someone walking up to you, telling you to take it easy and not to worry since apparently text search is available on the entire contents of the market hall. Of course if you're not easily intimidated you could give that a try.
Using the keywords 'Michael Lewis' in a text search would probably get you a really long list of results. These could be anything from newspaper clippings and articles to other books that just happen to contain these two keywords.
Furthermore, how would we get from search result (on computer) to physically retrieving the actual publication from the pile? In short, this is not the way to go.
Obviously, the content needs more structure for it to be easily retrievable. Although it sounds like a mad example, it comes pretty close to the way we look for information through online search. Major difference is that on the web we deal with millions upon millions of market halls filled with content.
Naturally Google and other search engines spend a lot of time and energy in trying to figure out which results are best to show first. To be honest, they do this quite well. There is no question key based search has been fine-tuned over the years and been instrumental in helping users find information online.
Still, it leaves the user to sift through the different types of results to determine which one corresponds best with the users query. Useful relevant content may be buried under a load of higher ranked search results.
Over the years we've become so used to this key based search paradigm that we've all come up with our own strategies for finding content. For the most part, this works pretty well, but everyone has experienced frustrating moments where internet search does not yield the required results or just simply takes too much time.
As the amount of online content keeps expanding, one cannot help but wonder whether search should be faster and more efficient. In our market hall example, adding very basic structure to content like author, title and content type would allow for easy retrieval of all books by Michael Lewis. This same principle can be applied to information retrieval on the web.
What then is it... this semantic search?
We should be moving from finding (text) strings to finding actual things. The search for things and concepts (or entities) rather than text based key words is what semantic search is all about. We're not looking for the words 'Michael Lewis' but for the actual person that is Michael Lewis.
Likewise, when we look for a jaguar in the Amazon, we're not looking for (Jaguar) car parts on amazon.com! Major search engine players are beginning to tap into this.
For example, Google has been building up a database called knowledge graph over the last couple of years that identifies and links together popular things and concepts. Google's hummingbird algorithm uses this knowledge graph database to present actual information / facts. See the picture below next to the usual blue links to webpages when we search for Jennifer Lawrence.
Google Knowledge Graph in action. Searching for 'Jennifer Lawrence'
(click to enlarge)
Semantic search is about understanding the questions asked. Or to put it differently its about understanding the context and meaning of a question and coming up with real answers.
For example, when we are looking for Jennifer Lawrence movies, what we are really saying is: Give me all things of type = 'movie' with an actor='Jennifer Lawrence'
A semantic search would show me the movies I've asked for, rather than displaying a bunch of links to webpages that contain the words 'Jennifer Lawrence'. (thus leaving me to figure out which one has a listing of movies) Google can already do some of this as the picture below shows us.
Google Knowledge Graph in action. Searching for 'Jennifer Lawrence movies'
(click to enlarge)
Of course Google's scope is quite large - namely all of the worlds public web content and that makes building up a comprehensible database a monumental task. However, for the many sites out there that have a specific focus, building up a database of structured content around the area of interest is relatively simple. In fact, a lot of sites actually already have underlying structured content that they can use. This offers great potential for the use of semantic search.
Take the British Forestry Commission, for example. The commission's website holds a large collection of parks and forests that can be found in England and Scotland. This is structured information. Unfortunately, they fail to put it to good use for their search engine (Google custom search)
Suppose I want to browse through a list of parks in England. I enter the keywords 'england parks' which gives me the following results:
Google keyword search with search terms 'england parks'
(click to enlarge)
It's obvious that the text based search does not give satisfactory results:
- The first result is an annual report in PDF.
- The second, information on car parking
- only the third result contains information about a park in England.
The text based search engine has no idea what the entered keywords actually mean and can only try to match them in the underlying content.
Alternatively, a semantic search would have given you parks in England that you can browse through based on fancy characteristics such as location/county, habitat, activities etc.- kind of like browsing through an e-commerce store. The semantic search can do all this by recognizing the keywords mentioned above. For this it uses information that is already available in the commission's database of parks. This database just needs to be unlocked and linked to the search function.
Unfortunately, as I've argued before
, the search function within many websites remains neglected, leaving users in despair.Explore and discover
It's important to note that semantic search should not be considered as the one and only
future of search. It doesn't and shouldn't replace key based search as a whole but rather compliments where key based search falls short.
For example, if you're looking for the book 'The big short' by Michael Lewis, key based search should be able to find it fast and easy. This is because you know exactly what you're looking for. Therefore, the question is easy to formulate and very precise - giving the search engine what it needs to retrieve the info.
On the other hand, discovering new relevant stuff online with a broader search term such as 'the financial crisis' can be somewhat frustrating. This is where semantic search really starts to shine.
Semantic search is really good at giving you all the information that surrounds a certain topic or thing (entity) and showing you how it all links together. This allows the user to browse through content and actively follow paths that lead to relevant information.
So, when to use semantic search may depend on the search mode
you find yourself in.Getting me dates and jobs?
Ever figured out why it is so easy to (re)find people on Facebook, even when they have relatively common names? I'm sure you have. It's all about the connections, right?
Indeed, people and things that are closest to you and your interests are shown first. Similarly, you can use LinkedIn to discover new companies and get introductions through your own personal network.
Facebook and LinkedIn can do this because they use semantic technologies. Just like Google Knowledge graph, Facebook and LinkedIn store people, things and concepts in a big interconnected network and offer easy discovery and retrieval through semantic search. For example, LinkedIn makes it easy to disambiguate between Robert Walters, the company and Robert Walters, the person.Semantic search in LinkedIn(click to enlarge)
Not only does this added structure help find what you need, the strength of the connections within the network determines what to show first.
So basically, you've been using semantic search for years, without even knowing about it.So it's here, it's useful...what's next?
Well, given examples are just the beginnings of the use of semantic search. The reason why we still have a long way to go is that compared to traditional methods, semantic search is actually quite hard.
To really get semantic search going, we face a number of challenges. I'll illustrate these by going back to the market hall example.
If we want to find books by Michael Lewis, there needs to be a mechanism that understands the question and recognizes the concepts 'book' and 'author' and somehow link these to person 'Michael Lewis'. Next to this, the books themselves will need to be labelled and categorized (or annotated) with this information.
Thirdly, when the results to our question are shown, it needs to be clear what we are actually looking at.
In short, for semantic search to work the following three challenges
(thank you Daan for input) need to be overcome:
1) Formulating a meaningful question
2) Adding meaning to content
3) Presenting results in a meaningful way
All three steps/challenges require considerable effort by both people and computers. I'll discuss these in a series of follow-up articles on semantic search.