Chapter 21. Index Parameters and Full-Text Search

When defining a mixed index, a list of parameters can be optionally specified for each property key added to the index. These parameters control how the particular key is to be indexed. JanusGraph recognizes the following index parameters. Whether these are supported depends on the configured index backend. A particular index backend might also support custom parameters in addition to the ones listed here.

21.1. Full-Text Search

When indexing string values, that is property keys with String.class data type, one has the choice to either index those as text or character strings which is controlled by the mapping parameter type.

When the value is indexed as text, the string is tokenized into a bag of words which allows the user to efficiently query for all matches that contain one or multiple words. This is commonly referred to as full-text search. When the value is indexed as a character string, the string is index "as-is" without any further analysis or tokenization. This facilitates queries looking for an exact character sequence match. This is commonly referred to as string search.

21.1.1. Full-Text Search

By default, strings are indexed as text. To make this indexing option explicit, one can define a mapping when indexing a property key as text.

mgmt = graph.openManagement()
summary = mgmt.makePropertyKey('booksummary').dataType(String.class).make()
mgmt.buildIndex('booksBySummary', Vertex.class).addKey(summary, Mapping.TEXT.asParameter()).buildMixedIndex("search")
mgmt.commit()

This is identical to a standard mixed index definition with the only addition of an extra parameter that specifies the mapping in the index - in this case Mapping.TEXT.

When a string property is indexed as text, the string value is tokenized into a bag of tokens. The exact tokenization depends on the indexing backend and its configuration. JanusGraph’s default tokenization splits the string on non-alphanumeric characters and removes any tokens with less than 2 characters. The tokenization used by an indexing backend may differ (e.g. stop words are removed) which can lead to minor differences in how full-text search queries are handled for modifications inside a transaction and committed data in the indexing backend.

When a string property is indexed as text, only full-text search predicates are supported in graph queries by the indexing backend. Full-text search is case-insensitive.

  • textContains: is true if (at least) one word inside the text string matches the query string
  • textContainsPrefix: is true if (at least) one word inside the text string begins with the query string
  • textContainsRegex: is true if (at least) one word inside the text string matches the given regular expression
import static org.janusgraph.core.attribute.Text.*
g.V().has('booksummary', textContains('unicorns'))
g.V().has('booksummary', textContainsPrefix('uni'))
g.V().has('booksummary', textContainsRegex('.*corn.*'))

String search predicates (see below) may be used in queries, but those require filtering in memory which can be very costly.

21.1.2. String Search

To index string properties as character sequences without any analysis or tokenization, specify the mapping as Mapping.STRING:

mgmt = graph.openManagement()
name = mgmt.makePropertyKey('bookname').dataType(String.class).make()
mgmt.buildIndex('booksBySummary', Vertex.class).addKey(name, Mapping.STRING.asParameter()).buildMixedIndex("search")
mgmt.commit()

When a string mapping is configured, the string value is indexed and can be queried "as-is" - including stop words and non-letter characters. However, in this case the query must match the entire string value. Hence, the string mapping is useful when indexing short character sequences that are considered to be one token.

When a string property is indexed as string, only the following predicates are supported in graph queries by the indexing backend. String search is case-sensitive.

  • eq: if the string is identical to the query string
  • neq: if the string is different than the query string
  • textPrefix: if the string value starts with the given query string
  • textRegex: if the string value matches the given regular expression in its entirety
import static org.apache.tinkerpop.gremlin.process.traversal.P.*
import static org.janusgraph.core.attribute.Text.*
g.V().has('bookname', eq('unicorns'))
g.V().has('bookname', neq('unicorns'))
g.V().has('bookname', textPrefix('uni'))
g.V().has('bookname', textRegex('.*corn.*'))

Full-text search predicates may be used in queries, but those require filtering in memory which can be very costly.

21.1.3. Full text and string search

If you are using ElasticSearch it is possible to index properties as both text and string allowing you to use all of the predicates for exact and fuzzy matching.

mgmt = graph.openManagement()
summary = mgmt.makePropertyKey('booksummary').dataType(String.class).make()
mgmt.buildIndex('booksBySummary', Vertex.class).addKey(summary, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search")
mgmt.commit()

Note that the data will be stored in the index twice, once for exact matching and once for fuzzy matching.

21.2. Field Mapping

21.2.1. Individual Field Mapping

By default, JanusGraph will encode property keys to generate a unique field name for the property key in the mixed index. If one wants to query the mixed index directly in the external index backend can be difficult to deal with and are illegible. For this use case, the field name can be explicitly specified through a parameter.

mgmt = graph.openManagement()
name = mgmt.makePropertyKey('bookname').dataType(String.class).make()
mgmt.buildIndex('booksBySummary', Vertex.class).addKey(name, Parameter.of('mapped-name', 'bookname')).buildMixedIndex("search")
mgmt.commit()

With this field mapping defined as a parameter, JanusGraph will use the same name for the field in the booksBySummary index created in the external index system as for the property key. Note, that it must be ensured that the given field name is unique in the index.

21.2.2. Global Field Mapping

Instead of individually adjusting the field mapping for every key added to a mixed index, one can instruct JanusGraph to always set the field name in the external index to be identical to the property key name. This is accomplished by enabling the configuration option map-name which is configured per indexing backend. If this option is enabled for a particular indexing backend, then all mixed indexes defined against said backend will use field names identical to the property key names.

However, this approach has two limitations: 1) The user has to ensure that the property key names are valid field names for the indexing backend and 2) renaming the property key will NOT rename the field name in the index which can lead to naming collisions that the user has to be aware of and avoid.

Note, that individual field mappings as described above can be used to overwrite the default name for a particular key.