Table of Contents
Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. Architected from the ground up for use in distributed environments where reliability and scalability are must haves, Elasticsearch gives you the ability to move easily beyond simple full-text search. | ||
-- Elasticsearch Overview |
JanusGraph supports Elasticsearch as an index backend. Here are some of the Elasticsearch features supported by JanusGraph:
- Full-Text: Supports all
Text
predicates to search for text properties that matches a given word, prefix or regular expression. - Geo: Supports the
Geo.WITHIN
condition to search for points that fall within a given circle. Only supports points for indexing and circles for querying. - Numeric Range: Supports all numeric comparisons in
Compare
. - Flexible Configuration: Supports embedded or remote operation, custom transport and discovery, and open-ended settings customization.
- TTL: Supports automatically expiring indexed elements.
- Collections: Supports indexing SET and LIST cardinality properties.
- Temporal: Nanosecond granularity temporal indexing.
Please see Appendix B, Version Compatibility for details on what versions of ES will work with JanusGraph.
Important | |
---|---|
JanusGraph currently requires Elasticsearch’s dynamic scripting feature. The |
JanusGraph supports two distinct configuration tracks for Elasticsearch. "Track" in this chapter means a set of configuration options.
- The new
interface
track - The legacy track
These tracks are mutually exclusive. A configuration uses one track or the other, but not both simultaneously. The interface
track is recommended over the legacy track. The interface
track, introduced in 0.5.1, offers a superset of the legacy track’s functionality. The legacy track will be maintained through at least the end of the 0.5.x patch series.
Note | |
---|---|
JanusGraph’s index options start with the string " |
Tip | |
---|---|
It’s recommended that index names contain only alphanumeric lowercase characters and hyphens, and that they start with a lowercase letter. |
The interface
track is activated by setting either one of the following:
# Activate the interface track with ES's Node client index.search.elasticsearch.interface=NODE index.search.backend=elasticsearch
# Or activate the interface with ES's TransportClient index.search.elasticsearch.interface=TRANSPORT_CLIENT index.search.backend=elasticsearch
The NODE
and TRANSPORT_CLIENT
values tell JanusGraph to use either the Node or Transport client, respectively, and activates the interface
configuration track. One or the other must be specified to use this track. Do not specify both in the same configuration.
Tip | |
---|---|
This chapter assumes some familiarity with the difference between Elasticsearch’s "Node client" and "Transport client". For background on these two Elasticsearch clients and their comparative tradeoffs, see Talking to Elasticsearch and Java Clients in the official Elasticsearch documentation. |
Configuration on the interface
track proceeds through roughly the following steps:
- If the JanusGraph config option
index.[X].conf-file
is set, it’s interpreted as the name of an Elasticsearch config file and its contents are copied into the ES transport or node configuration - Any JanusGraph config options starting with
index.[X].elasticsearch.ext.
are copied verbatim to the ES transport or node configuration - Any other ES-related JanusGraph config options listed in JanusGraph’s config file are copied into their respective ES transport or node configuration settings (Chapter 12, Configuration Reference lists these options)
script.disable_dynamic
is set to false
Arbitrary Elasticsearch settings can be specified through one or several of the following mechanisms.
The index.[X].conf-file
option is interpreted as a path to an Elasticsearch YAML/JSON/properties file. The file must exist. If the path is relative, and the path appears in a JanusGraph properties file on disk, then the path will be interpreted relative to the directory containing the JanusGraph properties file in which it appears. The file will be opened and loaded using Elasticsearch’s ImmutableSettings.Builder.loadFromStream
method. This method will attempt to guess the file content’s syntax by the filename extension; for this reason, it’s recommended that the filename end in either ".json", ".yml", ".yaml", or ".properties", as appropriate, so that ES uses the correct parser. Here’s an example configuration fragment:
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE # or TRANSPORT_CLIENT index.search.conf-file=/home/janusgraph/elasticsearch_client.yaml
# /home/janusgraph/elasticsearch_client.yaml node.name=alice
JanusGraph iterates over all properties prefixed with index.[X].elasticsearch.ext.
, where [X]
is an index name such as search
. It strips the prefix from each property key. The remainder of the stripped key will be interpreted as an Elasticsearch configuration key. The value associated with the key is not modified. The stripped key and unmodified value are passed into the Elasticsearch client configuration. This allows embedding arbitrary Elasticsearch settings in JanusGraph’s properties. Here’s an example configuration fragment showing how to specify the Elasticsearch node.name
setting using the ext
config mechanism:
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE # or TRANSPORT_CLIENT index.search.elasticsearch.ext.node.name=bob
Tip | |
---|---|
The |
After processing conf-file
and ext
, JanusGraph checks for the following common options. On the interface
config track, JanusGraph’s only uses default values for index-name
and health-request-timeout
. If ignore-cluster-name
or cluster-name
is unset in JanusGraph’s configuration, then Elasticsearch’s internal defaults, any setting from conf-file
, and any setting from ext
apply, in that order. See Chapter 12, Configuration Reference for descriptions of these options and their accepted values.
index.[X].elasticsearch.index-name
index.[X].elasticsearch.cluster-name
index.[X].elasticsearch.ignore-cluster-name
index.[X].elasticsearch.health-request-timeout
In addition to common options described in Section 23.2.1, “Common Options”, the Transport client requires one or more hosts to which to connect. These are supplied via JanusGraph’s index.[X].hostname
key. Each host or host:port pair specified here will be added to the Transport client’s round-robin list of request targets. This setting has no analog in an Elasticsearch configuration file and must be set through JanusGraph’s index.[X].hostname
option. Here’s a minimal Transport client configuration that will round-robin over 10.0.0.10 on the default Elasticsearch native protocol port (9300) and 10.0.0.20 on port 7777:
index.search.backend=elasticsearch index.search.elasticsearch.interface=TRANSPORT_CLIENT # or NODE index.search.hostname=10.0.0.10, 10.0.0.20:7777
Furthermore, the Transport client accepts the index.[X].client-sniff
option. This can be set just as effectively through the conf-file
or ext
mechanisms. However, it can also be controlled through this JanusGraph config option. This option exists for continuity with the legacy config track.
In addition to common options described in Section 23.2.1, “Common Options”, the Node client also respects the following JanusGraph config options. See Chapter 12, Configuration Reference for descriptions of these options and their accepted values.
index.[X].directory
index.[X].elasticsearch.ttl-interval
index.[X].elasticsearch.client-only
index.[X].elasticsearch.local-mode
index.[X].elasticsearch.load-default-node-settings
Unlike the Transport client, the Node client can be completely configured through conf-file
or ext
. If you provide a complete Node configuration via conf-file
or ext
, then none of the JanusGraph options listed above are required, and it’s fine to leave them unset in JanusGraph’s configuration. The JanusGraph options listed above are retained mainly for convenience and continuity with the legacy config track.
However, there is one unique aspect to index.[X].directory
. When index.[X].directory
is set for Elasticsearch, it is taken as the path to a directory which will contain the ES data, work, and logs directories. These directories are created if they don’t already exist. Furthermore, when the index.[X].directory
setting appears in a JanusGraph properties file on disk and its value is a relative path, it will be interpreted relative to the directory containing that JanusGraph properties file (similar to how relative conf-file
paths are handled). That’s the difference between setting JanusGraph’s index.[X].directory
versus setting Elasticsearch’s path.data
, path.work
, and path.logs
directories: relative paths for the former are based on the directory containing the JanusGraph properties file, whereas relative paths for the latter are based on the JVM’s current working directory.
Note that index.[X].hostname
is not in the list above. The recommended way to set a list of hostnames with the Node client is to use Elasticsearch’s own config keys via ext
or conf-file
. See the Elasticsearch documentation on the discovery
module and the transport
module for relevant ES config keys. Also see Section 23.2.3.2, “Node Example: Connecting to a Remote Cluster” for an example configuration using the Elasticsearch Zen discovery module and unicast addressing.
The following JanusGraph configuration and accompanying Elasticsearch config file create a Node which uses ES’s JVM-local discovery. This means that the Node can only see other Nodes within the JVM. The Node does not listen for connections on network sockets or attempt to discover a cluster over the network. This is convenient when testing JanusGraph in a single-machine setup.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE index.search.conf-file=es_jvmlocal.yml
# es_jvmlocal.yml node.data: true node.client: false node.local: true # These paths are interpreted relative to the JVM's current working directory path.data: es/data path.work: es/work path.logs: es/logs
The following configuration is similar to the one above, except it uses ext
and the index.[X].directory
JanusGraph setting to locate the ES work, data, and log paths. When the index.[X].directory
appears in a JanusGraph properties file and is set to a relative path, that path is interpreted relative to the directory containing the JanusGraph properties file. Compare this to setting path.data
, path.work
, and path.logs
directly, which will be interpreted relative to the current working directory of the Java VM.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE # data, work, and logs subdirectories for ES will be created in # <directory containing this properties file>/../db/es index.search.directory=../db/es index.search.elasticsearch.ext.node.data=true index.search.elasticsearch.ext.node.client=false index.search.elasticsearch.ext.node.local=true
The following JanusGraph configuration and accompanying Elasticsearch config file create a Node which discovers its cluster by sending unicast packets to host1
on the default port and host2
on customport
. The Node client will attempt to learn all members of the cluster using unicast.hosts
as the initial points of contact. Since the following config sets node.data=false and node.client=true, the Node started by JanusGraph won’t store any persistent index data or attempt to become a master node. It discovers the cluster and routes requests using that information, but it doesn’t hold any important state, so it can be lost without affecting Elasticsearch’s availability or durability.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE index.search.conf-file=es_netclient.yml
# es_netclient.yml node.data: false node.client: true discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [ "host1", "host2:customport" ]
This configuration has the same effect as the one listed above, except using ext
instead of conf-file
.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE index.search.elasticsearch.ext.node.data=false index.search.elasticsearch.ext.node.client=true index.search.elasticsearch.ext.discovery.zen.ping.multicast.enabled=false index.search.elasticsearch.ext.discovery.zen.ping.unicast.hosts=host1, host2:customport
This is similar to the example in the previous section, except the Node holds Elasticsearch data. This means JanusGraph’s Elasticsearch instance will be a full-fledged member of the Elasticsearch cluster, and if the process containing JanusGraph and the ES Node dies, it could affect Elasticsearch’s availability or durability. This is an uncommon configuration.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE index.search.conf-file=es_clustermember.yml
# es_clustermember.yml node.data: true node.client: false node.local: false path.data: es/data path.work: es/work path.logs: es/logs discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [ "host1", "host2:customport" ]
This configuration has the same effect as the one listed above, except using ext
instead of conf-file
.
index.search.backend=elasticsearch index.search.elasticsearch.interface=NODE index.search.elasticsearch.ext.node.data=true index.search.elasticsearch.ext.node.client=false index.search.elasticsearch.ext.node.local=false # The next three paths are interpreted relative to the JVM working directory index.search.elasticsearch.ext.path.data=es/data index.search.elasticsearch.ext.path.work=es/work index.search.elasticsearch.ext.path.logs=es/logs index.search.elasticsearch.ext.discovery.zen.ping.multicast.enabled=false index.search.elasticsearch.ext.discovery.zen.ping.unicast.hosts=host1, host2:customport
The legacy configuration track allows running either a Transport client or a Node in JVM-local discovery mode. Running a Node that discovers the cluster over network sockets is not supported.
This track is activated by omitting the index.[X].elasticsearch.interface
option from JanusGraph’s configuration file.
Warning | |
---|---|
The legacy track is not recommended for new deployments. Consider using the newer |
The legacy track supports starting an Elasticsearch Node with JVM-local transport. Network transport and discovery are not supported on the legacy track. Due to this limitation, it’s only useful running a single-node embedded ES instance, such as in testing.
Here’s an example JanusGraph configuration that starts a JVM-local Node using the legacy config track:
index.search.backend=elasticsearch # This will create /tmp/searchindex/work, /tmp/searchindex/logs, and # /tmp/searchindex/data index.search.directory=/tmp/searchindex index.search.elasticsearch.client-only=false index.search.elasticsearch.local-mode=true
Elasticsearch will not be accessible from outside of this particular JanusGraph instance, i.e. remote connections will not be possible.
In the above configuration, the index backend is named search
. Replace search
by a different name to change the name of the index.
The legacy track supports the Transport client. This can connect to Elasticsearch nodes running on the same machine or a cluster of remote machines.
To use the Transport client on the legacy track, add the following JanusGraph options to the graph configuration file, where hostname
lists the IP addresses of the Elasticsearch cluster nodes:
index.search.backend=elasticsearch index.search.hostname=100.100.101.1, 100.100.101.2 index.search.elasticsearch.client-only=true
Make sure that the Elasticsearch cluster is running prior to starting a JanusGraph instance attempting to connect to it. Also ensure that the machine running JanusGraph can connect to the Elasticsearch instances over the network if the machines are physically separated. This might require setting additional configuration options which are summarized below.
In the above configuration, the index backend is named search
. Replace search
by a different name to change the name of the index.
This section lists the subset of ES options that are effective on the legacy configuration track. See Chapter 12, Configuration Reference for descriptions of these options and their accepted values.
index.[X].elasticsearch.index-name
index.[X].elasticsearch.cluster-name
index.[X].elasticsearch.local-mode
index.[X].elasticsearch.client-only
index.[X].elasticsearch.health-request-timeout
index.[X].conf-file
index.[X].directory
index.[X].hostname
On the legacy track, setting cluster-name
automatically enables cluster name validation. Leaving cluster-name
unset disables cluster name validation.
Elasticsearch does not perform authentication or authorization. A client that can connect to ES is trusted by ES. When Elasticsearch runs on an unsecured or public network, particularly the Internet, it should be deployed with some type of external security. This is generally done with a combination of firewalling and tunneling of Elasticsearch’s ports. Elasticsearch has two client-facing ports to consider:
- The HTTP REST API, usually on port 9200
- The native "transport" protocol, usually on port 9300
A client uses either one protocol/port or the other, but not both simultaneously. JanusGraph uses Elasticsearch’s two official Java clients. Each of these uses only the native "transport" protocol typically listening on port 9300. Although both of Elasticsearch’s ports should be secured when running ES on a public network, JanusGraph is only concerned with the latter port, so it’s the focus of this section. There are a couple of ways to approach security on the native "transport" protocol port:
- Tunnel ES’s native "transport" protocol
- This approach can be implemented with SSL/TLS tunneling (for instance via stunnel), a VPN, or SSH port forwarding. SSL/TLS tunnels require non-trivial setup and monitoring: one or both ends of the tunnel need a certificate, and the stunnel processes need to be configured and running continuously in order for JanusGraph and Elasticsearch to communicate. The setup for most secure VPNs is likewise non-trivial. Some Elasticsearch service providers handle server-side tunnel management and provide a custom Elasticsearch
transport.type
to simplify the client setup. JanusGraph is compatible with these custom transports. See Section 23.2.1, “Common Options” for information on how to override thetransport.type
and provide arbitrarytransport.*
config keys to JanusGraph’s ES client. - Add a firewall rule that allows only trusted clients to connect on Elasticsearch’s native protocol port
- This is typically done at the host firewall level. This doesn’t require any configuration changes in JanusGraph or Elasticsearch, nor does it require helper processes like stunnel. Easy to configure, but very weak security by itself.
Since 0.5.3, JanusGraph supports customization of the index settings it uses when creating its Elasticsearch index. The customization mechanism is based on but distinct from the ext
config prefix described in Section 23.2.1.2, “Embedding ES settings with ext
”. It allows setting arbitrary key-value pairs on the settings
object in the Elasticsearch create index
request issued by JanusGraph. Here is a non-exhaustive sample of Elasticsearch index settings that can be customized using this mechanism:
index.number_of_replicas
index.number_of_shards
index.refresh_interval
Settings customized through this mechanism are only applied when JanusGraph attempts to create its index in Elasticsearch. If JanusGraph finds that its index already exists, then it does not attempt to recreate it, and these settings have no effect.
JanusGraph iterates over all properties prefixed with index.[X].elasticsearch.create.ext.
, where [X]
is an index name such as search
. It strips the prefix from each property key. The remainder of the stripped key will be interpreted as an Elasticsearch index creation setting. The value associated with the key is not modified. The stripped key and unmodified value are passed as part of the settings
object in the Elasticsearch create index request that JanusGraph issues when bootstrapping on ES. This allows embedding arbitrary index creation settings settings in JanusGraph’s properties. Here’s an example configuration fragment that customizes three Elasticsearch index settings using the create.ext
config mechanism:
index.search.backend=elasticsearch index.search.elasticsearch.create.ext.number_of_shards=15 index.search.elasticsearch.create.ext.number_of_replicas=3 index.search.elasticsearch.create.ext.shard.check_on_startup=true
The configuration fragment listed above takes advantage of Elasticsearch’s assumption, implemented server-side, that unqualified create index
setting keys have an index.
prefix. It’s also possible to spell out the index prefix explicitly. Here’s a JanusGraph config file functionally equivalent to the one listed above, except that the index.
prefix before the index creation settings is explicit:
index.search.backend=elasticsearch index.search.elasticsearch.create.ext.index.number_of_shards=15 index.search.elasticsearch.create.ext.index.number_of_replicas=3 index.search.elasticsearch.create.ext.index.shard.check_on_startup=false
Note | |
---|---|
The |
Tip | |
---|---|
The |
Check that the Elasticsearch cluster nodes are reachable on the native "transport" protocol port from the JanusGraph nodes. Check the node listen port by examining the Elasticsearch node configuration logs or using a general diagnostic utility like netstat
. Check the JanusGraph configuration; try the Transport client while troubleshoot connectivity issues, since it’s easier to control which ES hosts the Transport client will use. Disable sniffing to restrict the Transport client to just the configured host list. Check that the client and server have the same major version: 0.90.x and 1.x are not compatible.
For bulk loading or other write-intense applications, consider increasing Elasticsearch’s refresh interval. Refer to this discussion on how to increase the refresh interval and its impact on write performance. Note, that a higher refresh interval means that it takes a longer time for graph mutations to be available in the index.
For additional suggestions on how to increase write performance in Elasticsearch with detailed instructions, please read this blog post.
- Please refer to the Elasticsearch homepage and available documentation for more information on Elasticsearch and how to setup an Elasticsearch cluster.