Hot questions for Spring Data Elasticsearch

Top 10 Java Open Source / Spring / Spring Data Elasticsearch

Question:

I'm developing a system which is planning to use elasticsearch as an data repository. I'm trying to choose the best way to develop my application that can index and query data from elasticsearch. The system I have is built on top of Spring framework.

Is it a good choice to use Spring-data-elasticsearch(https://github.com/spring-projects/spring-data-elasticsearch)?

Or is it a good choice to use elasticsearch core libraries itself?

I need to handle nested data (inner object) but Spring-data-elasticsearch seems to have no operations for that recently.

I hope I can find a solution for the question. Thanks in advance.


Answer:

Spring data elasticsearch supports most of the common feature set of elasticsearch including Nested, Inner Objects and Parent Child (recently).

When you said that want to use nested data (inner object), please be clear as elasticsearch has two concepts: Inner Object and Nested Object.

Detailed explanation can be found at managing relationship in elasticsearch

Nested document Example

Person Entity:

@Document(indexName = "person" , type = "user")

public class Person {

    @Id
    private String id;

    private String name;

    @Field( type = FieldType.Nested)
    private List<Car> car;

    // setters-getters
}

Car Entity:

public class Car {
    private String name;
    private String model;
    //setters and getters 
}

Setting up data:

Person foo = new Person();
foo.setName("Foo");
foo.setId("1");

List<Car> cars = new ArrayList<Car>();
Car subaru = new Car();
subaru.setName("Subaru");
subaru.setModel("Imprezza");
cars.add(subaru);
foo.setCar(cars);

Indexing:

IndexQuery indexQuery = new IndexQuery();
indexQuery.setId(foo.getId());
indexQuery.setObject(foo);

//creating mapping
elasticsearchTemplate.putMapping(Person.class);
//indexing document
elasticsearchTemplate.index(indexQuery);
//refresh
elasticsearchTemplate.refresh(Person.class, true);

Searching:

QueryBuilder builder = nestedQuery("car", boolQuery()
    .must(termQuery("car.name", "subaru"))
    .must(termQuery("car.model", "imprezza")));

SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(builder).build();
List<Person> persons = elasticsearchTemplate.queryForList(searchQuery, Person.class);

You can find more test cases about Nested and Inner Object at Nested Object Tests

Question:

I am in a situation where I am using Spring boot and AWS elasticsearch service. AWS Elasticsearch service which only provides REST interface.

Elasticsearch Rest Client is here.

Simply, Is it possible to use REST client with Spring Data Elasticsearch?

In other words, Does Spring Data Elasticsearch works with Elasticsearch Rest client?

Spring Data Elasticsearch is very easy to use and template provides very most functionality that I need. With Elasicsearch Rest client I have to implement all the functionality myself.


Answer:

[2019 February Update]

A see now that 3.2.0 M1 Spring Data Elasticsearch supports the HTTP client (https://docs.spring.io/spring-data/elasticsearch/docs/3.2.0.M1/reference/html/#reference)

According to the documentation (it could of course change because it's not final version so I will put it here):

The well known TransportClient is deprecated as of Elasticsearch 7.0.0 and is expected to be removed in Elasticsearch 8.0.

2.1. High Level REST Client

The Java High Level REST Client provides a straight forward replacement for the TransportClient as it accepts and returns the very same request/response objects and therefore depends on the Elasticsearch core project. Asynchronous calls are operated upon a client managed thread pool and require a callback to be notified when the request is done.

Example 49. High Level REST Client

static class Config {

  @Bean
  RestHighLevelClient client() {

    ClientConfiguration clientConfiguration = ClientConfiguration.builder() 
      .connectedTo("localhost:9200", "localhost:9201")
      .build();

    return RestClients.create(clientConfiguration).rest(); 
  }
}

// ...

IndexRequest request = new IndexRequest("spring-data", "elasticsearch", randomID())
  .source(singletonMap("feature", "high-level-rest-client"))
  .setRefreshPolicy(IMMEDIATE);

IndexResponse response = client.index(request);

[Original answer]

Currently Spring Data Elasticsearch doesn't support the communication by the REST API. They are using the transport client.

There is separate fork of Spring Data Elasticsearch (the guy needed it for AWS the same as you) where the JEST library is used and communication is made by REST:

https://github.com/VanRoy/spring-data-jest

You will find the interesting discussion under the following ticked of Spring Data Elasticsearch:

https://jira.spring.io/browse/DATAES-220

I think the Spring Data Elasticseach will need to migrate to REST on the future according to the statements from Elasticsearch team that they are planning to support only HTTP communication for ES.

Hope it helps.

Question:

I have a Spring Boot application with Spring Data Elasticsearch plugin in the pom.xml. I created a document class which i'd like to index:

@Document(indexName = "operations", type = "operation")
public class OperationDocument {

@Id
private Long id;

@Field(
    type = FieldType.String, 
    index = FieldIndex.analyzed, 
    searchAnalyzer = "standard", 
    indexAnalyzer = "standard",
    store = true
)
private String operationName;

@Field(
    type = FieldType.Date, 
    index = FieldIndex.not_analyzed, 
    store = true, 
    format = DateFormat.custom, pattern = "dd.MM.yyyy hh:mm"
)
private Date dateUp;

@Field(
    type = FieldType.String, 
    index = FieldIndex.not_analyzed, 
    store = false
) 
private String someTransientData;

@Field(type = FieldType.Nested)
private List<Sector> sectors;

//Getter and setters

I also created a repository for this class:

 public interface OperationDocumentRepository 
      extends ElasticsearchRepository<OperationDocument, Long> {
 }

I made a test that indexes three sample objects using the repository. It's pretty long so I'll post it only is needed. The fact is that the mapping created in the ES server ignores configuration set by @Field annotations:

"mappings": {
  "operation": {
    "properties": {
      "operationName": {
        "type": "string"
      },
      "dateUp": {
        "type": "long"
      },
      "someTransientData": {
        "type": "string"
      },
      "sectors": {
        "properties": {
          "id": {
            "type": "long"
          },
          "sectorName": {
            "type": "string"
          }
        }
      }
    }
  }
}

There is no info about analyzers, "someTransientData" is stored and indexed, and dateUp is typed as Long instead of Date.

A sample document requested directly from the server:

 {
   "_index": "operations",
   "_type": "operation",
   "_id": "AUyUk2cY3nXeOFxdOlQW",
   "_version": 1,
   "_score": 1,
   "_source": {
     "id": null,
     "operationName": "Second Operation Name",
     "dateUp": 1428421827091,
     "someTransientData": "Do not index or store",
     "sectors": [
       {
         "id": 2,
         "sectorName": "Health Care"
       },
       {
         "id": 3,
         "sectorName": "Construction"
       }
     ]
   }
 }

I also noted that when I run the application for the second time, at startup time I get this error, only printed when the index already exists:

ERROR 19452 --- [main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.index.mapper.MergeMappingException: Merge failed with failures {[mapper [someTransientData] has different index values, mapper [someTransientData] has different tokenize values, mapper [someTransientData] has different index_analyzer, object mapping [sectors] can't be changed from non-nested to nested, mapper [operationName] has different store values, mapper [operationName] has different index_analyzer, mapper [dateUp] of different type, current_type [long], merged_type [date]]}

It's this a bug of Spring Data Elastic Search or I'm doing something wrong?

I tried the stable version provided by spring boot and last snapshot of spring-data-elasticsearch. I also tried the embedded Elasticsearch server provided by the plugin and an external one of the current version. I got always the same results.


Answer:

I could finally replicate and solve the problem. The fact is that I was using ElasticTemplate for indexing and searching docs instead of repositories, because my business logic got a more complicated (use of aggregations, etc.).

After that, I removed the unused OperationDocumentRespository. It seems that the repository is needed for the type mapping being posted to ES server on startup. I thought having the @Document class should be enough, but it isn't.

So we have two options here:

  • Keep the OperationDocumentRepository
  • Add this line to the app startup:

    elasticsearchTemplate.putMapping(OperationDocument.class);
    

Question:

I am using Spring Data Elasticsearch 2.0.1 with Elastic version 2.2.0.

My DAO is similar to:

import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;    

@Document(indexName = "myIndex")
public class MyDao {
    @Id
    private String id;

    public String getId() { return id; }
    public void setId(String id) { this.id = id; }

    <other fields, setters, getters omitted>
}

Saving the object to ES using a repository the _id metadata field gets populated correctly. The getter and setter methods for the id field correctly return the value of the _id metadata field. But the id field within the _source field is null.

2 questions: 1) Why is the id field null? 2) Does it matter that the id field is null?


Answer:

Since you're letting ES generate its own IDs, i.e. you're never calling MyDao.setId("abcdxyz") then the _source cannot have a value in the id field.

What is happening is that if you generate your own IDs and call setId("yourid"), then Spring Data ES will use it as the value for the _id of your document and also persist that value into the _source.id field. Which means that _source.id will not be null.

If you don't call setId(), then _source.id will be null and ES will generate its own ID. When you then call getId(), Spring Data ES will make sure to return you the value of the _id field and not _source.id since it's annotated with @Id

To answer your second question, it doesn't matter that the _source.id field is null... as long as you don't need to reference it. Spring Data ES will always populate it when mapping the JSON documents to your Java entities, even if the underlying id field in ES is null.

Question:

I'm trying to use both Spring Data JPA and Spring Data Elasticsearch on the same domain object but it doesn't work.

When I tried to run a simple test, I got the following exception:

org.springframework.data.mapping.PropertyReferenceException: No property index found for type Person! at org.springframework.data.mapping.PropertyPath.(PropertyPath.java:75) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:327) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:307) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:270) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:241) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.query.parser.Part.(Part.java:76) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.query.parser.PartTree$OrPart.(PartTree.java:235) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.query.parser.PartTree$Predicate.buildTree(PartTree.java:373) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.query.parser.PartTree$Predicate.(PartTree.java:353) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.query.parser.PartTree.(PartTree.java:84) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.jpa.repository.query.PartTreeJpaQuery.(PartTreeJpaQuery.java:61) ~[spring-data-jpa-1.9.0.RELEASE.jar:na] at org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$CreateQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:95) ~[spring-data-jpa-1.9.0.RELEASE.jar:na] at org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$CreateIfNotFoundQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:206) ~[spring-data-jpa-1.9.0.RELEASE.jar:na] at org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$AbstractQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:73) ~[spring-data-jpa-1.9.0.RELEASE.jar:na] at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.(RepositoryFactorySupport.java:408) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:206) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:251) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:237) ~[spring-data-commons-1.11.0.RELEASE.jar:na] at org.springframework.data.jpa.repository.support.JpaRepositoryFactoryBean.afterPropertiesSet(JpaRepositoryFactoryBean.java:92) ~[spring-data-jpa-1.9.0.RELEASE.jar:na] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1637) ~[spring-beans-4.2.1.RELEASE.jar:4.2.1.RELEASE] at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574) ~[spring-beans-4.2.1.RELEASE.jar:4.2.1.RELEASE] ... 43 common frames omitted

They work when disabling either one.

The project is based on Spring Boot 1.3.0.M5.

This is a sample project reproducing the situation:

https://github.com/izeye/spring-boot-throwaway-branches/tree/data-jpa-and-elasticsearch


Answer:

Repositories in Spring Data are datasource agnostic, meaning that JpaRepository and ElasticsearchRepository both roll up into Repository interface. When this is the case, then auto-configuration of Spring Boot will cause Spring Data JPA to try and configure a bean for each repository in the project that inherits any Spring Data Commons base repository.

To fix this problem you need to move your JPA repository and Elasticsearch repository to separate packages and make sure to annotate your @SpringBootApplication application class with:

  • @EnableJpaRepositories
  • @EnableElasticsearchRepositories

Then you need to specify where the repositories are for each enable annotation. This ends up looking like:

@SpringBootApplication
@EnableJpaRepositories("com.izeye.throwaway.data")
@EnableElasticsearchRepositories("com.izeye.throwaway.indexing")
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

}

Then your application will be able to disambiguate which repositories are intended for which Spring Data project.

Question:

I am playing with spring-boot-sample-data-elastcisearch project. I've changed the pom and added:

SampleElasticsearchApplicationWebXml extends SpringBootServletInitializer 

to run with Tomcat embedded. My application.properties has

spring.data.elasticsearch.http-enabled=true
spring.data.elasticsearch.local=true

I want to be able to connect to localhost:9200 in order to use elasticsearch-head or other JS client. What am I missing? Thanks, Milan


Answer:

According to this ticket, you can now simply add this line to your configuration files:

spring.data.elasticsearch.properties.http.enabled=true

Question:

I am new to Elastic Search and I am trying to implement it using Spring-data-elasticsearch.

I have fields with names such as "Transportation", "Telephone_Number" in our elastic search documents.

When I try to map my @Domain object fields with those, I don't get any data for those as I couldn't successfully map those fields.

Tried to use @Field, was disappointed as it didn't have 'name' property in it to map with custom field name.

Tried different variations of a GETTER function, none of those seem to be mapping to those fields.

I started wondering if there's something I'm missing here. How does a domain object field look like which should map to a filed called something like "Transportation" ?

Any help appreciated


Answer:

You can use custom name. Spring Data ES use Jackson. So, you can use @JsonProperty("your_custom_name") to enable custom name in ES Mapping

for example:

@Document(indexName = "your_index_name", type = "your_type_name")
public class YourEntity {
   ....
   @JsonProperty("my_transportation")
   @Field(type = FieldType.String, searchAnalyzer = "standard", indexAnalyzer = "standard", store = true) // just for example
   private String myTransportation;
   ....
}

Note: I'm sorry anyway, my english is bad.. :D