Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Solr Data Import Handler & XML – nested entities

10.07.2011
| 5478 views |
  • submit to reddit

Data Import Handler is a very nice and powerful tool. The following entry is a description of the problem (and solutions) which I met recently.

Description of the problem

I had to index some list of products, it doesn’t matter what kind of products. However, the products can be combined into groups. In addition, every successive element in the group may have some data omitted – actually the data that were present in the previous documents that appeared in the group. Here is the example structure (irrelevant information was omitted for readability):

<products>
  <product>
    <id>1</id>
    <name>Product 1</name>
  </product>
  <product>
    <id>2</id>
    <name>Product 2</name>
  </product>
  <group>
    <product>
      <id>3</id>
      <name>Product 3 and 4</name>
    </product>
    <product>
      <id>4</id>
    </product>
  </group>
</products>

Solution

The solution is as always a definition of the “entity” element which looked as follows:

<entity processor="XPathEntityProcessor"
    forEach="/products/product | /products/group/product">
  <field column="id" xpath="//id" />
  <field column="name" xpath="//name" commonField="true" />
</entity>

Explanation

With this “forEach” design the processing will take place both for products that do not belong to the group, as well as those in groups. An important attribute if the “commonField” one. It informs DIH that if the record doesn’t have a field defined DIH should fetch the field from the previous record.

The above solution has some limitations, such as the first item in the group should have defined the field “name” and it is important to order the products, but in my case, those limitations corresponded exactly with the specifications of the provided import file.

 

References
Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: