Rafal Kuc is a team leader and software developer. Right now he is a software architect and Solr and Lucene specialist. Mainly focused on Java, but open on every tool and programming language that will make the achievement of his goal easier and faster. Rafal is also one of the founders of solr.pl site where he tries to share his knowledge and help people with their problems. Rafał is a DZone MVB and is not an employee of DZone and has posted 75 posts at DZone. You can read more from them at their website. View Full User Profile

Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality

07.12.2011
| 6982 views |
  • submit to reddit

Recently, my attention caught the functionality described in the SOLR-2272 Jira ticket – functionality of SQL JOIN implemented in Solr. In today’s entry I take a look at this functionality.

Preparation

The first thing we need to do is download the code from the main branch (trunk) of Lucene/Solr SVN repository and add the the patch attached to the SOLR-2272. After building Solr we can proceed to testing.

Index structure

For testing purposes, I decided to define the following index structure:

<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="name" type="text" indexed="true" stored="true"/>
  <field name="color" type="string" indexed="true" stored="true"/>
  <field name="parent" type="string" indexed="true" stored="true"/>
</fields>

The structure is rather simple. The id field is responsible for storing a unique identifier of the document. The name field holds the name of the document, the color is the color of the product, and the parent field is the identifier for the parent document.

Test data

File with the sample data is as follows:

<add>
 <doc>
  <field name="id">1</field>
  <field name="name">First document</field>
  <field name="color">Red</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="name">Second document</field>
  <field name="color">Yellow</field>
 </doc>
 <doc>
  <field name="id">3</field>
  <field name="name">Third document</field>
  <field name="color">Red</field>
  <field name="parent">1</field>
 </doc>
 <doc>
  <field name="id">4</field>
  <field name="name">Fourth document</field>
  <field name="color">Yellow</field>
  <field name="parent">1</field>
 </doc>
</add>

Query example

For the short tests I conducted I’ve chosen a fairly simple case. I wanted to get documents that are the parents of documents with the value Yellow in the color field. Nothing fancy, at least in the meaning of SQL. To execute this query in the standard Solr we would have to make at least two queries – first to get all documents with the appropriate value in the color field, and then their parent documents. And how it looks when talking about patched Solr 4.0 ? Take a look at the following query:

http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow

Response for the above query looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="q">{!join from=parent to=id}color:Yellow</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>
    <str name="color">Red</str>
    <str name="id">1</str>
    <str name="name">First document</str>
  </doc>
</result>
</response>

As you can see we got the results we wanted. But how does that query work ? As you can see in the example, apart from the standard query for a document with the given value in the field (color:Yellow) I’ve added a strange looking fragment {!join from=parent to=id}. This fragment tells Solr to show not the documents with the given field value, but those, which have the value in the id field the same as the parent field of the document that match the given query.

Verification

Let’s see if the result we got are the right ones. First let’s fetch the documents that have Yellow value in the color field. To do that, we send the following query to Solr:

http://localhost:8983/solr/select?q=color:Yellow

The response was as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">2</int>
  <lst name="params">
    <str name="q">color:Yellow</str>
  </lst>
</lst>
<result name="response" numFound="2" start="0">
  <doc>
    <str name="color">Yellow</str>
    <str name="id">2</str>
    <str name="name">Second document</str>
  </doc>
  <doc>
    <str name="color">Yellow</str>
    <str name="id">4</str>
    <str name="name">Fourth document</str>
    <str name="parent">1</str>
  </doc>
</result>
</response>

So we got one document which have a value in the parent field – this value is 1. Document with id field equals to 1 is the following one:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">4</int>
  <lst name="params">
    <str name="indent">true</str>
    <str name="q">id:1</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>
    <str name="color">Red</str>
    <str name="id">1</str>
    <str name="name">First document</str>
  </doc>
</result>
</response>

As you can see, the JOIN query worked correctly ;)

A few words at the end

Keep in mind that at the moment the functionality is under development and may change. The important this is that with Solr version 4.0 we will be able to enjoy the functionality similar to SQL JOIN.

References
Published at DZone with permission of Rafał Kuć, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: