Apache Tika

  • submit to reddit

Simple Photo Search with Solr and Tika

Recently we had a change to help with a non-commercial project which included search as its part. One of the assumptions, although not the key ones, was...

1 replies - 4973 views - 02/21/12 by Rafał Kuć in Articles

Indexing with SolrJ

Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler. These can be used to index...

0 replies - 6764 views - 02/15/12 by Erick Erickson in Articles

Document language identification

One of the features of the latest Solr version (3.5) is the ability to identify the language of the document during its indexation. In today's entry we...

0 replies - 4475 views - 02/06/12 by Rafał Kuć in Articles

Apache Tika 1.0 Solidifies Position in Content and Metadata Detection and Analysis

The 1.0 release of Apache Tika, a collection of Java libraries for the detection and extraction of structured text and metadata, has been 5 years in the making...

0 replies - 5634 views - 11/09/11 by Mitch Pronschinske in News

Accuracy and performance of Google's Compact Language Detector

To get a sense of the accuracy and performance of Google's Compact Language Detector, I ran some tests against two other packages: Apache Tika,...

1 replies - 5601 views - 10/26/11 by Michael Mccandless in Articles

Inside the New Apache Solr

In 2006, Solr was donated to the Apache Foundation and integrated into the Lucene project.  Apache Solr is an enterprise search platform that powers the...

2 replies - 11220 views - 11/16/09 by Mitch Pronschinske in Articles