Checkout
 

mharwood in lucene

LUCENE-6747: FingerprintFilter is a new TokenFilter that outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.
LUCENE-6747: FingerprintFilter is a new TokenFilter that outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens.
LUCENE-329: Fix FuzzyQuery defaults to rank exact matches highest
LUCENE-329: Fix FuzzyQuery defaults to rank exact matches highest
LUCENE-6066: new DiversifiedTopDocsCollector in misc and PriorityQueue.remove method
LUCENE-6066: new DiversifiedTopDocsCollector in misc and PriorityQueue.remove method
New addition: Lucene-4069 BloomFilterPostingsFormat for faster access to low-frequency terms such as primary keys.
  1. … 5 more files in changeset.
Added missing package javadocs
Javadoc fixes
New addition: Lucene-4069 BloomFilterPostingsFormat for faster access to low-frequency terms such as primary keys.
  1. … 3 more files in changeset.
Added self to committers list
Lucene-2306: - Add NumericRangeQuery and NumericRangeFilter support to XMLQueryParser.

Jingkei Ly via Mark Harwood

  1. … 7 more files in changeset.
Initial commit of LUCENE-1486 - a subclass of the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries
Fix for Lucene-1500 - new exception added to Highlighter API to handle TokenStreams with Tokens that exceed given text length
Added new web application demo for contrib's XmlQueryParser.

This change involves:

* Adding Tomcat's Servlet jar into the lib directory and appropriate entry in NOTICE.txt following the lead from Solr's packaging

* Adding new "demo" directory to XmlQueryParser src directory

* Changing XMLQueryParser's build file to create a demo War file

* Changing the main build to include the demo War file (and any other future contrib/*/war files) in the binary distributions

The packaged source distribution has NOT been changed currently to add a lib directory with the servlet.jar so building from a cut-down src distro as opposed to the full subversion /trunk directory will not currently build the war file (the xml query parser build file detects the absence of servlet.jar). Not sure if this is a problem currently.

TODO:

Now that the servlet jar is available in Subversion I would recommend that the other existing WAR file,"luceneweb.war", is changed to move much of the java code which is currently embedded in JSP files into servlet .java files. This would ensure that the build system will check that the code in this application compiles cleanly with the latest Lucene APIs - otherwise any issue will only become apparent when a user tries to run a JSP.

  1. /java/trunk/contrib/xml-query-parser/src/demo
  2. … 7 more files in changeset.
Fix for a potential null-pointer-introducing bug which came about as part of the DocIdSet changes. TermsFilter no longer implemented bits(IndexReader) and the Filter base class' version of this was changed to return null.

When dropping 2.4 Lucene in as a direct replacement for 2.3.2 my client code was getting NullPointer errors - returning null was never part of the Filter.bits contract and so this could be a problem for others using this class.

Fix is for TermsFilter to implement bits(IndexReader) - this can safely be removed in later versions because it is a deprecated method going forward.

Fixed bug in FuzzyLikeThisQuery.java. Queries that contain a term with no fuzzy variants caused the query construction logic to exit loop early, producing no fuzzy variants for all subsequent terms in the query string.

Junit test added which recreates the problem conditions and added fix to FuzzyLikeThisQuery that solves the issue.

Added option to allow UserQuery tag to define a different default fieldName.

Standard use case for this is where users are presented with a GUI form with multiple input boxes, each targetting a different field and allowing "lucene syntax". The XML query template behind such a form would have a <UserQuery> tag for each form field, each defined with the appropriate choice of default field name.

Added Junit test for changing default field name, updated DTD for XML query syntax and regenerated HTML documentation.

Fixed bug parsing boolean attributes. Boolean.getBoolean(s) was being used by mistake which reads a system property with name of s - instead need to parse s as string value of "true" or "false".
Commit of LUCENE-794 patch - adding phrase/span query support to highlighter
Applied trejkaz's patch from https://issues.apache.org/jira/browse/LUCENE-1240 to optimise TermFilter.java and included new JUnit test
Additional thread safety around filter creation - old code could create duplicate CachingWrapperFilter if thread1 gets cache miss and thread 2 has a cache miss before thread1 populates cache with new CachingWrapperFilter.

Synchronization cost around whole method is OK here because Filter object construction should be a lightweight call.

Note: CachingWrapperFilter currently has a similar bug in bits() method but adding "synchronized" around that whole method would not be a solution there because of the cost of evaluating filter.bits and the unnecessary blocking effect this would have on threads using different readers to the thread with the lock.

Added thread-safety around use of core's QueryParser.

Old XML parser constructors use a mode which will synchronize on use of the user-supplied QueryParser.

New constructors offer alternative option of passing "defaultField" String which is used to create a new single-use QueryParser for each parse operation.

Added toString implementation on BooleanFilter.java, provided by Jason Calabrese
Provided DTDs for core and contrib XML query syntax. The "docs" directory contains detailed documentation generated by DTDdoc from the DTDs. The ant script used to generate these docs is also included but not hooked up to the main build process due to license issues with DTDdoc.
  1. /java/trunk/contrib/xml-query-parser/docs
  2. … 12 more files in changeset.
Updated hashcode/equals to test all fields
Added new DuplicateFilter functionality to filter documents sharing a field value (e.g. primary key/url)

Also includes Junit test and XML Query support

Exposed the MoreLikeThis "minDocFreq" property for use in MoreLikeThisQuery.java and in XML queries
Added equals/hashcode implementations to enable caching
Added hashcode and equals implementations to enable caching