|
DLESE Tools v1.2 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.lang.Thread org.dlese.dpc.index.SimpleLuceneIndex
A simple wrapper API for creating, maintaining and using a Lucene index.
In order to provide transparent real-time updating, as well as access across multiple/different VMs, the location specified for SimpleLuceneIndex is actually a hierarchical directory structure. That is, the location specified is a parent directory, which then contains an indexConfig File object that is used to identify the current 'working' versus the current 'searching' indexes, as well as being a location for the file lock mechanisms used to control access. The current working index is the index that gets updated via the add, remove and replace functions, and actually never changes - this is basically the canonical Lucene index. The searching index is then a new index created from the working index after each incremental update.
The basic directory structure:
\some\directory\location\
- indexConfig
- readLock
- writeLock
- indexReader
\writer
- index files
\reader_date-as-long
- index files
Directory /writer is the index where all maintenance, add, delete, etc of dcouments takes place reader_date-as-long is the current searching index - a current reader/searcher is created by copying the writer after it completes an update
The indexReader file is simply a serialized string written as an ObjectStream to disk, and along with readLock and writeLock, provide syncrhonization for distributed users of this specific index.
The indexConfig file is a serialized object written to disk, which identifies the basic static components of a given index - these are the default document field, the analyzer type used when tokenizing text for the index, and a list of StopWords to be used. The StopWords can generally be left to null, in which case Lucene's default StopWord list is used. (See Lucene JavaDocs for info on analyzers.)
Field Summary | |
---|---|
static boolean |
BLOCK
Indicates update operations will be blocked until the current one returns. |
static int |
DEFAULT_AND
Use to set the boolean search operator to AND. |
static int |
DEFAULT_OR
Use to set the boolean search operator to OR. |
static boolean |
NO_BLOCK
Indicates update operations will be allowed while others are still in progress. |
Fields inherited from class java.lang.Thread |
---|
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY |
Fields inherited from interface org.dlese.dpc.index.LuceneIndex |
---|
SIMPLE_ANALYSIS, STANDARD_ANALYSIS, STOP_ANALYSIS, WHITESPACE_ANALYSIS |
Constructor Summary | |
---|---|
SimpleLuceneIndex()
Non-initializing constructor for the SimpleLuceneIndex. |
|
SimpleLuceneIndex(String indexDir)
Initializes or creates an index at the given location using a default search field named "default" and a standard analyzer for indexing. |
|
SimpleLuceneIndex(String indexDir,
String defaultField,
String[] stopWords,
int analyzerType)
Initializes or creates an index at the given location using the default search field, additional stop words and analyzer indicated. |
Method Summary | |
---|---|
boolean |
addDoc(Document doc)
Adds a Document to the index. |
boolean |
addDoc(Document doc,
boolean block)
Adds a Document to the index. |
boolean |
addDocs(Document[] docs)
Adds a group of Documents to the index. |
boolean |
addDocs(Document[] docs,
boolean block)
Adds a group of Documents to the index. |
void |
create(String location,
String defaultDocField,
int analysisType)
Creates a new index if one does not exist. |
void |
create(String location,
String defaultDocField,
String[] stopWords,
int analysisType)
Creates a new index if one does not exist. |
void |
delete()
Deletes the index and re-initializes a new, empty one in its place. |
static String |
encodeToTerm(String s)
Encodes a String to an appropriate format that can be indexed as a single term using a StandardAnalyzer. |
Object |
getAttribute(String key)
Gets an attribute Object from this SimpleLuceneIndex. |
static String |
getDateStamp()
Gets a datestamp of the current time formatted for display with logs and output. |
Document |
getDocument(int n)
Gets the nth document in the index. |
String |
getIndexLocation()
Gets the ablsolute path to the directory where the index resides. |
long |
getLastModifiedTime()
Gets the timestamp of the last time the index was modified by adding, dleteing or changing a document. |
int |
getOperator()
Gets the boolean operator that is currently being used for searches. |
String |
getOperatorString()
Gets the boolean operator that is currently being used for searches as a String (AND or OR). |
IndexReader |
getReader()
Gets the reader attribute of the SimpleLuceneIndex object |
int |
getTermFrequency(String term)
Gets the termFrequency across all fields in the index |
int |
getTermFrequency(String field,
String term)
Gets the termFrequency of terms in the given field. |
Map |
getTermLists()
Gets a Map of Lists that contain the terms for each field in the index. |
List |
getTerms(String field)
Gets a list of all terms that are in the index under the given field name. |
boolean |
isUpdating()
Indicates whether the index is currently being updated. |
List |
listDocs()
Gets a list of all Document s in the index. |
List |
listDocs(String field,
String term)
Gets a list of all Document s in the index that
match the given term in the given field. |
List |
listDocs(String field,
String[] terms)
Gets a list of all Document s in the index that
match the given terms in the given field. |
List |
listFields()
Gets a list of all fields in the index. |
List |
listStopWords()
Gets a list of all stop words for this index. |
List |
listTerms()
Gets a list of all terms in the index. |
void |
loadToMemory()
Places the index into RAM memory for possible improved performance. |
int |
numDocs()
Gets the total number of documents in the index. |
int |
numDocs(String query)
Gets the number of documents that match the given query. |
boolean |
removeDocs(String field,
String value)
Removes all Documents that match the given term within the given field. |
boolean |
removeDocs(String field,
String[] values)
Removes all documents that match the given terms within the given field. |
boolean |
removeDocs(String field,
String[] values,
boolean block)
See removeDocs(String,String[]) for description. |
boolean |
removeDocs(String field,
String value,
boolean block)
See removeDocs(String,String) for description. |
void |
run()
Main processing method for the SimpleLuceneIndex object |
ResultDoc[] |
searchDocs(String query)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results. |
ResultDoc[] |
searchDocs(String query,
String defaultField)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results. |
ResultDoc[] |
searchDocs(String query,
String defaultField,
Collector collector)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results. |
ResultDoc[] |
searchDocs(String query,
String defaultField,
Filter filter)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results. |
ResultDoc[] |
searchDocs(String query,
String defaultField,
Filter filter,
Collector collector)
Performs a search over the index using the qiven query String, returning an ordered array of matching ranked results. |
void |
setAttribute(String key,
Object attribute)
Sets an attribute Object that will be available for access in results. |
static void |
setDebug(boolean db)
Sets the debug attribute of the SimpleLuceneIndex object |
void |
setOperator(int operator)
Sets the boolean operator used during searches. |
boolean |
tryConfig(int AnalysisType,
String defaultDocField,
String[] stopWords,
String indexReader)
Checks for the existence of indexConfig and IndexReader and for validity against the params already set, if any. |
boolean |
update(String deleteField,
ArrayList deleteValues,
ArrayList addDocs)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the
documents in addDocs . |
boolean |
update(String deleteField,
ArrayList deleteValues,
ArrayList addDocs,
boolean block)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the
documents in addDocs . |
boolean |
update(String deleteField,
String[] deleteValues,
Document[] addDocs)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the
documents in addDocs . |
boolean |
update(String deleteField,
String[] deleteValues,
Document[] addDocs,
boolean block)
Updates the index by first deleting the documents that match the value(s) indicated in deleteValues in the field deleteField, then adding the
documents in addDocs . |
boolean |
update(String deleteField,
String deleteValue,
ArrayList addDocs,
boolean block)
See update(String, String[], Document[], boolean) for description. |
boolean |
update(String deleteField,
String deleteValue,
Document[] addDocs,
boolean block)
See update(String, String[], Document[], boolean) for description. |
boolean |
update(String deleteField,
String deleteValue,
Document addDoc,
boolean block)
See update(String, String[], Document[], boolean) for description. |
void |
use(String location)
Initializes an existing SimpleLuceneIndex by verifing the index and index config, then seting the indexReader and other stuff. |
Methods inherited from class java.lang.Thread |
---|
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getContextClassLoader, getName, getPriority, getThreadGroup, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setName, setPriority, sleep, sleep, start, stop, stop, suspend, toString, yield |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final boolean BLOCK
public static final boolean NO_BLOCK
public static final int DEFAULT_OR
setOperator(int operator)
,
getOperator()
,
Constant Field Valuespublic static final int DEFAULT_AND
setOperator(int operator)
,
getOperator()
,
Constant Field ValuesConstructor Detail |
public SimpleLuceneIndex()
use(String)
, create(String,String,int)
or
create(String,String,String[],int)
.
use(String)
,
create(String,String,int)
,
create(String,String,String[],int)
public SimpleLuceneIndex(String indexDir)
indexDir
- The directory where the index is located or will be created.public SimpleLuceneIndex(String indexDir, String defaultField, String[] stopWords, int analyzerType)
indexDir
- The directory where the index is located or will be created.defaultField
- The name of the field used for default searching, for example
"default".stopWords
- Additional stop words to use during indexing, or null.analyzerType
- The analyzer type used when indexing.Method Detail |
public void delete()
public void setAttribute(String key, Object attribute)
key
- The key used to reference the attribute.attribute
- Any Java Object.ResultDoc
,
DocReader
public Object getAttribute(String key)
key
- The key used to reference the attribute.
ResultDoc
,
DocReader
public boolean tryConfig(int AnalysisType, String defaultDocField, String[] stopWords, String indexReader)
The strategy for using a pre-existing index COPIED to this location is to first use this method to create the appropriate indexConfig, and then call useIndex(location) to activate.
NOTE: using this method does not guarantee the params passed in match those used when the original index was created. A mismatch could result in a non-functioning index or unexpected query results.
tryConfig
in interface LuceneIndex
AnalysisType
- Type of analysis.defaultDocField
- The field name used for default search (if no field is
given).stopWords
- Stop words used for this index.indexReader
- The index reader
public void use(String location) throws InvalidIndexException, IndexInitializationException
Notes:
Since writer and reader never operate on the same index, writer can be kept open for
the entire life-cycle of this SimpleLuceneIndex instance. The analyzer and parser
also exist for the entire life-cycle. The reader and searcher, however, must be
reinstantiated with each update of the index.
use
in interface LuceneIndex
location
- The location of the index?
InvalidIndexException
- If problem.
IndexInitializationException
- If problem.public String getIndexLocation()
public void loadToMemory()
public void create(String location, String defaultDocField, int analysisType) throws IndexInitializationException
create
in interface LuceneIndex
location
- The location of the index.defaultDocField
- The default field that is searched if none
indicated at search time.analysisType
- The analyzer type.
IndexInitializationException
- If the index already exists or one can not
be created or other problem.public void create(String location, String defaultDocField, String[] stopWords, int analysisType) throws IndexInitializationException
create
in interface LuceneIndex
location
- The location of the index.defaultDocField
- The default field that is searched if none
indicated at search time.stopWords
- The additional stop words used for this
index, or null.analysisType
- The analyzer type.
IndexInitializationException
- If the index already exists or one can not
be created or other problem.public ResultDoc[] searchDocs(String query)
searchDocs
in interface LuceneIndex
query
- The query to perform over the index.
setOperator(int operator)
,
getOperator()
public ResultDoc[] searchDocs(String query, String defaultField)
searchDocs
in interface LuceneIndex
query
- The query to perform over the index.defaultField
- The default field to search in.
setOperator(int operator)
,
getOperator()
public ResultDoc[] searchDocs(String query, String defaultField, Filter filter)
searchDocs
in interface LuceneIndex
query
- The query to perform over the index.filter
- A filter used for the search.defaultField
- The default field to search in, or null to use the pre-defined
default field.
setOperator(int operator)
,
getOperator()
public ResultDoc[] searchDocs(String query, String defaultField, Collector collector)
searchDocs
in interface LuceneIndex
query
- The query to perform over the index.collector
- A custom collector used for the search.defaultField
- The default field to search in, or null to use the pre-defined
default field.
setOperator(int operator)
,
getOperator()
public ResultDoc[] searchDocs(String query, String defaultField, Filter filter, Collector collector)
searchDocs
in interface LuceneIndex
query
- The query to perform over the index.filter
- A filter used for the search.collector
- A custom collector used for the search.defaultField
- The default field to search in, or null to use the pre-defined
default field.
setOperator(int operator)
,
getOperator()
public void setOperator(int operator)
operator
- The new boolean operator value.DEFAULT_OR
,
DEFAULT_AND
public int getOperator()
DEFAULT_OR
,
DEFAULT_AND
public String getOperatorString()
DEFAULT_OR
,
DEFAULT_AND
public IndexReader getReader()
public int numDocs(String query)
numDocs
in interface LuceneIndex
query
- The query to perform over the index.
public int numDocs()
numDocs
in interface LuceneIndex
public List listDocs()
Document
s in the index.
listDocs
in interface LuceneIndex
public List listDocs(String field, String term)
Document
s in the index that
match the given term in the given field.
listDocs
in interface LuceneIndex
field
- The field searched.term
- The term to match.
public List listDocs(String field, String[] terms)
Document
s in the index that
match the given terms in the given field.
field
- The field searched.terms
- The terms to match.
public List listTerms()
listTerms
in interface LuceneIndex
public List listFields()
listFields
in interface LuceneIndex
public Map getTermLists()
getTermLists
in interface LuceneIndex
public List getTerms(String field)
getLastModifiedTime()
to
determe when to update the cache.
field
- The indexed field name.
public List listStopWords()
listStopWords
in interface LuceneIndex
public int getTermFrequency(String term)
getTermFrequency
in interface LuceneIndex
term
- The term.
public int getTermFrequency(String field, String term)
getTermFrequency
in interface LuceneIndex
field
- The field.term
- The term.
public boolean addDoc(Document doc)
addDoc
in interface LuceneIndex
doc
- The Document to add.
public boolean addDoc(Document doc, boolean block)
doc
- The Document to add.block
- Indicates whether to block other updates until complete.
public boolean addDocs(Document[] docs)
addDocs
in interface LuceneIndex
docs
- The Documents to add.
public boolean addDocs(Document[] docs, boolean block)
docs
- The Documents to add.block
- Indicates whether to block other updates until complete.
public boolean removeDocs(String field, String value)
removeDocs
in interface LuceneIndex
field
- The field that is searched.value
- The term that is matched for deletes.
public boolean removeDocs(String field, String value, boolean block)
removeDocs(String,String)
for description. Adds the ability to control
whether blocking occurs during the update.
field
- The field that is searched.value
- The term that is matched for deletes.block
- Indicates whether or not to block other update operations.
public boolean removeDocs(String field, String[] values)
removeDocs
in interface LuceneIndex
field
- The field that is searched.values
- The terms that are matched for deletes.
public boolean removeDocs(String field, String[] values, boolean block)
removeDocs(String,String[])
for description. Adds the ability to control
whether blocking occurs during the update.
field
- The field that is searched.values
- The terms that are matched for deletes.block
- Indicates whether or not to block other update operations.
public boolean update(String deleteField, String[] deleteValues, Document[] addDocs, boolean block)
deleteValues
in the field deleteField,
then adding the
documents in addDocs
. Assuming the deleteField
contains a
unique ID for the Document, the Document may be removed by indicating the ID in the
deleteValues
list. To replace an entry in the index for a single item,
supply the item's ID in the deleteValues
list and supply the new
Document for the item in the addDocs
list.
deleteField
- The field searched for deleteValues
.deleteValues
- The value matched in deleteField
to indicate which
document(s) to delete.addDocs
- An array of Documents to add to the indexblock
- Indicates whether or not to block other threads or JVMs from
read/write from the index during the delete/add operation.
public boolean update(String deleteField, String[] deleteValues, Document[] addDocs)
deleteValues
in the field deleteField,
then adding the
documents in addDocs
. See update(String, String[], Document[],
boolean)
for description. Performs an update with blocking on.
update
in interface LuceneIndex
deleteField
- The field searched for deleteValues
.deleteValues
- Array of Strings containing the value matched in deleteField
to indicate which document(s) to deleteaddDocs
- Array containing Documents to add to the index
public boolean update(String deleteField, String deleteValue, Document[] addDocs, boolean block)
update(String, String[], Document[], boolean)
for description.
deleteField
- The field searched for deleteValue
.deleteValue
- Matching docs are deleted.addDocs
- These Docs are added to the indexblock
- Block or run in background.
public boolean update(String deleteField, String deleteValue, Document addDoc, boolean block)
update(String, String[], Document[], boolean)
for description.
deleteField
- The field searched for deleteValue
.deleteValue
- Matching docs are deleted.addDoc
- The Doc to be added to the indexblock
- Block or run in background.
public boolean update(String deleteField, String deleteValue, ArrayList addDocs, boolean block)
update(String, String[], Document[], boolean)
for description.
deleteField
- The field searched for deleteValue
.deleteValue
- Matching docs are deleted.addDocs
- These Docs are added to the indexblock
- Block or run in background.
public boolean update(String deleteField, ArrayList deleteValues, ArrayList addDocs, boolean block)
deleteValues
in the field deleteField,
then adding the
documents in addDocs
. See update(String, String[], Document[],
boolean)
for description.
deleteField
- The field searched for deleteValues
.deleteValues
- ArrayList of Strings containing the value matched in deleteField
to indicate which document(s) to deleteaddDocs
- An ArrayList containing Documents to add to the indexblock
- Indicates whether or not to block other threads or JVMs from
read/write from the index during the delete/add operation.
public boolean update(String deleteField, ArrayList deleteValues, ArrayList addDocs)
deleteValues
in the field deleteField,
then adding the
documents in addDocs
. See update(String, String[], Document[],
boolean)
for description. Performs an update with blocking on.
deleteField
- The field searched for deleteValues
.deleteValues
- ArrayList of Strings containing the value matched in deleteField
to indicate which document(s) to deleteaddDocs
- An ArrayList containing Documents to add to the index
public void run()
run
in interface Runnable
public long getLastModifiedTime()
public Document getDocument(int n)
n
- The document number
public boolean isUpdating()
public static String encodeToTerm(String s)
s
- The string to encode.
public static final String getDateStamp()
public static void setDebug(boolean db)
db
- The new debug value
|
DLESE Tools v1.2 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |