DLESE Tools
v1.2

org.dlese.dpc.index
Class FileIndexingService

java.lang.Object
  extended byorg.dlese.dpc.index.FileIndexingService

public final class FileIndexingService
extends Object

Indexes files into a SimpleLuceneIndex and automatically updates the index whenever changes to the files are made. This class uses a FileIndexingServiceWriter to create the Lucene Documents that are placed in the SimpleLuceneIndex. This class looks for changes made to items in a directory of files and updates the index automatically by adding, updating or deleting items as appropriate. The frequency for update checkes is configurable. There should be only one instance of this class for each SimpleLuceneIndex that is being populated with this class.

Author:
John Weatherley

Constructor Summary
FileIndexingService(SimpleLuceneIndex index, long updateFrequency, boolean saveDeletes, String idFieldToRemove, String fileIndexingServiceDataDir)
          Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency.
FileIndexingService(SimpleLuceneIndex index, long updateFrequency, String fileIndexingServiceDataDir)
          Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency.
 
Method Summary
 boolean addDirectory(File srcDir, FileIndexingServiceWriter documentWriter)
          Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.
 boolean addDirectory(String sourceFileDirectory, FileIndexingServiceWriter documentWriter)
          Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.
 void changeUpdateFrequency(long updateFrequency)
          Changes the frequency of reindexing to the new value.
 Object getAttribute(String key)
          Gets an attribute Object from this FileIndexingService.
static String getDateStamp()
          Return a string for the current time and date, sutiable for display in log files and output to standout:
 ArrayList getIndexingMessages()
          Gets the last 10 indexing status messages.
 long getLastSyncTime()
          Gets the lastSyncTime attribute of the FileIndexingService object
 int getNumRecordsToAdd()
          Gets the numRecordsToAdd attribute of the FileIndexingService object
 int getNumRecordsToDelete()
          Gets the numRecordsToDelete attribute of the FileIndexingService object
 int getNumRecordsToReplace()
          Gets the numRecordsToReplace attribute of the FileIndexingService object
static String getSimpleDateStamp()
          Return a string for the current time and date, sutiable for display in log files and output to standout:
 long getUpdateFrequency()
          Gets the updateFrequency attribute of the FileIndexingService object
 boolean isDirectoryConfigured(File srcDir)
          Determines whether the given directory is configured for indexing.
 void reindexDocs(Document[] docs, boolean reindexAll)
          Reindexes the given Documents.
 void reindexDocs(ResultDoc[] docs, boolean reindexAll)
          Reindexes the Documents in the given ResultDocs.
 int reindexDocs(String query, boolean reindexAll)
          Reindexes Documents managed by this FileIndexingService that match the given Lucene query.
 int reindexDocs(String field, String[] terms, boolean reindexAll)
          Re-indexes all documents that match the given terms within the given field.
 int reindexDocs(String field, String term, boolean reindexAll)
          Re-indexes all documents that match the given term within the given field.
 boolean removeDirectory(File srcDir)
          Removes the files in the given directory from the index.
 boolean removeDirectory(String sourceFileDirectory)
          Removes the files in the given directory from the index.
 void removeDocs(String field, String[] terms, FileIndexingServiceWriter docWriter)
          Removes all documents that match the given terms within the given field.
 void removeDocs(String field, String[] terms, FileIndexingServiceWriter docWriter, boolean saveDeletes)
          Removes all documents that match the given terms within the given field.
 void removeDocs(String field, String term, FileIndexingServiceWriter docWriter)
          Removes all documents that match the given term within the given field.
 void removeDocs(String field, String term, FileIndexingServiceWriter docWriter, boolean saveDeletedRecords)
          Removes all documents that match the given term within the given field.
 void setAttribute(String key, Object attribute)
          Sets an attribute Object that will be available for access in here and from the FileIndexingServiceWriters.
static void setDebug(boolean db)
          Sets the debug attribute object
 void setValidationEnabled(boolean validateFiles)
          Sets whether or not to validate the files being indexed and create a validation report, which is indexed.
 void startTester(String docRoot, String sourceFileDirectory)
          Starts a FileMoveTester iff one is not already initialized.
 void startTimerThread(long updateFrequency)
          Start or restarts the timer thread with the given update frequency.
 void stopTester()
          Stops the FileMoveTester
 void stopTimerThread()
          Stops the indexing timer thread.
 void synchIndexWithFiles(boolean background, boolean reindexAll)
          Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FileIndexingService

public FileIndexingService(SimpleLuceneIndex index,
                           long updateFrequency,
                           String fileIndexingServiceDataDir)
Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.

Parameters:
index - The SimpleLuceneIndex that will be populated and updated with Documents created from files
updateFrequency - The frequency by which files are checked for updates. A negative number indicates no updates should be performed.
See Also:
setValidationEnabled(boolean validateFiles)

FileIndexingService

public FileIndexingService(SimpleLuceneIndex index,
                           long updateFrequency,
                           boolean saveDeletes,
                           String idFieldToRemove,
                           String fileIndexingServiceDataDir)
Indexes files to the given SimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.

Parameters:
index - The SimpleLuceneIndex that will be populated and updated with Documents created from files
updateFrequency - The frequency by which files are checked for updates. A negative number indicates no updates should be performed.
saveDeletes - True to save removed documents in the index and mark them deleted, else they will be removed from the index.
idFieldToRemove - An ID field whoes docs should be removed if found in duplicate.
See Also:
setValidationEnabled(boolean validateFiles)
Method Detail

setAttribute

public void setAttribute(String key,
                         Object attribute)
Sets an attribute Object that will be available for access in here and from the FileIndexingServiceWriters.

Parameters:
key - The key used to reference the attribute.
attribute - Any Java Object.
See Also:
FileIndexingServiceWriter

getAttribute

public Object getAttribute(String key)
Gets an attribute Object from this FileIndexingService.

Parameters:
key - The key used to reference the attribute.
Returns:
The Java Object that is stored under the given key or null if none exists.
See Also:
FileIndexingServiceWriter

changeUpdateFrequency

public void changeUpdateFrequency(long updateFrequency)
Changes the frequency of reindexing to the new value. Same as startTimerThread(long updateFrequency).

Parameters:
updateFrequency - The frequency by which files are checked for changes and reindexed.

startTimerThread

public void startTimerThread(long updateFrequency)
Start or restarts the timer thread with the given update frequency. Same as changeUpdateFrequency(long updateFrequency).

Parameters:
updateFrequency - The number of seconds between index updates.

stopTimerThread

public void stopTimerThread()
Stops the indexing timer thread.


setValidationEnabled

public void setValidationEnabled(boolean validateFiles)
Sets whether or not to validate the files being indexed and create a validation report, which is indexed. If set to true, the files will be validated, otherwise they will not. Default is true.

Parameters:
validateFiles - True to validate, else false.
See Also:
FileIndexingServiceWriter.getValidationReport()

addDirectory

public boolean addDirectory(String sourceFileDirectory,
                            FileIndexingServiceWriter documentWriter)
Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.

Parameters:
sourceFileDirectory - The file direcory that will be monitored for updates.
documentWriter - The FileIndexingServiceWriter that is used to create new Lucene Document entries for the files in the given directory.
Returns:
True if the directory was added successfully.

addDirectory

public boolean addDirectory(File srcDir,
                            FileIndexingServiceWriter documentWriter)
Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.

Parameters:
srcDir - The file direcory that will be monitored for updates.
documentWriter - The FileIndexingServiceWriter that is used to create new Lucene Document entries for the files in the given directory.
Returns:
True if the directory was added successfully.

isDirectoryConfigured

public boolean isDirectoryConfigured(File srcDir)
Determines whether the given directory is configured for indexing.

Parameters:
srcDir - A directory of indexable files.
Returns:
True if this directory is already configured for indexing, false otherwise.

removeDirectory

public boolean removeDirectory(String sourceFileDirectory)
Removes the files in the given directory from the index. Assumes the directory was previously added to the index using the addDirectory(File,FileIndexingServiceWriter) method.

Parameters:
sourceFileDirectory - The directory of files needing to be removed from the index.
Returns:
True if the directory of files exsited in the index and was removed.

removeDirectory

public boolean removeDirectory(File srcDir)
Removes the files in the given directory from the index. Assumes the directory was previously added to the index using the addDirectory(File,FileIndexingServiceWriter) method.

Parameters:
srcDir - The directory of files needing to be removed from the index.
Returns:
True if the directory of files exsited in the index and was removed.

getUpdateFrequency

public long getUpdateFrequency()
Gets the updateFrequency attribute of the FileIndexingService object

Returns:
The updateFrequency value

getLastSyncTime

public long getLastSyncTime()
Gets the lastSyncTime attribute of the FileIndexingService object

Returns:
The lastSyncTime value

getNumRecordsToDelete

public int getNumRecordsToDelete()
Gets the numRecordsToDelete attribute of the FileIndexingService object

Returns:
The numRecordsToDelete value

getNumRecordsToAdd

public int getNumRecordsToAdd()
Gets the numRecordsToAdd attribute of the FileIndexingService object

Returns:
The numRecordsToAdd value

getNumRecordsToReplace

public int getNumRecordsToReplace()
Gets the numRecordsToReplace attribute of the FileIndexingService object

Returns:
The numRecordsToReplace value

synchIndexWithFiles

public void synchIndexWithFiles(boolean background,
                                boolean reindexAll)
Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background. Any new, deleted or modified files that appear in the directories will be reflected in the index. This process may take several minutes to return depending on the number of files needing to be updated.

Parameters:
background - True to run this process as a background thread, else wait until the update is done before returning.
reindexAll - DESCRIPTION

removeDocs

public final void removeDocs(String field,
                             String term,
                             FileIndexingServiceWriter docWriter)
Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same as SimpleLuceneIndex.removeDocs(String,String) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
term - The term that is matched for removal.
docWriter - DESCRIPTION

removeDocs

public final void removeDocs(String field,
                             String[] terms,
                             FileIndexingServiceWriter docWriter)
Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same as SimpleLuceneIndex.removeDocs(String,String[]) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
terms - The terms that are matched for removal.
docWriter - DESCRIPTION

removeDocs

public final void removeDocs(String field,
                             String term,
                             FileIndexingServiceWriter docWriter,
                             boolean saveDeletedRecords)
Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same as SimpleLuceneIndex.removeDocs(String,String) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
term - The term that is matched for removal.
saveDeletedRecords - True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
docWriter - DESCRIPTION

removeDocs

public final void removeDocs(String field,
                             String[] terms,
                             FileIndexingServiceWriter docWriter,
                             boolean saveDeletes)
Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same as SimpleLuceneIndex.removeDocs(String,String[]) but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.

Parameters:
field - The field that is searched.
terms - The terms that are matched for removal.
saveDeletes - True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
docWriter - Writer used to perform the delete.

reindexDocs

public int reindexDocs(String field,
                       String term,
                       boolean reindexAll)
Re-indexes all documents that match the given term within the given field. Requires that the file for the given document is still in it's original location. If the file is not in it's original location then the index will remove the document without updating and it will not be marked as deleted. This is useful for updating a single document that is indexed with a unique ID field, or to update a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs updating along with the ID field that it is indexed under, or the file path corresponding to a record that needs updating along with the field "docsource."

Parameters:
field - The field that is searched.
term - The term that is matched for updates.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public int reindexDocs(String field,
                       String[] terms,
                       boolean reindexAll)
Re-indexes all documents that match the given terms within the given field. This is useful for updating multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be updated along with the ID field that it is indexed under, or an array of file paths corresponding to records that need updating along with the field "docsource."

Parameters:
field - The field that is searched.
terms - The terms that are matched for updates.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public int reindexDocs(String query,
                       boolean reindexAll)
Reindexes Documents managed by this FileIndexingService that match the given Lucene query.

Parameters:
query - A Lucene search query.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
Returns:
The number of matching documents to be updated.

reindexDocs

public void reindexDocs(Document[] docs,
                        boolean reindexAll)
Reindexes the given Documents.

Parameters:
docs - Lucene Documents from the same index that is managed by this FileIndexingService.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.

reindexDocs

public void reindexDocs(ResultDoc[] docs,
                        boolean reindexAll)
Reindexes the Documents in the given ResultDocs.

Parameters:
docs - Lucene ResultDocs from the same index that is managed by this FileIndexingService.
reindexAll - True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.

getIndexingMessages

public ArrayList getIndexingMessages()
Gets the last 10 indexing status messages.

Returns:
The indexingMessages.

startTester

public void startTester(String docRoot,
                        String sourceFileDirectory)
Starts a FileMoveTester iff one is not already initialized. The FileMoveTester simulate moving files in and out of the sourceFile directory, for testing purposes only. Warning: FileMoveTester moves metadatafiles. Only use with test records!)

Parameters:
docRoot - The context document root as obtainied by calling getServletContext().getRealPath("/");
sourceFileDirectory - DESCRIPTION

stopTester

public void stopTester()
Stops the FileMoveTester


getSimpleDateStamp

public static String getSimpleDateStamp()
Return a string for the current time and date, sutiable for display in log files and output to standout:

Returns:
The dateStamp value

getDateStamp

public static String getDateStamp()
Return a string for the current time and date, sutiable for display in log files and output to standout:

Returns:
The dateStamp value

setDebug

public static void setDebug(boolean db)
Sets the debug attribute object

Parameters:
db - The new debug value

DLESE Tools
v1.2