DLESE Tools
v1.2

org.dlese.dpc.index.writer
Class FileIndexingServiceWriter

java.lang.Object
  extended byorg.dlese.dpc.index.writer.FileIndexingServiceWriter
All Implemented Interfaces:
DocWriter
Direct Known Subclasses:
ErrorFileIndexingWriter, XMLFileIndexingWriter

public abstract class FileIndexingServiceWriter
extends Object
implements DocWriter

Abstract class for creating customized Lucene Documents for different file formats such as DLESE-IMS, ADN-item, ADN-collection, etc. Concrete sub-classes may be used with a FileIndexingService to enable automatic updating of the index whenever changes in the source file are made. This class, along with the FileIndexingService, may be used with a SimpleLuceneIndex to provide simple search support over files.

Note: after creating a new concrete FileIndexingServiceWriter, add a switch in RepositoryManager, method putDirInIndex(DirInfo, String) to select it for indexing.


The Lucene fields that are created by this class are:

Author:
John Weatherley

Constructor Summary
FileIndexingServiceWriter()
           
 
Method Summary
protected  void abortIndexing()
          Aborts the indexing process by returning a null index document.
protected abstract  void addCustomFields(Document newDoc, Document existingDoc, File sourceFile)
          Adds additional custom fields that are unique the document format being indexed.
protected  void addToAdminDefaultField(String value)
          Adds the given String to a text field referenced in the index by the field name 'admindefault'.
protected  void addToDefaultField(String value)
          Adds the given String to a text field referenced in the index by the field name 'default'.
 Document create(File source, File dir, Document existingDoc)
          Creates the Lucene Document for the given resource or returns null if unable to create.
protected abstract  void destroy()
          This method is called at the conclusion of processing and may be used for tear-down.
 Document getDeletedDoc(Document existingDoc)
          Creates a Lucene Document from an existing FileIndexingService Document by setting the field "deleted" to "true".
abstract  String getDocType()
          Returns a unique document type key for this kind of record, corresponding to the format type.
 Document getExistingDoc()
           
 FileIndexingService getFileIndexingService()
          Gets the fileIndexingService attribute of the FileIndexingServiceWriter object
abstract  String getReaderClass()
          Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".
 File getSourceDir()
          Gets the sourceDir that holds the file being indexed.
 File getSourceFile()
          Gets the sourceFile that is being indexed.
protected  String getValidationReport()
          Gets a report detailing any errors found in the validation of the file, or null if no error was found.
abstract  void init(File source, Document existingDoc)
          This method is called prior to processing and may be used to for any necessary set-up.
 boolean isValidationEnabled()
          Returns true if the files being indexed should be validated, otherwise false.
protected  void prtln(String s)
          Output a line of text to standard out, with datestamp, if debug is set to true.
protected  void prtlnErr(String s)
          Output a line of text to error out, with datestamp.
static void setDebug(boolean db)
          Sets the debug attribute of the FileIndexingServiceWriter object
 void setDefaultFieldName(String newDefaultFieldName)
          Sets the field name used to index the default content that was added using the addToDefaultField(String) method.
 void setFileIndexingService(FileIndexingService fileIndexingService)
          Sets the fileIndexingService attribute of the FileIndexingServiceWriter object
 void setValidationEnabled(boolean validateFiles)
          Sets whether or not to validate the files being indexed and create a validation report, which is indexed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FileIndexingServiceWriter

public FileIndexingServiceWriter()
Method Detail

getDocType

public abstract String getDocType()
                           throws Exception
Returns a unique document type key for this kind of record, corresponding to the format type. For example "adn," "dlese_ims," or "dlese_anno". The string is parsed using the Lucene StandardAnalyzer so it must be lowercase and should not contain any stop words.

Specified by:
getDocType in interface DocWriter
Returns:
The docType String
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getReaderClass

public abstract String getReaderClass()
Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".

Specified by:
getReaderClass in interface DocWriter
Returns:
The name of the DocReader.

init

public abstract void init(File source,
                          Document existingDoc)
                   throws Exception
This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.

Parameters:
source - The source file being indexed
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Throws:
Exception - If an error occured during set-up.

destroy

protected abstract void destroy()
This method is called at the conclusion of processing and may be used for tear-down.


addCustomFields

protected abstract void addCustomFields(Document newDoc,
                                        Document existingDoc,
                                        File sourceFile)
                                 throws Exception
Adds additional custom fields that are unique the document format being indexed. When implementing this method, use the add method of the Document class to add a Field.

The following Lucene Field types are available for indexing with the Document:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) -- allows control to do anything you want

Example code:
protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
  String customContent = "Some content";
  newDoc.add(Field.Text("mycustomefield", customContent));
}

Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The sourceFile that is being indexed
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

getSourceFile

public File getSourceFile()
Gets the sourceFile that is being indexed. Only available after create() has been called.

Returns:
The sourceFile value

getSourceDir

public File getSourceDir()
Gets the sourceDir that holds the file being indexed. Only available after create() has been called.

Returns:
The sourceDir value

getExistingDoc

public Document getExistingDoc()
Returns:
The existingDoc value

setFileIndexingService

public void setFileIndexingService(FileIndexingService fileIndexingService)
Sets the fileIndexingService attribute of the FileIndexingServiceWriter object

Parameters:
fileIndexingService - The new fileIndexingService.

getFileIndexingService

public FileIndexingService getFileIndexingService()
Gets the fileIndexingService attribute of the FileIndexingServiceWriter object

Returns:
The fileIndexingService.

setDefaultFieldName

public void setDefaultFieldName(String newDefaultFieldName)
Sets the field name used to index the default content that was added using the addToDefaultField(String) method. If this method is not called, the default field will be named "default." This method should be called only once prior to using the method addToDefaultField(String) for the first time.

Parameters:
newDefaultFieldName - The new defaultFieldName value.

isValidationEnabled

public boolean isValidationEnabled()
Returns true if the files being indexed should be validated, otherwise false. This method may be ignored by concrete classes if not needed.

Returns:
true if validateion is enabled.

setValidationEnabled

public void setValidationEnabled(boolean validateFiles)
Sets whether or not to validate the files being indexed and create a validation report, which is indexed. This value is set by the FileIndexingService prior to indexing. If true, the method getValidationReport() will be called, otherwise it will not.

Parameters:
validateFiles - True to validate, else false.
See Also:
getValidationReport(), FileIndexingService.setValidationEnabled(boolean validateFiles)

getValidationReport

protected String getValidationReport()
                              throws Exception
Gets a report detailing any errors found in the validation of the file, or null if no error was found. This method should be overridden by concrete classes that need to validate the underlying file before indexing. Otherwise, this default method will simply return null. This method is called after all other method calls.

Returns:
Null if no file validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error.

addToDefaultField

protected void addToDefaultField(String value)
Adds the given String to a text field referenced in the index by the field name 'default'. The default field may be used in queries to quickly search for text across fields. This method should be called from the addCustomFields of implementing classes.

Parameters:
value - A text string to be added to the indexed field named 'default.'

addToAdminDefaultField

protected void addToAdminDefaultField(String value)
Adds the given String to a text field referenced in the index by the field name 'admindefault'. The default field may be used in queries to quickly search for text across fields. This method should be called from the addCustomFields of implementing classes.

Parameters:
value - A text string to be added to the indexed field named 'admindefault.'

getDeletedDoc

public Document getDeletedDoc(Document existingDoc)
                       throws Throwable
Creates a Lucene Document from an existing FileIndexingService Document by setting the field "deleted" to "true". Design note: this method should be overwritten by subclasses that require more envolved logic for deletes.

Parameters:
existingDoc - An existing FileIndexingService Document that currently resides in the index for the given resource.
Returns:
A Lucene FileIndexingService Document with the field "deleted" set to "true".
Throws:
Throwable - Thrown if error occurs

abortIndexing

protected void abortIndexing()
Aborts the indexing process by returning a null index document.


create

public Document create(File source,
                       File dir,
                       Document existingDoc)
                throws Throwable
Creates the Lucene Document for the given resource or returns null if unable to create. This method is called by class FileIndexingService.

Parameters:
source - The source file to be indexed
dir - The directory where the file resides
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
Returns:
A Lucene Document with it's fields populated, or null.
Throws:
Throwable - Thrown if error occurs

prtlnErr

protected final void prtlnErr(String s)
Output a line of text to error out, with datestamp.

Parameters:
s - The text that will be output to error out.

prtln

protected final void prtln(String s)
Output a line of text to standard out, with datestamp, if debug is set to true.

Parameters:
s - The String that will be output.

setDebug

public static final void setDebug(boolean db)
Sets the debug attribute of the FileIndexingServiceWriter object

Parameters:
db - The new debug value

DLESE Tools
v1.2