|
DLESE Tools v1.2 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.dlese.dpc.index.writer.FileIndexingServiceWriter
Abstract class for creating customized Lucene Document
s for different file formats such as DLESE-IMS,
ADN-item, ADN-collection, etc. Concrete sub-classes may be used with a FileIndexingService
to enable automatic updating of the index
whenever changes in the source file are made. This class, along with the FileIndexingService
, may be used with a SimpleLuceneIndex
to provide simple search support over files.
Note: after creating a new concrete FileIndexingServiceWriter, add a switch in RepositoryManager
, method putDirInIndex(DirInfo, String) to
select it for indexing.
The Lucene fields that are created by this class are:
doctype
- The document format type (e.g. dlese_ims, adn, oai_dc,
etc.) defined by concrete classes, with '0' appended to support wildcard searching.
readerclass
- The class which is used to read typed Document
s created by the concrete classes, for example
"ItemDocReader".default
- The default field containing content added by concrete
classes. Generally this is the field assigned in the Lucene index for default
searching.docsource
- The absolute path to the file, which is used by the
FileIndexingService
for updating/deleting and may be
used by beans or other classes that wish to have access to the source file.docdir
- The absolute path to the directory where the file
resides, which is used by the FileIndexingService
for
updating/deleting and may be used by beans or other classes.modtime
- The file modification time, which is used by the FileIndexingService
to determine if the file has changed and
needs update and may be used by beans or other classes that wish to query the
modtime for the record.deleted
- Set to 'true' if the file or record for this document
has been deleted, otherwise this field does not exist. Stored. valid
- Set to 'true' if the file or record for this document is
valid, otherwise 'false'. This field may also be ommited. Not stored. validationreport
- Contains a report that provides validation
information about the underlying file. This field may be ommited. Not stored.
Constructor Summary | |
---|---|
FileIndexingServiceWriter()
|
Method Summary | |
---|---|
protected void |
abortIndexing()
Aborts the indexing process by returning a null index document. |
protected abstract void |
addCustomFields(Document newDoc,
Document existingDoc,
File sourceFile)
Adds additional custom fields that are unique the document format being indexed. |
protected void |
addToAdminDefaultField(String value)
Adds the given String to a text field referenced in the index by the field name 'admindefault'. |
protected void |
addToDefaultField(String value)
Adds the given String to a text field referenced in the index by the field name 'default'. |
Document |
create(File source,
File dir,
Document existingDoc)
Creates the Lucene Document for the given resource
or returns null if unable to create. |
protected abstract void |
destroy()
This method is called at the conclusion of processing and may be used for tear-down. |
Document |
getDeletedDoc(Document existingDoc)
Creates a Lucene Document from an existing
FileIndexingService Document by setting the field "deleted" to "true". |
abstract String |
getDocType()
Returns a unique document type key for this kind of record, corresponding to the format type. |
Document |
getExistingDoc()
|
FileIndexingService |
getFileIndexingService()
Gets the fileIndexingService attribute of the FileIndexingServiceWriter object |
abstract String |
getReaderClass()
Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document , for example
"org.dlese.dpc.index.reader.ItemDocReader". |
File |
getSourceDir()
Gets the sourceDir that holds the file being indexed. |
File |
getSourceFile()
Gets the sourceFile that is being indexed. |
protected String |
getValidationReport()
Gets a report detailing any errors found in the validation of the file, or null if no error was found. |
abstract void |
init(File source,
Document existingDoc)
This method is called prior to processing and may be used to for any necessary set-up. |
boolean |
isValidationEnabled()
Returns true if the files being indexed should be validated, otherwise false. |
protected void |
prtln(String s)
Output a line of text to standard out, with datestamp, if debug is set to true. |
protected void |
prtlnErr(String s)
Output a line of text to error out, with datestamp. |
static void |
setDebug(boolean db)
Sets the debug attribute of the FileIndexingServiceWriter object |
void |
setDefaultFieldName(String newDefaultFieldName)
Sets the field name used to index the default content that was added using the addToDefaultField(String) method. |
void |
setFileIndexingService(FileIndexingService fileIndexingService)
Sets the fileIndexingService attribute of the FileIndexingServiceWriter object |
void |
setValidationEnabled(boolean validateFiles)
Sets whether or not to validate the files being indexed and create a validation report, which is indexed. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public FileIndexingServiceWriter()
Method Detail |
public abstract String getDocType() throws Exception
StandardAnalyzer
so it
must be lowercase and should not contain any stop words.
getDocType
in interface DocWriter
Exception
- This method should throw and Exception with appropriate error
message if an error occurs.public abstract String getReaderClass()
DocReader
class that is used to read this type of Document
, for example
"org.dlese.dpc.index.reader.ItemDocReader".
getReaderClass
in interface DocWriter
DocReader
.public abstract void init(File source, Document existingDoc) throws Exception
source
- The source file being indexedexistingDoc
- An existing Document that currently resides in the index for
the given resource, or null if none was previously present
Exception
- If an error occured during set-up.protected abstract void destroy()
protected abstract void addCustomFields(Document newDoc, Document existingDoc, File sourceFile) throws Exception
Document
class to add a Field
.
The following Lucene Field
types are available for
indexing with the Document
:
Field.Text(string name, string value) -- tokenized, indexed, stored
Field.UnStored(string name, string value) -- tokenized, indexed, not stored
Field.Keyword(string name, string value) -- not tokenized, indexed, stored
Field.UnIndexed(string name, string value) -- not tokenized, not indexed, stored
Field(String name, String string, boolean store, boolean index, boolean tokenize) --
allows control to do anything you want
Example code:
protected void addCustomFields(Document newDoc, Document existingDoc) throws Exception {
String customContent = "Some content";
newDoc.add(Field.Text("mycustomefield", customContent));
}
newDoc
- The new Document
that is
being created for this resourceexistingDoc
- An existing Document
that
currently resides in the index for the given resource, or null if none was
previously presentsourceFile
- The sourceFile that is being indexed
Exception
- This method should throw and Exception with appropriate error
message if an error occurs.public File getSourceFile()
public File getSourceDir()
public Document getExistingDoc()
public void setFileIndexingService(FileIndexingService fileIndexingService)
fileIndexingService
- The new fileIndexingService.public FileIndexingService getFileIndexingService()
public void setDefaultFieldName(String newDefaultFieldName)
addToDefaultField(String)
method. If this method is not called, the default field
will be named "default." This method should be called only once prior to using the
method addToDefaultField(String)
for the first time.
newDefaultFieldName
- The new defaultFieldName value.public boolean isValidationEnabled()
public void setValidationEnabled(boolean validateFiles)
FileIndexingService
prior to indexing. If true, the method
getValidationReport()
will be called, otherwise it will not.
validateFiles
- True to validate, else false.getValidationReport()
,
FileIndexingService.setValidationEnabled(boolean
validateFiles)
protected String getValidationReport() throws Exception
Exception
- If error.protected void addToDefaultField(String value)
value
- A text string to be added to the indexed field named 'default.'protected void addToAdminDefaultField(String value)
value
- A text string to be added to the indexed field named 'admindefault.'public Document getDeletedDoc(Document existingDoc) throws Throwable
Document
from an existing
FileIndexingService Document by setting the field "deleted" to "true". Design note:
this method should be overwritten by subclasses that require more envolved logic for
deletes.
existingDoc
- An existing FileIndexingService Document that currently resides
in the index for the given resource.
Throwable
- Thrown if error occursprotected void abortIndexing()
public Document create(File source, File dir, Document existingDoc) throws Throwable
Document
for the given resource
or returns null if unable to create. This method is called by class FileIndexingService
.
source
- The source file to be indexeddir
- The directory where the file residesexistingDoc
- An existing Document that currently resides in the index for
the given resource, or null if none was previously present
Throwable
- Thrown if error occursprotected final void prtlnErr(String s)
s
- The text that will be output to error out.protected final void prtln(String s)
s
- The String that will be output.public static final void setDebug(boolean db)
db
- The new debug value
|
DLESE Tools v1.2 |
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |