DLESE Tools
v1.2

org.dlese.dpc.index.writer
Class SimpleXMLFileIndexingWriter

java.lang.Object
  extended byorg.dlese.dpc.index.writer.FileIndexingServiceWriter
      extended byorg.dlese.dpc.index.writer.XMLFileIndexingWriter
          extended byorg.dlese.dpc.index.writer.SimpleXMLFileIndexingWriter
All Implemented Interfaces:
DocWriter

public class SimpleXMLFileIndexingWriter
extends XMLFileIndexingWriter

Creates a Lucene Document from any XML file by stripping the XML tags to extract and index the content. The reader for this type of Document is XMLDocReader.

Author:
John Weatherley
See Also:
FileIndexingService, XMLDocReader

Field Summary
 
Fields inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
recordDataService, vocab
 
Constructor Summary
SimpleXMLFileIndexingWriter(String collection, String doctype, RecordDataService recordDataService)
          Constructor for the XMLFileIndexingWriter.
 
Method Summary
protected  void addFields(Document newDoc, Document existingDoc, File sourceFile)
          Adds the full content of the XML to the default search field.
protected  void destroy()
          Does nothing.
 String getCollection()
          Returns unique collection keys for the item being indexed, separated by spaces.
 String getDocType()
          Gets the docType associated with this file.
protected  String getId()
          Returns an ID for this record derived from the file name.
 String getReaderClass()
          Gets the name of the concrete DocReader class that is used to read this type of Document, which is "org.dlese.dpc.index.reader.XMLDocReader".
protected  String getValidationReport()
          Gets a report detailing any errors found in the validation of the data, or null if no error was found.
 void init(File sourceFile, Document existingDoc)
          Caputres the sourc file.
 
Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getFieldContent, getFieldContent, getFieldName, getOaiModtime
 
Methods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addToAdminDefaultField, addToDefaultField, create, getDeletedDoc, getExistingDoc, getFileIndexingService, getSourceDir, getSourceFile, isValidationEnabled, prtln, prtlnErr, setDebug, setDefaultFieldName, setFileIndexingService, setValidationEnabled
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleXMLFileIndexingWriter

public SimpleXMLFileIndexingWriter(String collection,
                                   String doctype,
                                   RecordDataService recordDataService)
Constructor for the XMLFileIndexingWriter.

Parameters:
collection - The collection associated with this file.
doctype - A document type for this type of file, such as 'oai_dc' or 'adn.'
recordDataService - Used to get data about the file.
Method Detail

getDocType

public String getDocType()
                  throws Exception
Gets the docType associated with this file. Should be overridden by sub-classes that define their own doctype.

Specified by:
getDocType in interface DocWriter
Specified by:
getDocType in class FileIndexingServiceWriter
Returns:
The docType value
Throws:
Exception - DESCRIPTION

getCollection

public String getCollection()
Description copied from class: XMLFileIndexingWriter
Returns unique collection keys for the item being indexed, separated by spaces. For example 'dcc,' 'comet' or 'dwel'.

Specified by:
getCollection in class XMLFileIndexingWriter
Returns:
The collection keys

getReaderClass

public String getReaderClass()
Gets the name of the concrete DocReader class that is used to read this type of Document, which is "org.dlese.dpc.index.reader.XMLDocReader".

Specified by:
getReaderClass in interface DocWriter
Specified by:
getReaderClass in class FileIndexingServiceWriter
Returns:
The STring "XMLDocReader".

init

public void init(File sourceFile,
                 Document existingDoc)
          throws Exception
Caputres the sourc file.

Specified by:
init in class FileIndexingServiceWriter
Parameters:
sourceFile - The sourceFile being indexed.
existingDoc - An existing Document that exists for this in the index.
Throws:
Exception - DESCRIPTION

destroy

protected void destroy()
Does nothing.

Specified by:
destroy in class FileIndexingServiceWriter

getValidationReport

protected String getValidationReport()
                              throws Exception
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This method performs schema validation over the XML.

Overrides:
getValidationReport in class FileIndexingServiceWriter
Returns:
Null if no data validation errors were found, otherwise a String that details the nature of the error.
Throws:
Exception - If error in performing the validation.

getId

protected String getId()
                throws Exception
Returns an ID for this record derived from the file name. Simply removes the ".xml" from the end of the filename to get the ID.

Specified by:
getId in class XMLFileIndexingWriter
Returns:
The id String
Throws:
Exception - If error.

addFields

protected void addFields(Document newDoc,
                         Document existingDoc,
                         File sourceFile)
                  throws Exception
Adds the full content of the XML to the default search field. Strips the XML tags to extract the content. Will not work properly if the XML is not well-formed.

Specified by:
addFields in class XMLFileIndexingWriter
Parameters:
newDoc - The new Document that is being created for this resource
existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
sourceFile - The feature to be added to the CustomFields attribute
Throws:
Exception - This method should throw and Exception with appropriate error message if an error occurs.

DLESE Tools
v1.2