DLESE Tools
v1.2

org.dlese.dpc.index.reader
Class XMLDocReader

java.lang.Object
  extended byorg.dlese.dpc.index.reader.DocReader
      extended byorg.dlese.dpc.index.reader.FileIndexingServiceDocReader
          extended byorg.dlese.dpc.index.reader.XMLDocReader
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
DleseAnnoDocReader, ItemDocReader

public class XMLDocReader
extends FileIndexingServiceDocReader

Provides getter methods to read data from an XML-based Lucene Document that was created by a XMLFileIndexingWriter. The getter methods can then be accessed from (Struts) beans that need the data. This class may be extended for each document type that might be returned in a search. For example: DLESE-IMS, ADN-i, ADN-c or DC. Instances of this class and sublcasses are created by ResultDoc. After implemeting a new XMLDocReader, a new switch to access it should be added to class ResultDoc. In general, one XMLDocReader may be created for each document type that is defined in package org.dlese.dpc.index.writer.

Author:
John Weatherley
See Also:
XMLFileIndexingWriter, Serialized Form

Field Summary
protected  RecordDataService recordDataService
          DESCRIPTION
protected  MetadataVocab vocab
          DESCRIPTION
 
Fields inherited from class org.dlese.dpc.index.reader.DocReader
conf, doc, score
 
Constructor Summary
XMLDocReader(Document doc)
          Constructor that may be used programatically to wrap a reader around a Lucene Document created by a FileIndexingServiceWriter.
XMLDocReader(Document doc, float score, ResultDocConfig conf)
          Constructor that is used by ResultDoc at search time to create a new instance.
 
Method Summary
 ArrayList getAvailableFormats()
          Gets the XML formats that are available for this item.
 String getCollectionKey()
          Gets the collection key associated with this record, for example 01.
 String[] getCollectionKeys()
          Gets the collection keys associated with this record, for example {01,02}.
 String[] getCollections()
          Gets the sets associated with this record as an array of Strings, for example dcc.
 String getDocsource()
          Gets the absolute path of the file that was used to index the Document.
protected  String getFieldName(String fieldString)
          Gets the fieldName attribute of the XMLDocReader object
 String getId()
          Gets the id attribute of the object
 String getIndexedContent()
          Gets the full text of the content that was indexed.
 String getMetadataPrefix()
          Gets the metadata previx (format) of the file associated with this reader, for example 'dlese_ims' or 'adn'.
 String getNativeFormat()
          Gets the nativeFormat of the file associated with this reader, for example 'dlese_ims' or 'adn'.
 String getOaiDatestamp()
          Gets the oaiDatestamp in UTC format for the given record.
 String getOaiLastModifiedString()
          Gets a String representataion of the oai datestamp in readable format.
 String getReaderType()
          Gets the String 'XmlDocReader,' which is the key that describes this reader type.
 String[] getSets()
          Gets the sets associated with this record as an array of Strings, for example dcc.
 String getSetString()
          Gets the collections associated with this record as a single String.
 String getValidationReport()
          Gets the validationReport for this document, or null if no validationReport was found.
 String getXml()
          Gets XML in the format native to the underlying docType.
 String getXmlFormat(String format, boolean filter)
          Gets XML in the given format.
 String getXmlStripped()
          Gets XML with no XML or DTD declaration in the format native to the underlying docType.
 boolean isValid()
          Determines whether the XML for this record is valid.
static void setXMLConversionService(XMLConversionService cs)
          Sets the XMLConversionService used by this DocReader.
 
Methods inherited from class org.dlese.dpc.index.reader.FileIndexingServiceDocReader
fileExists, getDateStamp, getDeleted, getDocDir, getDocsourceEncoded, getDoctype, getFileExists, getFileName, getFullContent, getLastModified, getLastModifiedString, getSourceFile, isDeleted, prtln, prtlnErr, setDebug
 
Methods inherited from class org.dlese.dpc.index.reader.DocReader
getDocument, getIndex, getQuery, getScore, setDoc
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

vocab

protected MetadataVocab vocab
DESCRIPTION


recordDataService

protected RecordDataService recordDataService
DESCRIPTION

Constructor Detail

XMLDocReader

public XMLDocReader(Document doc,
                    float score,
                    ResultDocConfig conf)
Constructor that is used by ResultDoc at search time to create a new instance.

Parameters:
doc - The Lucene Document that is read.
score - The rank of the Document in a set of results.
conf - Config object.

XMLDocReader

public XMLDocReader(Document doc)
Constructor that may be used programatically to wrap a reader around a Lucene Document created by a FileIndexingServiceWriter.

Parameters:
doc - A Lucene Document created by a ItemFileIndexingWriter.
Method Detail

getReaderType

public String getReaderType()
Gets the String 'XmlDocReader,' which is the key that describes this reader type. This may be used in (Struts) beans to determine which type of reader is available for a given search result and thus what data is available for display in the UI. The reader type determines which getter methods are available.

Specified by:
getReaderType in class DocReader
Returns:
The String 'XmlDocReader'.

getIndexedContent

public String getIndexedContent()
Gets the full text of the content that was indexed.

Returns:
The indexedContent value.

getId

public String getId()
Gets the id attribute of the object

Returns:
The id value

getMetadataPrefix

public String getMetadataPrefix()
Gets the metadata previx (format) of the file associated with this reader, for example 'dlese_ims' or 'adn'.

Returns:
The metadataPrefix value

getNativeFormat

public String getNativeFormat()
Gets the nativeFormat of the file associated with this reader, for example 'dlese_ims' or 'adn'. Same as getMetadataPrefix().

Returns:
The nativeFormat.

getSetString

public String getSetString()
Gets the collections associated with this record as a single String.

Returns:
The collections.

getSets

public String[] getSets()
Gets the sets associated with this record as an array of Strings, for example dcc. Assumes the set key has not been encoded using the vocab manager.

Returns:
The set(s) associated with this record.

getCollections

public String[] getCollections()
Gets the sets associated with this record as an array of Strings, for example dcc. Assumes the set key has not been encoded using the vocab manager.

Returns:
The set(s) associated with this record.

getCollectionKey

public String getCollectionKey()
Gets the collection key associated with this record, for example 01. Assumes the set key has been encoded using the vocab manager and that there is only one collection associated with this item.

Returns:
The collection for which this item belogs.

getCollectionKeys

public String[] getCollectionKeys()
Gets the collection keys associated with this record, for example {01,02}. Assumes the set key has been encoded using the vocab manager and that there are more than one collections associated with this item.

Returns:
The collection for which this item belogs.

getXml

public final String getXml()
Gets XML in the format native to the underlying docType.

Returns:
The XML, or empty string if unable to process.

getXmlStripped

public final String getXmlStripped()
Gets XML with no XML or DTD declaration in the format native to the underlying docType.

Returns:
The XML, or empty string if unable to process.

getXmlFormat

public String getXmlFormat(String format,
                           boolean filter)
Gets XML in the given format. The resulting String contains XML in the given format, or an empty String if unable to process. If filter is set to true then the output will have the XML declaration stripped out and the DTD declaration will be commented out, in the case of DLESE IMS. Use filter=true to get XML suitable for insertion into an OAI container. Use filter=true to get the full XML including XML and DTD declaration, if present.

Parameters:
format - The format desired.
filter - Indicates whether to filter out the XML and DTD declaration.
Returns:
XML for the given format, or an empty String if unable to process.

getAvailableFormats

public ArrayList getAvailableFormats()
Gets the XML formats that are available for this item.

Returns:
The availableFormats.

getOaiDatestamp

public String getOaiDatestamp()
Gets the oaiDatestamp in UTC format for the given record.

Returns:
The oaiDatestamp value.

getOaiLastModifiedString

public String getOaiLastModifiedString()
Gets a String representataion of the oai datestamp in readable format.

Returns:
The File modification time.

setXMLConversionService

public static void setXMLConversionService(XMLConversionService cs)
Sets the XMLConversionService used by this DocReader. This method should be called prior to accessing the index and using the DocReader to get XML. Only necessary if XML conversion is required such as in OAI applications.

Parameters:
cs - The new XMLConversionService.

getValidationReport

public String getValidationReport()
Gets the validationReport for this document, or null if no validationReport was found.

Returns:
The validationReport value.
See Also:
isValid()

getDocsource

public String getDocsource()
Description copied from class: FileIndexingServiceDocReader
Gets the absolute path of the file that was used to index the Document.

Overrides:
getDocsource in class FileIndexingServiceDocReader
Returns:
The absolute path the the underlying file.

isValid

public boolean isValid()
Determines whether the XML for this record is valid. To search for valididity, use field valid:[true|false] (unstored). If the XML was not valid there will be a validation report available.

Returns:
True if valid, else false.
See Also:
getValidationReport()

getFieldName

protected String getFieldName(String fieldString)
Gets the fieldName attribute of the XMLDocReader object

Parameters:
fieldString - DESCRIPTION
Returns:
The fieldName value

DLESE Tools
v1.2