Introduction
Technical
About us

Development Document

The goal of this project is to build an extensible, modular and configurable xml-based repository which will house, search on, and display over the World Wide Web documents encoded in TEILite.

What follows is an outline of the project as it exists on 11th of Jan 2005:

Index

Part I

Part II

Part III

Part IV

What <teiPublisher> will allow

  1. editors with limited technical knowledge to establish an xml repository for TEI-Lite documents;
  2. the provision of a document management system that assists in developing ontology consistency;
  3. a set of xsl style sheets to display documents;
  4. a rubric for the development of search/browse and results pages;
  5. a set of XML Documents to be indexed and stored for efficient search and retrieval;
  6. an extensible framework with plugin architecture.

What will <teiPublisher> will not do

  1. Delve in to ontology development, connection to Knowledge bases etc.;
  2. Scale to thousands of documents*

* Full text searches do not work extremely well with native xml databases as xpath searches which contain"()" are very slow. To overcome this problem, we have integrated Lucene in the teiPublisher.

Current Status

@ refers to internal mith project code
{ } refers to source codebase

Java Coding

Module Name Description Status Complete Implementation Codebase Tested Required Changes Help Required
I/O XML DB Add/Delete to and from xml database 100% Java Servlet {org.teipublisher.actions.text} 100%
I/O Lucene DB Add/Delete to and from lucene indexes 100% Java Servlet {org.teipublisher.actions.text} Sample: http://aton.umd.edu:8080/teipublisher/test No
Search XML DB Search retrieve document DOM using XPATH 100% Java Servlet {org.teipublisher.actions.text} Currently in use @STEIN
Search Lucene DB Search retrieve document 100% Java Servlet {org.teipublisher.actions.text} Implemented: http://aton.umd.edu:8080/teipublisher/test No
Backup data Backup the xml database to a zip file 100% JSP Servlet TESTED: Pass
Wiki Sample implementation File I/O from the Server file system 100% Servlet * Requires Javascript for preview Access control TESTED: Passed with Exception Remove Tidy Messages.
*DOM manipulation HTML Form elements Not Required {Redundant} 0% Javascript * NA NA

Looks

Task Status URL Help Required Compliance
Logo Alex/Martin Yes
Style sheets John/Martin Yes NS 7+ IE 5+

Exploration !!!

Topic Problem/Issue Action Comments
JavaServer Faces Binding the html form events to the server side Further investigation required Support very recent only for tomcat 5.0 +
Greenstone Digital Library Software Install the software and test it out Any takers ? Its a generic tool for building repository
Jazz Api for zooming and multiple representation. http://www.cs.umd.edu/hcil/jazz/ If the xml analysis tool requires an interface with features of zoom and quick browse, this API can be very useful.
Lucene (Integrated) Api for indexing Does not support the xpath, how ever full text search is very quick The simple search (full text search) in <teiPublisher> is based on Lucene

Architecture

The application can be divided into two broad parts: an XML Analysis Tool and a Respository Management System:

XML Analysis Tool

We envision <teiPublisher> will provide a helper application to allow content developers to view the content of elements and attributes used in controlled vocabularies, and highlight semantic inconsistencies. It will also assist in selecting elements and attributes which will ultimately be searched on.

This visualization tool will be accessable to editors to check ontology consistency and mark areas of interest in the document. The analysis tool would help to define constraints on xml for both PCdata and attribute values. This tool may be thought of as a generator of an xml schema for node sets. The work in progress on xml analysis tool is available here.

Repository Management

The web application will contain two types of web pages:

When we speak below about website customization, we are only speaking about the latter dynamically-generated pages. Editors can create static HTML pages outside the <teiPublisher> framework and drop them into the the application.

. The application will use xsl stylesheets to provide interfaces for the customization of the webpages. The logic/data generated by the customization will be stored in a set of xml files. These xml files will store the following kinds of information:

Communication between Analysis and Repository Management tools

Depending upon the architecture of the analysis tool, there are two choices for how the communicaiton will be made between analysis and repository management: Tight Coupling and Loose Coupling.

In tight coupling the xml analysis tool is the part of the administrative section on the website and it runs on the same JVM as the servlet container. In the above scenario, the selections by the Analysis tool can be instantly displayed much like the changes to HTML in static web pages. However, the process of analysis will be memory intensive and using the same environment as the publishing tool will make the application less robust.

In loose coupling, the xml analysis tool will run as a separate application which can be fired upon need. This will be a java swing application. The java swing application can have robust visualization tools to filter/select and display xml nodes.

The information can be shared between the two sections by serializing it to the set of xml files as shown in the table below:

Name of File Purpose DTD decided
config.xml
  1. list of other config files
  2. head/footer/navigation include locations
  3. backup locations
  4. access control patterns e.g {129.2.*.* allow}
Yes
search.xml
  1. Nodes/xpath/patterns to be searched
  2. Controlled vocab
  3. search type
Yes
browse.xml
  1. Xpath grouped for collection
  2. Controlled vocab
Yes

Part III

The following modules are here as placeholders which will be filled up as they are assigned/completed

Procedures:

Website

layout on http://teipublisher.sourceforge.net

<teiPublisher>

Part IV

Subversion: Useful Info

[Update: 01/11/2005] We are not using the sourceforge's CVS system.

We use subversion for versioning, please consult the subversion book online.

If you want a Subversion tutorial, Google points to many. If you are looking for a good Windows Subversion client, try TortoiseSvn {http://tortoisesvn.tigris.org/}.

Anonymous Access: (Read only)

Developer Access: (Read and Write)

Mailing List and Archives

teipublisher-devel@lists.sourceforge.net
Archive: http://sourceforge.net/mailarchive/forum.php?forum_id=36548
teipublisher-users@lists.sourceforge.net
Archive link will be added latter.