The goal of this project is to build an extensible, modular and configurable xml-based repository which will house, search on, and display over the World Wide Web documents encoded in TEILite.
What follows is an outline of the project as it exists on 11th of Jan 2005:
* Full text searches do not work extremely well with native xml databases as xpath searches which contain"()" are very slow. To overcome this problem, we have integrated Lucene in the teiPublisher.
@ refers to internal mith project code
{ } refers to source codebase
| Module Name | Description | Status Complete | Implementation | Codebase | Tested | Required Changes | Help Required |
| I/O XML DB | Add/Delete to and from xml database | 100% | Java Servlet | {org.teipublisher.actions.text} | 100% | ||
| I/O Lucene DB | Add/Delete to and from lucene indexes | 100% | Java Servlet | {org.teipublisher.actions.text} | Sample: http://aton.umd.edu:8080/teipublisher/test | No | |
| Search XML DB | Search retrieve document DOM using XPATH | 100% | Java Servlet | {org.teipublisher.actions.text} | Currently in use @STEIN | ||
| Search Lucene DB | Search retrieve document | 100% | Java Servlet | {org.teipublisher.actions.text} | Implemented: http://aton.umd.edu:8080/teipublisher/test | No | |
| Backup data | Backup the xml database to a zip file | 100% | JSP | Servlet | TESTED: Pass | ||
| Wiki Sample implementation | File I/O from the Server file system | 100% | Servlet | * | Requires Javascript for preview | Access control | TESTED: Passed with Exception Remove Tidy Messages. |
| *DOM manipulation HTML Form elements | Not Required {Redundant} | 0% | Javascript | * | NA | NA | |
| Task | Status | URL | Help Required | Compliance |
| Logo | Alex/Martin | Yes | ||
| Style sheets | John/Martin | Yes | NS 7+ IE 5+ |
| Topic | Problem/Issue | Action | Comments |
| JavaServer Faces | Binding the html form events to the server side | Further investigation required | Support very recent only for tomcat 5.0 + |
| Greenstone Digital Library Software | Install the software and test it out | Any takers ? | Its a generic tool for building repository |
| Jazz | Api for zooming and multiple representation. http://www.cs.umd.edu/hcil/jazz/ | If the xml analysis tool requires an interface with features of zoom and quick browse, this API can be very useful. | |
| Lucene (Integrated) | Api for indexing | Does not support the xpath, how ever full text search is very quick | The simple search (full text search) in <teiPublisher> is based on Lucene |
The application can be divided into two broad parts: an XML Analysis Tool and a Respository Management System:
We envision <teiPublisher> will provide a helper application to allow content developers to view the content of elements and attributes used in controlled vocabularies, and highlight semantic inconsistencies. It will also assist in selecting elements and attributes which will ultimately be searched on.
This visualization tool will be accessable to editors to check ontology consistency and mark areas of interest in the document. The analysis tool would help to define constraints on xml for both PCdata and attribute values. This tool may be thought of as a generator of an xml schema for node sets. The work in progress on xml analysis tool is available here.
The web application will contain two types of web pages:
. The application will use xsl stylesheets to provide interfaces for the customization of the webpages. The logic/data generated by the customization will be stored in a set of xml files. These xml files will store the following kinds of information:
Depending upon the architecture of the analysis tool, there are two choices for how the communicaiton will be made between analysis and repository management: Tight Coupling and Loose Coupling.
In tight coupling the xml analysis tool is the part of the
administrative section on the website and it runs on the same JVM
as the servlet container. In the above scenario, the selections
by the Analysis tool can be instantly displayed much like the
changes to HTML in static web pages. However, the process of
analysis will be memory intensive and using the same environment
as the publishing tool will make the application less robust.
In loose coupling, the xml analysis tool will run as a separate application which can be fired upon need. This will be a java swing application. The java swing application can have robust visualization tools to filter/select and display xml nodes.
The information can be shared between the two sections by serializing it to the set of xml files as shown in the table below:
| Name of File | Purpose | DTD decided |
| config.xml |
|
Yes |
| search.xml |
|
Yes |
| browse.xml |
|
Yes |
The following modules are here as placeholders which will be filled up as they are assigned/completed
layout on http://teipublisher.sourceforge.net
We use subversion for versioning, please consult the subversion book online.
If you want a Subversion tutorial, Google points to many. If you are looking for a good Windows Subversion client, try TortoiseSvn {http://tortoisesvn.tigris.org/}.
teipublisher-devel@lists.sourceforge.net
Archive: http://sourceforge.net/mailarchive/forum.php?forum_id=36548
teipublisher-users@lists.sourceforge.net
Archive link will be added latter.