A Gateway Between the World-Wide Web and PAT: Exploiting SGML Through the Web
The HyperText Markup Language (HTML) used by the World-Wide Web has limited markup and structure recognition capabilities. Only a small set of text characteristics can be represented, and few of these have any functional value beyond display capabilities. The HTML ANCHOR element supports hypertext links; however, it cannot retrieve components of a linked document, such as a single glossary entry from a collection of several thousand entries, without resorting to programs external to HTML and the Web server. In spite of these limitations, HTML and the Web are key technologies for libraries. The Standard Generalized Markup Language (SGML) is a fullfeatured, standard markup language. HTML is actually an SGML Document Type Definition. Ideally, it would be possible to retrieve text documents marked up with the richer SGML tag set via the World-Wide-Web. This technical paper discusses how the Web can be linked to the PAT system, Open Text's search engine that supports access to SGML-encoded documents. This Web-to-PAT Gateway utilizes the Web's Common Gateway Interface (CGI) capability and SGML-to-HTML filter programs. After briefly overviewing key technical concepts, the paper explains the operation of the Web-to-PAT Gateway, using several examples of how it is employed at the University of Virginia Libraries, including access to text files such as a Middle English collection, the Oxford English Dictionary, and the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange.