Alice opened the door and found that it led into a small passage ... into the loveliest garden you ever saw. How she longed to get out of that dark hall and wander ... [Lewis Carroll, Alice in Wonderland]

The overarching goal of the ALICE project is to build a semantic web for defined subsets of the digitised literature. The consortium will capitalise on their collective expertise in biodiversity to enhance access to well-defined subsets of the digitised literature in this scientific domain. However, it is anticipated that the principles and practices developed during the project will be transferable to other disciplines, along with much of the software and supporting infrastructure developed by the consortium.

The ALICE project seeks to develop a portal that will facilitate access to the digitised literature. The portal must allow standard searches that extend across distributed sources but which also is facilitated by adding content markers to the material from those sources where they are not originally present. These content markers will be stored by the portal and presented to the user as ‘overlays’ so the source digitised literature remains unmodified. The development of overlays enabling queries to be made is ideally done by mark-up into XML. Prior to mark-up, parsing of the source content to a standardised XML format will be required. To gain maximum benefits a high level of atomisation is beneficial. The mark-up process itself should be automated as far as possible, although user-friendly interfaces will be required to maximise take-up by a wide variety of users of differing levels of expertise and skills. Marked up text will be stored in database format to maximise search and retrieval speed. These texts will be version-controlled so that automated and manual markup can be added iteratively as algorithms for information extraction improve and additional mark-up is added manually to the texts.

