The Irish in Europe VRE
The Irish in Europe Project provides integrated access in a customised virtual research environment to four legacy databases. The humanities data modeling and information technology components of the IRCHSS-funded project to process and host these databases was conceived by John Keating (NUI Maynooth), assisted by Damien Gallagher.
The ‘legacy’ databases which form the core of the data presented in this first phase of the Irish in Europe Virtual Research Environment, were assembled in various electronic formats and were not susceptible, as they stood, either to merging or to advanced querying. Nor were they suitable for web-hosting. To remedy this, the IRCHSS project team decided to use a customised XML (eXtensible Markup Language) schema to encode the legacy data. The schema provided mechanisms for encoding personal, educational, military and professional information about individual Irish migrants. As an individual’s record typically contained personal data and data related to some particular attribute (for example, military) it was possible to select an appropriate minimum set of XML elements for encoding purposes. IEP developed custom conversion programs, using the Perl programming language to generate automatically XML from Adobe PDF, Microsoft Word and Microsoft Excel documents originally used to contain the legacy data. Later, the XML encodings were cleansed manually and then stored in custom, non- native databases that provided fast searching interfaces implemented as a web service.
With this format, web services provide external applications with an XML querying methodology and return query results in a standardised XML format. Standardising and then publishing, or exposing, the web interfaces allows software developers to develop customised query forms and visualisation objects independently of ongoing database redevelopment. In the Strangers to Citizens: Irish in Europe 1600-1800 virtual exhibition, curated by the IEP for the National Library, Dublin (2007-9), for example, all web services were developed and tested early in the development cycle. This ensured that the database developers, web and interface developers could work independently. The software engineers created ‘database inspectors’ which constructed web service queries, and processed the XML encoded results. The software engineers then implemented several searching algorithms, each conforming to the web service interface, and finally selected the optimal algorithm that could be used in a web environment.
Another advantage of using a web service approach is that visualisation tools tend to be built for a web interface rather than a database ensuring software tools can be reused or extended with ease. In the Strangers to Citizens virtual exhibition, for example, IEP developed highly visual and interactive map objects where individual Irish counties or dioceses could be coloured to represent migration densities from that region during a particular period. This database inspector was used with four different databases and required minimal configuration for each deployment. Furthermore, all future databases containing, for instance, diocesan-based or county-based data, can use this database inspector with minimal configuration. It is necessary only to specify the database name and location, the date range, colour preferences and some scaling limiters.
For the Strangers to Citizens virtual exhibition, IEP used Adobe Flex 3 to develop the database inspectors. Flex 3 is a free open source framework for building web applications deployable on all major browsers, desktops, and operating systems. IEP has also developed database inspectors using AJAX (asynchronous JavaScript and XML) which is a collection of related web development techniques used for creating interactive web applications. One disadvantage of using Flex over AJAX is that Flex provides a mature cross domain policy management facility, which means that it will be possible for non-IEP developers to develop and host database inspectors on web domains other than the Irish in Europe and still access the databases hosted and managed by IEP.
Currently, the XML encodings are stored on a dedicated XML database using the open source eXist-db software. This database is configured to separate the XML data elements into Lucene indexes, allowing the database to run complex text-matching queries with no visible slow down. The IEP Virtual Research Environment (VRE) website uses user interface developed in Java using a Model-View-Controller (MVC) environment that maps complex application development onto an object-based architecture. The website communicates with an XML server using web services (in this case a combination of the REST and XML-RPC).
The searches provided by the IEP VRE are dynamically constructed database queries written using the open XQuery standard and are communicated to the database using XML-RPC. XQuery is also used to analyse the search results using bar charts, pie charts and population maps. An XML translation technology (XSLT) is used to convert XML records into other text-based formats, and is a crucial component of the IEP VRE, for example, the translation of migrant records into the view that is shown to the user within the VRE. In practice, the VRE queries the database to apply our custom built XHTML translation to the requested migrant record and the VRE appends the result to the website without further processing. XSLT, in combination with Apache FOP, is also used to convert records and query results into common text and spreadsheet formats such as Portable Document Format (PDF), Postscript, Comma Separated Values (CSV) and Rich Text Format (RTF).
The software architecture underlying all IEP’s software technology is flexible and is designed to be interoperable. This ensures that new technologies, for example, more powerful XML database engines, can be introduced without disrupting the usability of the research material. IEP has now completed the first phase of its virtual
research platform. The platform is an extensible online collection of databases and software tools for creating, managing, querying and manipulating multiple datasets simultaneously. One currently available feature of the research platform is the ability to query a user-selected collection of datasets and compare graphically the migration for specific dioceses or counties. It is possible to compare these data with existing census and migration (theoretical and empirical) models appearing in the research literature. Users may also avail of data manipulation (smoothing, modeling, interpolation, etc.), table generation and plotting facilities using collections of datasets, which heretofore, has been difficult.
The platform can also acquire and build new databases related to early modern Irish migration populations that are automatically harvested from digital archives. In conjunction with colleagues in An Foras Feasa, the IRCHSS-funded humanities research institute in NUIM, they have implemented a modern digital archive based on Fedora Commons, an open source repository technology suitable for the creation, management, publishing and sharing of digital content. The archive also provides index-based XML document retrieval using XQuery processing provided by eXist- db, an open source database management system built entirely on XML. Additional web facilities are provided by the open source Apache web server. IEP and An Foras Feasa recently captured, segmented, transcribed, translated and encoded a 324 page eighteenth century ledger from Libros de gastos del colegio de Alcalá (Russell Library Maynooth, Salamanca Archives, Legajo S30, nos 1-3). The digital copy, together with software tools for inspecting and performing basic accounting operations, are accessible via the VRE. These interoperable software tools have been developed using the community-sensitive approach described earlier, and are available with English and Spanish user interfaces.
The IRCHSS-funded IEP has demonstrated that it is possible to use modern, open source technologies, to provide early modern researchers with inter-operable tools for advancing historical research. The careful choices of technologies, best practice software development paradigms, and effective engineer-researcher discourse, are crucial for phased deployment of software research tools and digital archives. These developments are bringing enormous practical advantages to students of the history of migrations everywhere. With data in these flexible formats it will be possible not only to do traditional tasks more effectively but also to conceive and execute new ones.
Graph placeholder
