Overview

This document contains an overview of the components required for EMERSE to run properly.

Components

A fully functioning EMERSE system requires 4 main components:

  1. The EMERSE application code, running inside Apache Tomcat

  2. Apache Solr

  3. Apache ActiveMQ

  4. An Oracle database.

EMERSE can optionally make use of an LDAP server for authentication.

EMERSE application

The EMERSE application process is a Java based application. It is a standard J2EE web application that requires deployment in a Java servlet container such as Apache Tomcat running inside a Java Virtual Machine. We recommend using Apache Tomcat, as we have the most experience using it, but others such as Jetty could be used. Details on supported versions of Java, Tomcat and Solr are provided in the installation guide.

Apache Solr

EMERSE leverages the Apache Solr project to enable searching of documents. Solr is a popular open source toolkit that enables fast searching and text retrieval by creating specialized indexes from documents. The EMERSE application uses these indexes exclusively to display, search and highlight documents. Therefore, to use EMERSE, a process needs to be implemented that will take documents from your organization (e.g., electronic health record or document repository) and have them indexed by Solr. Because this process is likely to be highly dependent on local circumstances, it is not a core part of EMERSE and will have to be set up separately. Solr supports many mechanisms for indexing, described in the EMERSE Integration and Indexing Guide.

The Solr project umbrella includes REST based web APIs for querying, monitoring and admin tools, as well as the APIs for indexing and removing/deleting documents when necessary. The core technology underpinning most of the APIs in the Solr project is Apache Lucene, a set of Java libraries. EMERSE uses a mix of the Solr web based API and Lucene libraries.

Apache ActiveMQ

ActiveMQ is an open source Message Broker. It facilitates asyncronous or background processing required by a few features of the EMERSE web application. Currently it is being used to handle deleting patients from large patient lists, which can be a time consuming operation, so the user does not have to wait. It is also used by an internal job that maintains the Patient Solr index to increase parallelization.

Oracle Database

The database stores audit information, user searches, patient lists, and some configuration data. The database does not contain the clinical documents or the index. Thus, searching and highlighting of documents occurs without the use of the database; rather, these features are mainly enabled through the use of Apache Solr.

Additional External Components

Indexing code

In order for EMERSE to highlight and search, documents need to be sent from their source locations to Solr for indexing. At Michigan Medicine, a small Java application was created that pulls documents from various sources, then pushes them to Solr via SolrJ API. More information on integrating your organization’s data with EMERSE is in the Integration and Indexing Guide.

Batch Jobs

Patient updating

The EMERSE software requires that patients and their metadata (e.g., medical record number, date of birth, gender, race, ethnicity) related to the indexed documents be loaded into a table in the database. At Michigan Medicine this is currently being handled by a daily batch job, using Pentaho Data Integrator (PDI). For every indexed document their should be a corresponding patient in the patient table, connected via a common medical record number (MRN).

Research Studies and users

For auditing and access purposes it is helpful to be able to link a user, and a specific EMERSE session, to a study. Because most sites that support research have an electronic IRB system to keep track of studies and users, EMERSE can incorporate some of those data to be used for validating research study information after a user authenticates. For this to work, EMERSE requires that a couple of tables are populated related to research studies, and users related to those studies. Currently this is handled by a Pentaho Data Integrator (PDI) job but will likely have to be customized for the specifics of your site and electronic IRB system.

Architecture Views

Deployment View

Logical Architecture View
Figure 1. Components that comprise EMERSE, as they are deployed at Michigan Medicine

Process View

process_view
Figure 2. Diagram showing how some of the components within EMERSE interact, and how the data are incorporated into the system.