EMERSE Installation Guide

Overview

This guide provides a general overall of the components needed to run EMERSE, how to install them, and verify that they are running.

Supported Operating Systems

EMERSE is mainly tested on enterprise linux systems (RHEL and SUSE), so we recommend these or their open source equivalents. Windows based platforms should work as well, provided they are recent enough to have quality Java releases, as EMERSE and most of the components run within a Java Virtual Machine.

Pre-Requisites

Oracle

An Oracle database server is needed for EMERSE. For the purposes of getting started, Oracle makes available a free “Express Edition” that is fully functional. This free edition of Oracle supports 1 core and up to 10 GB of disk space, which should be enough to support a few users in a demonstration version. Oracle express and be downloaded here: Oracle Express-Edition

It should be possible to change the database to an open source one, although we do not recommend it at this time due to the effort it would take, and this approach is currently untested and unsupported by the EMERSE team. Further, we have optimized Oracle for the specific needs of EMERSE to ensure rapid responsiveness.

Sizing

EMERSE doesn’t place great demand on the Oracle database, so a relatively small server can be used with 10-50 GB of storage allocated for user tablespaces. Currently at Michigan Medicine we are using only 8 GB of space for production EMERSE with 600+ users and 5 years of data.

Database account

An Oracle/Schema needs to be created to place initial EMERSE tables—we recommend a schema named "EMERSE". The account doesn’t require a DBA role but needs to be able to create database objects such as tables, indexes, sequences and views.

Server

A Linux/Unix based server that will be used to install the application server and host the indexing services. This server should be connected to the highest speed storage available. Capacity is dependent on number of documents to be indexed. At Michigan Medicine 1.5TB is in use to host approximately 100 million documents in both TXT and HTML format on the production server. An additional 3 TB is available for index optimization, as optimization requires about 2x the size of the index. If your documents are heavily formatted (e.g., RTF instead of TXT), storage requirements may be higher.

Installation components

The remainder of this document covers the process of installing and configuring the application server where the EMERSE application will be deployed. The following items are required:

Java Development Kit/SDK (JDK)
Apache Tomcat (Java Servlet Engine)
Apache ActiveMQ (A message broker)
Solr 6 web application
EMERSE Web Archive File (WAR file) deployment and configuration

Specifics for installing each of these follows.

Installation

Table 1. Required Software
Name	Version	Download URL
Java JDK	1.8.x	http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Apache Tomcat	8.0.45+	https://tomcat.apache.org/download-80.cgi#8.5.9
Apache Active MQ	5.15	http://activemq.apache.org/download.html
Apache Solr	6.0.0 (only, not higher)	http://lucene.apache.org/solr/
EMERSE WAR file	latest version	http://project-emerse.org

Java

The first step in installing EMERSE is to download and install a Java Development Kit (JDK) on the server. Java 8 or higher is required. All of the remaining components that will be installed are dependent on a path to the location of the installation of this java JDK installation.

Note the location of the path to the JDK installation. It will be needed in subsequent installation steps

Linux: After downloading the java appropriate for the linux server, unpack the file:

tar -xvf jdk-xxversion-linux-xnn.tar.gz

rpm -ivh jdk-xxversion-linux-xnn.rpm

Windows: After downloading, run the executable. If the target workstation has an existing Java installation that you wish to maintain, unselect the “public JRE” option in the installer as shown below.

Apache Tomcat

EMERSE is packaged as a Java based WAR file that will is deployed to the Tomcat Servlet engine. Tomcat depends on a Java JDK. Version 8.0.45+ is required. The tomcat download page lists a number of components available for download. Only the "Core" software is required. The main steps to setup Tomcat are:

Download the "Core" zip or tar file. See Required Software section.
Unzip to desired directory
Ensure Tomcat uses the desired Java runtime

Operating Systems often include a Java runtime, and is available on the "Path" variable. We recommend explicitly using the JDK downloaded previously.

Linux:

From the binary distributions listed on the page, choose the "Core" tar file. Move the tar file to a desired directory for installation. Extract the tar file using

tar -zxvf apache-tomcat-8.0.nn.tar.gz

Edit the startup script (startup.sh) found at /path/to/tomcat/bin to point to Java installation directory by adding

export JAVA_HOME=/path/to/jdk_install

It has also been observed that EMERSE requires a higher limit on the number of open files than what is typically the default on some Linux systems. This can be set using the ulimit command as below prior to starting Tomcat, or added directly to the startup script startup.sh.

ulimit -v unlimited

When the open file limit is set too low, an exception like the following will be thrown when Solr or EMERSE are started.

...
Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
        at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
        at org.apache.lucene.store.MMapDirectory$MMapIndexInput.<init>(MMapDirectory.java:228)
        at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
        at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.<init>(CompressingTermVectorsReader.java:118)
        at org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:85)
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:132)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:96)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:63)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
...

Windows:

Download the zip file from binary distributions under the “Core” section. Unzip this file in a desired directory. This will become the Tomcat installed directory. Edit the startup.bat file found under bin to point to the directory where the JDK was installed.

set JAVA_HOME=c:\path_to\jdk_install

Starting and Stopping the Tomcat Server

To start the server use:

/path/to/tomcat/bin/startup.sh

To stop the server use:

/path/to/tomcat/bin/shutdown.sh

Confirm Tomcat installation

Point a browser to http://hostname:8080/. A tomcat welcome screen should come up.

Apache Solr

EMERSE leverages Apache Solr for searching documents.

Download recommended file. See Required Software section.
Unzip to desired directory
Ensure Solr uses the desired Java runtime

The Solr version should be the exact version specified. Otherwise, the sample indexes provided may not be usable, and EMERSE may not function.

Start and Stop Solr

To start the server use:

/path/to/solr/bin/solr start

To stop the server use:

/path/to/solr/bin/solr stop

Validate installation

Using a browser, navigate to http://hostname:8983/solr to confirm Solr installation.

Solr Data directory

At this point Solr should be running, but it has no "cores" or data. Create a new directory, preferrably outside the Solr installation directory, that will have configuration and data for each index that is being hosted. The remainder of this document will refer to SOLR_DATA_DIR as this directory.

Initial index configuration

The EMERSE distribution includes a set of sample Solr index files that include a small set of PubMed articles, not containing protected health information (PHI). They provide a way to verify the EMERSE install is successful, but the contents of which should be deleted and replaced with your organization’s real patient documents. These documents are linked to the patients loaded by the SQL file in database setup by their MRN.

The sample index pubmed should be downloaded and then placed in the Solr’s data directory on the server where EMERSE and Solr are running. Other aspects of the configuration will have to be changed from the included example files to match your local institutional needs (e.g., the metadata types included for each document).

Starting Solr with the index

To start Solr with the sample index, add the -s flag to the solr startup script, and provide the location to Solr’s configuration directory.

/path/to/solr/bin/solr start -s SOLR_DATA_DIR

for example,

/path/to/solr/bin/solr start -s /applicationdata/solr6

Apache ActiveMQ

EMERSE requires ActiveMQ to enable background processing of a number of internal tasks. Installation steps are very similar to Apache Tomcat.

Download recommended file. See Required Software section.
Unzip to desired directory
Ensure ActiveMQ uses the desired Java runtime

Unpack the ActiveMQ distribution.

Modify JAVA_HOME:

Linux:

modify 'activemq' inside the installation directory to point to Java.

export JAVA_HOME=/app/software/jdk1.8.xyz

Windows:

modify activemq.bat inside ACTIVEMQ_HOME

SET JAVA_HOME=c:\path_to\java_install

Starting and Stopping ActiveMQ

Startup:

./activemq start

Shutdown:

./activemq stop

Confirm ActiveMQ installation

The default port for the ActiveMQ GUI is 8161. Confirm it is up and running by pointing a web browser to:

http://hostname:8161/admin

EMERSE Install and Configuration

Database initialization

Provided with the EMERSE distribution are a set of files, each containing SQL statements that create all needed database objects and sample data that will allow the EMERSE application to startup with a default set of database objects, and sample data in the patients, research studies, synonyms and tables. These scripts should be run as the user and schema setup for the EMERSE application (this will be set by each implementing site), and not a system or sysdba user. These files need to be executed in a SQL query tool in the following order:

create.sql
auditTables.sql
sqlToPutBackInModel.sql
synonymsCreate.sql
lookupData.sql
patientData.sql
indexData.sql
synonyms_index_subset.sql
synonyms_subset_50k.sql

WAR file installation

The next step in getting EMERSE up and running after initial installation of the application server and configuration of the database with default settings is to deploy the EMERSE WAR file. To deploy the file, first rename the supplied war file to emerse.war, then copy the war file to the webapps directory of the Tomcat server. If Tomcat is using default settings, the WAR file will be exploded into a number of files in a directory called emerse. This directory includes all the files needed to run the application. You will need to make a change to the settings file to reflect the database that will be used. Inside the WEB-INF/classes directory of the exploded war file, you will find a file called project.properties. This file contains the settings to connect to the database.

Configuration

At this point the software part of the installation is complete, but some additional configuration may be required. Minimally, this would include the URL of the Oracle database. Information on configuring EMERSE to locate the Oracle Server is located in the configuration guide.

Data Source configuration

Confirm installation

At this point EMERSE should come up, using the URL:

http://hostname:8080/emerse

If it does not, the best place to troubleshoot any issues are in log files inside the Tomcat installation:

$TOMCAT_INSTALL_PATH/logs