EMERSE Installation Guide

Overview

This guide provides a general overview of the components needed to run EMERSE, how to install them, and verify that they are running. The guide covers step-by-step directions on how to install everything on a virtual machine, so these directions should be used as a high-level guide if EMERSE will be installed in a different type of environment.

For this guide we provide 4 already created Solr indexes to use as well as SQL scripts to not only create, but also populate, the database (all files available upon request). An actual production instance would require the local creation of the Solr indexes, locally populating the patient table, and additional localized configurations dependent on local document sources and metadata. Nevertheless, this guide should provide enough detail to understand how everything is installed and provides the ability to test the final installation to ensure that it is functional.

Supported Operating Systems

EMERSE is mainly tested on enterprise linux systems (RHEL and SUSE), so we recommend these or their open source equivalents. Windows based platforms should work as well, provided they are recent enough to have quality Java releases, as EMERSE and most of the components run within a Java Virtual Machine.

Planning

Personnel

To install EMERSE, someone with sys admin skills is desirable. This person should be able to install and configure software, edit database tables, etc. No actual programmer experience is needed for a default installation and configuration of EMERSE. Knowledge of servers, Java, Linux, and Apache software will be useful skills for this installation.

Timelines

The initial setup described in this guide can probably be done in a day or less. More time will be required for additional customizations and planning related to localized needs, document extraction and indexing, regulatory approval, etc.

Server and Storage

A Linux/Unix based server is suggested for installing the application server and for hosting the indexing services. No specific type of storage is required, but in general the server should be connected to the highest speed storage available. EMERSE performs a lot of disk reads to retrieve the documents, so read performance is important. We have conducted some experiments using SSD storage and have found it to provide nearly a two-fold increase in system performance for EMERSE, most noticeably in the area where all documents are retrieved and highlighted for a pre-defined set of patients.

Storage capacity is dependent on number of documents to be indexed. At Michigan Medicine 1.5TB is in use to host approximately 100 million documents in both TXT and HTML format on the production server. An additional 3 TB is available for index optimization, as optimization requires about 2x the size of the index. If your documents are heavily formatted (e.g., RTF instead of TXT), storage requirements may be higher.

Oracle Database Sizing

EMERSE doesn’t place great demand on the Oracle database, so a relatively small server can be used with 10-50 GB of storage allocated for user tablespaces. Currently at Michigan Medicine we are using only 8 GB of space for production EMERSE with 600+ users and 5 years of data.

Server setup

For a production install of EMERSE you will probably want to set up a Linux server based on whatever local options are available. For the purposes of this install guide, a Virtual Machine (VM) running Redhat Enterprise Linux (RHEL) will be used to demonstrate one way in which this can be done. When selecting the type of Linux to use, note that the Oracle-XE software is distributed as an RPM file. This means that systems such as Redhat or SUSE would good options, whereas Ubuntu would not be.

Note that we are only using a VM to demonstrate the installation process. We do not recommend using this VM approach for a production-level installation. RHEL can be used at no cost for development purposes. To get started, download the DVD iso file at:

https://developers.redhat.com/products/rhel/download/

To download the DVD iso file you will need to set up an account with Redhat and accept the terms and conditions.

Download and install the VirtualBox* application locally, which can be found at:

https://www.virtualbox.org

*(this was tested using VirtualBox version 5.2.30)

In the VirtualBox menu choose Machine → New

It will then prompt for a name, which can be anything you want. Also choose Red Hat Linux (64-bit), then press Continue. For example:

Figure 1. One of the setup screens in VirtualBox

On the Memory Size screen set the memory to 4096 MB, and press Continue.

On the Hard disk screen, choose the option Create a virtual hard disk now, and press Create.

When it asks for about Hard disk file type choose the VDI (VirtualBox Disk Image) option and press Continue.

On the Storage on physical hard disk screen, choose Dynamically allocated and press Continue.

On the File location and size screen, leave the text alone (it should say EMERSE installation demo) and change the size to 15.00 GB, then press Create.

Now, in the VirtualBox application, make sure the EMERSE installation demo VM is selected and press the green Start arrow.

Once the VM starts it will ask for the location of the Redhat Linux DVD iso file, so browse to the right location where it was downlaoded to select the file. You should see something like:

Figure 2. The screen in VirtualBox where you locate the Redhat Linux DVD iso file.

Once you have located the file, press Start. This will begin installation of the operating system.

The Red Hat software should start loading. You can use the up arrow to select Install Red Hat Enterprise Linux 7.4 (or whatever the current version is), then press Return.

Follow the prompt to choose the language, and then on the Installation Summary screen, do the following.

Click on System to select the Installation Destination. Then, under the Device Selection section, click the hard drive icon where it says "15 GiB…ATA VBOX HARDISK". You may have to click twice until is is blue in color and the checkbox appears over it, as shown below, then click Done.

selecting the hard drive for the linux setup

Figure 3. Part of the screen where the hard drive is selected for the Linux installation.

Then, click where it says Software Selection, and under the Base Environment column select Server with GUI, then press Done (none of the add-ons in the right column need to be selected).

There is no actual need to have a GUI, but it is easier to work with for these installation instructions.

You may have to wait a little bit for the system to check to make sure everything can be found, and once it is ready click on the Begin Installation button.

While the installation is underway, click on the Root Password section under User Setings and enter a password, such as demouser. Then, in the User Creation section, create a user for the system, such as:

Full name: emerse
User name: emerse

Make this user administrator
password: demouser

These usernames/passwords are not needed by the EMERSE application so they can be anything.

Once the installation is complete, click on the Reboot button. After the machine has rebooted, click on Licensing Information to accept the license, then click the Finish Configuration button. A few other setting options will then be offered, and then the system is ready to use.

It may also be necessary to turn on the network within the VM so that files can be downloaded the the VM. To do this, go to Applications → System Tools → Settings → Network and change the Wired option to On.

Component Installation

The remainder of this guide covers the process of installing and configuring the application server where the EMERSE application will be deployed. Specifics for installing each of these components follows.

For some of the commands shown in this guide, specific version numbers will be shown as examples. Depending on the software component, the version number you download and install may differ slightly from these examples. Make note of these potential differences when executing the instructions on the command line to ensure that they work successfully.

Also note that most of these directions are directed at a Linux installation, but we have provided some details for Windows for those interested in trying to install EMERSE on a Windows server.

For the purposes of this installation guide, all components will be stored in a directory called app. In reality it can be any directory, but for consistency they will go here in this guide. To make the directory, go to Application → Favorites → Terminal and in the Terminal type:

sudo mkdir /app

After the sudo commands you will likely have to re-enter your system password, demouser and press Return. Depending on your Linux privileges you may have to add sudo in front of most of the commands shown in this guide. If permissions are still causing a problem, it might be necessary to change file/directory ownership (for example, if directories end up being created under root). To do this, type the following in succession:
su - root
chown -R emerse /app
exit

In a few cases for the following directions it will be necessary to edit some text files. There are multiple Linux software programs to make these edits (e.g., vi, emacs, etc) but a simple one with this Virtual Machine installation is gedit, which can open the files in a regular graphical interface text editor. To use it, open the file from the command line by typing the following, where filename.txt is the name of the file that should be opened.

sudo gedit filename.txt

It may also be necessary to turn on the internet connection to the virtual machine so that software can be downloaded to it. To do this, go to the upper right corner of the screen and click on the battery/power icon, then click where it says Wired Off and choose Connect.

Figure 4. Turning on the internet for the virtual machine

Table 1. Required Software
Name	Version	Description	Download URL
Java JDK	11	Java Development Kit/SDK (JDK)	https://openjdk.java.net/projects/jdk/11/ (open source version) or https://www.oracle.com/technetwork/java/javase/downloads/jdk11-downloads-5066655.html (licensed version)
Apache Tomcat	9.0.7+	Java Servlet Engine	https://tomcat.apache.org/download-90.cgi
Apache Solr	7.3.1 (only, not higher)	Indexing/Information Retrieval System	http://lucene.apache.org/solr/
Apache Active MQ	5.15.x	A message broker	http://activemq.apache.org/download.html
Oracle Database	11g or higher	Express Edition is fine for testing or light production use. Setup scripts will be provided by the EMERSE team upon request	http://www.oracle.com/technetwork/database/database-technologies/express-edition/overview/index.html
EMERSE WAR file	latest version	EMERSE Web Archive File (WAR file) deployment and configuration	http://project-emerse.org/download.html

Table 2. Additional Software included within the EMERSE WAR file
Name	Version	Description
Hibernate	5.2.17	Provides database-agonstic data persistence
Lucene	7.3.1	Provides search capabilities by reading SOLR index files directly
Spring Security	5.0.5	Provides security services
Chartist.js	0.11.0	Used for making the demographic summary charts

Java

The first step in installing EMERSE is to download and install a Java Development Kit (JDK) on the server. Java 11 or higher is required because it was compiled under that version. However, as of this writing EMERSE is not using any components beyond what is in Java 8.

Operating Systems often include a Java runtime, and is available on the "Path" variable. All of the remaining components that will be installed are dependent on a path to the location of the installation of this java JDK installation, so note the location of the path to the JDK installation—it will be needed in subsequent installation steps

Linux: After downloading the Java appropriate for the Linux server, move it to the /app/ directory and unpack the file. Assuming that the file is downloaded to /home/emerse/Downloads/ then to move the file (note that the path names below refer to Java 8, but it should be similar for Java 11):

mv /home/emerse/Downloads/jdk-8u211-linux-x64.tar.gz /app/

Unpack it:

tar -xvf jdk-8u211-linux-x64.tar.gz

rpm -ivh jdk-8u211-linux-x64.rpm

Windows: After downloading, run the executable. If the target workstation has an existing Java installation that you wish to maintain, unselect the “public JRE” option in the installer as shown below.

Figure 5. Setting up Java for Windows

Confirm Java installation

Confirm installation of java by typing into Terminal:

java -version

If there are multiple versions of Java installed, and the newly installed version is not set as the default, then confirm the specific installation using:

path_to_java/java -version

For example:

/app/jdk1.8.0_211/bin/java -version

You should see a message that says something like:

java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

As noted earlier, the actual version should now be Java 11. The information above is only provided as an example.

Apache Tomcat

EMERSE is packaged as a Java based WAR file that will be deployed to the Tomcat Servlet engine. Tomcat depends on a Java JDK. Tomcat version 9.0.7+ is required. The Tomcat download page lists a number of components available for download. Only the "Core" software is required. See Required Software section.

Linux:

From the binary distributions listed on the page, choose the "Core" tar file (for example, the tar.gz file). Move the tar file from the Downloads directory to the desired directory for installation.

mv /home/emerse/Downloads/apache-tomcat-9.0.21.tar.gz /app/

Extract the tar file using

tar -zxvf apache-tomcat-9.0.21.tar.gz

Ensure Tomcat uses the desired Java runtime (it may not be necessary to do anything if the environmental variables are already set to point to the correct version of Java). To specify the correct Java installation, edit the startup script (startup.sh) found at /path/to/tomcat/bin (in this specific example for the installation guide it is in /app/apache-tomcat-9.0.21/bin) to point to Java installation directory by doing the following:

To edit:

gedit /app/apache-tomcat-9.0.21/bin/startup.sh

Add this line:

export JAVA_HOME=/path/to/jdk_install

For example,

# ----------------------------------
# Start Script for the CATALINA Server
# ----------------------------------

export JAVA_HOME=/app/jdk1.8.0_211/

It has also been observed that EMERSE requires a higher limit on the number of open files than what is typically the default on some Linux systems. This can be set using the ulimit command as below prior to starting Tomcat, or added directly to the startup script startup.sh. For example,

# ----------------------------------
# Start Script for the CATALINA Server
# ----------------------------------

export JAVA_HOME=/app/jdk1.8.0_211/

ulimit -v unlimited

When the open file limit is set too low, an exception like the following will be thrown when Solr or EMERSE are started.

Caused by: java.io.IOException: Map failed
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
        at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
        at org.apache.lucene.store.MMapDirectory$MMapIndexInput.<init>(MMapDirectory.java:228)
        at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
        at org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.<init>(CompressingTermVectorsReader.java:118)
        at org.apache.lucene.codecs.compressing.CompressingTermVectorsFormat.vectorsReader(CompressingTermVectorsFormat.java:85)
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:132)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:96)
        at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:63)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:53)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)

Windows:

Download the zip file from binary distributions under the "Core" section. Unzip this file in a desired directory. This will become the Tomcat installed directory. Edit the startup.bat file found under bin to point to the directory where the JDK was installed.

set JAVA_HOME=C:\path_to\jdk_install

Changing the port

Since Oracle will be using port 8080, change the Tomcat port from its default (8080) to 8090. Go to the tomcat/conf folder, and edit the server.xml file. to change 8080 to 8090. The server.xml file should be located here:

/app/apache-tomcat-9.0.21/conf/server.xml

To edit the file:

gedit /app/apache-tomcat-9.0.21/conf/server.xml

The part that needs to be changed should look something like:

<Connector port="8090" protocol="HTTP/1.1"
           connectionTimout="20000"
           redirectPort="8443" />

Starting and Stopping the Tomcat Server

To start the server use:

/path/to/tomcat/bin/startup.sh

To stop the server use:

/path/to/tomcat/bin/shutdown.sh

For this installation guide it would be:

/app/apache-tomcat-9.0.21/bin/startup.sh

/app/apache-tomcat-9.0.21/bin/shutdown.sh

You may need to use ./startup.sh and ./shutdown.sh

Confirm Tomcat installation

Point a browser to http://hostname:8090/, or if locally installed point it to http://localhost:8090/. A Tomcat welcome screen should appear.

Apache Solr

EMERSE leverages Apache Solr for searching documents. See Required Software section for the specific version to use. The Solr version should be the exact version specified. Otherwise, the sample indexes provided may not be usable, and EMERSE may not function.

After downlodading/moving the file to the desired directory, unzip/untar the file similar to before. For example:

mv /home/emerse/Downloads/solr-7.3.1.tgz /app/

tar -zxvf solr-7.3.1.tgz

There is no need to download the source (src) files for Solr, so avoid downloading the files with src in the filenames.

Make sure that Solr is using the correct JDK. It may not be necessary to do anything if the environmental variables are already set to point to the correct version of Java. Nevertheless, you can specify the exact path by editing the file:

gedit /app/solr-7.3.1/bin/solr.in.sh

Within solr.in.sh look for the line:

#SOLR_JAVA_HOME=""

And change the line by removing the comment (#) character and specifying the exact path to the JDK, for example:

SOLR_JAVA_HOME="/app/jdk1.8.0_211/"

Start and Stop Solr

To start the server use:

/app/solr-7.3.1/bin/solr start

To stop the server use:

/app/solr-7.3.1/bin/solr stop

In some cases you may need to use ./solr start instead of solr start. Or you may need to use bin/solr start.

Confirm Solr installation

Using a browser, navigate to http://hostname:8983/solr (such as http://localhost:8983) to confirm Solr installation.

Apache ActiveMQ

EMERSE requires ActiveMQ to enable background processing of a number of internal tasks. Installation steps are very similar to Apache Tomcat. First, download the recommended file. See Required Software section.

Move the file to the desired directory:

mv /home/emerse/Downloads/apache-activemq-5.15.9-bin.tar.gz /app/

Unpack the ActiveMQ distribution:

tar -xvf apache-activemq-5.15.9-bin.tar.gz

Ensure ActiveMQ uses the desired Java runtime (it may not be necessary to do anything if the environmental variables are already set to point to the correct version of Java). To specify the JDK, do the following:

Linux:

Modify the file activemq:

gedit /app/apache-activemq-5.15.9/bin/activemq

Add the line (near the beginning of the file, perhaps under the license):

export JAVA_HOME=/app/jdk1.8.0_211/

Windows:

Modify the file activemq.bat located in ACTIVEMQ_HOME

SET JAVA_HOME=c:\path_to\java_install

Starting and Stopping ActiveMQ

Startup:

./activemq start

/app/apache-activemq-5.15.9/bin/activemq start

Shutdown:

./activemq stop

/app/apache-activemq-5.15.9/bin/activemq stop

Confirm ActiveMQ installation

The default port for the ActiveMQ GUI is 8161. Confirm it is up and running by pointing a web browser to http://hostname:8161/admin (for example, http://localhost:8161/admin). The default username/password is admin/admin.

A note about the embedded ActiveMQ

EMERSE comes with an embedded version of ActiveMQ that is off by default. We do not recommend that you use the embedded version, since it was there primarily to ease some of the internal development. Instead, use a separate installation as described above. If you did want to turn the embedded version on for some reason, that can be done through the Spring Profiles. Example:

export CATALINA_OPTS="-Dspring.profiles.active=ldap,activemq"

Oracle Database

An Oracle database server is needed for EMERSE. For the purposes of getting started, Oracle makes available a free “Express Edition” (Oracle-XE) that is fully functional. This free edition of Oracle supports 1 core and up to 10 GB of disk space, which should be enough to support a few users in a demonstration version, or even light use in a production setting. Oracle express can be downloaded here: Oracle Express-Edition

Any of the available Oracle Database Editions should work with EMERSE. Michigan Medicine uses the Enterprise version in the production environment, but EMERSE does not depend on any Oracle components that would require additional licensing (e.g., replication, Oracle RAC, advanced compression, etc.)

EMERSE was initially developed at Michigan Medicine for internal use, and Oracle was chosen because it was already licensed by the University. It is the only non-open source component used by EMERSE. It should theoretically be possible to change the database to an open source one, or to another database such as SQL Server, but we do not recommend it at this time due to the effort it would take. Further, this approach is currently untested and unsupported by the EMERSE team.

Instructions for installing Oracle-XE can be found here:

https://docs.oracle.com/cd/E17781_01/install.112/e18802/toc.htm#XEINL101

Download the Linux Oracle Express edition, for example "Oracle Database Express Edition 11g Release 2 for Linux x64". This will likely be found in the "Prior Release Archive".

You will need to accept the License agreement and have an account (free) set up with Oracle to download the file.

Move the file to the desired directory:

mv /home/emerse/Downloads/oracle-xe-11.2.0-1.0.x86_64.rpm.zip /app/

unzip the file

unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip

It will unzip into a directory called Disk1.

Run the executable to install the database:

rpm -ivh /app/Disk1/oracle-xe-11.2.0-1.0.x86_64.rpm

If you encounter an error installing Oracle-XE related to Swap Space (e.g, "The system does not meet the minimum requirements for swap space. Based on the amount of physical memory available, Oracle Database 11g Express Edition requires 2048 MB of swap space…") please see the directions below, Creating a swap file.

After the installation is complete, follow the directions to configure the application by issuing the following command:

/etc/init.d/oracle-xe configure

Accept all of the defaults. For the password anything can be entered, but for this example it may be easiest to use demouser

Confirm Oracle-XE installation

Point your browser to:

http://localhost:8080/apex/

http://localhost:8080/apex/f?p=4950

Creating a swap file

Ignore this section if you do not encounter an error related to installing Oracle-XE

If you encounter an error installing Oracle-XE due to a lack of swap space (see warning, above), you can follow these Redhat Linux directions to increase the swap space on your machine, focusing specifically on the section called"Creating a Swap File". Those directions are briefly summarized below.

In Terminal in the type the following commands:

dd if=/dev/zero of=/swapfile bs=1024 count=2560000

chmod 0600 /swapfile

mkswap /swapfile

swapon /swapfile

The additional swap memory should now be available. You can confirm this by typing in the Terminal:

free

Oracle SQL Developer

While not necessary for EMERSE, it may be useful to install the Oracle SQL Developer tool, or another database tool of your liking (e.g., DbVisulizer). The following directions are for installing the Oracle SQL Developer tool. This tool requires the Java JDK so make sure Java is already installed. The SQL Developer software can be downloaded at:

http://www.oracle.com/technetwork/developer-tools/sql-developer/overview/index.html

Accept the license and download the Linux RPM. You can run the RPM installer from anywhere, since by default the software will be installed in the /opt/sqldeveloper/ directory.

rpm -Uhv sqldeveloper-19.1.0.094.2042.noarch.rpm

Run the setup script:

/opt/sqldeveloper/sqldeveloper.sh

When prompted, enter the full path name of the Java JDK:

/app/jdk1.8.0_211/

The SQL Developer App should then launch automatically. The installation should also create a shortcut to the app that can be reached via Applications → Programming → SQL Developer.

If launching does not work using the GUI, then go to Terminal and type:

cd /opt/sqldeveloper/

sudo ./sqldeveloper.sh

EMERSE Install and Initial Configuration

Solr indexes

We provide 4 indexes with the default distribution. Three of the indexes are populated with sample data to help with testing the initial setup and configuration, and the fourth is an index containing a list of words used for spell-checking, described below.

The four indexes are:

patient
patient-slave
documents
dictionary

Do not change the names of these indexes.

Two of the indexes (patient and patient-slave) store data about the patients (not real patients in the files that we distribute), and are meant to be populated from the PATIENT table within the Oracle database. These indexes are required by EMERSE and the overall structure/definition of them should not be changed even for your own localized installation. The patient demographic data stored within them will automatically be replaced from the database tables when EMERSE is running. The two patient indexes are used to rapidly summarize details used for graphing the demographics.

A third index is called documents and holds PubMed abstracts (containing no protected health information) as placeholders for sample documents. For your production system you can optionally rename this to something that is more suitable (you would then have to update the name in the Oracle table called SOLR_INDEX, described in the Data Guide). Additionally, other aspects of the configuration for this document index will have to be changed from the included example to match your local institutional needs (e.g., the metadata types included for each document).

The fourth index is called dictionary which holds words that are used by EMERSE for spell checking of terms entered by users. This dictionary index is recommended but is not required. Additionally, no changes should need to be made to this dictionary and it can be used as-is. It is also worth pointing out that the dictionary is in an older Lucene format and not the newer Solr format. This should have no practical implications for use, except that it won’t show up in the standard Solr user interface even though the EMERSE application itself can access it.

Contact the EMERSE team for the download links to these four sample indexes.

Once the four indexes have been downloaded, move them to the default directory within Solr: /app/solor-7.3.1/server/solr/

mv /home/emerse/Downloads/indexes/patient.zip /app/solr-7.3.1/server/solr/

mv /home/emerse/Downloads/indexes/patient-slave.zip /app/solr-7.3.1/server/solr/

mv /home/emerse/Downloads/indexes/documents.zip /app/solr-7.3.1/server/solr/

mv /home/emerse/Downloads/indexes/dictionary.zip /app/solr-7.3.1/server/solr/

Change to the /app/solr-7.3.1/server/solr/ directory and unzip the four files:

unzip patient.zip

unzip patient-slave.zip

unzip documents.zip

unzip dictionary.zip

For the demonstration installation the Solr indexes are placed in the default Solr directory. However, for a full production installation we would recommend that these indexes are located outside the Solr installation directory, perhaps on externally connected storage. To specify a directory outside of the default location, open the solr.in.sh file (located in /solr-7.3.1/bin/) and edit the SOLR_HOME property. For example:
SOLR_HOME=/app/data/SOLR_DATA_DIR

Also move the solr.xml file to where the indexes are. There may already be one there form the initial Solr installation, so the original one can be removed or just renamed if you want to keep it around.

mv /home/emerse/Downloads/indexes/solr.xml /app/solr-7.3.1/server/solr/

Verify the indexes

If Solr is already running, stop it first:

/app/solr-7.3.1/bin/solr stop

Then restart it:

/app/solr-7.3.1/bin/solr start

There may be times when you have several sets of indexes in various locations (e.g., for testing). In such a case it is possible to start up Solr and have it point to a set of indexes not located in the default location (or in the location specified by SOLR_HOME). To this this when starting up Solr, use the -s flag. For example, if the new location is /other_dir/data/SOLR_DATA_DIR then you would use:
/app/solr-7.3.1/bin/solr start -s /other_dir/data/SOLR_DATA_DIR

Visit http://localhost:8983

Check to see if the three Solr indexes (patient, patient-slave, and documents) are listed under the Core Selector button:

Screen shot of the Core Selector in Solr

Figure 6. The Core Selector option in Solr

The dictionary index will not appear here since it is in an older Lucene format, not the Solr format.

Select each core to verify that the proper counts are there. For the patient and patient-slave cores, there should be 1750 "documents" (each representing a patient), and for the documents core there should be a little over 125,000 documents.

Figure 7. Verify the Patient index using the Core Selector option

Figure 8. Verify the documents index using the Core Selector option

The dictionary index is currently in the Lucene format and will not show up in the Solr interface.

Database initialization

Provided with the EMERSE distribution are a set of files, each containing SQL statements, that create all needed database objects and sample data that will allow the EMERSE application to startup with a default set of database objects, and sample data in the patients, research studies, synonyms and tables. These scripts should be run as the user and schema setup for the EMERSE application (this will be set by each implementing site), and not a system or sysdba user. We recommend a schema named emerse. The account doesn’t require a DBA role but needs to be able to create database objects such as tables, indexes, sequences and views.

To set up the user/schema for EMERSE, do the following.

Launch SQL Developer.

In SQL Developer click on the green plus-sign icon to make a New Connection… or go to File → New… Database Connection.

Give any name to the Connection name, such as Administrator Account. The username will be system and the password will be the same password used when initially setting up Oracle. For the directions in the guide, the password demouser was used. Press Test to test the connection (it should say Status: Success if it worked). Save the Connection, and then press Connect.

Figure 9. Setting up the Administrator connection in Oracle, which will then be used to create the emerse account

Now create the actual EMERSE application user for Oracle (this is for the EMERSE application, not a user of EMERSE). In the Query Builder window enter:

create user emerse identified by demouser default tablespace USERS quota unlimited on users;
grant create session to emerse;
grant create table to emerse;
grant create sequence to emerse;
grant create procedure to emerse;

And then click on the Run Script icon.

If you enter the wrong password to set up the emerse account you can change it by issuing the following command:

alter user emerse identified by demouser

Create another connection: File → New… Database Connection.

The connection name can be anything, such as EMERSE connection. For the username and password enter the information used to create the account based on the sql above. In this case, username is emerse and password is demouser.

Figure 10. Setting up the emerse application connection in Oracle

Test the connection by pressing Test, and you should see Status: Success. Check the box to Save Password and then press the Save button to save this connection. Then press Connect.

At this point you should be ready to run the SQL script that sets up the database for EMERSE and loads in sample data (but not the notes since the notes do not get loaded into the database).

Contact the EMERSE team for the download link to the SQL setup script.

Unzip the file, if necessary (the actual filename may differ between versions):

unzip /home/emerse/Downloads/sql_script_compressed.zip

Execute the file in a SQL tool. For example, to run the script in SQL Developer go to File - > Open… and then select the SQL file. After loading the file it should open in a Worksheet tab. Then click on the Run Script icon and wait for the script to finish running. When running the Script it may show a prompt for Select Connection, in which case choose EMERSE connection and then press OK.

The script has about 140,000 lines so it will take a while to run.

EMERSE Configuration

EMERSE is configured by primarily one file, emerse.properties. This file tells EMERSE how to connect to Oracle, ActiveMQ, Solr, and LDAP (if using), where the lucene index files are, and additional internal configuration or presentation information such as contact email.

Contact the EMERSE team for the download link to the properties file.

After downloading, move the properties to the home directory of the emerse user using the following commands:

mv /home/emerse/Downloads/emerse.properties /home/emerse/

You will need to make some changes to the emerse.properties file to reflect your installation properties such as the URL of the Oracle database if it is running on a separate server. Information on configuring EMERSE application properties is located in the Configuration Guide. If you have followed our directions for the example setup/installation you likely will not need to change any database connection information.

If you cannot store the emerse.properties file in your home directory, you must tell EMERSE where it by setting the system property emerse.properties.filepath. You can do that by creating the file setenv.sh in the bin directory of your tomcat installation.modifying the tomcat server JVM settings. (Use setenv.bat if on windows.)

/app/apache-tomcat-9.0.21/bin/setenv.sh

CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/emerse.properties"

Log Configuration

Out of the box, EMERSE will log to the /app/apache/tomcat-9.0.21/log/catalina.log (or wherever EMERSE is installed). If you want to make changes to how much is logged, or where the log goes, you can specify the log4j2 configuration file with the system property log4j.configurationFile=/path/to/file.xml, which can be added to the /app/apache/tomcat-9.0.21/bin/setenv.sh script like so:

/app/apache-tomcat-9.0.21/bin/setenv.sh

CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/emerse.properties"
CATALINA_OPTS="$CATALINA_OPTS -Dlog4j.configurationFile=/path/to/somewhere/log4j2.xml"

You don’t need to set the emerse.properties.filepath if you want to change the log settings; the above snippet just shows how you can stack the changes.

The current log file can be found in /app/apache/tomcat-9.0.21/webapps/emerse/classes/WEB-INF/classes/log4j2.xml, or you can contact us to get the latest copy.

WAR file installation

The next step in getting EMERSE up and running after initial installation of the application server and configuration of the database with default settings is to deploy the EMERSE WAR file.

Contact the EMERSE team for the download link to the EMERSE WAR file.

After downloading, deploy the file. First, rename the supplied war file to emerse.war, then copy the war file to the webapps directory of the Tomcat server. This can be done in one step:

mv /home/emerse/Downloads/emerse2-4.4.war /app/apache-tomcat-9.0.12/webapps/emerse.war

If Tomcat cat is not running, start it:

/app/apache-tomcat-9.0.12/bin/startup.sh

If Tomcat is using default settings, the WAR file will be exploded into a number of files in a directory called emerse. This directory includes all the files needed to run the application.

Confirm EMERSE installation

First, it would be a good idea to make sure all of the components (Tomcat, Solr, Active MQ) are restarted. Oracle should have been setup to start when the system started up so likely does not need to be launched separately.

To stop the main components:

/app/apache-tomcat-9.0.12/bin/shutdown.sh

/app/solr-7.3.1/bin/solr stop

/app/apache-activemq-5.15.6/bin/activemq stop

To re-start the main components:

/app/apache-tomcat-9.0.12/bin/startup.sh

/app/solr-7.3.1/bin/solr start

/app/apache-activemq-5.15.6/bin/activemq start

At this point EMERSE should come up, using the URL:

http://hostname:8090/emerse

For example:

http://localhost:8090/emerse

If the EMERSE application does not come up, the best place to troubleshoot any issues are in log files inside the Tomcat installation:

$TOMCAT_INSTALL_PATH/logs

Application Diagnostics Checklist

After all the required software is installed, one can verify the component wiring by running a diagnosis checklist through the EMERSE app. The check list page can be viewed using the URL:

http(s)://hostname<:port if applicable>/emerse/diagnostics.html

For example,

http://localhost:8090/emerse/diagnostics.html

The following components are verified on a high level:

Database: Connection to the database is established and a query on sysdate is run.
Lucene: Checks the Lucene path in the configuration file and establishes a connection.
SOLR: Establishes a connection to Solr and runs a simple query.
Active MQ: Establishes a connection to Active MQ configured in the property files.
LDAP: Checks if the LDAP profile is provided and, if true, establishes a connection to the LDAP tree.
SOLR_DATE_FIELD: Checks whether documents in Solr have a populated field for CLINICAL_DATE, as mapped in the the DOC_FIELD_EMR_INTENT table.
MRN_IN_SOLR: Checks whether the MRNs in the PATIENT table exist in the Solr patient index. The intention is to catch a mis-match between the configured field and the populated one.
SOLR_SOURCE_FIELD: Makes sure documents in Solr have a SOURCE field, and that all values of that field are mapped in the DOCUMENT_SOURCE table.
SOLR_SCHEMA: Checks to make sure the Solr schema XML files are not changed in ways that the software doesn’t expect.
LUCENE_SOLR: Checks a subset of MRNs associated with documents to make sure that the Lucene and Solr indexes match up.

Figure 11. Screen shot from the EMERSE Diagnostics page.

The Diagnostics page can also show some errors extracted from the logs. However, if the hardened profile is active, the diagnostics page will not be able to show these error messages and it will be necessary to view it in the protected Admin page instead. This Admin page is described in the Administrator Guide.

Further testing

To test the installation a bit futher, login and try a search:

The initial username and password is provided just to help you get started. To ensure security, it should be changed once you get everything set up.

Choose any button or type something in to get past the Attestation page.
Enter "chest pain" in the Quick Terms Box and press Find Patients.
Click on the button Move patients to Temporary Patient List
Click the Highlight Documents button
Click on cell on the Overview page to see a document with the term "chest pain" highlighted on the following Summaries page.
Click on a Summary to open up an actual document with the term "chest pain" highlighted.
Check to see if the spell checker is working (which uses the dictionary index):
- Click on the Terms button and then on the Term Bundles button.
- Make a new Term Bundle by clicking on the New Term Bundle button.
- Provide a name and description for the Bundle, then press the Save button.
- Start typing a new term that is misspelled such as miocardial. You should see something like:
  Did you mean "myocardial"

At this point, if everything worked, it’s is a very good indication that EMERSE is running well.

Setting startup scripts

While not necessary, it can be helpful to have the system startup every time the server is started or rebooted, rather than entering the startup commands for each component every time. To do this, create a shell script that will contain the commands to launch the three components (Oracle should already be set to launch automatically with its installation). This can be created in the /Documents/ directory, or elsewhere:

cd /home/emerse/Documents/

Create the file:

gedit emerse_startup_script.sh

Enter the following into the script (specific path names may differ between installations):

#!/bin/sh

/app/apache-activemq-5.15.6/bin/activemq start

/app/apache-tomcat-9.0.12/bin/startup.sh

/app/solr-7.3.1/bin/solr start

While not recommended, if you choose to run ActiveMQ in embedded mode, you should not start ActiveMQ in this start up script since it will conflict with the embedded ActiveMQ started inside Tomact, and EMERSE will fail to load. See details above.

Make the script executable:

chmod +x emerse_startup_script.sh

Now add that startup script to a crontab. To create the file, type:

crontab -e

If you have followed the directions outlined in this document and installed Redhat Linux, it may open up the file using the vi editor. If so, type i for insert, then type one line as follows:

@reboot /home/emerse/Documents/emerse_startup_script.sh

Then, to save the file, type esc [escape] and then :wq to close and save the file. To verify that the crontab is there, type

crontab -l

It should show the line above (beginning with @reboot). At this point, restarting the server should allow all of the required components to startup automatically.

Next Steps

Once an initial implementation is complete, local customization work will need to be done including identifying document sources, proper indexing with metadata in Solr, and other configurations related to document display in EMERSE. These are detailed in other guides, such as the EMERSE Configuration and Optimization Guide and the EMERSE Data Guide. However, a good place to go to next is the EMERSE Setup Guide which will walk you through additional steps in setting up the system with your own data.

Useful Post-Implementation Details

The solrconfig.xml file should be located on the server. It is located inside SOLR_HOME/documents/conf/ (where "documents" is the name of the Solr core/collection).
For server configuration data (e.g., hostname, authentication, etc) there are fields that hold their values in the SolrDocumentService. Spring sets their values when the server comes up from external property files.
For changes made to the Solr schema.xml you will need to make corresponding changes in the EMERSE database, described in the Data Guide.
When interacting with the Solr API you may need to use a URL that points to the specific index/collection, such as https://localhost:8983/solr/documents/. However, to get anything back you will need to specify an action such as
```
https://localhost:8983/solr/documents/select?q=RPT_TEXT:asthma
```
As you are updating the underlying data, note that the patient counts shown in the user interface do not come from the patient database table, but rather is derived from a unique count of the medical record numbers (MRNs) in the indexed documents. This was intentional because it might be the case that a patient in the database has no documents, and thus would return no results within EMERSE. The actual source of the patient count that the GUI uses is in the SOLR_INDEX database table (the patient_count column). This column is updated by the app in the background using an async batch process that runs occasionally to update the patient count by retrieving unique MRN’s from the Solr index called documents. An update of the counts for the UI can be forced by the "System Synchronization" feature found within the EMERSE admin application ( host:port/emerse/admin2/).