EMERSE Configuration and Optimization Guide

Overview

This guide describes modifications that can be made that impact the behavior of EMERSE, tuning of the JVM associated with the EMERSE application and Solr, and some security hardening procedures.

Configuration

Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are access to database, the URL of the Solr instance with indexed documents, and location of the Solr data files.

Loading Configuration

EMERSE loads configuration data mainly via property file configuration provided by the Spring Framework. Multiple paths can be defined for locating a file named project.properties, and Spring will use the first one it finds.

Normally this file is located inside the EMERSE war file. Once deployed, the file is located at TOMCAT_HOME/webapps/EMERSE/WEB-INF/classes. Using this location is usually not desirable in production environments, as deploying new versions of the code will cause this file to be replaced. Alternatively, the file can be placed in the Tomcat users’ home directory (that tomcat runs as) in a file called emerse.properties.

Version number

The EMERSE version number is distributed as part of the WAR file that gets deployed to the server. This is not really a configurable option, but it is being mentioned here for the sake of completeness. The number is set in the Project Object Model (POM) files that can be found in the META-INF directory within the Tomcat webapps folder.

Profiles

EMERSE uses Spring profiles to enable optional services within the application. These profiles are "activated" by supplying a parameter to the EMERSE JVM at startup. Currently this is only useful to activate LDAP related spring security features. If the profile "ldap" is enabled, EMERSE will use enable LDAP for user authentication. Activation of profiles is added by adding a JVM system property at startup. In the example below, the tomcat startup file is modified to enable LDAP service via the system property. You should not have to change or swap XML files to enable LDAP. Toggling between the XML files is enabled at runtime via a Spring profile being activated.

Additional information can be found in the LDAP section below.

export CATALINA_OPTS="-Dspring.profiles.active=ldap"

Application Settings

The following settings can be modified in the project.properties file, or emerse.properties as discussed previously:

Database

ds.username	The username for the database account for the main EMERSE database.
ds.password	The password for the database account. This can be encrypted with Jasypt if desired.
ds.url	The JDBC url to connect to the EMERSE database. Example: ds.url=jdbc:oracle:thin:@databasehost:port:sid
ds.maxPoolSize	Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted.

Solr/Lucene

lucene.indexPath

The path on the local file system to the Solr patient indexes as well as the document index. The patient and patient-slave Solr directories/indexes would reside inside of this path, as would the actual index containing the documents. Note that this path name is to the parent directory holding all of the indexes, not to a specific index itself.

Examples:

lucene.indexPath=/app/data/indexes/

Windows:

lucene.indexPath=file://c:/somewhere

Solr.serviceURL

The URL the application will use to access the Solr instance. Mainly used by the All patient search feature.

Example:

Solr.serviceURL=http://localhost:8983/Solr

Solr.unifiedCollection

The path that is appended to the solr serviceURL to find the collection of patient documents.

Examples:

Solr.unifiedCollection=documents

Solr.unifiedCollection=unified

Solr.patientSearchCollection

The path appended to the solr URL where the application will find the patient index. By default this is patient-slave, a replicated copy of the patient index. This is so that the slave copy can be updated at any time. EMERSE has a background task that automatically updates these from the patient table, so no real configuration needs to be done. These two Solr cores (patient and patient-slave) were created to support the All Patient search feature in order to rapidly summarize the demographics of a search result. The EMERSE code will automatically re-index these two Solr indexes from the source (emerse.patient table) daily, but it is also possible to force re-indexing from the patient tables via http calls to the application server (see the Troubleshooting Guide for details on how to do this).

It is worth noting that the Solr patient index is replicated within EMERSE, with one serving as a backup for the other if the main one ever became corrupted. The larger, production Solr document index is not replicated in this way, mainly because it is so large. In other words, this type of 'slave' index is good practice, but may not be practical for the larger indexes.

Example:

Solr.patientSearchCollection=patient-slave

Solr.patientUpdateCollection

The path appended to the solr URL to the source patient collection for updating.

Example:

Solr.patientUpdateCollection=patient

Solr.username

The username used when making connection to Solr when configured with basic auth.

Solr.password

The password used when making connection to Solr when configured with basic auth.

ActiveMQ

In the default activemq configuration, its message broker doesn’t require authentication, and does not use SSL. In this case, username, password are not required. If they are provided in the EMERSE configuration it will have no effect unless authentication is enabled in ActiveMQ. Please see ActiveMQ documentation for how to set up authentication. While securing the connections with SSL is good practice, EMERSE does not transmit any protected health information (PHI) via ActiveMQ. For details, see the section on 'Security Hardening' later in this document. [see: Active MQ]

activemq_connfactory.username	The username used to access activeMQ broker
activemq_connfactory.password	The password used to access activeMQ broker
activemq_connfactory.host	URL the application will use to access the activeMQ broker Examples: activemq_connfactory.host=tcp://localhost:61616 With SSL configured: activemq_connfactory.host=ssl://hostname:61616
queue.result	Name of the queue that will be used by EMERSE application to process search results. The queue names can really be anything, but they need to be unique. ActiveMQ will create them on the fly if they do not exist. Example: queue.result=CRHIX_EMERSE_RESULT_SUMMARY_IN
queue.reply	Name of the queue that receives search results Example: queue.reply=CRHIX_EMERSE_REPLY_TEMP

Patient Lists and MRN validation

Various options are available for validating user-entered patient medical record numbers (MRNs) that are stored in the Patient database table. Note that String comparisons are performed, so it is important to make sure that any formatting or cleaning of the user-entered MRNs results in exact matches with the format of the MRNs in the table.

patientList.MRNLimit	The number of patients that can be added to a patient list. We set it to 100,000 by default.
patientList.fileSizeLimitMB	The size limit in MB of incoming CSV uploads. We currently use 10 MB.
patientList.maxErrors	The maximum number of errors that are shown when user uploads or inserts new patients to a patient list.
patientList.deduplicateLists	Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it `true` is ideal.
patientList.pullInvalidMRNs	If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed.
patientList.MRNFormat	A format string that can be optionally provided for the patient’s MRN. This uses a Java String format syntax. For MRNs 9 digits long, one can use: patientList.MRNFormat=%9s
patientList.MRNFormatUseZeroPad	(Requires use of MRN format) Optionally pad the MRN with leading zeroes. If this is set to `true` the numbers will be padded with leading zeroes using the `patientList.MRNFormat` described abonve.
patientList.stripRegexOnMRNInsert	Remove matches of the regular expression when MRNs are uploaded/entered by users. This is run after whitespace is removed from the MRN. (For instance, if the regular expression is `^0+` then `000 0045 67` would become `4567`.) This setting is important when validating MRNs against the `Patient` table. Example values: `^0+\|[-]` - remove leading zeros and dashes patientList.stripRegexOnMRNInsert=^0+\|[-] `[-#]` - remove dashes and pound signs patientList.stripRegexOnMRNInsert=[-#] An empty value keeps the exact value (spaces would still be removed) patientList.stripRegexOnMRNInsert=

LDAP

The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.

ldap.host	Example: ldap.host=ldaps://hostname:636
ldap.userDn	A distinguished name of a user that will authenticate to the LDAP directory Example: ldap.userDn=cn=account,ou=people,dc=med,dc=umich,dc=edu
ldap.password	Password of the user account
ldap.uidPath	Suffix path to search for the user in LDAP directory Example: dc=med,dc=umich,dc=edu

Attestation

attestation.allowOtherAttestationReasons	If set to `true`, Quick Buttons with standard reasons are displayed to the user for selection
attestation.allowFreeTextAttestation	If set to `true`, users can enter a free text description describing their purpose of using EMERSE
attestation.showPriorAttestations	If set to `true`, the Attestation screen will display in the table prior free text attestation reasons used by the user.

Batch Updating Begin/End Dates

Our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).

To circumvent this potential problem EMERSE provides two options for controlling the dates displayed to users. In general, background tasks that update the Lucene indexes would also update the date ranges for documents when all dates are selected (that is, when no date range is entered into the date range boxes in the user interface). This is so that as the index updates every night a new ‘end date’ can be shown for the date range of the documents.

This auto-update setting can be over-ridden for the start and stop dates, independently, using the properties described below. Changing this setting can, for example, allow one to have a more sensible document start date that more closely matches when the documents were being collected (without having to actually change the dates of all of the incorrect documents).

Note that changing these dates only affects the dates displayed in the date range section at the top of the screen. The actual documents will still show their original dates, and the searches will still take place based on the actual dates of the documents even if they are incorrect. Thus, if actual dates are entered by users into the stop/date boxes, those dates will be used. If no dates are entered by users (thus, searching ‘All dates’) then the system will search across all of the documents regardless of the over-ride date shown in the UI and regardless of the document dates in the system.

batch.updateIndexMinDateFromSolrIndex	If set to true, min date of documents is updated from Solr every night, which would be updated in the `solr_index` table.
batch.updateIndexMaxDateFromSolrIndex	If set to true, max date of documents is updated from Solr every night, which would be updated in the `solr_index` table.

If one or both of these properties is set to false, then the date entered in the solr_index table is what will be used for display purposes. For more information on this table see the section on the solr_index in the Data Guide.

All Patient Search

search.allPatientFragmentLimit	Number of fragments/text snippets to display for preview when using All Patient Search Example: search.allPatientFragmentLimit=100
search.facetDateRangeInterval	The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting. Example: search.facetDateRangeInterval=10

Search

search.cacheInMB	size of cache to use for efficient searches. Larger numbers are better, but you just need to make sure you have enough memory available. Example: search.cacheInMB=128
search.cachedQueryCount	Each query term can be a query, so search bundles with lot of terms need a large count for this to work efficiently. We have found 1024 to be reasonable, so it likely does not need to be changed. Example: search.cachedQueryCount=1024

Email

This is the email address in which a user will send any feedback or issues related to EMERSE. These feedback options are available within the application. The email address is where all of this feedback will be sent. This email is display to users in the About window and in the Feedback page. The system will provide a default subject line which is currently EMERSE feedback and is not currently configurable.

Example:

email=emerse-email-support@med.umich.edu

Timeouts

A user’s session is configured to be timed out due to inactivity. If the app is idle and does not encounter a mouse click, mouse move, mouse scroll or a keypress activity for a configured timeout setting, the application logs the user out of their session and the login page is presented. The following properties can be added to the project.properties to override the defaults.

This timeout feature does not apply to the Attestation screen, because at this point no Protected Health Information (PHI) would be displayed. Nevertheless, EMERSE would still timeout based on the server timeout settings even though the countdown window for a forced logout would not be shown to the user.

application.idle.timeout	Number of seconds to run the timer when the application is idle. The default value is 3600 if this property has not been added to the properties file. The value should be in seconds. Example: application.idle.timeout=3600
application.warn.length	Number of seconds to show the timeout warning window. The default value is 30 if this property has not been added to the properties file. Example: application.warn.length=30

Overall Patient Count

EMERSE displays the total number of patients in the system with respect to conducting an All Patient Search across all of the patients. This count is updated using the Spring Scheduler within the app itself, and should auto-update about every 30 minutes. The overall patient count is not configurable since it is derived from the data loaded into the system. Specifically, this count is based on the distinct number of MRNs that are associated with all of the documents in the Solr index. It is not based on the total number of MNRs in the database table, Patient. Thus, if a patient is in the Patient table but does not have an associated document, that patient will not be counted towards the total number of patients.

Optimization

Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.

Tomcat (EMERSE application)

To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. One way to do this is to add the following snippet to the tomcat startup file - startup.sh.

export JAVA_OPTS="-Xmx2048m -Xms1024m"

Solr Index Optimization

Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the documents. This can be invoked manually using the Optimize button in the Solr Administration User Interface. Optimizing also reduces the index segment sizes which can also improve system performance. During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about 10+ hours to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.

Solr Memory

Solr’s memory can be configured by a simple flag at startup. When working with millions of documents, we had some issues when Solr was using its default settings of 512 megabytes. Currently the EMERSE production instance at Michigan Medicine is configured to use 3 gigabytes. In the example below, 3 gigabytes are being allocated.

./solr start -m 3g

You may need to pass other flags such as -s when starting Solr, as described in the Installation Guide.

Solr Patient Index Replication Interval

EMERSE has two indexes used to keep track of patients: patient and patient-slave (the patient-slave index is the one actually used by EMERSE while it is running).

The patient index is created by copying the patients from the database table, Patient over to the corresponding Solr index, patient. This is done automatically by the system once per day as a scheduled event. The schedule of the jobs can be found in the properties file. The default time set for the EMERSE distribution is 7:30 AM. This was done with the assumption that the patients in the Database table would be updated once every night during non-peak hours. If you are fine with that time, no changes need to be made.

The properties file uses a cron-like syntax to specify the schedule, which consists of six fields separated by whitespace. The first field is the seconds, then it’s minutes, hours, day-of-the-month, the month number, and then day-of-the-week. A field can have a number in it, appropriate for the field, a star meaning every value of the field, or a question mark, meaning no restriction. A more formal description of the syntax can be found in the Spring Documentation, specifically the component regarding the Class CronSequenceGenerator.

task.updateIndexStatsViaSolr.cron	This runs a job that finds the minimum and and maximum dates of the document index, along with the number of distinct MRNs in the document index, which are used for updating the date ranges displayed in EMERSE as well as the overall patient count shown when conducting an All Patient search. The default time is every hour, around minute 42. For instance, 1:42, 2:42, 3:42, etc. Default: task.updateIndexStatsViaSolr.cron=00 42 * * * ?
task.updatePatientIndex.cron	The schedule to update the Solr patient index from the patient table in the database. The default time is 7:30 AM. Default: task.updatePatientIndex.cron=00 30 7 * * ?
task.optimizePatientIndex.cron	The schedule to optimize the Solr patient index. Optimizing an index puts all the data into a single segment for operational efficiency. The default time is 7:45 AM. Default: task.updatePatientIndex.cron=00 45 7 * * ?

If you change the scheduled time of this process, you will have to restart Tomcat for the changes to take effect.

The patient-slave Solr index gets replicated from the master patient Solr index, and it is possible to change the frequency of this replication. We currently have it set to do this every minute, and we do not see any reason that it should need to be changed. However, if you do wish to change it, this can be found in the solrconfig.xml file within the patient-slave core. The parameter to change is pollInterval. For example:

<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy">
    <lst name="slave">
      <str name="masterUrl">http://localhost:8983/solr/patient/replication</str>
      <str name="pollInterval">00:01:00</str>
    </lst>
  </requestHandler>

It is possible to force these copying and indexing events to occur on demand, which may be useful for troubleshooting or when testing with an initial setup. Details about how to do this are described in the Troubleshooting Guide.

Server memory optimization

In a deployment of EMERSE where the main application and Solr are running on the same server, each process should be given adequate memory. At some point, however, allocating memory to these processes may actually reduce performance. Most modern operating systems will cache files in all available memory. As memory is allocated to these processes, less memory will be available to the operating system to cache files.

EMERSE Search Concurrency

EMERSE enqueues user searches as they use the application, and a pool of workers concurrently pulls from that queue and executes searches. This parallel execution of searches allows fast searches to complete while long ones continue, and helps prevent a single user’s long-running query from slowing down the entire system for all users. Workers from the pool take a batch of requests from the queue at once, and process that batch sequentially. However, if the requests they are running are slow, they may take a smaller batch size, but never smaller than the minimum size. There are a few parameters in the project.properties file that control this construct, although we would not expect that these should need to be changed unless specific performance issues need to be tweaked.

search.concurrentConsumers	This controls the size of the worker pool which concurrently pulls from the queue. Default 10.
search.initialBatchSize	The smallest batch of requests a worker may take at once, no matter how slow searches are taking. Default 1.
search.maxBatchSize	The largest batch of requests a work may take at once. This is the size of the first batch for a worker. After this, it may reduce its batch size of the searches are running slow, and may go back up to this limit if they are running fast again. Default 7.

The dynamic adjustment of batch sizes is based on the following rules, which are not configurable at this time. Each thread processes a search for a given user. A search is done for each patient (not patient/source). When the last search for the user has taken…
…< 1.5 seconds, max batch size (7) is used for next search
…>=1.5 seconds and < 5 seconds, max batch size / 2 (round down, so 3 in this case) is used for next search
…>=5 seconds, initilal batch size (1) is used for next search

Security Hardening

Solr

Solr also provides a REST API that can be accessed with tools such as curl. By default this is not locked down and should be secured with basic authentication if the Solr ports are not firewalled to external communication.

Solr can be set up to use SSL/TLS, and require authentication with basic auth. Both of these features are supported by Solr Cloud, but EMERSE does not yet support Solr Cloud. However, the Jetty servlet engine embedded by stand alone Solr can be modified to require authentication and use SSL.

Much of the Solr documentation pertains to Solr Cloud, which is NOT currently supported by EMERSE. Look for references to a single node configuration when consulting Solr documentation.

Solr SSL Setup

Changes are required in Solr.in.sh found in bin directory under the Solr_INSTALLATION directory. Essentially uncomment the lines below and configure them with values appropriate to a java keystore containing the certificate for the server.

SOLR_SSL_KEY_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=keystore password
SOLR_SSL_TRUST_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=keystore password
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

See the "Basic SSL Setup" section at the following link for more information.

https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html

Basic Auth

If Basic Auth is desired, there are several ways in which Basic Auth can be configured. Solr provides its own approach, but another approach uses the Jetty servlet engine bundled with Solr.

The first step is to modify the jetty.xml file inside the SOLR_INSTALL_DIR/server/etc folder, adding the following snippet inside the <Configure></Configure> tags.

  <Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
        </New>
      </Arg>
    </Call>

After adding in the xml snippet, add a user/password combination to the file realm.properties located in SOLR_INSTALL_DIR/server/etc. If the file doesn’t exist just create a new file and add the following line to it.

solradmin:password, admin-role

In the above, the username is "solradmin" and the password is "password".

Also, the following needs to be added to the webdefaults.xml file:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

For more information on configuring Jetty with Basic Authentication, see here https://www.eclipse.org/jetty/documentation/9.3.0.v20150612/configuring-security-authentication.html

Active MQ

ActiveMQ broker can be setup to use SSL/TLS and also require authentication. ActiveMQ webapp can also be configured to use SSL. EMERSE allows setting user/password via configuration properties mentioned above [activemq-properties]

http://activemq.apache.org/how-do-i-use-ssl.html

Configuration password

Passwords specified in the project.properties files can themselves be encrypted using Jasypt.

http://www.jasypt.org/encrypting-configuration.html

Exported Excel files

EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server inside the exploded EMERSE war file, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file persists on the server. We currently have a shell script on the server that executes every 30 minutes and deletes files older than 60 minutes.

#!/bin/sh
cd /PATH_TO_TOMCAT_INSTALL/webapps/emerse/downloads \
        2> /dev/null || exit 0
find . -name "*.xlsx*" -mmin +60 -exec rm {} \;

Admin Application

Most details related to the Admin application and Admin features can be found in the Administrator Guide. Below is a high-level summary of the Admin features.

EMERSE users that have an ADMIN role have access to the admin application located at:

http://host:port/emerse/admin2

The application has two main features- user management related to authorization, and maintenance of synonyms.

Add/Remove users

The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.

Roles and Privileges

Roles and Privileges for EMERSE users can be customized. Details about how this is done can be found in the Administrator Guide.

Synonyms

The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a CSV file.

Synonyms upload currently deletes all the existing entries in the synonyms table and then loads entries from the CSV file. Thus, this options replaces synonyms as opposed to appending new ones to the existing list.

Supporting Multiple Environments

It may be ideal to support multiple EMERSE environments such as test, dev, prod, etc. We have found that sometimes it can be difficult for users who are testing EMERSE to know what specific system they are using. To make it easier to distinguish between multiple instances of EMERSE, the system has the ability to display a small, but obvious, box in the upper right part of the screen to inform users. Having this information in a database table is useful because it can remain stable even as the application itself gets upgraded.

This information is defined in a table with a single row called ENVIRONMENT_INFO:

Column Name

Description

id

This should set to 0 and not changed.

environment

This is the environment that is active (dev, test, prod, etc). This is a free text option so can be anything (e.g., "Development", "Testing", "Production", etc.)

display_on_ui

This is a flag to determine if the text for environment should be displayed on the screen or not. 1=display, 0=do not display. In general you would not display this to users in the Production system.

The version number of the application (displayed when selecting the About menu) is distributed with the WAR file itself and is not contained in the database.