EMERSE Configuration and Optimization Guide

Overview

This document describes modifications that can be made that impact the behavior of EMERSE, tuning of JVM’s associated with the EMERSE application and Solr, and some security hardening procedures.

Configuration

Overview

Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are access to database, the URL of the Solr instance with indexed documents, and location of the Solr data files.

Loading Configuration

EMERSE loads configuration data mainly via property file configuration provided by the Spring Framework. Multiple paths can be defined for locating a file named project.properties, and Spring will use the first one it finds.

Normally this file is located inside the EMERSE war file. Once deployed, the file is located at TOMCAT_HOME/webapps/EMERSE/WEB-INF/classes. Using this location is usually not desirable in production environments, as deploying new versions of the code will cause this file to be replaced. Alternatively, the file can be placed in the Tomcat users’ home directory (that tomcat runs as) in a file called emerse.properties.

Profiles

EMERSE uses Spring profiles to enable optional services within the application. These profiles are "activated" by supplying a parameter to the EMERSE JVM at startup. Currently this is only useful to activate LDAP related spring security features. If the profile "ldap" is enabled, EMERSE will use enable LDAP for user authentication. Activation of profiles is added by adding a JVM system property at startup. In the example below, the tomcat startup file is modified to enable LDAP service via the system property.

export CATALINA_OPTS="-Dspring.profiles.active=ldap"

Application Settings

The following settings can be modified in the project.properties file, or emerse.properties as discussed previously:

Database

ds.username: The username for the database account for the main EMERSE database.

ds.password: The password for the database account. This can be encrypted with Jasypt if desired.

ds.url: The JDBC url to connect to the EMERSE database.

Example 1. -

jdbc:oracle:thin:@databasehost:port:sid

ds.maxPoolSize: Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted.

Solr/Lucene

lucene.indexPath: The path on the local filesystem to the Solr index containing the indexed documents.

Linux:

/app/data/indexes/unified
/app/data/indexes/pubmed

Windows:

file://c:/somewhere

Solr.serviceURL: The URL the application will use to access the Solr instance. Mainly used by the All patient search feature.

Example 2. -

http://localhost:8983/Solr

Solr.unifiedCollection: The path that is appended to the solr serviceURL to find the collection of patient documents.

Example 3. -

pubmed
unified

Solr.patientSearchCollection: The path appended to the solr URL where the application will find the patient index. By default this is patient-slave, a replicated copy of the patient index. This is so that the slave copy can be updated at any time. EMERSE has a background task that automatically updates these from the patient table, so no real configuration needs to be done. It is worth noting that the Solr patient index is replicated within EMERSE, with one serving as a backup for the other if the main one ever became corrupted. The larger, production Solr document index is not replicated in this way, mainly because it is so large. In other words, this type of 'slave' index is good practice, but may not be practical for the larger indexes.

Example 4. -

patient-slave

Solr.patientUpdateCollection: The path appended to the solr URL to the source patient collection for updating.

Example 5. -

patient

Solr.username: The username used when making connection to Solr when configured with basic auth.

Solr.password: The password used when making connection to Solr when configured with basic auth.

ActiveMQ

In the default activemq configuration, its message broker doesn’t require authentication, and does not use SSL. In this case, username, password are not required. If they are provided in the EMERSE configuration it will have no effect it will have no affect unless authentication is enabled in ActiveMQ. Please see ActiveMQ documentation for how to set up authentication. While securing the connections with SSL is good practice, EMERSE does not transmit any protected health information (PHI) via ActiveMQ. For details, see the section on 'Security Hardening' later in this document. [see: [activemq-security-hardening]]

activemq_connfactory.username: The username used to access activeMQ broker

activemq_connfactory.password: The password used to access activeMQ broker

activemq_connfactory.host: URL the application will use to access the activeMQ broker

Example 6. -

tcp://localhost:61616
With SSL configured:
ssl://hostname:61616

queue.result: Name of the queue that will be used by EMERSE application to process search results. The queue names can really be anything, but they need to be unique. ActiveMQ will create them on the fly if they do not exist.

Example 7. -

CRHIX_EMERSE_RESULT_SUMMARY_IN

queue.reply: Name of the queue that receives search results

Example 8. -

CRHIX_EMERSE_REPLY_TEMP

patientList.MRNLimit: The number of patients that can be added to a patient list. We set it to 100,000 by default.

patientList.fileSizeLimitMB: The size limit in MB of incoming CSV uploads. We currently use 10 MB.

patientList.maxErrors: The maximum number of errors that are shown when user uploads or inserts new patients to a patient list.

patientList.deduplicateLists: Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it true is ideal.

patientList.pullInvalidMRNs: If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed.

patientList.MRNFormat: A format string that can be optionally provided for the patient’s MRN. This uses a Java String format syntax. For MRNs 9 digits long, one can use:

%9s

patientList.MRNFormatUseZeroPad: (Requires use of MRN format) Optionally pad the MRN with leading zeroes. If this is set to true the numbers will be padded with leading zeroes using the patientList.MRNFormat described abonve.

LDAP

The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.

ldap.host:

Example 9. -

ldaps://hostname:636

ldap.userDn: A distinguished name of a user that will authenticate to the LDAP directory

Example 10. -

=cn=account,ou=people,dc=med,dc=umich,dc=edu

ldap.password: Password of the user account

ldap.uidPath: Suffix path to search for the user in LDAP directory

Example 11. -

dc=med,dc=umich,dc=edu

Attestation

attestation.allowOtherAttestationReasons: If set to true, Quick Buttons with standard reasons are displayed to the user for selection

attestation.allowFreeTextAttestation: If set to true, users can enter a free text description describing their purpose of using EMERSE

attestation.showPriorAttestations: If set to true, the Attestation screen will display in the table prior free text attestation reasons used by the user.

Batch Updating

Our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).

To circumvent this potential problem EMERSE provides two options for controlling the dates displayed to users. In general, background tasks that update the Lucene indexes would also update the date ranges for documents when all dates are selected (that is, when no date range is entered into the date range boxes in the user interface). This is so that as the index updates every night a new ‘end date’ can be shown for the date range of the documents.

This auto-update setting can be over-ridden for the start and stop dates, independently, using the properties described below. Changing this setting can, for example, allow one to have a more sensible document start date that more closely matches when the documents were being collected (without having to actually change the dates of all of the incorrect documents).

Note that changing these dates only affects the dates displayed in the date range section at the top of the screen. The actual documents will still show their original dates, and the searches will still take place based on the actual dates of the documents even if they are incorrect. Thus, if actual dates are entered by users into the stop/date boxes, those dates will be used. If no dates are entered by users (thus, searching ‘All dates’) then the system will search across all of the documents regardless of the over-ride date shown in the UI and regardless of the document dates in the system.

batch.updateIndexMinDateFromSolrIndex: If set to true, min date of documents is updated from Solr every night, which would be updated in the lucene_shards table.

batch.updateIndexMaxDateFromSolrIndex: If set to true, max date of documents is updated from Solr every night, which would be updated in the lucene_shards table.

All Patient Search

search.allPatientFragmentLimit: Number of fragments/text snippets to display for preview when using All Patient Search

Example 12. -

search.allPatientFragmentLimit=100

search.facetDateRangeInterval: The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting.

Example 13. -

search.facetDateRangeInterval=10

Search

search.cacheInMB: size of cache to use for efficient searches. Larger numbers are better, but you just need to make sure you have enough memory available.

Example 14. -

search.cacheInMB=128

search.cachedQueryCount: Each query term can be a query, so search bundles with lot of terms need a large count for this to work efficiently. We have found 1024 to be reasonable, so it likely does not need to be changed.

Example 15. -

search.cachedQueryCount=1024

Optimization

Overview

Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.

Solr Index Optimization

Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the documents. This can be invoked manually using the Optimize button in the Solr Administration User Interface. Optimizing also reduces the index segment sizes which can also improve system performance. During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about 10+ hours to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.

Tomcat (EMERSE application)

To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. One way to do this is to add the following snippet to the tomcat startup file - startup.sh.

export JAVA_OPTS="-Xmx2048m -Xms1024m"

Solr

Solr’s memory can be configured by a simple flag at startup. In the example below, 2 gigabytes are being allocated. When working with millions of documents, we had some issues when Solr was using its default settings of 512 megabytes. Currently the EMERSE production instance at Michigan Medicine is configured to use 3 gigabytes.

./solr -s /app/data/solr6 -m 2g

Server memory optimization

In a deployment of EMERSE where the main application and Solr are running on the same server, each process should be given adequate memory. At some point, however, allocating memory to these processes may actually reduce performance. Most modern operating systems will cache files in all available memory. As memory is allocated to these processes, less memory will be available to the operating system to cache files.

Security Hardening

Solr

Solr can be set up to use SSL/TLS. Changes are required in Solr.in.sh found in bin directory under the Solr_INSTALLATION directory.

https://cwiki.apache.org/confluence/display/Solr/Enabling+SSL

Solr also provides a native API that can be accessed with tools such as curl. By default this is not locked down and must be secured with basic authentication.

Active MQ

ActiveMQ broker can be setup to use SSL/TLS and also require authentication. ActiveMQ webapp can also be configured to use SSL. EMERSE allows setting user/password via configuration properties mentioned above [activemq-properties]

http://activemq.apache.org/how-do-i-use-ssl.html

Configuration password

Passwords specified in the project.properties files can themselves be encrypted using Jasypt.

http://www.jasypt.org/encrypting-configuration.html

Exported Excel files

EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file on persists on the server. We currently have a small custom batch job….

Admin Application

EMERSE users that have an ADMIN role have access to the admin application located at:

http://host:port/emerse/admin2

The application has two main features- user management related to authorization, and maintenance of synonyms.

Add/Remove users

The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.

Synonyms

The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a CSV file.

Synonyms upload currently deletes all the existing entries in the synonyms table and then loads entries from the CSV file. Thus, this options replaces synonyms as opposed to appending new ones to the existing list.

EMERSE Configuration and Optimization Guide

Overview

Configuration

Overview

Loading Configuration

Profiles

Application Settings

Database

Solr/Lucene

ActiveMQ

Patient List related:

LDAP

Attestation

Batch Updating

All Patient Search

Search

Optimization

Overview

Solr Index Optimization

Tomcat (EMERSE application)

Solr

Server memory optimization

Security Hardening

Solr

Active MQ

Configuration password

Exported Excel files

Admin Application

Add/Remove users

Synonyms