Overview

This guide describes modifications that can be made that impact the behavior of EMERSE, tuning of the JVM associated with the EMERSE application and Solr, and some security hardening procedures.

Configuration

Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are the URL and credentials to the database and Solr.

Web Server

While it is not strictly required to have a web server such as apache’s httpd, or nginx, in front of Tomcat, it is typical. The only special configuration EMERSE currently needs is that the socket timeout should be set to around 10 minutes in order for synonym upload to work on very large synonym sets (such as those we distribute).

Solr configuration

Solr has its own configuration files in two places. There is the installation directory, which is where you unzip the distribution, often referred to as solr-8.8.2/ in these guides. Then there is the configruation of the indexes, also called "cores" in Solr. The indexes reside in the Solr "home" directory, $SOLR_HOME which is either set by you in solr-8.8.2/bin/solr.in.sh or specified with -s when running solr start. If left unset, $SOLR_HOME is solr-8.8.2/server/solr/.

Each core has its own configuration files: - $SOLR_HOME/core_dir/core.properties - $SOLR_HOME/core_dir/conf/solrconfig.xml - $SOLR_HOME/core_dir/conf/managed-schema

The core.properties file defines the name of the core. It usually contains a line like name=documents or name=patient. This name doesn’t have to match the name of the directory it is in, but it’s less confusing if it does. This name determines the URL used to query the core, and how it appears in Solr’s admin app. EMERSE must be told the name of the core, not the name of the directory, since it uses the web API to talk to the core.

The solrconfig.xml file defines the version of Lucene used to talk to the index. Changing this will require a re-index since that may change the format of the index on disk.

In addition, solrconfig.xml contains the configuration of request handlers, such as the search, spell check and query validation handlers. These and other settings can be changed without re-indexing.

EMERSE assumes a number of endpoints are defined in this file, and those themselves rely on EMERSE’s Solr plugin. To install the plugin, place it in $SOLR_HOME/lib/. (You will have to create the directory.) The definitions of the endpoints that must be present in the solrconfig.xml files are kept in "configsets". These are basically templates for a Solr core. They have the same structure, but miss some files, such as the core.properties file.

The managed-schema file describes the format of documents for the core. It says what fields exist and how they should be indexed and stored. This will likely need to be modified to match your local document metadata structures, and will need to match what the appropriate database tables. For more information see the Data Guide.

Loading Configuration

EMERSE loads configuration data through Spring’s "environment", which is an abstraction over a number of ways to configure processes generally, Java programs specifically, and Java web applications more specifically. It presents configuration much like process-level "environment variables" which map textual names to textual values. The name/value pair generally is referred to as a "property" in Java. Spring’s environment searches all manners of configuration in some order, the first property found wins. The order of search is roughly as follows:

  1. Servlet Configuration Parameters

  2. Servlet Context Configuration Parameters

  3. JNDI Properties

  4. Java System Properties

  5. Environment Variables

  6. The emerse.properties file

The emerse.properties file is searched for in the following order:

  1. The value of the property emerse.properties.filepath, according to Spring’s environment

  2. $HOME/emerse.properties (ie, in the home directory of the user tomcat runs as)

  3. The file WEB-INF/classes/project.properties inside emerse.war (or its exploded directory) or any jar file inside the war file itself (This is to say, the file /project.properties on the classpath.) (There is no such file by default, and we don’t recommend re-packaging the war to add one; this is supported only for legacy reasons.)

Generally, all EMERSE configuration is placed in the emerse.properties file, though configuration may be split across any of the six places Spring’s environment searches.

In practice, we tend to put the emerse.properties file under the home directory of the user which tomcat runs as. However, if this is not desirable, or if multiple versions of EMERSE are running on the same server, you must specify the location of the properties file using another property, emerse.properties.filepath specified in one of the first five ways. We tend to do this using either Java system properties, or JNDI.

Java system properties are passed as arguments to the JVM, in the form -Dproperty.name=value. Since the Tomcat scripts are what actually start the JVM, you need to configure them to pass this argument, and the easiest way to do that is to set and export the CATALINA_OPTS environment variable before starting tomcat:

export CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/file"
bin/catalina.sh start

If you merely want to put the properties file somewhere other than $HOME/emerse.properties you can create a file bin/setenv.sh inside the tomcat installation and put the export statement there; catalina.sh automatically runs such a file internally if it exists. (See the comments inside catalina.sh or catalina.bat for more information.)

The downside of this is that each tomcat instance can host at most one deployment of EMERSE. To have multiple deployments of EMERSE in the same tomcat instance, you can set up each deployment with a different JNDI environment, which can give EMERSE a different location for the emerse.properties file.

The JNDI environment for a war file is configured by a file inside Tomcat’s configuration directory: conf/Catalina/localhost/emerse.xml. (The directories other than conf will need to be created.) The filename emerse.xml should match the name of the war-file in the webapps directory. Inside, add the XML:

<Context>
  <Environment name="emerse.properties.filepath"
               value="/path/to/emerse.properties"
               type="java.lang.String"/>
</Context>

You can add other properties here in the same way as well. They will take precedence over the values specified in any emerse.properties file loaded. So, if two deployments have the same configuration, except for which Solr instance or database they connect to, you could just set the solr.serviceURL or ds.url properties here.

Version number

The EMERSE version number is distributed as part of the WAR file that gets deployed to the server. This is not really a configurable option, but it is being mentioned here for the sake of completeness. It can be found in the WAR manifest, under META-INF/MANIFEST.MF. (WAR files are just zip files with the wrong extension. This is true of jar files too.) Try unzip -c emerse.war META-INF/MANIFEST.MF.

Profiles

EMERSE uses Spring profiles to enable optional services within the application. Active profiles are those listed in the value of the spring.profiles.active property in Spring’s environment. Profile names should be comma-separated.

Currently, there are two interesting profiles: ldap and no-scheduler.

ldap turns of the LDAP authentication mechanism which must be further configured with other properties. Additional information can be found in the LDAP section below.

no-scheduler turns off all scheduled jobs typically configured with the task.*.cron properties, described below. Activating this profile is used for testing only. (The scheduler is on by default, which is why activating the no-scheduler profile turns it off.)

See the Loading Configuration section for more details about how to put spring.profiles.active into Spring’s environment.

apache-tomcat/bin/setenv.sh
export CATALINA_OPTS="$CATALINA_OPTS -Dspring.profiles.active=ldap
emerse.properties
spring.profiles.active=ldap

Application Settings

The following are all the properties that configure EMERSE. These are looked up through Spring’s environment, as described in Loading Configuration. Generally, we set them in the emerse.properties file.

Database

ds.username

The username for the database account for the main EMERSE database.

ds.password

The password for the database account. This can be encrypted with Jasypt if desired.

ds.url

The JDBC url to connect to the EMERSE database.

Example:
ds.url=jdbc:oracle:thin:@databasehost:port:sid
ds.maxPoolSize

Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted.

Solr/Lucene

solr.serviceURL

The URL the application will use to access the Solr instance.

Example:
solr.serviceURL=http://localhost:8983/solr
solr.username

The username used when making connection to Solr when configured with basic auth.

solr.password

The password used when making connection to Solr when configured with basic auth.

solr.unifiedCollection

The name of the Solr core that contains patient documents. Tends to be documents or for older installations, unified.

solr.patientUpdateCollection

The name of the Solr core that EMERSE copy the PATIENT table to as part of a scheduled batch job each night. Typically, this is patient.

solr.patientSearchCollection

The name of the Solr core that should be used to search patients. Typically, this is patient-slave, the replica of patient

solr.wildcard.minLength

This is the minimum number of characters in a term before the first wildcard character, *, in a search. Terms which have less than this are rejected as invalid. This does not apply to advanced search. Wildcard searches can be very taxing on Solr, more taxing if more terms match the wildcard; enforcing a minimum prefix ensures these shouldn’t be too taking. The default value is 3.

The patient cores were created to facilitate the graphs summarizing demographics shown in the results of an all-patient search. They are also used in filters.

It is worth noting that the Solr patient index is replicated within EMERSE, with one serving as a backup for the other if the main one ever became corrupted. The larger, production Solr document index is not replicated in this way, mainly because it is so large. In other words, this type of 'slave' index is good practice, but may not be practical for the larger indexes.

Patient Lists and MRN validation

There are various options available for validating user-entered patient medical record numbers (MRNs). MRNs in the PATIENT table are in a "canonical" form which is formatted for display to the user, and MRNs entered by the user are "cleaned up" to match the canonical form. The "formatting for display" portion is determined by patientList.MRNFormat and patientList.MRNFormatUseZeroPad. The "cleaning up" portion is covered by patientList.stripRegexOnMRNInsert.

If your MRNs are numbers possibly with leading zeros, we suggest you pick the "canonical" form as with no leading zeros. This allows you to be the most forgiving of minor errors because you can strip leading zeros on input, and but show the leading zeros when showing the MRNs to the user.

We suggest being this forgiving since MRNs may come from different systems (or go through several systems) before being sent to EMERSE, and not all of them may preserve leading zeros.

patientList.MRNLimit

The number of patients that can be added to a patient list. We set it to 100,000 by default.

patientList.maxErrors

The maximum number of errors that are shown when user uploads or inserts new patients to a patient list. The default is 100.

patientList.deduplicateLists

Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it true is ideal.

patientList.pullInvalidMRNs

If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed. Default true.

patientList.MRNFormat

A format string formats the MRN for display in the UI. Generally this is either %s or of the form %9s where 9 can be some other number. In the former case, it leaves the input as-is. In the latter case, it pads the MRN with spaces if it is less than 9 characters. The spaces added here can be replaced with zeros in the next option. Default %9s.

patientList.MRNFormatUseZeroPad

true to replace spaces in the MRN with zeros; false to not do that. Default true.

patientList.stripRegexOnMRNInsert

Remove matches of the regular expression when MRNs are uploaded/entered by users. This is run after whitespace is removed from the MRN. (For instance, if the regular expression is ^0+ then 000 0045 67 would become 4567.) The resulting value ought to match one in the PATIENT table. If it doesn’t it will be reported as discarded and reported as invalid. Example values:

  • ^0+|[-] - remove leading zeros and dashes

    patientList.stripRegexOnMRNInsert=^0+|[-]
  • [-#] - remove dashes and pound signs

    patientList.stripRegexOnMRNInsert=[-#]
  • An empty value keeps the exact value (spaces would still be removed)

    patientList.stripRegexOnMRNInsert=

LDAP

The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.

Typically, to authenticate to LDAP, you need the DN (distinguished name) of the user you want to authenticate as, and the password of that user. Since the DN of a user may not contain their username (as entered at the login screen), EMERSE authenticates to LDAP as a fixed "service user" as specified by ldap.userDn and ldap.password. This should give EMERSE the permission to then run a search for the DN of the user trying to login. The search run is the one specified in the ldap.search property, where the every instance of the text {0} in that search is replaced with the username entered on the login screen. The user record found from that search should contain a dn: entry, which is then used to authenticate the user against LDAP with the password provided on the login screen.

LDAP is only used for authentication, authorization (permissions) are given to the user as defined by their user account in the EMERSE database that matches the username they entered at the login screen. This means if they are not in the EMERSE database, they will not have access. So, you must add users via the administration application; we don’t create accounts from information stored in LDAP or grant permissions based on LDAP groups.

ldap.host
Example:
ldap.host=ldaps://hostname:636
ldap.auth.tls.gracefulShutdown

Either true or false. Default true. Only effectful if the protocol of ldap.host is ldap://. Some LDAP systems require a graceful shutdown of TLS or subsequent LDAP binds fail (which will appear as a first login to EMERSE as successful, but subsequent logins as failing with an error complaining about how TLS is already established). Some may fail if graceful shutdown is attempted at all. Set this depending on the errors you encounter when testing.

ldap.userDn

The distinguished name of the service account that EMERSE will use to conduct the user search

Example:
ldap.userDn=cn=emerse,ou=people,dc=med,dc=umich,dc=edu
ldap.password

Password of the service account

ldap.uidPath

Path of the subtree to search for the user

Example:
dc=med,dc=umich,dc=edu
ldap.search

The search to find the user based on the username typed at the login screen. Every occurrence the text {0} will be replaced with the username typed at login.

Examples:
(uid={0})
(&(uid={0})(objectClass=user))

Single Sign-On or Pre-Auth

Single sign-on properties are only used when the preauth profile is active. When the preauth profile is active, EMERSE will do a few things:

  1. EMERSE will automatically login the user specified by the HTTP request without any credentials. The login is associated with the session, so only one request per session needs to specify the user, but it’s fine (and the norm) for each request to specify the user.

  2. EMERSE will no longer show the login form on the login page. The login.hint will still be displayed, along with any errors in the login process.

The manner the http request specifies the user to be logged in is controlled by the preauth.header setting, described below.

preauth.header

The HTTP request header than contains the username of the person to be logged in. If unspecified, then HttpServletRequest#getRemoteUser() is used. Tomcat can populate this value when using AJP.

preauth.loginUrl

When specified, EMERSE will show the login button on the login page, and it will redirect to this url. This is intended to allow the user to initiate the SSO, instead of it happening before landing on an EMERSE page.

Pre-Auth Security

If you use a header name to indicate which user should be regarded as logged in, it’s very important to make sure users/attackers cannot craft HTTP requests directly to Tomcat, since otherwise they’d be able to login as anyone they wish without credentials.

If you use AJP in Tomcat, then AJP requests can specify the REMOTE_USER which gets put in getRemoteUser() of the request, and so it has a similar problem as using request headers, but Tomcat can reject AJP requests unless they have a specific secret included.

Shibboleth

The main way you should use pre-auth is with Shibboleth, httpd, Tomcat, and the AJP protocol. Shibboleth does the actual single sign-on, where httpd, Tomcat, and the AJP protocol allows the username known by shibboleth to be sent to Tomcat and set as the getRemoteUser() of the HTTP request.

To make httpd use ajp, just specify that protocol in the reverse-proxy:

LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

<Location "/emerse/">
  ProxyPass "ajp://localhost:8009/emerse/" secret=somesecret
</Location>

You’ll need a secret unless you disable requirement in Tomcat. In Tomcat, you’ll want to use the AJP connector instead of the HTTP(S) one, and you’ll need to tell tomcat to take authentication from the AJP protocol (by specifying tomcatAuthentication="false").

$CATALINA_BASE/conf/server.xml
<?xml version="1.0" encoding="UTF-8"?>
<Server port="8005" shutdown="SHUTDOWN">
  ...
	<Service name="Catalina">
		<Connector
				protocol="AJP/1.3"
				port="8009"
				secret="somesecret"
				tomcatAuthentication="false"
		/>
		...
	</Service>
</Server>

Specify the SAML attribute to be used as the username / REMOTE_USER, in the <ApplicationDefaults> element of shibboleth2.xml:

/etc/shibboleth/shibboleth2.xml
...
  <ApplicationDefaults REMOTE_USER="your-attribute-name-here" ...>
...
User Initiated Login

To allow users see the EMERSE login screen and initiate login themselves, set the preauth.loginUrl to be Shibboleth’s login initiator. Make sure a shibboleth session is optional in httpd:

<Location "/emerse/">
  AuthType shibboleth
  ShibRequestSetting requireSession false
  require shibboleth
</Location>

Then set the login location to have a redirect back to the EMERSE app, something like:

emerse.properties
preauth.loginUrl=https://your.host.edu/Shibboleth.sso/Login?target=https://your.host.edu/emerse/
Automatic Login

To just automatically redirect to the SSO login without seeing any EMERSE application screen, just make sure the shibboleth session is required for the entire EMERSE application in httpd:

<Location "/emerse/">
  AuthType shibboleth
  ShibRequestSetting requireSession true
  require shibboleth
</Location>

Then shibboleth will redirect before allowing httpd to do the reverse proxy and show the EMERSE’s login screen.

Attestation

attestation.allowOtherAttestationReasons

If set to true, Quick Buttons with standard reasons are displayed to the user for selection

attestation.allowFreeTextAttestation

If set to true, users can enter a free text description describing their purpose of using EMERSE

attestation.showPriorAttestations

If set to true, the Attestation screen will display in the table prior free text attestation reasons used by the user.

Batch Updating Begin/End Dates

EMERSE every night will run a batch job that finds the date range of documents stored in the documents index. It stores this date range in the SOLR_INDEX table. This is used as the default date range for any search that doesn’t specify one specifically. It is also used to validate the any custom date range the user enters in the filters page, so that users don’t think they are searching documents from a time period that is not indexed at all.

However, our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).

To circumvent this potential problem EMERSE allows you to specify which of the start and end dates should be updated from the values found from the index itself. Typically, this means you can set a sensible START_DATE into the SOLR_INDEX table, and then tell EMERSE not to update it. Then, this start date will appear as the earliest date searchable within EMERSE.

batch.updateIndexMinDateFromSOLRIndex

If set to true, min date of documents is updated from Solr every night, which would be updated in the solr_index table.

batch.updateIndexMaxDateFromSOLRIndex

If set to true, max date of documents is updated from Solr every night, which would be updated in the solr_index table.

If one or both of these properties is set to false, then the date entered in the solr_index table is what will be used for display purposes. For more information on this table see the section on the solr_index in the Data Guide.

The Find Patients function is when the system seeks to identify a set of patients based on the search terms. Initially this function only applied to all patients in the Solr index. As of version 6.1 the capability can also apply to a Patient List, which is generally a much smaller subset of patients compared to the entire index.

search.allPatientFragmentLimit

Number of fragments/text snippets to display for preview when using All Patient Search

Example:
search.allPatientFragmentLimit=100
search.facetDateRangeInterval

The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting.

Example:
search.facetDateRangeInterval=10
search.facets.excluded

Hide certain the given charts and filters. Generally this should be done because this data is no available at the site. These are comma-separated values. The values are:

Value

Graph Title in UI

SEX_CD

Sex

BIRTHDATE

Current Age

RACE_CD

Race

ETHNICITY_CD

Ethnicity

DECEASED_FLAG

Vital Status

MARITAL_STATUS_CD

Marital Status

SOURCE

Document Source

Example:
search.facets.excluded=MARITAL_STATUS_CD,DECEASED_FLAG

Synonyms

The properties below are the defaults that we recommend. In general there should be no reason to change them, but they are described here for completenes.

synonymList.valueLimit

Maximum number of characters that a single synonym term string can be. This is also limited by the settings in the database, but is used here to perform validation on the uploaded datasets to ensure they comply. The value limit should not be greater than the column width in the database, which is 255. Each database may have its own interpretation on the 255 limit, which is mainly decided by the character set of  the database. Because some characters that may be included in a term could be Unicode (and not plain ASCII), they might take up additional bytes and exceed the size limit in the database. This is why we recommend 200 to be safe.

Example:
synonymList.valueLimit=200
synonymList.delimiter

The delimiter used to separate the columns for a synonyms dataset. We expect this should always be a tab.

Example:
synonymList.delimiter=\t

Miscellaneous

There are several components configurable within the EMERSE menu, which is available to all users in the upper-right portion of the window. In addition, there are things you can configure for the login page.

Contact Information

Users may want to contact a local administrator about issues or feedback about EMERSE. This can be accessed by users in the upper right menu through either the About option or the Feedback option. Both of these menu options have some hard-coded text followed by a customizable URL that can be defined using the two properties listed below. The About menu item contains text beginning with "Please direct feedback and issues to…​" and the Feedback menu item contains text beginning with "Please send any comments or suggestions you may have about EMERSE to…​". The remaining text is defined the the two properties:

contact.url

This is the URL, or the mailto URL that will direct the user to the correct resource.

Examples:
contact.url=https://link.to.help/server
contact.url=mailto:emersehelp@university.edu
contact.text

This is the text that wil be displayed on the screen for the URL.

Example:
contact.text=EMERSE help desk

The resulting URL would then be constructed using the two properties above to look something like:

<a href="https://link.to.help/server">EMERSE help desk</a>
login.hint

This is a small snippet of text that appears on the EMERSE login page to help users know what login credentials to enter. This is by default blank.

resources.dir

This is a directory containing web resources. Right now, only the "cover" photo on the EMERSE login page (not the admin app one) can be set. By default there is none, so you should do this. The name of the cover photo should be either cover.png or cover.jpg inside the directory specified by the setting. The photo should be around 1200 pixels tall, though the size is up to you. The browser will "zoom" in on it so it takes up the entire browser window, though the left-hand side of the image will be obscured by the login panel itself.

userGuideUrl

This is the link that contains the user guide. By default (if nothing is defined) it will link to the main user guide on the project-emerse.org website. If you have your own user guide you can link to that instead by replacing the URL.

Example:
userGuideUrl=http://project-emerse.org/documentation/user_guide.html

Timeouts

A user’s session is configured to be timed out due to inactivity. If the app is idle and does not encounter a mouse click, mouse move, mouse scroll or a keypress activity for a configured timeout setting, the application logs the user out of their session and the login page is presented. The following properties can be added to the project.properties to override the defaults.

This timeout feature does not apply to the Attestation screen, because at this point no Protected Health Information (PHI) would be displayed. Nevertheless, EMERSE would still timeout based on the server timeout settings even though the countdown window for a forced logout would not be shown to the user.
application.idle.timeout

Number of seconds to run the timer when the application is idle. The default value is 3600 if this property has not been added to the properties file. The value should be in seconds.

Example:
application.idle.timeout=3600
application.warn.length

Number of seconds to show the timeout warning window. The default value is 30 if this property has not been added to the properties file.

Example:
application.warn.length=30

Overall Patient Count

EMERSE displays the total number of patients in the system with respect to conducting an All Patient Search across all of the patients. This count is updated using the Spring Scheduler within the app itself, and should auto-update about every 30 minutes. The overall patient count is not configurable since it is derived from the data loaded into the system. Specifically, this count is based on the distinct number of MRNs that are associated with all of the documents in the Solr index. It is not based on the total number of MNRs in the database table, Patient. Thus, if a patient is in the Patient table but does not have an associated document, that patient will not be counted towards the total number of patients.

The total patient count displayed in the user interface is stored in the PATIENT_COUNT column of the SOLR_INDEX table in the database. This count is refreshed periodically based on a background process that retrieves the unique numebr of MRNs from the Solr documents index. Additional details about configuring the schedule for this process can be found within this guide in the section called 'Solr Patient Index Replication Interval'. However, the overall patient count can also be forced to refresh immediately using the 'System Synchronization' feature found within the admin application.

Network

The EMERSE networking capability has multiple configuration settings. These are all described in the Networking Guide.

Optimization

Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.

Tomcat (EMERSE application)

To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. You can add these arguments in the setenv.sh script in the bin directory of Tomcat. If it doesn’t exist, just create it. See Tomcat’s catalina.sh script for more details on this script.

apache-tomcat/bin/setenv.sh
export CATALINA_OPTS="$CATALINA_OPTS -Xmx2048m -Xms1024m"

Compression

You can enable compression in Tomcat by modifying tomcat’s server.xml file. In it, there should be at least one uncommented <Connector> element with a number of attributes. For example it may appear as:

apache-tomcat/conf/server.xml
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443"/>

In which case, you should add compression options to make it look like:

apache-tomcat/conf/server.xml
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443"
	       compression="on"
               compressionMinSize="1024"
               compressableMimeType="text/html,text/xml,text/javascript,text/css"/>

This makes loading the EMERSE application in the browser considerabliy faster. If you have multiple connector elements uncommented, you may want to do this to all of them, or at least the one you are serving the main EMERSE files from. Compression can also be done at the webserver instead (such as HTTPD or nginx).

You will need to restart tomcat to enable this configuration. You can confirm its working by seeing the Content-Encoding: gzip header in responses from the server, or seeing a smaller transfer size than response size in the browser’s network tab in its developer tools. Compression will only be done on files larger than 1024 bytes (as set in the compressionMinSize attribute).

Solr Index Optimization

Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the documents. This can be invoked manually using the Optimize button in the Solr Administration User Interface. Optimizing also reduces the index segment sizes which can also improve system performance. During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about 10+ hours to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.

Solr Caches

Solr has a number of caches. Two important ones which are not configured by default are the filter cache and the document cache. These caches are index-specific, and are specified in the solrconfig.xml guide:

solrconfig.xml
<config>
  ...

  <query>
    <filterCache class="solr.FastLRUCache"
        maxRamMB="1024"
        showItems="10"/>
    <documentCache class="solr.LRUCache"
        size="4000"
        showItems="10"/>
  </query>

  ...
</config>

The maxRamMB attribute tells Solr how large the cache should be allowed to grow to before evicting entries. The size attribute tells Solr how many entries are allowed before evicting becomes necessary. Only one should be specified, but on earlier versions of Solr 8, the maxRamMB can’t be specified on the documentCache. (You’ll get an error if you do.)

We’d recommend configuring these caches both in the documents index and the patient-slave index, since both are queried by EMERSE. The documents index should have larger caches since it is used more heavily.

Not all queries that are sent to Solr are cached. We currently can cache the following queries (in the standard lucene syntax): 1. SOURCE:XXX 2. MRN:XXX 3. the collection of "filters" specified in EMERSE as a single query

To get a sense of the size of the queries, at UM, we have 200 million documents, and the first kind of query takes up 20-30 MB of space for each source in the system. (We have about five sources.) The second kind tends to take 2-30 KB (about a thousand times less), and the third kind takes between those two, depending on how selective the filters as a whole are. (The more selective, the smaller.)

Viewing Cache Statistics

You can view the usage of the caches in the Solr Admin UI. Just go to the core, click on "Plugins / Stats" then "Cache" and the types of caches should appear. Expand the "filterCache" or "documentCache" to look at that one. The showItems="#" in the XML tells the UI to show a random sample of the cached filters or documents, including their size. In addition, you can see how many cache lookups have been performed, evictions, and the hit-ratio. For the documents core’s filter cache, we have a hit ratio of above 0.95.

Solr Memory

Solr (on Linux at least) memory maps the index files, meaning it’s virtual memory will be about as large as the index file size. This means the virtual memory size of the application can be vastly larger than physical memory. (For instance, our Solr has at least 1 TB of virtual memory.)

These memory-mapped files are not a part of the Java heap space and so don’t contribute toward the -Xmx flag. In addition, the OS manages what of those files actually in physical memory and what isn’t. (Depending on the tool you use to look at memory on the box, the memory used for memory-mapped files may appear used or not; the OS is free to use that memory for another purpose, so it is, in a sense, free.) Solr internally caches the results of certain queries or parts of queries, so that if they are used frequently, the search doesn’t need to be done again. These do reside in the Java heap space.

So, you must strike a balance by having enough Java heap memory for query caching, and enough otherwise free memory on the box so that the OS has plenty of space to cache Solr’s memory-mapped files. At University of Michigan, we currently allocated 3GB to Solr’s heap, and have at least 20 GB of free memory on the box for the OS to cache files.

Allocating more heap space for Solr doesn’t mean you won’t have to tweak some of Solr’s cache settings, though there is some cache-sizing based on the max heap size. We haven’t done a ton of testing with caching, so we’ll say no more on this for now.

Solr’s memory can be configured with flags to the solr start command, or set as a default when starting solr by adding it to the solr.in.sh configuration.

./solr start -m 3g
solr-8.X.Y/bin/solr.in.sh
SOLR_HEAP=3g
You may need to pass other flags such as -s when starting Solr, as described in the Installation Guide.

If you are concerned with the performance of the garbage collector or free memory, you can see the frequency and duration of garbage collection in Solr’s GC log, contained in SOLR_INSTAL_DIR/server/logs/solr_gc.log.

Solr Patient Index Replication

EMERSE has two indexes used to keep track of patients: patient and its replica patient-slave.

The patient index is created by copying the patients from the database table, PATIENT over to the corresponding Solr index, patient. This is done automatically by the system once per day as a scheduled event. The schedule of the jobs can be found in the properties file. The default time set for the EMERSE distribution is 7:30 AM. This was done with the assumption that the patients in the Database table would be updated once every night during non-peak hours. If you are fine with that time, no changes need to be made.

After the PATIENT table is copied over, the replica is told to replicate from the master. To do this, the index must be configured with the master’s URL, which is done inside the solrconfig.xml file of the patient-slave core. There should be a handler defined like so:

<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy">
  <lst name="slave">
    <str name="masterUrl">http://localhost:8983/solr/patient/replication</str>
    <!--str name="httpBasicAuthUser">username_here</str-->
    <!--str name="httpBasicAuthPassword">password_here</str-->
  </lst>
</requestHandler>

Simply adjust the masterURL parameter as needed. Uncomment the username and password elements and just them if you set up basic auth on the Solr instance. If you have an SSL certificate (which you should if you use basic auth), you must use a domain name that is issued to the certificate.

The properties file uses a cron-like syntax to specify the schedule, which consists of six fields separated by whitespace. The first field is the seconds, then it’s minutes, hours, day-of-the-month, the month number, and then day-of-the-week. A field can have a number in it, appropriate for the field, a star meaning every value of the field, or a question mark, meaning no restriction. A more formal description of the syntax can be found in the Spring Documentation, specifically the component regarding the Class CronSequenceGenerator.
task.updateIndexStatsViaSolr.cron

This runs a job that finds the minimum and and maximum dates of the document index, along with the number of distinct MRNs in the document index, which are used for updating the date ranges displayed in EMERSE as well as the overall patient count shown when conducting an All Patient search.

The date used is not the CLINICAL_DATE (which is used for searching) but the LAST_UPDATED date. The actual Solr field name is determined by the mapping of these EMR intents by the DOC_FIELD_EMR_INTENT table.

The default time is every hour, around minute 42. For instance, 1:42, 2:42, 3:42, etc.

Default:
task.updateIndexStatsViaSolr.cron=00 42 * * * ?
task.updatePatientIndex.cron

The schedule to update the Solr patient index from the patient table in the database. The default time is 7:30 AM.

Default:
task.updatePatientIndex.cron=00 30 7 * * ?
task.refreshCaches.cron

This refreshes various caches inside EMERSE, such as the list of Solr fields mapped to EMR intentions. The default time is 6:30 AM.

Default:
task.refreshCaches.cron=00 30 6 * * ?
If you change the scheduled time of this process, you will have to restart Tomcat for the changes to take effect.
It is possible to force these copying and indexing events to occur on demand, which may be useful for troubleshooting or when testing with an initial setup. Details about how to do this are described in the Administrator Guide.

Refresh Caches Nightly Job

task.refreshCaches.cron

This refreshes a variety of caches of static data from the database, such as the DOC_FIELD_EMR_INTENT table, along with more dynamic data, such as the set of MRNs in the PATIENT table. By default, this is done at 6:30 AM.

Default:
task.refreshCaches.cron=00 30 6 * * ?

EMERSE Search Concurrency

The Overview screen in EMERSE is computationally expensive to show. Currently, it takes generally two searches for each cell of the table, more for the mosaic view.

To complete this work quickly and fairly among multiple concurrent users, EMERSE internally has a priority queue of batches of rows from this table. When a user goes to a page of the Overview table, EMERSE adds the rows of that page as a batch in the priority queue. Worker threads complete rows from the most-neglected batch in the queue. As rows are completed for the most-neglected batch, other batches will eventually become the most-neglected, and then rows will start being completed that batch. In this way, roughly equal time is spent on each batch.

There is one setting you can tweak on this search priority-queue.

overview.workers

This is the number of worker-threads that process rows from the batches. This determines the number of concurrent searches that are sent to Solr. Default 7.

Security Hardening

Solr

Solr also provides a REST API that can be accessed with tools such as curl. By default this is not locked down and should be secured with basic authentication if the Solr ports are not firewalled to external communication.

Solr can be set up to use SSL/TLS, and require authentication with basic auth. Both of these features are supported by Solr Cloud, but EMERSE does not yet support Solr Cloud. However, the Jetty servlet engine embedded by stand alone Solr can be modified to require authentication and use SSL.

Much of the Solr documentation pertains to Solr Cloud, which is NOT currently supported by EMERSE. Look for references to a single node configuration when consulting Solr documentation.

Solr SSL Setup

Changes are required in solr.in.sh found in bin directory under the Solr_INSTALLATION directory. Essentially uncomment the lines below and configure them with values appropriate to a java keystore containing the certificate for the server.

SOLR_SSL_KEY_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=keystore password
SOLR_SSL_TRUST_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=keystore password
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

See the "Basic SSL Setup" section at the following link for more information.

Basic Auth

If Basic Auth is desired, there are several ways in which Basic Auth can be configured. Solr provides its own approach, but another approach uses the Jetty servlet engine bundled with Solr.

The first step is to modify the jetty.xml file inside the SOLR_INSTALL_DIR/server/etc folder, adding the following snippet inside the <Configure></Configure> tags.

  <Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
        </New>
      </Arg>
    </Call>

After adding in the xml snippet, add a user/password combination to the file realm.properties located in SOLR_INSTALL_DIR/server/etc. If the file doesn’t exist just create a new file and add the following line to it.

solradmin:password, admin-role

In the above, the username is "solradmin" and the password is "password".

Also, the following needs to be added to the webdefaults.xml file:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>
For more information on configuring Jetty with Basic Authentication, see here https://www.eclipse.org/jetty/documentation/9.3.0.v20150612/configuring-security-authentication.html

Configuration password

Passwords specified in the project.properties files can themselves be encrypted using Jasypt.

Exported Excel files

EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server inside the exploded EMERSE war file, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file persists on the server. We currently have a shell script on the server that executes every 30 minutes and deletes files older than 60 minutes.

#!/bin/sh
cd /PATH_TO_TOMCAT_INSTALL/webapps/emerse/downloads \
        2> /dev/null || exit 0
find . -name "*.xlsx*" -mmin +60 -exec rm {} \;

Admin Application

Most details related to the Admin application and Admin features can be found in the Administrator Guide. Below is a high-level summary of the Admin features.

EMERSE users that have an ADMIN role have access to the admin application located at:

The application has two main features- user management related to authorization, and maintenance of synonyms.

Add/Remove users

The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.

Roles and Privileges

Roles and Privileges for EMERSE users can be customized. Details about how this is done can be found in the Administrator Guide.

Synonyms

The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a TSV file. Multiple datasets can be added and managed through the interface in the Admin app. More details are discussed in the Administrator Guide.

Synchronization

The admin application has an option to "synchronize" various data between the database and Solr. While this happens automatically overnight it can be useful to force this more frequently, especially during initial system setup and testing. Details can be found in the Administrator Guide.

Supporting Multiple Environments

It may be ideal to support multiple EMERSE environments such as test, dev, prod, etc. We have found that sometimes it can be difficult for users who are testing EMERSE to know what specific system they are using. To make it easier to distinguish between multiple instances of EMERSE, the system has the ability to display a small, but obvious, box in the upper right part of the screen to inform users. Having this information in a database table is useful because it can remain stable even as the application itself gets upgraded.

This information is defined in a table with a single row called ENVIRONMENT_INFO:

Column Name

Description

id

This should set to 0 and not changed.

environment

This is the environment that is active (dev, test, prod, etc). This is a free text option so can be anything (e.g., "Development", "Testing", "Production", etc.)

display_on_ui

This is a flag to determine if the text for environment should be displayed on the screen or not. 1=display, 0=do not display. In general you would not display this to users in the Production system.

The version number of the application (displayed when selecting the About menu) is distributed with the WAR file itself and is not contained in the database.