EMERSE Configuration and Optimization Guide

This guide describes modifications that can be made that impact the behavior of EMERSE, tuning of the JVM associated with the EMERSE application and Solr, and some security hardening procedures.

Configuration

Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are the URL and credentials to the database and Solr.

Web Server

While it is not strictly required to have a web server such as apache’s httpd, or nginx, in front of Tomcat, it is typical. The only special configuration EMERSE currently needs is that the socket timeout should be set to around 10 minutes in order for synonym upload to work on very large synonym sets (such as those we distribute).

Solr configuration

Solr has its own configuration files in two places. There is the installation directory, which is where you unzip the distribution, often referred to as solr-8.8.2/ in these guides. Then there is the configruation of the indexes, also called "cores" in Solr. The indexes reside in the Solr "home" directory, $SOLR_HOME which is either set by you in solr-8.8.2/bin/solr.in.sh or specified with -s when running solr start. If left unset, $SOLR_HOME is solr-8.8.2/server/solr/.

Each core has its own configuration files: - $SOLR_HOME/core_dir/core.properties - $SOLR_HOME/core_dir/conf/solrconfig.xml - $SOLR_HOME/core_dir/conf/managed-schema

The core.properties file defines the name of the core. It usually contains a line like name=documents or name=patient. This name doesn’t have to match the name of the directory it is in, but it’s less confusing if it does. This name determines the URL used to query the core, and how it appears in Solr’s admin app. EMERSE must be told the name of the core, not the name of the directory, since it uses the web API to talk to the core.

The solrconfig.xml file defines the version of Lucene used to talk to the index. Changing this will require a re-index since that may change the format of the index on disk.

In addition, solrconfig.xml contains the configuration of request handlers, such as the search, spell check and query validation handlers. These and other settings can be changed without re-indexing.

EMERSE assumes a number of endpoints are defined in this file, and those themselves rely on EMERSE’s Solr plugin. To install the plugin, place it in $SOLR_HOME/lib/. (You will have to create the directory.) The definitions of the endpoints that must be present in the solrconfig.xml files are kept in "configsets". These are basically templates for a Solr core. They have the same structure, but miss some files, such as the core.properties file.

The managed-schema file describes the format of documents for the core. It says what fields exist and how they should be indexed and stored. This will likely need to be modified to match your local document metadata structures, and will need to match what the appropriate database tables. For more information see the Data Guide.

Loading Configuration

EMERSE loads configuration data through Spring’s "environment", which is an abstraction over a number of ways to configure processes generally, Java programs specifically, and Java web applications more specifically. It presents configuration much like process-level "environment variables" which map textual names to textual values. The name/value pair generally is referred to as a "property" in Java. Spring’s environment searches all manners of configuration in some order, the first property found wins. The order of search is roughly as follows:

Servlet Configuration Parameters
Servlet Context Configuration Parameters
JNDI Properties
Java System Properties
Environment Variables
The emerse.properties file

The emerse.properties file is searched for in the following order:

The value of the property emerse.properties.filepath, according to Spring’s environment
$HOME/emerse.properties (ie, in the home directory of the user tomcat runs as)
The file WEB-INF/classes/project.properties inside emerse.war (or its exploded directory) or any jar file inside the war file itself (This is to say, the file /project.properties on the classpath.) (There is no such file by default, and we don’t recommend re-packaging the war to add one; this is supported only for legacy reasons.)

Generally, all EMERSE configuration is placed in the emerse.properties file, though configuration may be split across any of the six places Spring’s environment searches.

In practice, we tend to put the emerse.properties file under the home directory of the user which tomcat runs as. However, if this is not desirable, or if multiple versions of EMERSE are running on the same server, you must specify the location of the properties file using another property, emerse.properties.filepath specified in one of the first five ways. We tend to do this using either Java system properties, or JNDI.

Java system properties are passed as arguments to the JVM, in the form -Dproperty.name=value. Since the Tomcat scripts are what actually start the JVM, you need to configure them to pass this argument, and the easiest way to do that is to set and export the CATALINA_OPTS environment variable before starting tomcat:

export CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/file"
bin/catalina.sh start

If you merely want to put the properties file somewhere other than $HOME/emerse.properties you can create a file bin/setenv.sh inside the tomcat installation and put the export statement there; catalina.sh automatically runs such a file internally if it exists. (See the comments inside catalina.sh or catalina.bat for more information.)

The downside of this is that each tomcat instance can host at most one deployment of EMERSE. To have multiple deployments of EMERSE in the same tomcat instance, you can set up each deployment with a different JNDI environment, which can give EMERSE a different location for the emerse.properties file.

The JNDI environment for a war file is configured by a file inside Tomcat’s configuration directory: conf/Catalina/localhost/emerse.xml. (The directories other than conf will need to be created.) The filename emerse.xml should match the name of the war-file in the webapps directory. Inside, add the XML:

<Context>
  <Environment name="emerse.properties.filepath"
               value="/path/to/emerse.properties"
               type="java.lang.String"/>
</Context>

You can add other properties here in the same way as well. They will take precedence over the values specified in any emerse.properties file loaded. So, if two deployments have the same configuration, except for which Solr instance or database they connect to, you could just set the solr.serviceURL or ds.url properties here.

Version number

The EMERSE version number is distributed as part of the WAR file that gets deployed to the server. This is not really a configurable option, but it is being mentioned here for the sake of completeness. It can be found in the WAR manifest, under META-INF/MANIFEST.MF. (WAR files are just zip files with the wrong extension. This is true of jar files too.) Try unzip -c emerse.war META-INF/MANIFEST.MF.

Profiles

EMERSE uses Spring profiles to enable optional services within the application. Active profiles are those listed in the value of the spring.profiles.active property in Spring’s environment. Profile names should be comma-separated.

Currently, there are two interesting profiles: ldap and no-scheduler.

ldap turns of the LDAP authentication mechanism which must be further configured with other properties. Additional information can be found in the LDAP section below.

no-scheduler turns off all scheduled jobs typically configured with the task.*.cron properties, described below. Activating this profile is used for testing only. (The scheduler is on by default, which is why activating the no-scheduler profile turns it off.)

See the Loading Configuration section for more details about how to put spring.profiles.active into Spring’s environment.

apache-tomcat/bin/setenv.sh

export CATALINA_OPTS="$CATALINA_OPTS -Dspring.profiles.active=ldap

emerse.properties

spring.profiles.active=ldap

Application Settings

The following are all the properties that configure EMERSE. These are looked up through Spring’s environment, as described in Loading Configuration. Generally, we set them in the emerse.properties file.

Database

ds.username	The username for the database account for the main EMERSE database.
ds.password	The password for the database account. This can be encrypted with Jasypt if desired.
ds.url	The JDBC url to connect to the EMERSE database. Example: ds.url=jdbc:oracle:thin:@databasehost:port:sid
ds.maxPoolSize	Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted.

Solr/Lucene

solr.serviceURL	The URL the application will use to access the Solr instance. Example: solr.serviceURL=http://localhost:8983/solr
solr.username	The username used when making connection to Solr when configured with basic auth.
solr.password	The password used when making connection to Solr when configured with basic auth.
solr.documentsCollection	The name of the Solr core that contains patient documents. Tends to be `documents` or for older installations, `unified`.
solr.patientCollection	The name of the Solr core that should be used to search patients. Typically, this is `patient`.
solr.wildcard.minLength	This is the minimum number of characters in a term before the first wildcard character, `*`, in a search. Terms which have less than this are rejected as invalid. This does not apply to advanced search. Wildcard searches can be very taxing on Solr, more taxing if more terms match the wildcard; enforcing a minimum prefix ensures these shouldn’t be too taking. The default value is 3.

The patient cores were created to facilitate the graphs summarizing demographics shown in the results of an all-patient search. They are also used in filters.

Patient Lists and MRN validation

There are various options available for validating user-entered patient medical record numbers (MRNs). MRNs in the PATIENT table are in a "canonical" form which is formatted for display to the user, and MRNs entered by the user are "cleaned up" to match the canonical form. The "formatting for display" portion is determined by patientList.MRNFormat and patientList.MRNFormatUseZeroPad. The "cleaning up" portion is covered by patientList.stripRegexOnMRNInsert.

If your MRNs are numbers possibly with leading zeros, we suggest you pick the "canonical" form as with no leading zeros. This allows you to be the most forgiving of minor errors because you can strip leading zeros on input, and but show the leading zeros when showing the MRNs to the user.

We suggest being this forgiving since MRNs may come from different systems (or go through several systems) before being sent to EMERSE, and not all of them may preserve leading zeros.

patientList.MRNLimit	The number of patients that can be added to a patient list. We set it to 100,000 by default.
patientList.maxErrors	The maximum number of errors that are shown when user uploads or inserts new patients to a patient list. The default is 100.
patientList.deduplicateLists	Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it `true` is ideal.
patientList.pullInvalidMRNs	If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed. Default true.
patientList.MRNFormat	A format string formats the MRN for display in the UI. Generally this is either `%s` or of the form `%9s` where 9 can be some other number. In the former case, it leaves the input as-is. In the latter case, it pads the MRN with spaces if it is less than 9 characters. The spaces added here can be replaced with zeros in the next option. Default `%9s`.
patientList.MRNFormatUseZeroPad	`true` to replace spaces in the MRN with zeros; `false` to not do that. Default true.
patientList.stripRegexOnMRNInsert	Remove matches of the regular expression when MRNs are uploaded/entered by users. This is run after whitespace is removed from the MRN. (For instance, if the regular expression is `^0+` then `000 0045 67` would become `4567`.) The resulting value ought to match one in the `PATIENT` table. If it doesn’t it will be reported as discarded and reported as invalid. Example values: `^0+\|[-]` - remove leading zeros and dashes patientList.stripRegexOnMRNInsert=^0+\|[-] `[-#]` - remove dashes and pound signs patientList.stripRegexOnMRNInsert=[-#] An empty value keeps the exact value (spaces would still be removed) patientList.stripRegexOnMRNInsert=

LDAP

The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.

Typically, to authenticate to LDAP, you need the DN (distinguished name) of the user you want to authenticate as, and the password of that user. Since the DN of a user may not contain their username (as entered at the login screen), EMERSE authenticates to LDAP as a fixed "service user" as specified by ldap.userDn and ldap.password. This should give EMERSE the permission to then run a search for the DN of the user trying to login. The search run is the one specified in the ldap.search property, where the every instance of the text {0} in that search is replaced with the username entered on the login screen. The user record found from that search should contain a dn: entry, which is then used to authenticate the user against LDAP with the password provided on the login screen.

LDAP is only used for authentication, authorization (permissions) are given to the user as defined by their user account in the EMERSE database that matches the username they entered at the login screen. This means if they are not in the EMERSE database, they will not have access. So, you must add users via the administration application; we don’t create accounts from information stored in LDAP or grant permissions based on LDAP groups.

ldap.host	Example: ldap.host=ldaps://hostname:636
ldap.startTLS	Set to if you need to use StartTLS. This is the `-ZZ` option when using the `ldapsearch` commandline tool. StartTLS is never used with the `ldaps` protocol.
ldap.userDn	The distinguished name of the service account that EMERSE will use to conduct the user search Example: ldap.userDn=cn=emerse,ou=people,dc=med,dc=umich,dc=edu
ldap.password	Password of the service account
ldap.uidPath	Path of the subtree to search for the user Example: dc=med,dc=umich,dc=edu
ldap.search	The search to find the user based on the username typed at the login screen. Every occurrence the text `{0}` will be replaced with the username typed at login. Examples: (uid={0}) (&(uid={0})(objectClass=user))

For linux, you can replicate this process on the commandline to confirm you know what parameters that should work by using the ldapsearch command. Suppose EMERSE is configured this way:

emerse.properties

ldap.host=ldaps://hostname:636

ldap.userDn=cn=emerse-service-user,ou=people,dc=med,dc=umich,dc=edu
ldap.password=emerseServiceUserPassword

ldap.uidPath=ou=people,dc=med,dc=umich,dc=edu
ldap.search=(uid={0})

And a user logs in with username someuser and password someuserpassword. EMERSE effectively executes this search command:

ldapsearch -H ldaps://hostname:636 \
   -x -D cn=emerse-service-user,ou=people,dc=med,dc=umich,dc=edu -w emerseServiceUserPassword \
   -b ou=people,dc=med,dc=umich,dc=edu \
   (uid=someuser) dn

The command has been broken across many lines for readability, with backslashes at the end of each line to indicate a contiuation to the next line, as the shell requries. The -H option specifies the URL of the ldap server. -x says to do a "simple" bind, as opposed to as SASL bind. (EMERSE only supports simple binds.) -D gives the user DN to bind to, and -w gives the password. (For testing, you can do -W to have the command prompt you for the password instead of entering it as an argument.) After that, -b specifies the subtree to search. The next two arguments are the search, and the attributes to return in the search results. Notice that the search was specified as (uid={0}) in the properties file, but the {0} was replaced with the username entered at the login screen (someuser). Suppose the dn of the record with uid=someuser is cn=some-user,ou=people,dc=med,dc=umich,dc=edu, then a bind on that DN with the password provided at the login page is done:

ldapsearch -H ldaps://hostname:636 \
   -x -D cn=some-user,ou=people,dc=med,dc=umich,dc=edu -w passwordEnteredAtLoginScreen \
   -n

If this second bind is successful, the user is allowed to log in. (The -n option tells ldapsearch not to actually do a search; EMERSE only does a bind here, not a search.)

You may need to pass the -ZZ option if you cannot get binds to work with correct DNs and passwords, and you’re using ldap:// not ldaps://. If that option is needed, you should set ldap.startTLS=true in the properties file.

Single Sign-On or Pre-Auth

Single sign-on properties are only used when the preauth profile is active. When the preauth profile is active, EMERSE will do a few things:

EMERSE will automatically login the user specified by the HTTP request without any credentials. The login is associated with the session, so only one request per session needs to specify the user, but it’s fine (and the norm) for each request to specify the user.
EMERSE will no longer show the login form on the login page. The login.hint will still be displayed, along with any errors in the login process.

The manner the http request specifies the user to be logged in is controlled by the preauth.header setting, described below.

preauth.header	The HTTP request header than contains the username of the person to be logged in. If unspecified, then `HttpServletRequest#getRemoteUser()` is used. Tomcat can populate this value when using AJP.
preauth.loginUrl	When specified, EMERSE will show the login button on the login page, and it will redirect to this url. This is intended to allow the user to initiate the SSO, instead of it happening before landing on an EMERSE page.

Pre-Auth Security

If you use a header name to indicate which user should be regarded as logged in, it’s very important to make sure users/attackers cannot craft HTTP requests directly to Tomcat, since otherwise they’d be able to login as anyone they wish without credentials.

If you use AJP in Tomcat, then AJP requests can specify the REMOTE_USER which gets put in getRemoteUser() of the request, and so it has a similar problem as using request headers, but Tomcat can reject AJP requests unless they have a specific secret included.

Shibboleth

The main way you should use pre-auth is with Shibboleth, httpd, Tomcat, and the AJP protocol. Shibboleth does the actual single sign-on, where httpd, Tomcat, and the AJP protocol allows the username known by shibboleth to be sent to Tomcat and set as the getRemoteUser() of the HTTP request.

To make httpd use ajp, just specify that protocol in the reverse-proxy:

LoadModule proxy_ajp_module modules/mod_proxy_ajp.so

<Location "/emerse/">
  ProxyPass "ajp://localhost:8009/emerse/" secret=somesecret
</Location>

You’ll need a secret unless you disable requirement in Tomcat. In Tomcat, you’ll want to use the AJP connector instead of the HTTP(S) one, and you’ll need to tell tomcat to take authentication from the AJP protocol (by specifying tomcatAuthentication="false").

$CATALINA_BASE/conf/server.xml

<?xml version="1.0" encoding="UTF-8"?>
<Server port="8005" shutdown="SHUTDOWN">
  ...
	<Service name="Catalina">
		<Connector
				protocol="AJP/1.3"
				port="8009"
				secret="somesecret"
				tomcatAuthentication="false"
		/>
		...
	</Service>
</Server>

Specify the SAML attribute to be used as the username / REMOTE_USER, in the <ApplicationDefaults> element of shibboleth2.xml:

/etc/shibboleth/shibboleth2.xml

...
  <ApplicationDefaults REMOTE_USER="your-attribute-name-here" ...>
...

To allow users see the EMERSE login screen and initiate login themselves, set the preauth.loginUrl to be Shibboleth’s login initiator. Make sure a shibboleth session is optional in httpd:

<Location "/emerse/">
  AuthType shibboleth
  ShibRequestSetting requireSession false
  require shibboleth
</Location>

Then set the login location to have a redirect back to the EMERSE app, something like:

emerse.properties

preauth.loginUrl=https://your.host.edu/Shibboleth.sso/Login?target=https://your.host.edu/emerse/

To just automatically redirect to the SSO login without seeing any EMERSE application screen, just make sure the shibboleth session is required for the entire EMERSE application in httpd:

<Location "/emerse/">
  AuthType shibboleth
  ShibRequestSetting requireSession true
  require shibboleth
</Location>

Then shibboleth will redirect before allowing httpd to do the reverse proxy and show the EMERSE’s login screen.

Attestation

attestation.allowOtherAttestationReasons	If set to `true`, Quick Buttons with standard reasons are displayed to the user for selection
attestation.allowFreeTextAttestation	If set to `true`, users can enter a free text description describing their purpose of using EMERSE
attestation.showPriorAttestations	If set to `true`, the Attestation screen will display in the table prior free text attestation reasons used by the user.

Batch Updating Begin/End Dates

EMERSE every night will run a batch job that finds the date range of documents stored in the documents index. It stores this date range in the SOLR_INDEX table. This is used as the default date range for any search that doesn’t specify one specifically. It is also used to validate the any custom date range the user enters in the filters page, so that users don’t think they are searching documents from a time period that is not indexed at all.

However, our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).

To circumvent this potential problem EMERSE allows you to specify which of the start and end dates should be updated from the values found from the index itself. Typically, this means you can set a sensible START_DATE into the SOLR_INDEX table, and then tell EMERSE not to update it. Then, this start date will appear as the earliest date searchable within EMERSE.

batch.updateIndexMinDateFromSOLRIndex	If set to true, min date of documents is updated from Solr every night, which would be updated in the `solr_index` table.
batch.updateIndexMaxDateFromSOLRIndex	If set to true, max date of documents is updated from Solr every night, which would be updated in the `solr_index` table.
batch.updateINdexMaxDateFromSOLRIndexPastToday	If set to false (the default), if the max date found in the index is past today, today will be used as the end date of the index instead. The default (a value of true), means whatever max date found in the index is set as the end date.

If one or both of these properties is set to false, then the date entered in the solr_index table is what will be used for display purposes. For more information on this table see the section on the solr_index in the Data Guide.

Find Patients (All Patient Search)

The Find Patients function is when the system seeks to identify a set of patients based on the search terms. Initially this function only applied to all patients in the Solr index. As of version 6.1 the capability can also apply to a Patient List, which is generally a much smaller subset of patients compared to the entire index.

search.allPatientFragmentLimit

Number of fragments/text snippets to display for preview when using All Patient Search

Example:

search.allPatientFragmentLimit=100

search.facetDateRangeInterval

The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting.

Example:

search.facetDateRangeInterval=10

search.facets.excluded

Hide certain the given charts and filters. Generally this should be done because this data is no available at the site. These are comma-separated values. The values are:

Value

Graph Title in UI

SEX_CD

Sex

BIRTHDATE

Current Age

RACE_CD

Race

ETHNICITY_CD

Ethnicity

DECEASED_FLAG

Vital Status

MARITAL_STATUS_CD

Marital Status

SOURCE

Document Source

Example:

search.facets.excluded=MARITAL_STATUS_CD,DECEASED_FLAG

Synonyms

The properties below are the defaults that we recommend. In general there should be no reason to change them, but they are described here for completenes.

synonymList.valueLimit

Maximum number of characters that a single synonym term string can be. This is also limited by the settings in the database, but is used here to perform validation on the uploaded datasets to ensure they comply. The value limit should not be greater than the column width in the database, which is 255. Each database may have its own interpretation on the 255 limit, which is mainly decided by the character set of the database. Because some characters that may be included in a term could be Unicode (and not plain ASCII), they might take up additional bytes and exceed the size limit in the database. This is why we recommend 200 to be safe.

Example:

synonymList.valueLimit=200

synonymList.delimiter

The delimiter used to separate the columns for a synonyms dataset. We expect this should always be a tab.

Example:

synonymList.delimiter=\t

Miscellaneous

There are several components configurable within the EMERSE menu, which is available to all users in the upper-right portion of the window. In addition, there are things you can configure for the login page.

Contact Information

Users may want to contact a local administrator about issues or feedback about EMERSE. This can be accessed by users in the upper right menu through either the About option or the Feedback option. Both of these menu options have some hard-coded text followed by a customizable URL that can be defined using the two properties listed below. The About menu item contains text beginning with "Please direct feedback and issues to…" and the Feedback menu item contains text beginning with "Please send any comments or suggestions you may have about EMERSE to…". The remaining text is defined the the two properties:

contact.url	This is the URL, or the `mailto` URL that will direct the user to the correct resource. Examples: contact.url=https://link.to.help/server contact.url=mailto:emersehelp@university.edu
contact.text	This is the text that wil be displayed on the screen for the URL. Example: contact.text=EMERSE help desk The resulting URL would then be constructed using the two properties above to look something like: `<a href="https://link.to.help/server">EMERSE help desk</a>`
login.hint	This is a small snippet of text that appears on the EMERSE login page to help users know what login credentials to enter. This is by default blank.
resources.dir	This is a directory containing web resources. Right now, only the "cover" photo on the EMERSE login page (not the admin app one) can be set. By default there is none, so you should do this. The name of the cover photo should be either `cover.png` or `cover.jpg` inside the directory specified by the setting. The photo should be around 1200 pixels tall, though the size is up to you. The browser will "zoom" in on it so it takes up the entire browser window, though the left-hand side of the image will be obscured by the login panel itself.
userGuideUrl	This is the link that contains the user guide. By default (if nothing is defined) it will link to the main user guide on the project-emerse.org website. If you have your own user guide you can link to that instead by replacing the URL. Example: userGuideUrl=http://project-emerse.org/documentation/user_guide.html

Timeouts

A user’s session is configured to be timed out due to inactivity. If the app is idle and does not encounter a mouse click, mouse move, mouse scroll or a keypress activity for a configured timeout setting, the application logs the user out of their session and the login page is presented. The following properties can be added to the project.properties to override the defaults.

This timeout feature does not apply to the Attestation screen, because at this point no Protected Health Information (PHI) would be displayed. Nevertheless, EMERSE would still timeout based on the server timeout settings even though the countdown window for a forced logout would not be shown to the user.

application.idle.timeout	Number of seconds to run the timer when the application is idle. The default value is 3600 if this property has not been added to the properties file. The value should be in seconds. Example: application.idle.timeout=3600
application.warn.length	Number of seconds to show the timeout warning window. The default value is 30 if this property has not been added to the properties file. Example: application.warn.length=30

Logging

EMERSE does logging through slf4j, but provides in own implementation. A logging event is a message at a certain severity (aka log level). Log levels are given in the below table under the level option. A log is a place logging events can be written to, usually a file, but it can be a memory buffer or the standard out of the program as well. A logger is abstract location events are sent to. Loggers are arranged in a hierarchy based on their name. A logger is a parent of another logger if its name is a prefix of the other logger’s name.

Both log events and loggers have associated severities. Each logger will write events to a configured set of logs if the severity of the event is at least as severe as the severity set for the logger. In other words, the severity of the logger is the minimium severity required by events to be logged.

The application code chooses the severity of logging events, but configuration decides the severity and logs for loggers. If the logs or severity of a logger is not explicity set, it is inherited from the logger’s nearest parent in which it is explicitly set.

In some logging implementations, events chain up from child loggers to parent loggers so that logs set on parent loggers will also log the event even though it was already logged at a child. This is not the case here.

Logging can be controlled by property names starting with log. The general form of these properties are log.<option>.<name>=<value>. The name part names either a logger or a log depending on the option. In some cases it is not used, in which case it can be omitted. It can also be omitted if the logger named should be the root logger, which is the logger whose name is the empty string.

The options are:

Option Meaning

Option	Meaning
directory	The value is a path to the directory the log files are stored. If not set, this defaults to the `resources.dir` property’s value.
file	The value is the base name of the file the named log is written to.
keepDays	The value is the number of days to keep log files. Any time a logging message is written, we check to see if the logs should be roteted. If so, we also delete old log files older than this number of days. The default is 180.
pattern	The value is a pattern describing how the named log should render log messages. The syntax of the pattern is covered below.
level	The value is a log level, one of `ERROR` (most severe), `WARN`, `INFO`, `DEBUG`, or `TRACE` (least severe), to be set on the logger of the given name.
enabled	The value is a white-space separated list of log names which should be written to by the named logger. All logs you wish the logger to write to must be named in a single property; listing the property multiple times with different values will not add the values together. Logs set on parent loggers will not be logged to.

 directory

The value is a path to the directory the log files are stored. If not set, this defaults to the resources.dir property’s value.

 file

The value is the base name of the file the named log is written to.

 keepDays

The value is the number of days to keep log files. Any time a logging message is written, we check to see if the logs should be roteted. If so, we also delete old log files older than this number of days. The default is 180.

 pattern

The value is a pattern describing how the named log should render log messages. The syntax of the pattern is covered below.

 level

The value is a log level, one of ERROR (most severe), WARN, INFO, DEBUG, or TRACE (least severe), to be set on the logger of the given name.

 enabled

The value is a white-space separated list of log names which should be written to by the named logger. All logs you wish the logger to write to must be named in a single property; listing the property multiple times with different values will not add the values together. Logs set on parent loggers will not be logged to.

There are four logs by default:

out: this log logs to standard out
memory: this log logs to an internal memory buffer that is shown as the log in the admin app. It will only render events with a severity of WARN or greater. This is not configurable.
emerse: this log logs to a file name starting with emerse
security: this log logs to a file name starting with emerse-security

By default the root logger (whose name is the empty string so that it is a parent of all loggers) will log the out, memory, and emerse logs. The org.emerse.security logger will log only to the security log (and thus its events will not appear in the other three).

Thus, the default configuration is effectively:

log.file.emerse   = emerse
log.file.security = emerse-security

log.pattern.security = %d %m{user} %m{session} %m{ip} %m{forwarded-ip}: %e\n
log.pattern.emerse   = %d %l %g %e\n%x
log.pattern.memory   = %d %l %g %e\n%x
log.pattern.out      = %d %l %g %e\n%x

log.enabled = out memory emerse
log.level   = ERROR

log.enabled.org.emerse.security = security
log.level.org.emerse.security   = TRACE

Log Files

All log files are created in the same direcotry, which is the path given by the log.directory property, or if that is not set, the ${resources.dir}/logs directory. (It is an error to not set the resources.dir property.) Log files have names that start with the prefix given by the file option given above. After this prefix, a dash is added, plus the ISO 8601 date, then a dot, and then a number starting from zero to make the file unique, and then finally the extension .log. Thus, log files will look like:

emerse-2024-09-01.0.log
emerse-security-2024-09-01.0.log

Log files can have any size, but each time the logging system initializes (such as on tomcat restart or emerse re-deploy), it will open a new file, picking a new number at the end to make the file distinct. An easy way to see which file tomcat is currently writing to is to run lsof * in the log directory. This will list which files are currently open, and which processes currently have them open. However, in general, it should be the highest-numbered file with today’s date.

Log Patterns

The log pattern can be set using the following special properties:

Option Meaning

Option	Meaning
%d	Write the ISO 8601 timestamp of the event (in UTC)
%e	Write the message of the log event
%l	Write the name of the severity (log level) of the log event
%g	Write the logger’s name which is the name the event was logged to
%x	Write the exception message and stack trace of the exception of the log event
%m{KEYNAME}	Write the value of named key in the MDC. The key name can be either `user`, `session`, `ip` or `forwarded-ip`, whose values are as you would expect, but specifically `user` is the user id, `session` is the HTTP session cookie, and `forwarded-ip` is the value of the corresponding header if present.

%d

Write the ISO 8601 timestamp of the event (in UTC)

%e

Write the message of the log event

%l

Write the name of the severity (log level) of the log event

%g

Write the logger’s name which is the name the event was logged to

%x

Write the exception message and stack trace of the exception of the log event

 %m{KEYNAME}

Write the value of named key in the MDC. The key name can be either user, session, ip or forwarded-ip, whose values are as you would expect, but specifically user is the user id, session is the HTTP session cookie, and forwarded-ip is the value of the corresponding header if present.

You can also write newlines as a \n. Other control characters can be similarly done. Keep in mind that in a properties file, whitespace is stripped at the start and end of the value (and key). Thus, to have such whitespace logged, you must escape it as well.

Noteworthy Loggers

There are a few noteworthy loggers that get events that may be of interest during debugging.

Logger Description

Logger	Description
org.emerse._timing	Child loggers to this logger generally get events that describe how long an operation took, and the severity of the logging event is chosen to correspond to that time. Specifically, if the operation took at least 10 seconds it will be logged at `WARN`, and as it takes less time it’ll be logged at a lower severity, but will always be logged at the `TRACE` level.
org.emerse._timing.sql	This logger gets an event for each SQL statement we run, at a severity that depends on how log it took to execute that statement.
org.emerse._timing.jdbc	This logger gets events for certain JDBC methods at severities based on how long the method took to run.
org.emerse._timing.servlet	This logger gets events for each request to the server at severity based on the time it took to respond.

 org.emerse._timing

Child loggers to this logger generally get events that describe how long an operation took, and the severity of the logging event is chosen to correspond to that time. Specifically, if the operation took at least 10 seconds it will be logged at WARN, and as it takes less time it’ll be logged at a lower severity, but will always be logged at the TRACE level.

 org.emerse._timing.sql

This logger gets an event for each SQL statement we run, at a severity that depends on how log it took to execute that statement.

 org.emerse._timing.jdbc

This logger gets events for certain JDBC methods at severities based on how long the method took to run.

 org.emerse._timing.servlet

This logger gets events for each request to the server at severity based on the time it took to respond.

Changing the Logging Implementation

Up until now, we’ve been talking about EMERSE’s native implementation of Logging. However, EMERSE internally uses a logging facede called slf4j, which allows you to swap out the implementation of the logging system so you can log to more than just files in a single directory. To use an alternative logging implementation, you first have to remove our implementation, and then add you own. Suppose you want to use log4j (which is what we used prior to version 7). You will need to remove our logging jar from the emerse.war file and then add the log4j jar. To do this, you can list the contents of the war file (which is just a zip), issue add and remove commands, and then you can deploy your modified version of the war file. The emerse log file is called emerse-log-VERSION.jar and should be in the WEB-INF/lib directory inside the war file:

$ zip -l emerse.war
Archive:  emerse.war
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  09-17-2024 16:16   META-INF/
      166  09-17-2024 16:16   META-INF/MANIFEST.MF
        0  09-17-2024 16:16   WEB-INF/
        0  09-17-2024 16:16   WEB-INF/lib/
   161902  09-03-2024 22:35   WEB-INF/lib/HikariCP-5.0.1.jar
  1487418  09-03-2024 22:35   WEB-INF/lib/activemq-client-6.1.0.jar
    70816  09-03-2024 22:35   WEB-INF/lib/commons-logging-1.3.0.jar
    28011  09-09-2024 18:02   WEB-INF/lib/emerse-io-7.0.0.jar
    21335  09-17-2024 16:16   WEB-INF/lib/emerse-log-7.0.0.jar
    45896  09-16-2024 14:20   WEB-INF/lib/emerse-network-7.0.0.jar
   539965  09-17-2024 16:16   WEB-INF/lib/emerse-server-7.0.0.jar
    72844  09-16-2024 14:20   WEB-INF/lib/emerse-solr-search-7.0.0.jar
    56403  09-17-2024 16:16   WEB-INF/lib/emerse-sql-4.0.4.jar
    15148  09-10-2024 14:37   WEB-INF/lib/emerse-util-7.0.0.jar
    50155  09-03-2024 22:35   WEB-INF/lib/hawtbuf-1.11.jar
    58964  09-03-2024 22:35   WEB-INF/lib/jakarta.jms-api-3.1.0.jar
    67831  09-03-2024 22:35   WEB-INF/lib/slf4j-api-2.0.11.jar
    84035  09-03-2024 22:35   WEB-INF/lib/spring-security-crypto-5.8.3.jar
        0  09-03-2024 23:08   WEB-INF/classes/
       70  09-17-2024 16:16   WEB-INF/classes/build.properties
     2608  09-03-2024 16:03   WEB-INF/web.xml
     2966  09-03-2024 16:03   diagnostics.html
        0  09-17-2024 16:16   assets/
      553  09-17-2024 16:16   assets/color-palette.png
     5023  09-17-2024 16:16   assets/csv.svg
     4284  09-17-2024 16:16   assets/number-icon.png
    16939  09-17-2024 16:16   assets/tsv.svg
   821792  09-17-2024 16:16   assets/zxcvbn.js
     4286  09-17-2024 16:16   favicon.ico
      557  09-17-2024 16:16   index.jsp
  2428734  09-17-2024 16:16   main.js
  1715616  09-17-2024 16:16   main.js.map
   170049  09-17-2024 16:16   polyfills.js
   191778  09-17-2024 16:16   polyfills.js.map
     6860  09-17-2024 16:16   runtime.js
     5850  09-17-2024 16:16   runtime.js.map
   580222  09-17-2024 16:16   styles.css
   761393  09-17-2024 16:16   styles.css.map
  8005805  09-17-2024 16:16   vendor.js
  9903757  09-17-2024 16:16   vendor.js.map
        0  09-09-2024 18:38   admin/
        0  09-09-2024 18:38   admin/assets/
  2334198  09-09-2024 18:38   admin/assets/file-cabinet.jpg
     4286  09-09-2024 18:38   admin/favicon.ico
      562  09-09-2024 18:38   admin/index.jsp
  1132476  09-09-2024 18:38   admin/main.js
   796931  09-09-2024 18:38   admin/main.js.map
   170044  09-09-2024 18:38   admin/polyfills.js
   191777  09-09-2024 18:38   admin/polyfills.js.map
     6858  09-09-2024 18:38   admin/runtime.js
     5848  09-09-2024 18:38   admin/runtime.js.map
   579673  09-09-2024 18:38   admin/styles.css
   763809  09-09-2024 18:38   admin/styles.css.map
  5004009  09-09-2024 18:38   admin/vendor.js
  7050898  09-09-2024 18:38   admin/vendor.js.map
---------                     -------
 45431400                     55 files


$ zip -d emerse.war WEB-INF/lib/emerse-log-7.0.0.jar
deleting: WEB-INF/lib/emerse-log-7.0.0.jar
$ zip emerse.war ~/Downloads/log4j-core-2.24.0.jar
  adding: Users/mcclaink/Downloads/log4j-core-2.24.0.jar (deflated 12%)

You will of course have to replace do this change every time you deploy EMERSE.

Once this is done, you can configure the new logging system as described on their website. Often, other logging systems require more than just a single jar, especially if you are not just writing to files. We can help you configure a different logging system or help you find one if you need to log things in a specific way.

Overall Patient Count (and Potential Problems)

EMERSE displays the total number of patients in the system with respect to conducting an All Patient Search across all of the patients. This count is updated using the Spring Scheduler within the app itself, and should auto-update about every 30 minutes. The overall patient count is not configurable since it is derived from the data loaded into the system. Specifically, this count is based on the distinct number of MRNs that are associated with all of the documents in the Solr index. It is not based on the total number of MNRs in the database table, Patient. Thus, if a patient is in the Patient table but does not have an associated document, that patient will not be counted towards the total number of patients.

The total patient count displayed in the user interface is stored in the PATIENT_COUNT column of the SOLR_INDEX table in the database. This count is refreshed periodically based on a background process that retrieves the unique numebr of MRNs from the Solr documents index. Additional details about configuring the schedule for this process can be found within this guide in the section called Solr Patient Index Replication Interval. However, the overall patient count can also be forced to refresh immediately using the 'System Synchronization' feature found within the admin application.

There are two ways in which patient count discrepancies might occur. One is fine and expected, and the other is problematic.

Expected (Non-Problematic) Patient Count/MRN Discrepancy

In the situation described above, the patient count seen within the EMERSE application (which comes from the unique MRNs from the document index in Solr) might not match the patient count in the Solr patient index. The patient index is copied from the Patient database table. This discrepancy is not a problem since the patients coming from the database table could be a larger set of patients from an EHR system, and not all of them need an associated indexed document. If there is a desire to check the number of unique MRNs in the Solr index one can issue a query from the Solr browser application…

http://your-host:8080/solr/documents/select?indent=true&json.facet={"uniqueMRNs":"unique(MRN)"}&rows=0&q=*:*

…where the index is named documents and the host and port need to be filled out for your specific setup. If you use a different index name for your documents index just change that part in the query.

The output will look like:

{
  “responseHeader”:{
    “status”:0,
    “QTime”:25,
    “params”:{
      “json.facet”:”{\”uniqueMRNs\”:\”unique(MRN)\”}”,
      “q”:”*:*”,
      “indent”:”true”,
      “rows”:”0”}},
  “response”:{“numFound”:635861,”start”:0,”numFoundExact”:true,”docs”:[]
  },
  “facets”:{
    “count”:635861,
    “uniqueMRNs”:10000}}

The actual count of MRNs is under the facets key, listed as uniqueMRNs. That should be what EMERSE reports in the application as the number of patients, and what is stored in the SOLR_INDEX database table.

Unexpected (Problematic) Patient Count/MRN Discrepancy

If there is an MRN in the Solr documents index without a corresponding MRN entry in the Solr patient index, problems can occur. This is because if that document is returned in a query the system will seek more details about the patient from the patient index. If no patient can be found, an error is likely to occur. To detect this specific type of problem, vist the Admin app, Reports → Missing Patients.

It is worth pointing out that this problem isn’t necessarily due to a discrepancy in the count of MRNs (although a count that does not match up might be a good indicator). Rather, the problem happens if the specific MRNs don’t match between the document index and the patient index, even if that overall number of unique MRNs in each index are the same.

Network

The EMERSE networking capability has multiple configuration settings. These are all described in the Networking Guide.

Optimization

Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.

Tomcat (EMERSE application)

To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. You can add these arguments in the setenv.sh script in the bin directory of Tomcat. If it doesn’t exist, just create it. See Tomcat’s catalina.sh script for more details on this script.

apache-tomcat/bin/setenv.sh

export CATALINA_OPTS="$CATALINA_OPTS -Xmx2048m -Xms1024m"

Compression

You can enable compression in Tomcat by modifying tomcat’s server.xml file. In it, there should be at least one uncommented <Connector> element with a number of attributes. For example it may appear as:

apache-tomcat/conf/server.xml

    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443"/>

In which case, you should add compression options to make it look like:

apache-tomcat/conf/server.xml

    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443"
	       compression="on"
               compressionMinSize="1024"
               compressableMimeType="text/html,text/xml,text/javascript,text/css"/>

This makes loading the EMERSE application in the browser considerabliy faster. If you have multiple connector elements uncommented, you may want to do this to all of them, or at least the one you are serving the main EMERSE files from. Compression can also be done at the webserver instead (such as HTTPD or nginx).

You will need to restart tomcat to enable this configuration. You can confirm its working by seeing the Content-Encoding: gzip header in responses from the server, or seeing a smaller transfer size than response size in the browser’s network tab in its developer tools. Compression will only be done on files larger than 1024 bytes (as set in the compressionMinSize attribute).

Solr Index Optimization

Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the index. Optimizing will also merge many smaller index segments into fewer, but larger, segments which also generally improves performance.

During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about many hours (possibly even a few days) to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.

Details about the optimization process can be found at https://solr.apache.org/guide/solr/latest/configuration-guide/index-segments-merging.html

You can define the optimization deatils in the solrconfig.xml file. For Michigan Medicine we have configured it as:

<indexConfig>
        <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        <int name="maxMergedSegmentMB">100000</int>
        </mergePolicyFactory>
        <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
                <int name="maxThreadCount">1</int>
                <int name="maxMergeCount">6</int>
        </mergeScheduler>
  </indexConfig>

Solr Caches

Solr has a number of caches. Two important ones which are not configured by default are the filter cache and the document cache. These caches are index-specific, and are specified in the solrconfig.xml guide:

solrconfig.xml

<config>
  ...

  <query>
    <filterCache class="solr.FastLRUCache"
        maxRamMB="1024"
        showItems="10"/>
    <documentCache class="solr.LRUCache"
        size="4000"
        showItems="10"/>
  </query>

  ...
</config>

The maxRamMB attribute tells Solr how large the cache should be allowed to grow to before evicting entries. The size attribute tells Solr how many entries are allowed before evicting becomes necessary. Only one should be specified, but on earlier versions of Solr 8, the maxRamMB can’t be specified on the documentCache. (You’ll get an error if you do.)

We’d recommend configuring these caches both in the documents index and the patient index, since both are queried by EMERSE. The documents index should have larger caches since it is used more heavily.

Not all queries that are sent to Solr are cached. We currently can cache the following queries (in the standard lucene syntax): 1. SOURCE:XXX 2. MRN:XXX 3. the collection of "filters" specified in EMERSE as a single query

To get a sense of the size of the queries, at UM, we have 200 million documents, and the first kind of query takes up 20-30 MB of space for each source in the system. (We have about five sources.) The second kind tends to take 2-30 KB (about a thousand times less), and the third kind takes between those two, depending on how selective the filters as a whole are. (The more selective, the smaller.)

Viewing Cache Statistics

You can view the usage of the caches in the Solr Admin UI. Just go to the core, click on "Plugins / Stats" then "Cache" and the types of caches should appear. Expand the "filterCache" or "documentCache" to look at that one. The showItems="#" in the XML tells the UI to show a random sample of the cached filters or documents, including their size. In addition, you can see how many cache lookups have been performed, evictions, and the hit-ratio. For the documents core’s filter cache, we have a hit ratio of above 0.95.

Solr Memory

Solr (on Linux at least) memory maps the index files, meaning it’s virtual memory will be about as large as the index file size. This means the virtual memory size of the application can be vastly larger than physical memory. (For instance, our Solr has at least 1 TB of virtual memory.)

These memory-mapped files are not a part of the Java heap space and so don’t contribute toward the -Xmx flag. In addition, the OS manages what of those files actually in physical memory and what isn’t. (Depending on the tool you use to look at memory on the box, the memory used for memory-mapped files may appear used or not; the OS is free to use that memory for another purpose, so it is, in a sense, free.) Solr internally caches the results of certain queries or parts of queries, so that if they are used frequently, the search doesn’t need to be done again. These do reside in the Java heap space.

So, you must strike a balance by having enough Java heap memory for query caching, and enough otherwise free memory on the box so that the OS has plenty of space to cache Solr’s memory-mapped files. At University of Michigan, we currently allocated 3GB to Solr’s heap, and have at least 20 GB of free memory on the box for the OS to cache files.

Allocating more heap space for Solr doesn’t mean you won’t have to tweak some of Solr’s cache settings, though there is some cache-sizing based on the max heap size. We haven’t done a ton of testing with caching, so we’ll say no more on this for now.

Solr’s memory can be configured with flags to the solr start command, or set as a default when starting solr by adding it to the solr.in.sh configuration.

./solr start -m 3g

solr-8.X.Y/bin/solr.in.sh

SOLR_HEAP=3g

You may need to pass other flags such as -s when starting Solr, as described in the Installation Guide.

If you are concerned with the performance of the garbage collector or free memory, you can see the frequency and duration of garbage collection in Solr’s GC log, contained in SOLR_INSTAL_DIR/server/logs/solr_gc.log.

Scheduled Tasks

EMERSE caches some data and has certain perodic work it usually needs to do. These tasks are executed on a schedule which can be changed via properties in the emerse.properties file. All tasks can be turned off by using the no-scheduler spring profile.

The schedule of the tasks (used in the properties file) uses a cron-like syntax to specify the schedule, which consists of six fields separated by whitespace. The first field is the seconds, then it’s minutes, hours, day-of-the-month, the month number, and then day-of-the-week. A field can have a number in it, appropriate for the field, a star meaning every value of the field, or a question mark, meaning no restriction. A more formal description of the syntax can be found in the Spring Documentation, specifically the component regarding the Class CronSequenceGenerator.

There are three tasks EMERSE does as described by the properties below:

task.updatePatientIndex.cron	The schedule to update the Solr patient index to match the contents of the `PATIENT` table in the database. The default time is 7:30 AM. Default: task.updatePatientIndex.cron=00 30 7 * * ?
task.refreshCaches.cron	This refreshes a variety of caches of static data from the database, such as the `DOC_FIELD_EMR_INTENT` table, along with more dynamic data, such as the set of MRNs in the `PATIENT` table. By default, this is done at 6:30 AM. Default: task.refreshCaches.cron=00 30 6 * * ?
task.updateIndexStatsViaSolr.cron	This runs a job that finds the minimum and and maximum dates of the document index, along with the number of distinct MRNs in the document index, which are used for updating the date ranges displayed in EMERSE as well as the overall patient count shown when conducting an All Patient search. The date used is not the `CLINICAL_DATE` (which is used for searching) but the `LAST_UPDATED` date. The actual Solr field name is determined by the mapping of these EMR intents by the `DOC_FIELD_EMR_INTENT` table. The default time is every hour, around minute 42. For instance, 1:42, 2:42, 3:42, etc. Default: task.updateIndexStatsViaSolr.cron=00 42 * * * ?

If you change the scheduled time of this process, you will have to restart Tomcat for the changes to take effect.

It is possible to force these copying and indexing events to occur on demand, which may be useful for troubleshooting or when testing with an initial setup. Details about how to do this are described in the Administrator Guide.

EMERSE Search Concurrency

The Overview screen in EMERSE is computationally expensive to show. Currently, it takes generally two searches for each cell of the table, more for the mosaic view.

To complete this work quickly and fairly among multiple concurrent users, EMERSE internally has a priority queue of batches of rows from this table. When a user goes to a page of the Overview table, EMERSE adds the rows of that page as a batch in the priority queue. Worker threads complete rows from the most-neglected batch in the queue. As rows are completed for the most-neglected batch, other batches will eventually become the most-neglected, and then rows will start being completed that batch. In this way, roughly equal time is spent on each batch.

There is one setting you can tweak on this search priority-queue.

overview.workers

This is the number of worker-threads that process rows from the batches. This determines the number of concurrent searches that are sent to Solr. Default 7.

Security Hardening

Solr

Solr also provides a REST API that can be accessed with tools such as curl. By default this is not locked down and should be secured with basic authentication if the Solr ports are not firewalled to external communication.

Solr can be set up to use SSL/TLS, and require authentication with basic auth. Both of these features are supported by Solr Cloud, but EMERSE does not yet support Solr Cloud. However, the Jetty servlet engine embedded by stand alone Solr can be modified to require authentication and use SSL.

Much of the Solr documentation pertains to Solr Cloud, which is NOT currently supported by EMERSE. Look for references to a single node configuration when consulting Solr documentation.

Solr SSL Setup

Changes are required in solr.in.sh found in bin directory under the Solr_INSTALLATION directory. Essentially uncomment the lines below and configure them with values appropriate to a java keystore containing the certificate for the server.

SOLR_SSL_KEY_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=keystore password
SOLR_SSL_TRUST_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=keystore password
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

See the "Basic SSL Setup" section at the following link for more information.

https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html

Basic Auth

If Basic Auth is desired, there are several ways in which Basic Auth can be configured. Solr provides its own approach, but another approach uses the Jetty servlet engine bundled with Solr.

The first step is to modify the jetty.xml file inside the SOLR_INSTALL_DIR/server/etc folder, adding the following snippet inside the <Configure></Configure> tags.

  <Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
        </New>
      </Arg>
    </Call>

After adding in the xml snippet, add a user/password combination to the file realm.properties located in SOLR_INSTALL_DIR/server/etc. If the file doesn’t exist just create a new file and add the following line to it.

solradmin:password, admin-role

In the above, the username is "solradmin" and the password is "password".

Also, the following needs to be added to the webdefaults.xml file:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

For more information on configuring Jetty with Basic Authentication, see here https://www.eclipse.org/jetty/documentation/9.3.0.v20150612/configuring-security-authentication.html

Configuration password

Passwords specified in the project.properties files can themselves be encrypted. Any property value that is inside the paranetheses of ENC() will be decrypted when used. The decryption settings are determined by three environment variables. Note that these must be environment variables; they cannot be Java system properties or other kinds of parameters.

EMERSE_CIPHER_TRANSFORM: This names the cipher algorithm, mode, and padding to be used. The default is PBEWithMD5AndTripleDES. You can find more information here.
EMERSE_CIPHER_KEY: This is base64’d binary that is the key to the cipher. The exact format depends on the cipher, but typically it is the binary of a password, or random bytes of the length of the block size of the cipher.
EMERSE_CIPHER_PARAMETERS: This is base64’d binary that encodes the parameters of the cipher. The exact format depends on the cipher, but typically it is in a DER format.

Since the binary of key and parameters isn’t trivial to produce yourself, you can generate it from one of EMERSE’s jars inside the EMERSE war file. The easiest way to do this is to go to the exploded emerse.war file in tomcat:

cd apache-tomcat-*/webapps/emerse/WEB-INF/lib
java -cp emerse-server-*.jar org.emerse.server.Cipher -h

This will list the operations the utility can do. You will need to invoke the key action, and parameters action to generate the two environment variables. For instance, suppose we want to use a password of abc and 100000 iterations for our cipher, then we would do:

$ cd apache-tomcat-*/webapps/emerse/WEB-INF/lib
$ java -cp emerse-server-*.jar org.emerse.server.Cipher key
Enter Password:
abc
YWJj
$ export EMERSE_CIPHER_KEY=YWJj
$ java -cp emerse-server-*.jar org.emerse.server.Cipher parameters
Iteration count?
100000
MA8ECM4MTFHmdp5NAgMBhqA=
$ export EMERSE_CIPHER_PARAMETERS=MA8ECM4MTFHmdp5NAgMBhqA=

The two export lines should also be copied in your tomcat startup script, placed before running Tomcat. Alternately, you can set those environment variables in some other way when starting tomcat, and it should stil work. For instance, if you have tomcat starting via systemd, then you can specify environment variables in the service unit file. Note that the parameters includes an 8-byte salt value which is randomly computed, so you won’t get the same output as above when giving the same iteration count.

Assuming you have the two cipher environment variables exported in your shell, you can now use the utility to encrypt values so that you can put them into the properties file. Do so by running the utility with the encrypt argument, and then type in each value you want encryted, hit return, and it will give you the encryted form to copy into the properties file. For instance, if you have set the key and parameters to those values above, you should see the same output shown below if you encrypt the two values abc and def:

$ java -cp emerse-server-*.jar org.emerse.server.Cipher encrypt
abc
ENC(z4IXwVogejk=)
def
ENC(muYaCCwwDog=)

You can then place these in the properties file, such as in the database password property:

...
ds.password=ENC(muYaCCwwDog=)
...

Exported Excel files

EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server inside the exploded EMERSE war file, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file persists on the server. We currently have a shell script on the server that executes every 30 minutes and deletes files older than 60 minutes.

#!/bin/sh
cd /PATH_TO_TOMCAT_INSTALL/webapps/emerse/downloads \
        2> /dev/null || exit 0
find . -name "*.xlsx*" -mmin +60 -exec rm {} \;

Admin Application

Most details related to the Admin application and Admin features can be found in the Administrator Guide. Below is a high-level summary of the Admin features.

EMERSE users that have an ADMIN role have access to the admin application located at:

http://host:port/emerse/admin2

The application has two main features- user management related to authorization, and maintenance of synonyms.

Add/Remove users

The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.

Roles and Privileges

Roles and Privileges for EMERSE users can be customized. Details about how this is done can be found in the Administrator Guide.

Synonyms

The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a TSV file. Multiple datasets can be added and managed through the interface in the Admin app. More details are discussed in the Administrator Guide.

Synchronization

The admin application has an option to "synchronize" various data between the database and Solr. While this happens automatically overnight it can be useful to force this more frequently, especially during initial system setup and testing. Details can be found in the Administrator Guide.

Supporting Multiple Environments

It may be ideal to support multiple EMERSE environments such as test, dev, prod, etc. We have found that sometimes it can be difficult for users who are testing EMERSE to know what specific system they are using. To make it easier to distinguish between multiple instances of EMERSE, the system has the ability to display a small, but obvious, box in the upper right part of the screen to inform users. Having this information in a database table is useful because it can remain stable even as the application itself gets upgraded.

This information is defined in a table with a single row called ENVIRONMENT_INFO:

Column Name

Description

id

This should set to 0 and not changed.

environment

This is the environment that is active (dev, test, prod, etc). This is a free text option so can be anything (e.g., "Development", "Testing", "Production", etc.)

display_on_ui

This is a flag to determine if the text for environment should be displayed on the screen or not. 1=display, 0=do not display. In general you would not display this to users in the Production system.

The version number of the application (displayed when selecting the About menu) is distributed with the WAR file itself and is not contained in the database.

EMERSE Configuration and Optimization Guide

Configuration

Web Server

Solr configuration

Loading Configuration

Version number

Profiles

Application Settings

Database

Solr/Lucene

Patient Lists and MRN validation

LDAP

Single Sign-On or Pre-Auth

Pre-Auth Security

Shibboleth

User Initiated Login

Automatic Login

Attestation

Batch Updating Begin/End Dates

Find Patients (All Patient Search)

Synonyms

Miscellaneous

Timeouts

Logging

Log Files

Log Patterns

Noteworthy Loggers

Changing the Logging Implementation

Overall Patient Count (and Potential Problems)

Expected (Non-Problematic) Patient Count/MRN Discrepancy

Unexpected (Problematic) Patient Count/MRN Discrepancy

Network

Optimization

Tomcat (EMERSE application)

Compression

Solr Index Optimization

Solr Caches

Viewing Cache Statistics

Solr Memory

Scheduled Tasks

EMERSE Search Concurrency

Security Hardening

Solr

Solr SSL Setup

Basic Auth

Configuration password

Exported Excel files

Admin Application

Add/Remove users

Roles and Privileges

Synonyms

Synchronization

Supporting Multiple Environments