EMERSE Configuration and Optimization Guide

Overview

This guide describes modifications that can be made that impact the behavior of EMERSE, tuning of the JVM associated with the EMERSE application and Solr, and some security hardening procedures.

Configuration

Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are the URL and credentials to the database and Solr.

Web Server

While it is not strictly required to have a web server such as apache’s httpd, or nginx, in front of Tomcat, it is typical. The only special configuration EMERSE currently needs is that the socket timeout should be set to around 10 minutes in order for synonym upload to work on very large synonym sets (such as those we distribute).

Solr configuration

Solr has its own configuration files in two places. There is the installation directory, which is where you unzip the distribution, often referred to as solr-8.1.1/ in these guides. Then there is the configruation of the indexes, also called "cores" in Solr. The indexes reside in the Solr "home" directory, $SOLR_HOME which is either set by you in solr-8.1.1/bin/solr.in.sh or specified with -s when running solr start. If left unset, $SOLR_HOME is solr-8.1.1/server/solr/.

Each core has its own configuration files: - $SOLR_HOME/core_dir/core.properties - $SOLR_HOME/core_dir/conf/solrconfig.xml - $SOLR_HOME/core_dir/conf/managed-schema

The core.properties file defines the name of the core. It usually contains a line like name=documents or name=patient. This name doesn’t have to match the name of the directory it is in, but it’s less confusing if it does. This name determines the URL used to query the core, and how it appears in Solr’s admin app. EMERSE must be told the name of the core, not the name of the directory, since it uses the web API to talk to the core.

The solrconfig.xml file defines the version of Lucene used to talk to the index. Changing this will require a re-index since that may change the format of the index on disk.

In addition, solrconfig.xml contains the configuration of request handlers, such as the search, spell check and query validation handlers. These and other settings can be changed without re-indexing.

EMERSE assumes a number of endpoints are defined in this file, and those themselves rely on EMERSE’s Solr plugin. To install the plugin, place it in $SOLR_HOME/lib/. (You will have to create the directory.) The definitions of the endpoints that must be present in the solrconfig.xml files are kept in "configsets". These are basically templates for a Solr core. They have the same structure, but miss some files, such as the core.properties file.

The managed-schema file describes the format of documents for the core. It says what fields exist and how they should be indexed and stored. This will likely need to be modified to match your local document metadata structures, and will need to match what the appropriate database tables. For more information see the Data Guide.

Loading Configuration

EMERSE loads configuration data through Spring’s "environment", which is an abstraction over a number of ways to configure processes generally, Java programs specifically, and Java web applications more specifically. It presents configuration much like process-level "environment variables" which map textual names to textual values. The name/value pair generally is referred to as a "property" in Java. Spring’s environment searches all manners of configuration in some order, the first property found wins. The order of search is roughly as follows:

Servlet Configuration Parameters
Servlet Context Configuration Parameters
JNDI Properties
Java System Properties
Environment Variables
The emerse.properties file

The emerse.properties file is searched for in the following order:

The value of the property emerse.properties.filepath, according to Spring’s environment
$HOME/emerse.properties (ie, in the home directory of the user tomcat runs as)
The file WEB-INF/classes/project.properties inside emerse.war (or its exploded directory) or any jar file inside the war file itself (This is to say, the file /project.properties on the classpath.) (There is no such file by default, and we don’t recommend re-packaging the war to add one; this is supported only for legacy reasons.)

Generally, all EMERSE configuration is placed in the emerse.properties file, though configuration may be split across any of the six places Spring’s environment searches.

In practice, we tend to put the emerse.properties file under the home directory of the user which tomcat runs as. However, if this is not desirable, or if multiple versions of EMERSE are running on the same server, you must specify the location of the properties file using another property, emerse.properties.filepath specified in one of the first five ways. We tend to do this using either Java system properties, or JNDI.

Java system properties are passed as arguments to the JVM, in the form -Dproperty.name=value. Since the Tomcat scripts are what actually start the JVM, you need to configure them to pass this argument, and the easiest way to do that is to set and export the CATALINA_OPTS environment variable before starting tomcat:

export CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/file"
bin/catalina.sh start

If you merely want to put the properties file somewhere other than $HOME/emerse.properties you can create a file bin/setenv.sh inside the tomcat installation and put the export statement there; catalina.sh automatically runs such a file internally if it exists. (See the comments inside catalina.sh or catalina.bat for more information.)

The downside of this is that each tomcat instance can host at most one deployment of EMERSE. To have multiple deployments of EMERSE in the same tomcat instance, you can set up each deployment with a different JNDI environment, which can give EMERSE a different location for the emerse.properties file.

The JNDI environment for a war file is configured by a file inside Tomcat’s configuration directory: conf/Catalina/localhost/emerse.xml. (The directories other than conf will need to be created.) The filename emerse.xml should match the name of the war-file in the webapps directory. Inside, add the XML:

<Context>
  <Environment name="emerse.properties.filepath"
               value="/path/to/emerse.properties"
               type="java.lang.String"/>
</Context>

You can add other properties here in the same way as well. They will take precedence over the values specified in any emerse.properties file loaded. So, if two deployments have the same configuration, except for which Solr instance or database they connect to, you could just set the solr.serviceURL or ds.url properties here.

Version number

The EMERSE version number is distributed as part of the WAR file that gets deployed to the server. This is not really a configurable option, but it is being mentioned here for the sake of completeness. It can be found in the WAR manifest, under META-INF/MANIFEST.MF. (WAR files are just zip files with the wrong extension. This is true of jar files too.) Try unzip -c emerse.war META-INF/MANIFEST.MF.

Profiles

EMERSE uses Spring profiles to enable optional services within the application. Active profiles are those listed in the value of the spring.profiles.active property in Spring’s environment. Profile names should be comma-separated.

Currently, there are two interesting profiles: ldap and no-scheduler.

ldap turns of the LDAP authentication mechanism which must be further configured with other properties. Additional information can be found in the LDAP section below.

no-scheduler turns off all scheduled jobs typically configured with the task.*.cron properties, described below. Activating this profile is used for testing only. (The scheduler is on by default, which is why activating the no-scheduler profile turns it off.)

See the Loading Configuration section for more details about how to put spring.profiles.active into Spring’s environment.

apache-tomcat/bin/setenv.sh

export CATALINA_OPTS="$CATALINA_OPTS -Dspring.profiles.active=ldap

emerse.properties

spring.profiles.active=ldap,no-ldap

Application Settings

The following are all the properties that configure EMERSE. These are looked up through Spring’s environment, as described in Loading Configuration. Generally, we set them in the emerse.properties file.

Database

ds.username	The username for the database account for the main EMERSE database.
ds.password	The password for the database account. This can be encrypted with Jasypt if desired.
ds.url	The JDBC url to connect to the EMERSE database. Example: ds.url=jdbc:oracle:thin:@databasehost:port:sid
ds.maxPoolSize	Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted.

Solr/Lucene

solr.serviceURL	The URL the application will use to access the Solr instance. Example: solr.serviceURL=http://localhost:8983/solr
solr.username	The username used when making connection to Solr when configured with basic auth.
solr.password	The password used when making connection to Solr when configured with basic auth.
solr.unifiedCollection	The name of the Solr core that contains patient documents. Tends to be `documents` or for older installations, `unified`.
solr.patientUpdateCollection	The name of the Solr core that EMERSE copy the `PATIENT` table to as part of a scheduled batch job each night. Typically, this is `patient`.
solr.patientSearchCollection	The name of the Solr core that should be used to search patients. Typically, this is `patient-slave`, the replica of `patient`
solr.wildcard.minLength	This is the minimum number of characters in a term before the first wildcard character, `*`, in a search. Terms which have less than this are rejected as invalid. This does not apply to advanced search. Wildcard searches can be very taxing on Solr, more taxing if more terms match the wildcard; enforcing a minimum prefix ensures these shouldn’t be too taking. The default value is 3.

The patient cores were created to facilitate the graphs summarizing demographics shown in the results of an all-patient search. They are also used in filters.

It is worth noting that the Solr patient index is replicated within EMERSE, with one serving as a backup for the other if the main one ever became corrupted. The larger, production Solr document index is not replicated in this way, mainly because it is so large. In other words, this type of 'slave' index is good practice, but may not be practical for the larger indexes.

Patient Lists and MRN validation

There are various options available for validating user-entered patient medical record numbers (MRNs). MRNs in the PATIENT table are in a "canonical" form which is formatted for display to the user, and MRNs entered by the user are "cleaned up" to match the canonical form. The "formatting for display" portion is determined by patientList.MRNFormat and patientList.MRNFormatUseZeroPad. The "cleaning up" portion is covered by patientList.stripRegexOnMRNInsert.

If your MRNs are numbers possibly with leading zeros, we suggest you pick the "canonical" form as with no leading zeros. This allows you to be the most forgiving of minor errors because you can strip leading zeros on input, and but show the leading zeros when showing the MRNs to the user.

We suggest being this forgiving since MRNs may come from different systems (or go through several systems) before being sent to EMERSE, and not all of them may preserve leading zeros.

patientList.MRNLimit	The number of patients that can be added to a patient list. We set it to 100,000 by default.
patientList.maxErrors	The maximum number of errors that are shown when user uploads or inserts new patients to a patient list. The default is 100.
patientList.deduplicateLists	Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it `true` is ideal.
patientList.pullInvalidMRNs	If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed. Default true.
patientList.MRNFormat	A format string formats the MRN for display in the UI. Generally this is either `%s` or of the form `%9s` where 9 can be some other number. In the former case, it leaves the input as-is. In the latter case, it pads the MRN with spaces if it is less than 9 characters, or only prints the first 9 characters if the input MRN is too long. The spaces added here can be replaced with zeros in the next option. Default `%9s`.
patientList.MRNFormatUseZeroPad	`true` to replace spaces in the MRN with zeros; `false` to not do that. Default true.
patientList.stripRegexOnMRNInsert	Remove matches of the regular expression when MRNs are uploaded/entered by users. This is run after whitespace is removed from the MRN. (For instance, if the regular expression is `^0+` then `000 0045 67` would become `4567`.) The resulting value ought to match one in the `PATIENT` table. If it doesn’t it will be reported as discarded and reported as invalid. Example values: `^0+\|[-]` - remove leading zeros and dashes patientList.stripRegexOnMRNInsert=^0+\|[-] `[-#]` - remove dashes and pound signs patientList.stripRegexOnMRNInsert=[-#] An empty value keeps the exact value (spaces would still be removed) patientList.stripRegexOnMRNInsert=

LDAP

The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.

Typically, to authenticate to LDAP, you need the DN (distinguished name) of the user you want to authenticate as, and the password of that user. Since the DN of a user may not contain their username (as entered at the login screen), EMERSE authenticates to LDAP as a fixed "service user" as specified by ldap.userDn and ldap.password. This should give EMERSE the permission to then run a search for the DN of the user trying to login. The search run is the one specified in the ldap.search property, where the every instance of the text {0} in that search is replaced with the username entered on the login screen. The user record found from that search should contain a dn: entry, which is then used to authenticate the user against LDAP with the password provided on the login screen.

LDAP is only used for authentication, authorization (permissions) are given to the user as defined by their user account in the EMERSE database that matches the username they entered at the login screen. This means if they are not in the EMERSE database, they will not have access. So, you must add users via the administration application; we don’t create accounts from information stored in LDAP or grant permissions based on LDAP groups.

ldap.host	Example: ldap.host=ldaps://hostname:636
ldap.userDn	The distinguished name of the service account that EMERSE will use to conduct the user search Example: ldap.userDn=cn=emerse,ou=people,dc=med,dc=umich,dc=edu
ldap.password	Password of the service account
ldap.uidPath	Path of the subtree to search for the user Example: dc=med,dc=umich,dc=edu
ldap.search	The search to find the user based on the username typed at the login screen. Every occurrence the text `{0}` will be replaced with the username typed at login. Examples: (uid={0}) (&(uid={0})(objectClass=user))

Attestation

attestation.allowOtherAttestationReasons	If set to `true`, Quick Buttons with standard reasons are displayed to the user for selection
attestation.allowFreeTextAttestation	If set to `true`, users can enter a free text description describing their purpose of using EMERSE
attestation.showPriorAttestations	If set to `true`, the Attestation screen will display in the table prior free text attestation reasons used by the user.

Batch Updating Begin/End Dates

Our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).

To circumvent this potential problem EMERSE provides two options for controlling the dates displayed to users. In general, background tasks that update the Lucene indexes would also update the date ranges for documents when all dates are selected (that is, when no date range is entered into the date range boxes in the user interface). This is so that as the index updates every night a new ‘end date’ can be shown for the date range of the documents.

This auto-update setting can be over-ridden for the start and stop dates, independently, using the properties described below. Changing this setting can, for example, allow one to have a more sensible document start date that more closely matches when the documents were being collected (without having to actually change the dates of all of the incorrect documents).

Note that changing these dates only affects the dates displayed in the date range section at the top of the screen. The actual documents will still show their original dates, and the searches will still take place based on the actual dates of the documents even if they are incorrect. Thus, if actual dates are entered by users into the stop/date boxes, those dates will be used. If no dates are entered by users (thus, searching ‘All dates’) then the system will search across all of the documents regardless of the over-ride date shown in the UI and regardless of the document dates in the system.

batch.updateIndexMinDateFromSolrIndex	If set to true, min date of documents is updated from Solr every night, which would be updated in the `solr_index` table.
batch.updateIndexMaxDateFromSolrIndex	If set to true, max date of documents is updated from Solr every night, which would be updated in the `solr_index` table.

If one or both of these properties is set to false, then the date entered in the solr_index table is what will be used for display purposes. For more information on this table see the section on the solr_index in the Data Guide.

All Patient Search

search.allPatientFragmentLimit	Number of fragments/text snippets to display for preview when using All Patient Search Example: search.allPatientFragmentLimit=100
search.facetDateRangeInterval	The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting. Example: search.facetDateRangeInterval=10

Miscellaneous

There are several components configurable within the EMERSE menu, which is available to all users in the upper-right portion of the window. In addition, there are things you can configure for the login page.

Contact Information

Users may want to contact a local administrator about issues or feedback about EMERSE. This can be accessed by users in the upper right menu through either the About option or the Feedback option. Both of these menu options have some hard-coded text followed by a customizable URL that can be defined using the two properties listed below. The About menu item contains text beginning with "Please direct feedback and issues to…" and the Feedback menu item contains text beginning with "Please send any comments or suggestions you may have about EMERSE to…". The remaining text is defined the the two properties:

contact.url	This is the URL, or the `mailto` URL that will direct the user to the correct resource. Examples: contact.url=https://link.to.help/server contact.url=mailto:emersehelp@university.edu
contact.text	This is the text that wil be displayed on the screen for the URL. Example: contact.text=EMERSE help desk The resulting URL would then be constructed using the two properties above to look something like: `<a href="https://link.to.help/server">EMERSE help desk</a>`
login.hint	This is a small snippet of text that appears on the EMERSE login page to help users know what login credentials to enter. This is by default blank.
resources.dir	This is a directory containing web resources. Right now, only the "cover" photo on the EMERSE login page (not the admin app one) can be set. By default there is none, so you should do this. The name of the cover photo should be either `cover.png` or `cover.jpg` inside the directory specified by the setting. The photo should be around 1200 pixels tall, though the size is up to you. The browser will "zoom" in on it so it takes up the entire browser window, though the left-hand side of the image will be obscured by the login panel itself.
userGuideUrl	This is the link that contains the user guide. By default (if nothing is defined) it will link to the main user guide on the project-emerse.org website. If you have your own user guide you can link to that instead by replacing the URL. Example: userGuideUrl=http://project-emerse.org/documentation/user_guide.html

Timeouts

A user’s session is configured to be timed out due to inactivity. If the app is idle and does not encounter a mouse click, mouse move, mouse scroll or a keypress activity for a configured timeout setting, the application logs the user out of their session and the login page is presented. The following properties can be added to the project.properties to override the defaults.

This timeout feature does not apply to the Attestation screen, because at this point no Protected Health Information (PHI) would be displayed. Nevertheless, EMERSE would still timeout based on the server timeout settings even though the countdown window for a forced logout would not be shown to the user.

application.idle.timeout	Number of seconds to run the timer when the application is idle. The default value is 3600 if this property has not been added to the properties file. The value should be in seconds. Example: application.idle.timeout=3600
application.warn.length	Number of seconds to show the timeout warning window. The default value is 30 if this property has not been added to the properties file. Example: application.warn.length=30

Overall Patient Count

EMERSE displays the total number of patients in the system with respect to conducting an All Patient Search across all of the patients. This count is updated using the Spring Scheduler within the app itself, and should auto-update about every 30 minutes. The overall patient count is not configurable since it is derived from the data loaded into the system. Specifically, this count is based on the distinct number of MRNs that are associated with all of the documents in the Solr index. It is not based on the total number of MNRs in the database table, Patient. Thus, if a patient is in the Patient table but does not have an associated document, that patient will not be counted towards the total number of patients.

The total patient count displayed in the user interface is stored in the PATIENT_COUNT column of the SOLR_INDEX table in the database. This count is refreshed periodically based on a background process that retrieves the unique numebr of MRNs from the Solr documents index. Additional details about configuring the schedule for this process can be found within this guide in the section called 'Solr Patient Index Replication Interval'. However, the overall patient count can also be forced to refresh immediately using the 'System Synchronization' feature found within the admin application.

Optimization

Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.

Tomcat (EMERSE application)

To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. You can add these arguments in the setenv.sh script in the bin directory of Tomcat. If it doesn’t exist, just create it. See Tomcat’s catalina.sh script for more details on this script.

apache-tomcat/bin/setenv.sh

export CATALINA_OPTS="$CATALINA_OPTS -Xmx2048m -Xms1024m"

Solr Index Optimization

Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the documents. This can be invoked manually using the Optimize button in the Solr Administration User Interface. Optimizing also reduces the index segment sizes which can also improve system performance. During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about 10+ hours to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.

Solr Caches

Solr has a number of caches. Two important ones which are not configured by default are the filter cache and the document cache. These caches are index-specific, and are specified in the solrconfig.xml guide:

solrconfig.xml

<config>
  ...

  <query>
    <filterCache class="solr.FastLRUCache"
        maxRamMB="1024"
        showItems="10"/>
    <documentCache class="solr.LRUCache"
        size="4000"
        showItems="10"/>
  </query>

  ...
</config>

The maxRamMB attribute tells Solr how large the cache should be allowed to grow to before evicting entries. The size attribute tells Solr how many entries are allowed before evicting becomes necessary. Only one should be specified, but on earlier versions of Solr 8, the maxRamMB can’t be specified on the documentCache. (You’ll get an error if you do.)

We’d recommend configuring these caches both in the documents index and the patient-slave index, since both are queried by EMERSE. The documents index should have larger caches since it is used more heavily.

Not all queries that are sent to Solr are cached. We currently can cache the following queries (in the standard lucene syntax): 1. SOURCE:XXX 2. MRN:XXX 3. the collection of "filters" specified in EMERSE as a single query

To get a sense of the size of the queries, at UM, we have 200 million documents, and the first kind of query takes up 20-30 MB of space for each source in the system. (We have about five sources.) The second kind tends to take 2-30 KB (about a thousand times less), and the third kind takes between those two, depending on how selective the filters as a whole are. (The more selective, the smaller.)

Viewing Cache Statistics

You can view the usage of the caches in the Solr Admin UI. Just go to the core, click on "Plugins / Stats" then "Cache" and the types of caches should appear. Expand the "filterCache" or "documentCache" to look at that one. The showItems="#" in the XML tells the UI to show a random sample of the cached filters or documents, including their size. In addition, you can see how many cache lookups have been performed, evictions, and the hit-ratio. For the documents core’s filter cache, we have a hit ratio of above 0.95.

Solr Memory

Solr (on Linux at least) memory maps the index files, meaning it’s virtual memory will be about as large as the index file size. This means the virtual memory size of the application can be vastly larger than physical memory. (For instance, our Solr has at least 1 TB of virtual memory.)

These memory-mapped files are not a part of the Java heap space and so don’t contribute toward the -Xmx flag. In addition, the OS manages what of those files actually in physical memory and what isn’t. (Depending on the tool you use to look at memory on the box, the memory used for memory-mapped files may appear used or not; the OS is free to use that memory for another purpose, so it is, in a sense, free.) Solr internally caches the results of certain queries or parts of queries, so that if they are used frequently, the search doesn’t need to be done again. These do reside in the Java heap space.

So, you must strike a balance by having enough Java heap memory for query caching, and enough otherwise free memory on the box so that the OS has plenty of space to cache Solr’s memory-mapped files. At University of Michigan, we currently allocated 3GB to Solr’s heap, and have at least 20 GB of free memory on the box for the OS to cache files.

Allocating more heap space for Solr doesn’t mean you won’t have to tweak some of Solr’s cache settings, though there is some cache-sizing based on the max heap size. We haven’t done a ton of testing with caching, so we’ll say no more on this for now.

Solr’s memory can be configured with flags to the solr start command, or set as a default when starting solr by adding it to the solr.in.sh configuration.

./solr start -m 3g

solr-8.X.Y/bin/solr.in.sh

SOLR_HEAP=3g

You may need to pass other flags such as -s when starting Solr, as described in the Installation Guide.

If you are concerned with the performance of the garbage collector or free memory, you can see the frequency and duration of garbage collection in Solr’s GC log, contained in SOLR_INSTAL_DIR/server/logs/solr_gc.log.

Solr Patient Index Replication

EMERSE has two indexes used to keep track of patients: patient and its replica patient-slave.

The patient index is created by copying the patients from the database table, PATIENT over to the corresponding Solr index, patient. This is done automatically by the system once per day as a scheduled event. The schedule of the jobs can be found in the properties file. The default time set for the EMERSE distribution is 7:30 AM. This was done with the assumption that the patients in the Database table would be updated once every night during non-peak hours. If you are fine with that time, no changes need to be made.

After the PATIENT table is copied over, the replica is told to replicate from the master. To do this, the index must be configured with the master’s URL, which is done inside the solrconfig.xml file of the patient-slave core. There should be a handler defined like so:

<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy">
  <lst name="slave">
    <str name="masterUrl">http://localhost:8983/solr/patient/replication</str>
    <!--str name="httpBasicAuthUser">username_here</str-->
    <!--str name="httpBasicAuthPassword">password_here</str-->
  </lst>
</requestHandler>

Simply adjust the masterURL parameter as needed. Uncomment the username and password elements and just them if you set up basic auth on the Solr instance. If you have an SSL certificate (which you should if you use basic auth), you must use a domain name that is issued to the certificate.

The properties file uses a cron-like syntax to specify the schedule, which consists of six fields separated by whitespace. The first field is the seconds, then it’s minutes, hours, day-of-the-month, the month number, and then day-of-the-week. A field can have a number in it, appropriate for the field, a star meaning every value of the field, or a question mark, meaning no restriction. A more formal description of the syntax can be found in the Spring Documentation, specifically the component regarding the Class CronSequenceGenerator.

task.updateIndexStatsViaSolr.cron	This runs a job that finds the minimum and and maximum dates of the document index, along with the number of distinct MRNs in the document index, which are used for updating the date ranges displayed in EMERSE as well as the overall patient count shown when conducting an All Patient search. The default time is every hour, around minute 42. For instance, 1:42, 2:42, 3:42, etc. Default: task.updateIndexStatsViaSolr.cron=00 42 * * * ?
task.updatePatientIndex.cron	The schedule to update the Solr patient index from the patient table in the database. The default time is 7:30 AM. Default: task.updatePatientIndex.cron=00 30 7 * * ?
task.optimizePatientIndex.cron	The schedule to optimize the Solr patient index. Optimizing an index puts all the data into a single segment for operational efficiency. The default time is 7:45 AM. Default: task.optimizePatientIndex.cron=00 45 7 * * ?

If you change the scheduled time of this process, you will have to restart Tomcat for the changes to take effect.

It is possible to force these copying and indexing events to occur on demand, which may be useful for troubleshooting or when testing with an initial setup. Details about how to do this are described in the Administrator Guide.

EMERSE Search Concurrency

The Overview screen in EMERSE is computationally expensive to show. Currently, it takes generally two searches for each cell of the table, more for the mosaic view.

To complete this work quickly and fairly among multiple concurrent users, EMERSE internally has a "ring" of batches of rows from this table. When a user goes to a page of the Overview table, EMERSE adds the rows of that page as a batch in the ring. Worker threads go through the ring and complete one row from a batch before going on to the next batch, eventually circling back around to the start. This should guarantee fairness in that all users who are waiting for the Overview page to load should will have an equal number of rows loading.

There are a few settings you can tweak on this search ring.

ring.size	This determines the number of "slots" in the ring for a batch. If there is not a slot for a new batch, then the request to add the batch will block. Default 50.
ring.workers	This is the number of worker-threads that process rows from the batches. This determines the number of concurrent searches that are sent to Solr. Default 7.

Security Hardening

Solr

Solr also provides a REST API that can be accessed with tools such as curl. By default this is not locked down and should be secured with basic authentication if the Solr ports are not firewalled to external communication.

Solr can be set up to use SSL/TLS, and require authentication with basic auth. Both of these features are supported by Solr Cloud, but EMERSE does not yet support Solr Cloud. However, the Jetty servlet engine embedded by stand alone Solr can be modified to require authentication and use SSL.

Much of the Solr documentation pertains to Solr Cloud, which is NOT currently supported by EMERSE. Look for references to a single node configuration when consulting Solr documentation.

Solr SSL Setup

Changes are required in solr.in.sh found in bin directory under the Solr_INSTALLATION directory. Essentially uncomment the lines below and configure them with values appropriate to a java keystore containing the certificate for the server.

SOLR_SSL_KEY_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=keystore password
SOLR_SSL_TRUST_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=keystore password
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

See the "Basic SSL Setup" section at the following link for more information.

https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html

Basic Auth

If Basic Auth is desired, there are several ways in which Basic Auth can be configured. Solr provides its own approach, but another approach uses the Jetty servlet engine bundled with Solr.

The first step is to modify the jetty.xml file inside the SOLR_INSTALL_DIR/server/etc folder, adding the following snippet inside the <Configure></Configure> tags.

  <Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
        </New>
      </Arg>
    </Call>

After adding in the xml snippet, add a user/password combination to the file realm.properties located in SOLR_INSTALL_DIR/server/etc. If the file doesn’t exist just create a new file and add the following line to it.

solradmin:password, admin-role

In the above, the username is "solradmin" and the password is "password".

Also, the following needs to be added to the webdefaults.xml file:

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

For more information on configuring Jetty with Basic Authentication, see here https://www.eclipse.org/jetty/documentation/9.3.0.v20150612/configuring-security-authentication.html

Configuration password

Passwords specified in the project.properties files can themselves be encrypted using Jasypt.

http://www.jasypt.org/encrypting-configuration.html

Exported Excel files

EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server inside the exploded EMERSE war file, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file persists on the server. We currently have a shell script on the server that executes every 30 minutes and deletes files older than 60 minutes.

#!/bin/sh
cd /PATH_TO_TOMCAT_INSTALL/webapps/emerse/downloads \
        2> /dev/null || exit 0
find . -name "*.xlsx*" -mmin +60 -exec rm {} \;

Admin Application

Most details related to the Admin application and Admin features can be found in the Administrator Guide. Below is a high-level summary of the Admin features.

EMERSE users that have an ADMIN role have access to the admin application located at:

http://host:port/emerse/admin2

The application has two main features- user management related to authorization, and maintenance of synonyms.

Add/Remove users

The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.

Roles and Privileges

Roles and Privileges for EMERSE users can be customized. Details about how this is done can be found in the Administrator Guide.

Synonyms

The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a CSV file.

Synonyms upload currently deletes all the existing entries in the synonyms table and then loads entries from the CSV file. Thus, this options replaces synonyms as opposed to appending new ones to the existing list.

Synchronization

The admin application has an option to "synchronize" various data between the database and Solr. While this happens automatically overnight it can be useful to force this more frequently, especially during initial system setup and testing. Details can be found in the Administrator Guide.

Supporting Multiple Environments

It may be ideal to support multiple EMERSE environments such as test, dev, prod, etc. We have found that sometimes it can be difficult for users who are testing EMERSE to know what specific system they are using. To make it easier to distinguish between multiple instances of EMERSE, the system has the ability to display a small, but obvious, box in the upper right part of the screen to inform users. Having this information in a database table is useful because it can remain stable even as the application itself gets upgraded.

This information is defined in a table with a single row called ENVIRONMENT_INFO:

Column Name

Description

id

This should set to 0 and not changed.

environment

This is the environment that is active (dev, test, prod, etc). This is a free text option so can be anything (e.g., "Development", "Testing", "Production", etc.)

display_on_ui

This is a flag to determine if the text for environment should be displayed on the screen or not. 1=display, 0=do not display. In general you would not display this to users in the Production system.

The version number of the application (displayed when selecting the About menu) is distributed with the WAR file itself and is not contained in the database.