Overview
This guide describes modifications that can be made that impact the behavior of EMERSE, tuning of the JVM associated with the EMERSE application and Solr, and some security hardening procedures.
Configuration
Most of the default configuration will not need modification, but some are specific to the deployment environment and would normally be changed. The most common settings that would be changed are the URL and credentials to the database and Solr.
Web Server
While it is not strictly required to have a web server such as apache’s httpd, or nginx, in front of Tomcat, it is typical. The only special configuration EMERSE currently needs is that the socket timeout should be set to around 10 minutes in order for synonym upload to work on very large synonym sets (such as those we distribute).
Solr configuration
Solr has its own configuration files in two places. There is the installation directory, which is where you unzip the distribution, often referred to as solr-8.1.1/
in these guides. Then there is the configruation of the indexes, also called "cores" in Solr. The indexes reside in the Solr "home" directory, $SOLR_HOME
which is either set by you in solr-8.1.1/bin/solr.in.sh
or specified with -s
when running solr start
. If left unset, $SOLR_HOME
is solr-8.1.1/server/solr/
.
Each core has its own configuration files:
- $SOLR_HOME/core_dir/core.properties
- $SOLR_HOME/core_dir/conf/solrconfig.xml
- $SOLR_HOME/core_dir/conf/managed-schema
The core.properties
file defines the name of the core. It usually contains a line like name=documents
or name=patient
. This name doesn’t have to match the name of the directory it is in, but it’s less confusing if it does. This name determines the URL used to query the core, and how it appears in Solr’s admin app. EMERSE must be told the name of the core, not the name of the directory, since it uses the web API to talk to the core.
The solrconfig.xml
file defines the version of Lucene used to talk to the index. Changing this will require a re-index since that may change the format of the index on disk.
In addition, solrconfig.xml
contains the configuration of request handlers, such as the search, spell check and query validation handlers. These and other settings can be changed without re-indexing.
EMERSE assumes a number of endpoints are defined in this file, and those themselves rely on EMERSE’s Solr plugin. To install the plugin, place it in $SOLR_HOME/lib/
. (You will have to create the directory.) The definitions of the endpoints that must be present in the solrconfig.xml
files are kept in "configsets". These are basically templates for a Solr core. They have the same structure, but miss some files, such as the core.properties
file.
The managed-schema
file describes the format of documents for the core. It says what fields exist and how they should be indexed and stored. This will likely need to be modified to match your local document metadata structures, and will need to match what the appropriate database tables. For more information see the Data Guide.
Loading Configuration
EMERSE loads configuration data through Spring’s "environment", which is an abstraction over a number of ways to configure processes generally, Java programs specifically, and Java web applications more specifically. It presents configuration much like process-level "environment variables" which map textual names to textual values. The name/value pair generally is referred to as a "property" in Java. Spring’s environment searches all manners of configuration in some order, the first property found wins. The order of search is roughly as follows:
-
Servlet Configuration Parameters
-
Servlet Context Configuration Parameters
-
JNDI Properties
-
Java System Properties
-
Environment Variables
-
The
emerse.properties
file
The emerse.properties
file is searched for in the following order:
-
The value of the property
emerse.properties.filepath
, according to Spring’s environment -
$HOME/emerse.properties
(ie, in the home directory of the user tomcat runs as) -
The file
WEB-INF/classes/project.properties
insideemerse.war
(or its exploded directory) or any jar file inside the war file itself (This is to say, the file/project.properties
on the classpath.) (There is no such file by default, and we don’t recommend re-packaging the war to add one; this is supported only for legacy reasons.)
Generally, all EMERSE configuration is placed in the emerse.properties
file, though configuration may be split across any of the six places Spring’s environment searches.
In practice, we tend to put the emerse.properties
file under the home directory of the user which tomcat runs as. However, if this is not desirable, or if multiple versions of EMERSE are running on the same server, you must specify the location of the properties file using another property, emerse.properties.filepath
specified in one of the first five ways. We tend to do this using either Java system properties, or JNDI.
Java system properties are passed as arguments to the JVM, in the form -Dproperty.name=value
. Since the Tomcat scripts are what actually start the JVM, you need to configure them to pass this argument, and the easiest way to do that is to set and export the CATALINA_OPTS
environment variable before starting tomcat:
export CATALINA_OPTS="$CATALINA_OPTS -Demerse.properties.filepath=/path/to/file"
bin/catalina.sh start
If you merely want to put the properties file somewhere other than $HOME/emerse.properties
you can create a file bin/setenv.sh
inside the tomcat installation and put the export statement there; catalina.sh
automatically runs such a file internally if it exists. (See the comments inside catalina.sh
or catalina.bat
for more information.)
The downside of this is that each tomcat instance can host at most one deployment of EMERSE. To have multiple deployments of EMERSE in the same tomcat instance, you can set up each deployment with a different JNDI environment, which can give EMERSE a different location for the emerse.properties
file.
The JNDI environment for a war file is configured by a file inside Tomcat’s configuration directory: conf/Catalina/localhost/emerse.xml
. (The directories other than conf
will need to be created.) The filename emerse.xml
should match the name of the war-file in the webapps
directory. Inside, add the XML:
<Context>
<Environment name="emerse.properties.filepath"
value="/path/to/emerse.properties"
type="java.lang.String"/>
</Context>
You can add other properties here in the same way as well. They will take precedence over the values specified in any emerse.properties
file loaded. So, if two deployments have the same configuration, except for which Solr instance or database they connect to, you could just set the solr.serviceURL
or ds.url
properties here.
Version number
The EMERSE version number is distributed as part of the WAR file that gets deployed to the server. This is not really a configurable option, but it is being mentioned here for the sake of completeness. It can be found in the WAR manifest, under META-INF/MANIFEST.MF
. (WAR files are just zip files with the wrong extension. This is true of jar files too.) Try unzip -c emerse.war META-INF/MANIFEST.MF
.
Profiles
EMERSE uses Spring profiles to enable optional services within the application. Active profiles are those listed in the value of the spring.profiles.active
property in Spring’s environment. Profile names should be comma-separated.
Currently, there are two interesting profiles: ldap
and no-scheduler
.
ldap
turns of the LDAP authentication mechanism which must be further configured with other properties. Additional information can be found in the LDAP section below.
no-scheduler
turns off all scheduled jobs typically configured with the task.*.cron
properties, described below. Activating this profile is used for testing only. (The scheduler is on by default, which is why activating the no-scheduler
profile turns it off.)
See the Loading Configuration section for more details about how to put spring.profiles.active
into Spring’s environment.
export CATALINA_OPTS="$CATALINA_OPTS -Dspring.profiles.active=ldap
spring.profiles.active=ldap,no-ldap
Application Settings
The following are all the properties that configure EMERSE. These are looked up through Spring’s environment, as described in Loading Configuration. Generally, we set them in the emerse.properties
file.
Database
ds.username |
The username for the database account for the main EMERSE database. |
ds.password |
The password for the database account. This can be encrypted with Jasypt if desired. |
ds.url |
The JDBC url to connect to the EMERSE database. Example:
ds.url=jdbc:oracle:thin:@databasehost:port:sid |
ds.maxPoolSize |
Maximum number of available connections to the database. Production systems with moderate users should set this to 10-20 connections. Our implementation at Michigan Medicine would be considered moderate. We estimate that 20 connections is probably reasonable for about 20 concurrent logged in users, about about 10 users searching at the same time. More is generally better, but a centrally managed DB may have a limit on the number of connections permitted. |
Solr/Lucene
solr.serviceURL |
The URL the application will use to access the Solr instance. Example:
solr.serviceURL=http://localhost:8983/solr |
solr.username |
The username used when making connection to Solr when configured with basic auth. |
solr.password |
The password used when making connection to Solr when configured with basic auth. |
solr.unifiedCollection |
The name of the Solr core that contains patient documents. Tends to be |
solr.patientUpdateCollection |
The name of the Solr core that EMERSE copy the |
solr.patientSearchCollection |
The name of the Solr core that should be used to search patients. Typically, this is |
solr.wildcard.minLength |
This is the minimum number of characters in a term before the first wildcard character, |
The patient cores were created to facilitate the graphs summarizing demographics shown in the results of an all-patient search. They are also used in filters.
It is worth noting that the Solr patient index is replicated within EMERSE, with one serving as a backup for the other if the main one ever became corrupted. The larger, production Solr document index is not replicated in this way, mainly because it is so large. In other words, this type of 'slave' index is good practice, but may not be practical for the larger indexes.
|
Patient Lists and MRN validation
There are various options available for validating user-entered patient medical record numbers (MRNs). MRNs in the PATIENT
table are in a "canonical" form which is formatted for display to the user, and MRNs entered by the user are "cleaned up" to match the canonical form. The "formatting for display" portion is determined by patientList.MRNFormat
and patientList.MRNFormatUseZeroPad
. The "cleaning up" portion is covered by patientList.stripRegexOnMRNInsert
.
If your MRNs are numbers possibly with leading zeros, we suggest you pick the "canonical" form as with no leading zeros. This allows you to be the most forgiving of minor errors because you can strip leading zeros on input, and but show the leading zeros when showing the MRNs to the user.
We suggest being this forgiving since MRNs may come from different systems (or go through several systems) before being sent to EMERSE, and not all of them may preserve leading zeros.
patientList.MRNLimit |
The number of patients that can be added to a patient list. We set it to 100,000 by default. |
patientList.maxErrors |
The maximum number of errors that are shown when user uploads or inserts new patients to a patient list. The default is 100. |
patientList.deduplicateLists |
Whether the system will allow duplicate medical record numbers on the same patient list. A true value will cause emerse to remove duplicates before saving to a patient list. In general, it is a good idea to remove duplicates, so keeping it |
patientList.pullInvalidMRNs |
If set to true, invalid MRN’s will be reported. If set to false, they will be silently removed. There is a small performance improvement when they do not need to be reported back to user, but in general it is good to let users know when invalid MRNs have been removed. Default true. |
patientList.MRNFormat |
A format string formats the MRN for display in the UI. Generally this is either |
patientList.MRNFormatUseZeroPad |
|
patientList.stripRegexOnMRNInsert |
Remove matches of the regular expression when MRNs are uploaded/entered by users. This is run after whitespace is removed from the MRN. (For instance, if the regular expression is
|
LDAP
The following settings that configure EMERSE to use LDAP for authentication only work when the runtime profile is set to include "ldap". See the Profiles section to add this profile to the running EMERSE instance.
Typically, to authenticate to LDAP, you need the DN (distinguished name) of the user you want to authenticate as, and the password of that user. Since the DN of a user may not contain their username (as entered at the login screen), EMERSE authenticates to LDAP as a fixed "service user" as specified by ldap.userDn
and ldap.password
. This should give EMERSE the permission to then run a search for the DN of the user trying to login. The search run is the one specified in the ldap.search
property, where the every instance of the text {0}
in that search is replaced with the username entered on the login screen. The user record found from that search should contain a dn:
entry, which is then used to authenticate the user against LDAP with the password provided on the login screen.
LDAP is only used for authentication, authorization (permissions) are given to the user as defined by their user account in the EMERSE database that matches the username they entered at the login screen. This means if they are not in the EMERSE database, they will not have access. So, you must add users via the administration application; we don’t create accounts from information stored in LDAP or grant permissions based on LDAP groups.
ldap.host |
Example:
ldap.host=ldaps://hostname:636 |
ldap.userDn |
The distinguished name of the service account that EMERSE will use to conduct the user search Example:
ldap.userDn=cn=emerse,ou=people,dc=med,dc=umich,dc=edu |
ldap.password |
Password of the service account |
ldap.uidPath |
Path of the subtree to search for the user Example:
dc=med,dc=umich,dc=edu |
ldap.search |
The search to find the user based on the username typed at the login screen. Every occurrence the text Examples:
(uid={0}) (&(uid={0})(objectClass=user)) |
Attestation
attestation.allowOtherAttestationReasons |
If set to |
attestation.allowFreeTextAttestation |
If set to |
attestation.showPriorAttestations |
If set to |
Batch Updating Begin/End Dates
Our experience at Michigan Medicine has shown that legacy documents coming from older systems may sometimes have invalid document dates. This led to unusual dates being displayed in the section of EMERSE that shows the overall date range of included documents when no date limitation was placed on the search criteria (e.g., “01/01/1900”).
To circumvent this potential problem EMERSE provides two options for controlling the dates displayed to users. In general, background tasks that update the Lucene indexes would also update the date ranges for documents when all dates are selected (that is, when no date range is entered into the date range boxes in the user interface). This is so that as the index updates every night a new ‘end date’ can be shown for the date range of the documents.
This auto-update setting can be over-ridden for the start and stop dates, independently, using the properties described below. Changing this setting can, for example, allow one to have a more sensible document start date that more closely matches when the documents were being collected (without having to actually change the dates of all of the incorrect documents).
Note that changing these dates only affects the dates displayed in the date range section at the top of the screen. The actual documents will still show their original dates, and the searches will still take place based on the actual dates of the documents even if they are incorrect. Thus, if actual dates are entered by users into the stop/date boxes, those dates will be used. If no dates are entered by users (thus, searching ‘All dates’) then the system will search across all of the documents regardless of the over-ride date shown in the UI and regardless of the document dates in the system.
batch.updateIndexMinDateFromSolrIndex |
If set to true, min date of documents is updated from Solr every night, which would be updated in the |
batch.updateIndexMaxDateFromSolrIndex |
If set to true, max date of documents is updated from Solr every night, which would be updated in the |
If one or both of these properties is set to false, then the date entered in the solr_index
table is what will be used for display purposes. For more information on this table see the section on the solr_index
in the Data Guide.
All Patient Search
search.allPatientFragmentLimit |
Number of fragments/text snippets to display for preview when using All Patient Search Example:
search.allPatientFragmentLimit=100 |
search.facetDateRangeInterval |
The All Patient Search displays a chart based on patient’s age using intervals. This setting specifies the interval to use when displaying the chart. In general there should be no reason to change the default setting. Example:
search.facetDateRangeInterval=10 |
Miscellaneous
There are several components configurable within the EMERSE menu, which is available to all users in the upper-right portion of the window. In addition, there are things you can configure for the login page.
Contact Information
Users may want to contact a local administrator about issues or feedback about EMERSE. This can be accessed by users in the upper right menu through either the About option or the Feedback option. Both of these menu options have some hard-coded text followed by a customizable URL that can be defined using the two properties listed below. The About menu item contains text beginning with "Please direct feedback and issues to…" and the Feedback menu item contains text beginning with "Please send any comments or suggestions you may have about EMERSE to…". The remaining text is defined the the two properties:
contact.url |
This is the URL, or the Examples:
contact.url=https://link.to.help/server contact.url=mailto:emersehelp@university.edu |
contact.text |
This is the text that wil be displayed on the screen for the URL. Example:
contact.text=EMERSE help desk The resulting URL would then be constructed using the two properties above to look something like:
|
login.hint |
This is a small snippet of text that appears on the EMERSE login page to help users know what login credentials to enter. This is by default blank. |
resources.dir |
This is a directory containing web resources. Right now, only the "cover" photo on the EMERSE login page (not the admin app one) can be set. By default there is none, so you should do this. The name of the cover photo should be either |
userGuideUrl |
This is the link that contains the user guide. By default (if nothing is defined) it will link to the main user guide on the project-emerse.org website. If you have your own user guide you can link to that instead by replacing the URL. Example:
userGuideUrl=http://project-emerse.org/documentation/user_guide.html |
Timeouts
A user’s session is configured to be timed out due to inactivity. If the app is idle and does not encounter a mouse click, mouse move, mouse scroll or a keypress activity for a configured timeout setting, the application logs the user out of their session and the login page is presented. The following properties can be added to the project.properties
to override the defaults.
This timeout feature does not apply to the Attestation screen, because at this point no Protected Health Information (PHI) would be displayed. Nevertheless, EMERSE would still timeout based on the server timeout settings even though the countdown window for a forced logout would not be shown to the user. |
application.idle.timeout |
Number of seconds to run the timer when the application is idle. The default value is 3600 if this property has not been added to the properties file. The value should be in seconds. Example:
application.idle.timeout=3600 |
application.warn.length |
Number of seconds to show the timeout warning window. The default value is 30 if this property has not been added to the properties file. Example:
application.warn.length=30 |
Overall Patient Count
EMERSE displays the total number of patients in the system with respect to conducting an All Patient Search across all of the patients. This count is updated using the Spring Scheduler within the app itself, and should auto-update about every 30 minutes. The overall patient count is not configurable since it is derived from the data loaded into the system. Specifically, this count is based on the distinct number of MRNs that are associated with all of the documents in the Solr index. It is not based on the total number of MNRs in the database table, Patient
. Thus, if a patient is in the Patient
table but does not have an associated document, that patient will not be counted towards the total number of patients.
The total patient count displayed in the user interface is stored in the PATIENT_COUNT
column of the SOLR_INDEX
table in the database. This count is refreshed periodically based on a background process that retrieves the unique numebr of MRNs from the Solr documents
index. Additional details about configuring the schedule for this process can be found within this guide in the section called 'Solr Patient Index Replication Interval'. However, the overall patient count can also be forced to refresh immediately using the 'System Synchronization' feature found within the admin application.
Optimization
Various components of the EMERSE system can be tweaked to enhance the user experience and yield optimum performance.
Tomcat (EMERSE application)
To reduce the frequency of garbage collection and memory recollection use -Xmx and -Xms switches to control how JVM handles its heap memory. We recommend setting up tomcat to use between 1 and 2 gig of memory. You can add these arguments in the setenv.sh
script in the bin
directory of Tomcat. If it doesn’t exist, just create it. See Tomcat’s catalina.sh
script for more details on this script.
export CATALINA_OPTS="$CATALINA_OPTS -Xmx2048m -Xms1024m"
Solr Index Optimization
Over time we have found that many document changes occur as they get updated or deleted (a deletion might be required if, for example, a document was found to be created under the wrong patient). It is possible to clear out these deleted/inactivated documents and potentially improve the performance of Solr by Optimizing the documents. This can be invoked manually using the Optimize button in the Solr Administration User Interface. Optimizing also reduces the index segment sizes which can also improve system performance. During the optimization process the original index is left in place while the new, optimized index is being created. This means that you will need empty storage about 2-3 times the original index’s size for optimization to proceed. Additionally, we have found that it can take about 10+ hours to conduct an optimization and it also uses substantial computational resources, meaning that system performance might suffer for users. Thus, it might be best to run this on weekends during times of low use. At Michigan Medicine we optimize infrequently and copy the indexes to a different server with more space and then copy the indexes back after optimization is complete. We also need to ensure that no new documents are added to the original index during this time.
Solr Caches
Solr has a number of caches. Two important ones which are not configured by default are the filter cache and the document cache. These caches are index-specific, and are specified in the solrconfig.xml
guide:
<config>
...
<query>
<filterCache class="solr.FastLRUCache"
maxRamMB="1024"
showItems="10"/>
<documentCache class="solr.LRUCache"
size="4000"
showItems="10"/>
</query>
...
</config>
The maxRamMB
attribute tells Solr how large the cache should be allowed to grow to before evicting entries. The size
attribute tells Solr how many entries are allowed before evicting becomes necessary. Only one should be specified, but on earlier versions of Solr 8, the maxRamMB
can’t be specified on the documentCache
. (You’ll get an error if you do.)
We’d recommend configuring these caches both in the documents
index and the patient-slave
index, since both are queried by EMERSE. The documents
index should have larger caches since it is used more heavily.
Not all queries that are sent to Solr are cached. We currently can cache the following queries (in the standard lucene syntax):
1. SOURCE:XXX
2. MRN:XXX
3. the collection of "filters" specified in EMERSE as a single query
To get a sense of the size of the queries, at UM, we have 200 million documents, and the first kind of query takes up 20-30 MB of space for each source in the system. (We have about five sources.) The second kind tends to take 2-30 KB (about a thousand times less), and the third kind takes between those two, depending on how selective the filters as a whole are. (The more selective, the smaller.)
Viewing Cache Statistics
You can view the usage of the caches in the Solr Admin UI. Just go to the core, click on "Plugins / Stats" then "Cache" and the types of caches should appear. Expand the "filterCache" or "documentCache" to look at that one. The showItems="#"
in the XML tells the UI to show a random sample of the cached filters or documents, including their size. In addition, you can see how many cache lookups have been performed, evictions, and the hit-ratio. For the documents
core’s filter cache, we have a hit ratio of above 0.95.
Solr Memory
Solr (on Linux at least) memory maps the index files, meaning it’s virtual memory will be about as large as the index file size. This means the virtual memory size of the application can be vastly larger than physical memory. (For instance, our Solr has at least 1 TB of virtual memory.)
These memory-mapped files are not a part of the Java heap space and so don’t contribute toward the -Xmx
flag. In addition, the OS manages what of those files actually in physical memory and what isn’t. (Depending on the tool you use to look at memory on the box, the memory used for memory-mapped files may appear used or not; the OS is free to use that memory for another purpose, so it is, in a sense, free.) Solr internally caches the results of certain queries or parts of queries, so that if they are used frequently, the search doesn’t need to be done again. These do reside in the Java heap space.
So, you must strike a balance by having enough Java heap memory for query caching, and enough otherwise free memory on the box so that the OS has plenty of space to cache Solr’s memory-mapped files. At University of Michigan, we currently allocated 3GB to Solr’s heap, and have at least 20 GB of free memory on the box for the OS to cache files.
Allocating more heap space for Solr doesn’t mean you won’t have to tweak some of Solr’s cache settings, though there is some cache-sizing based on the max heap size. We haven’t done a ton of testing with caching, so we’ll say no more on this for now.
Solr’s memory can be configured with flags to the solr start
command, or set as a default when starting solr by adding it to the solr.in.sh
configuration.
./solr start -m 3g
SOLR_HEAP=3g
You may need to pass other flags such as -s when starting Solr, as described in the Installation Guide.
|
If you are concerned with the performance of the garbage collector or free memory, you can see the frequency and duration of garbage collection in Solr’s GC log, contained in SOLR_INSTAL_DIR/server/logs/solr_gc.log
.
Solr Patient Index Replication
EMERSE has two indexes used to keep track of patients: patient
and its replica patient-slave
.
The patient
index is created by copying the patients from the database table, PATIENT
over to the corresponding Solr index, patient
. This is done automatically by the system once per day as a scheduled event. The schedule of the jobs can be found in the properties file. The default time set for the EMERSE distribution is 7:30 AM. This was done with the assumption that the patients in the Database table would be updated once every night during non-peak hours. If you are fine with that time, no changes need to be made.
After the PATIENT
table is copied over, the replica is told to replicate from the master. To do this, the index must be configured with the master’s URL, which is done inside the solrconfig.xml
file of the patient-slave
core. There should be a handler defined like so:
<requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy">
<lst name="slave">
<str name="masterUrl">http://localhost:8983/solr/patient/replication</str>
<!--str name="httpBasicAuthUser">username_here</str-->
<!--str name="httpBasicAuthPassword">password_here</str-->
</lst>
</requestHandler>
Simply adjust the masterURL parameter as needed. Uncomment the username and password elements and just them if you set up basic auth on the Solr instance. If you have an SSL certificate (which you should if you use basic auth), you must use a domain name that is issued to the certificate.
The properties file uses a cron -like syntax to specify the schedule, which consists of six fields separated by whitespace. The first field is the seconds, then it’s minutes, hours, day-of-the-month, the month number, and then day-of-the-week. A field can have a number in it, appropriate for the field, a star meaning every value of the field, or a question mark, meaning no restriction. A more formal description of the syntax can be found in the Spring Documentation, specifically the component regarding the Class CronSequenceGenerator .
|
task.updateIndexStatsViaSolr.cron |
This runs a job that finds the minimum and and maximum dates of the document index, along with the number of distinct MRNs in the document index, which are used for updating the date ranges displayed in EMERSE as well as the overall patient count shown when conducting an All Patient search. The default time is every hour, around minute 42. For instance, 1:42, 2:42, 3:42, etc. Default:
task.updateIndexStatsViaSolr.cron=00 42 * * * ? |
task.updatePatientIndex.cron |
The schedule to update the Solr patient index from the patient table in the database. The default time is 7:30 AM. Default:
task.updatePatientIndex.cron=00 30 7 * * ? |
task.optimizePatientIndex.cron |
The schedule to optimize the Solr patient index. Optimizing an index puts all the data into a single segment for operational efficiency. The default time is 7:45 AM. Default:
task.optimizePatientIndex.cron=00 45 7 * * ? |
If you change the scheduled time of this process, you will have to restart Tomcat for the changes to take effect. |
It is possible to force these copying and indexing events to occur on demand, which may be useful for troubleshooting or when testing with an initial setup. Details about how to do this are described in the Administrator Guide. |
EMERSE Search Concurrency
The Overview screen in EMERSE is computationally expensive to show. Currently, it takes generally two searches for each cell of the table, more for the mosaic view.
To complete this work quickly and fairly among multiple concurrent users, EMERSE internally has a "ring" of batches of rows from this table. When a user goes to a page of the Overview table, EMERSE adds the rows of that page as a batch in the ring. Worker threads go through the ring and complete one row from a batch before going on to the next batch, eventually circling back around to the start. This should guarantee fairness in that all users who are waiting for the Overview page to load should will have an equal number of rows loading.
There are a few settings you can tweak on this search ring.
ring.size |
This determines the number of "slots" in the ring for a batch. If there is not a slot for a new batch, then the request to add the batch will block. Default 50. |
ring.workers |
This is the number of worker-threads that process rows from the batches. This determines the number of concurrent searches that are sent to Solr. Default 7. |
Security Hardening
Solr
Solr also provides a REST API that can be accessed with tools such as curl. By default this is not locked down and should be secured with basic authentication if the Solr ports are not firewalled to external communication.
Solr can be set up to use SSL/TLS, and require authentication with basic auth. Both of these features are supported by Solr Cloud, but EMERSE does not yet support Solr Cloud. However, the Jetty servlet engine embedded by stand alone Solr can be modified to require authentication and use SSL.
Much of the Solr documentation pertains to Solr Cloud, which is NOT currently supported by EMERSE. Look for references to a single node configuration when consulting Solr documentation. |
Solr SSL Setup
Changes are required in solr.in.sh
found in bin directory under the Solr_INSTALLATION
directory. Essentially uncomment the lines below and configure them with values appropriate to a java keystore containing the certificate for the server.
SOLR_SSL_KEY_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_KEY_STORE_PASSWORD=keystore password
SOLR_SSL_TRUST_STORE=/path/to/keystore/my-keystore.jks
SOLR_SSL_TRUST_STORE_PASSWORD=keystore password
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false
See the "Basic SSL Setup" section at the following link for more information.
Basic Auth
If Basic Auth is desired, there are several ways in which Basic Auth can be configured. Solr provides its own approach, but another approach uses the Jetty servlet engine bundled with Solr.
The first step is to modify the jetty.xml
file inside the SOLR_INSTALL_DIR/server/etc
folder, adding the following snippet inside the <Configure></Configure>
tags.
<Call name="addBean">
<Arg>
<New class="org.eclipse.jetty.security.HashLoginService">
<Set name="name">Test Realm</Set>
<Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
</New>
</Arg>
</Call>
After adding in the xml snippet, add a user/password combination to the file realm.properties located in SOLR_INSTALL_DIR/server/etc
. If the file doesn’t exist just create a new file and add the following line to it.
solradmin:password, admin-role
In the above, the username is "solradmin" and the password is "password".
Also, the following needs to be added to the webdefaults.xml
file:
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin-role</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>
For more information on configuring Jetty with Basic Authentication, see here https://www.eclipse.org/jetty/documentation/9.3.0.v20150612/configuring-security-authentication.html |
Exported Excel files
EMERSE provides a function for exporting password-protected Excel files containing patient lists and and associated comments/tags. These files are generated on demand by the user and stored on the EMERSE server inside the exploded EMERSE war file, with a unique download link provided to the user. Because there is no straightforward way to know when a file has been successfully downloaded, the Excel file persists on the server. We currently have a shell script on the server that executes every 30 minutes and deletes files older than 60 minutes.
#!/bin/sh
cd /PATH_TO_TOMCAT_INSTALL/webapps/emerse/downloads \
2> /dev/null || exit 0
find . -name "*.xlsx*" -mmin +60 -exec rm {} \;
Admin Application
Most details related to the Admin application and Admin features can be found in the Administrator Guide. Below is a high-level summary of the Admin features.
EMERSE users that have an ADMIN role have access to the admin application located at:
The application has two main features- user management related to authorization, and maintenance of synonyms.
Add/Remove users
The Add/Remove users tab can be used to manage users of the EMERSE application. When you add new users, note that there are an expanded set of roles that can be applied to a user. For general users, you want to select/check “User with full privs” option and leave the others unchecked. The password field is required but will be ignored if security is set up to use LDAP. Although there is now a role for “limited access” type of user, we aren’t doing much with it yet locally.
Roles and Privileges
Roles and Privileges for EMERSE users can be customized. Details about how this is done can be found in the Administrator Guide.
Synonyms
The Synonyms tab allows the admin user to update synonyms in the EMERSE application by uploading them from a CSV file.
Synonyms upload currently deletes all the existing entries in the synonyms table and then loads entries from the CSV file. Thus, this options replaces synonyms as opposed to appending new ones to the existing list. |
Synchronization
The admin application has an option to "synchronize" various data between the database and Solr. While this happens automatically overnight it can be useful to force this more frequently, especially during initial system setup and testing. Details can be found in the Administrator Guide.
Supporting Multiple Environments
It may be ideal to support multiple EMERSE environments such as test, dev, prod, etc. We have found that sometimes it can be difficult for users who are testing EMERSE to know what specific system they are using. To make it easier to distinguish between multiple instances of EMERSE, the system has the ability to display a small, but obvious, box in the upper right part of the screen to inform users. Having this information in a database table is useful because it can remain stable even as the application itself gets upgraded.
This information is defined in a table with a single row called ENVIRONMENT_INFO
:
Column Name |
Description |
|
This should set to 0 and not changed. |
|
This is the environment that is active (dev, test, prod, etc). This is a free text option so can be anything (e.g., "Development", "Testing", "Production", etc.) |
|
This is a flag to determine if the text for |
The version number of the application (displayed when selecting the About menu) is distributed with the WAR file itself and is not contained in the database. |