Overview

EMERSE versions 4.4 require SOLR/Lucene 7.3.1. If you are currently running an older version of EMERSE (which used Solr 6.0.0) you will need to upgrade Solr to this newer version version. If you are installing EMERSE new as version 4.4 or higher, these directions will not apply to you.

The main tasks when migrating from Solr 6 are to download and install the Solr binaries, migrate any customization and configuration from Solr 6 to the Solr 7 installation, and update the indexes.

Additional details on many of install scripts are contained in the Solr section of the EMERSE installation guide.

Methodology

We highly recommend familiarizing yourself with the steps involved by testing the upgrade in a test environment that is similar to production. This is to avoid inadvertently damaging/erasing the indexes which can sometimes take several weeks to regenerate, depending on the size. At Michigan, we copy the Solr data directory, and production database to another server. Generally we use rsync to copy the Solr data files between servers, and use Oracle’s import export tools to copy the database.

Downloading and Installation

Download and untar/unpack Solr version 7.3.1, located here:

http://archive.apache.org/dist/lucene/solr/7.3.1/

Once downloaded, untar or unzip to desired installation directory.

cd (to download directory)
gunzip solr.7.3.1.tar.gz
tar -xvf solr.7.3.1.tar

Configuration

Typical customizations that are made to the Solr install include setting up the startup scripts and configuring authentication and SSL.

Solr configuration can be made in multiple ways, but typically any changes will be found in solr.in.sh (solr.in.cmd on Windows) or in the startup script itself, solr.sh or solr.cmd (Windows). These files are found in the Solr/bin directory. Generally we recommend using the approach of changing the settings in solr.in.sh where possible, as it should simplify future upgrades by consolidating the configuration.

SOLR_HOME

SOLR_HOME, which tells SOLR where to locate and store indexes, may be defined in solr.in.sh, but could also be passed in via the command line via the -s switch.

Basic Authentication

If basic authentication was configured in a previous version of Solr, there are a handful of files involved to migrate the settings. These include: realm.properties, webdefault.xml, and jetty.xml, all of which are located in the server/etc directory.

The realm.properties file contains the username/password combinations that are allowed to access the Solr API and Solr Admin application. This file can be copied from existing installation to the new installation.

The webdefault.xml file should have a security constraint section. This can be copied from the old to the new file, taking care to put it in the appropriate place within the xml document.

<security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin-role</role-name>
    </auth-constraint>
  </security-constraint>
  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

In jetty.xml, look for an existing HashLoginService , which ties the realm.properties file to the jetty startup.

<Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
          <Set name="refreshInterval">0</Set>
        </New>
      </Arg>
    </Call>

SSL/TLS configuration

Solr SSL configuration is typically setup in solr.in.sh. If certificates and keystores were setup in the previous installation, it is likely Solr is configured to find these files in the following configuration settings.

SOLR_SSL_KEY_STORE=/app/software/ssl_cert_stuff/server_keystore_postCA.jks
SOLR_SSL_KEY_STORE_PASSWORD={ password}
SOLR_SSL_TRUST_STORE=/app/software/ssl_cert_stuff/server_keystore_postCA.jks
SOLR_SSL_TRUST_STORE_PASSWORD={ password}
SOLR_SSL_NEED_CLIENT_AUTH=false
SOLR_SSL_WANT_CLIENT_AUTH=false

Upgrading existing document index

Each major Solr release usually includes a new index format, and allows for new data types or features to be used in indexes. Taking advantage of these new features or data types requires re-indexing of the entire data set once Solr has been upgraded. However, it is also possible to upgrade an existing index by using an upgrade tool provided by Solr/Lucene.

The index tool is a java class file that can be run at the command line. The needed jar files are inside the Solr webapp itself, so it is easiest to run the tool while in the directory with the needed jars.

The index tool is destructive. It is best to make a full copy of your index before starting to serve as a backup in case problems arise.

Change directory to Solr jars:

cd SOLR_INSTALL_DIR/server/solr-webapp/webapp/WEB-INF/lib

Once in this directory, the upgrade tool can be run as follows:

/PATH_TO_JAVA/bin/java -cp lucene-core-7.3.1.jar:lucene-codecs-7.3.1.jar:lucene-backward-codecs-7.3.1.jar org.apache.lucene.index.IndexUpgrader /PATH_TO_INDEX/data/index/

Note that this process can take a long time. At Michigan Medicine, migrating a 2TB index takes approximately 24 hours.

The space needed for this operation will be about the size of the existing index. In other words if the index is 2TB, an additional 2TB of free space will be required. Once the upgrade tool has completed, the original index will be deleted.

You can monitor progress of this tool by watching the size of the directory. For example:

du -h /PATH_TO_INDEX/data/index

Solr schema changes

Generally existing Solr configuration files should not need modification to work with Solr 7, but here at Michigan we noticed one change that may impact existing indexes. Some EMERSE samples previously were sent out with a customized MergePolicy. The syntax for setting up a custom merge policy has been changed, so after upgrading to Solr 7, if this is the case, the "core" may not start when viewed in the Solr Admin tool.

The simplest way to handle this is to remove the following from solrconfig.xml. This file is inside the index directory itself, inside the conf directory. For example:

/app/data/indexes/pubmed/conf
<indexConfig>
	<ramBufferSizeMB>960</ramBufferSizeMB>
        <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        </mergePolicy>
	<!--mergeFactor>10</mergeFactor!-->
 	<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
     		<int name="maxThreadCount">1</int>
     		<int name="maxMergeCount">6</int>
   	</mergeScheduler>
  </indexConfig>

The defaults for merge policy and schedule are adequate, so there is no need to defined this in the solrconfig.xml.

Patient index

The existing patient indexes can be updated by simply having EMERSE reindex the patient index, since they should not be too large.

While logged in to EMERSE as an administrator, use the following URL to force a full re-index from the patient table.

http://EMERSEHOST:port/emerse/springmvc/admin/patientIndex

It may take a few minutes to get a response from the URL.

You can monitor the re-index from the solr admin tool by navigating the the Solr core named patient. The page shows the number of indexed documents, which , when completed, should equal the number of patients in the EMERSE patient table. Once this has completed you can force an update of the patient-slave index by running the optimize command:

http://EMERSEHOST:port/emerse/springmvc/admin/patientIndex/optimize