Accessing the Admin App

This guide covers how to use the admin app of EMERSE. You can reach the admin app directly by going to the usual URL for EMERSE, put appending /admin2/ to the URL. For instance, if the URL for EMERSE is http://example.org/emerse/, the admin app is at http://example.org/emerse/admin2/. Only users with the ACCESS_ADMIN privilege are allowed access to the admin app.

The Users Tab

users page context menu

User Search

You can search for a user by username or first or last name by typing in the start of one of those three. Matches to the middle or end of a username or first or last name will not work. For instance, if a username is emerse search for em will find them, but se and er will not. Hit enter to update the table.

Enabling Users

Users can be enabled or disabled. Disabled users cannot log in and can use neither the main EMERSE app nor the admin app. You can disable or enable a user by clicking on the triple dots to bring up a context menu on their row, and selecting either disable or enable option.

Editing Users

To edit a user, click on the three dots of the user’s row, and select the edit option.

Editing a User

users page edit

After making changes, you must remember to click the save button; the application will not automatically save changes.

When editing a user, you can change the roles, first name, last name and username (login ID).

EMERSE verifies a user’s password in one of two ways. The password can be stored in EMERSE’s database, in which case you can set the password on this page. The password field is always empty even if this is the method of authentication, but filling it in and saving the user will change their password.

The other way to verify the password is to use LDAP. In this case, the password cannot be changed by EMERSE, and the password field is always not used. To use ldap, check the External Authentication (LDAP) checkbox and save the user. Using LDAP to authenticate requires that LDAP has been configured in emerse.properties. See the configuration documentation on how to do that.

Finally, you can select which roles the user has. Each role is listed at the end, with its description shown, not the code. Roles and privileges are described next. Changing a user’s roles doesn’t take effect until their next login. So, if they are currently logged in and need a new privilege you just assigned, they will have to log out, then log back in to get it.

Roles Tab

roles page

Changes to the table are saved immediately upon being changed. However, changes won’t take effect for currently logged-in users; they will have to log out and log back in again to see changes.

EMERSE’s authorization model uses roles and privileges. Privileges determine what actions a user can do in the system. Privileges are assigned to roles, and roles are assigned to users (as seen when editing a user in the roles tab). A role can have many assigned privileges, and a user can have many assigned roles. A user can do an action if they have the privilege from any of their assigned roles.

The roles page lists each role as a row in a table, with columns of the table being privileges. There are a lot of privileges, so you will need to scroll the table horizontally to see all of them. A role is assigned a privilege if the corresponding column’s checkbox is checked. You can click on the "Describe Privileges" button to see a description of what each privilege does.

The first column is the code of the role, which is not shown when editing a user, but the description. Codes must be unique and short. To see and edit the description of a role, click on the three dots and select the edit option.

roles page desc

You cannot delete roles or change their code through the admin app. You can add a role using the "Add Role" button, which will add a row to the table.

Privileges are given here for reference as well:

Privilege Description
 ACCESS_ADMIN

Allows the user to use all parts of this admin application.

 ACCESS_API

Allows the user to use /springmvc/ext/** endpoints. This is not currently a feature meant for consumption outside UM.

 ACCESS_EMERSE

This allows the user to use the main application at all. Without further privileges, they won’t be able to actually do anything in EMERESE.

 ATTEST_COMMON

Allows the user to attest to common attestation reasons for using EMERSE.

 ATTEST_FREE_TEXT

Allows the user to write their own reason for using EMERSE.

 ATTEST_PRIOR_REASON

Allows the user to select a previous free-text reason for attestation. This privilege is useless if the user doesn’t have ATTEST_FREE_TEXT. Giving a user this is really only a convenience to them so that they don’t have to type out the same reason over and over. However, it also clutters up the attestation table.

 ATTEST_RESEARCH_STUDY

Allows the user to attest to an active research study they are a member of. This requires that research study and study team data is loaded into EMERSE’s database through an ETL process you write.

 EXPORT_PT_LIST

Allows the user to download a patient list as a password-protected Excel file.

 NEW_PT_LIST

Allows the user to create a patient list. This includes copying a patient list, create a new empty one, creating a patient list by combining two existing lists using the "Compare Lists" feature, and converting a temporary patient list to a saved one.

 SAVE_ALL_PT_LIST

Allows the user to create a temporary patient list from an all-patient search.

 SEARCH_ALL_PT

Allows the user to do all-patient search. This gives them the ability to see the exact count of patients that match their search. The ability to view the demographic charts, document snippets, and the trend-over-time chart are controlled by separate privileges.

 SEARCH_NETWORK

Allows the user to search the federated network of EMERSE instances, if this EMERSE is configured to connect to it.

 UPLOAD_MRNS

Allows the user to add patients to a saved patient list or their temporary patient list by uploading MRNs directly to that list.

 VIEW_ALL_PT_CHARTS

Allows the user to view the demographic charts when doing all patient search. Useful only if the user has SEARCH_ALL_PT.

 VIEW_ALL_PT_SNIPPETS

Allows the user to see snippets of documents matching the search terms in all-patient search. This allows users to correct mistakes in their search, and see the kind of data they are matching to quickly refine their search without (likely) seeing PHI. Useful only if the user has SEARCH_ALL_PT.

 VIEW_ALL_PT_TRENDS

Allows the user to see a graph of how many patients match the search over time, in all-patient search. Useful only if the user has SEARCH_ALL_PT.

Privileges given or not given in specific ways to get specific effects. These effects are not always obvious, so here are a few examples:

Granted Privileges Implication

ACCESS_EMERSE
NEW_PT_LIST
SEARCH_ALL_PT
VIEW_ALL_PT_SNIPPETS
SAVE_ALL_PT_LIST

User can identify a set of patients with an All Patient Search and can see the text Summaries (VIEW_ALL_PT_SNIPPETS). User can move the list to a Temporary Patient list to search them further (SAVE_ALL_PT_LIST). User can also save the list as a Saved Patient List for searching later or sharing with another user (NEW_PT_LIST). User cannot export the list to an Excel file (EXPORT_PT_LIST).

ACCESS_EMERSE
SEARCH_ALL_PT
VIEW_ALL_PT_SNIPPETS
SAVE_ALL_PT_LIST

User can identify a set of patients with an All Patient Search and can see the text Summaries (VIEW_ALL_PT_SNIPPETS). User can move the list to a Temporary Patient list to search them further (SAVE_ALL_PT_LIST). But because the user cannot create a new list (NEW_PT_LIST), the user cannot save this list for later. Once the user logs out, the list will be gone.

ACCESS_EMERSE
SEARCH_ALL_PT

User can find a set of patients with an All Patient Search, but will not be able to see the text Summaries (VIEW_ALL_PT_SNIPPETS), and will not be able to move or save the list to search it in more detail (SAVE_ALL_PT_LIST). Essentially the user will be able to get a count from that All Patient search screen. The user will not be able to create their own lists (NEW_PT_LIST), but can still search a list of patients that was shared to them by another user. The user will not be able to add any new patients to the list that was shared to them (EDIT_PT_LIST). This is essentially a 'locked down' role that will allow them to get counts and search a list that was shared to them, but nothing else. For limited access, this could be a good configuration since an administrator could create and share a list with such a user.

ACCESS_EMERSE

The user can only search patient lists they are given permission to. This is the most locked-down role.

ACCESS_EMERSE
NEW_PT_LIST

The user can only search patient lists they are given permission to, plus create lists from MRNs they get from sources outside EMERSE.

ACCESS_ADMIN

This configures an administer role. Administrators can add and edit users and roles, and assign roles to users. Users with a role with such permissions cannot access EMERSE itself (so they cannot do any kind of search with EMERSE, but being an administrator, they can give themselves that permission).

ACCESS_API

This configures a service role. Users in this role can access the server endpoints under /emerse/springmvc/ext/**. This would be used by programs that need to access EMERSE to do searches or other automated tasks supported by the endpoints in /emerse/springmvc/ext/**.

The Synonyms Tab

EMERSE supports a type of query expansion we call "synonyms" to help make user’s searches more thorough. A synonym dataset is like a web of associated terms and phrases, connecting things such as alternative phrasings, acronyms, abbreviations, misspellings, conjugations, and related concepts to one another. After a user has added a term or phrase into their search, EMERSE will check it against activated synonyms, and suggest connected terms and phrases which the user can select and add to their search if they wish. These suggested terms and phrases can include alternative phrasings, acronyms, abbreviations, misspellings, stemming, related concepts, and more. Only administrators can These system-wide term suggestions can be leveraged by users but can only be updated by an administrator. There is no need to make your own synonyms dataset since we provide several that have been formatted specifically for EMERSE, including Enesmbl gene names and associated diseases, the Human Phenotype Ontology, and others. Details can be found on the EMERSE website. You can download these datasets from the project-emerse.org website and then upload on the synonyms page described here.

In addition to the sytem-wide Synonyms users can make their own collection of terms using the Saved Terms feature (previously called a Term Bundle), and can incorporate system-suggested Synonyms into their Bundles. User-created terms can be made available to other users through the Saved Terms sharing feature but they will not automatically become a part of the system-suggested Synonyms unless they are added by an administrator.

To be as quick as possible when matching terms entered by users, all Synonyms are loaded into memory at the time the system starts up. This might cause a slight delay between starting the system and the availability of the datasets.

Preparing a synonyms file

To prepare a synonyms file for importing into EMERSE, you must structure the file as a three-column tab-separated values (TSV) file.

Column 1: A key value that links related concepts together. This can be anything but should be unique for each grouping of concepts.

Column 2: The term/phrase

Column 3: The Concept Type, of which there are 3. These 3 Concept Types are used for categorizing how the suggestions are displayed to users and also how the terms will be matched. They are:

Concept Type Concept Type Name Concept Type Description

1

Regular Synonym

This is a standard type of "synonym", although that term is used loosely since it can mean any type of meaningful relationship between the terms. No formal ontological relationships are defined. Rather, the co-occurence of terms with the same key value implies that a relationship of some type exists.

2

Related Term

This is for terms that are related to the synonym it is grouped with but usually more distantly related. It provides a means for "one-way" matching. For example, in a group of connected terms (grouped by the key value), if one of the terms is "amoxicillin" (Concept Type = 1) and the related term is "antibiotic" (Concept Type = 2), then searching for "amoxicillin" would bring up "antibiotic" as a Related Term, but searching for "antibiotic" would not bring up "amoxicillin" at all.

3

Spelling Alternative

This is meant for misspellings of a term/phrase.

The maximum length of each synonym entry is 255 characters, not including the Key Value or the Concept Type. This constraint is set by the database, so to go beyond that limit the database would need to be modified. We recommend not exceeding about 200 characters simply because we’ve noted that some Unicode characters can take up additional space in the database and can exceed the database character limit. Also note that because the files is a TSV, there should be no tabs in the terms/phrases themselves. While it should not be necessary to change the maximum length of a synonym entry or the tab delimiter used in the file, these can be changed through properties if desired. See: synonymList.valueLimit and synonymList.delimiter.

Synonyms file metadata

Additional metadata can be added to the file which will be displayed to the users. These also must follow the three column TSV format and should appear at the beginning of the file, using the following formatting

Column 1: The phrase: emerse_synonyms_metadata

Column 2: An attribute, which is case-insensitive.

Column 3: The value for the attribute.

Note that you can provide any attribute, but only specific ones are supported by EMERSE. Other attributes not supported will be stored in the database but not used.

Currently supported Attributes are as follows:

Attribute Description Example Required or Optional

name

The name of the Synonyms dataset to be shown to users. This is case-sensitive and needs to be unique. If a dataset with the same case-sensitive name is uploaded, it will replace the prior dataset.

Local health center synonyms

Required

description

A brief description of the dataset to help a user understand what kind of terms it contains.

Local acronyms/abbreviations used at our health center

Optional

url

A URL that points back to the original resource from where the terms were obtained. For the users, the name of the dataset will become an active link if a URL is provided.

https://link-to-website.org

Optional

last_updated

The date that the dataset was last updated. This is treated as a String, so any format is acceptable.

08/25/2011

Optional

The name attribute must appear as the first line in the dataset. The other attributes can appear in any order as long as they are not on the first line. Furthermore, the name cannot exceed 256 characters.

The beginning of an example Synonyms import file would look like:

emerse_synonyms_metadata	name	Local health center synonyms
emerse_synonyms_metadata	description	Local acronyms/abbreviations used at our health center
emerse_synonyms_metadata	url	https://link-to-website.org
emerse_synonyms_metadata	last_updated	08/25/2011
1001	endocervical type adenocarcinomas	1
1001	endocervical type adenocarcinoma	1
1001	endocervical adenocarcinomas	1
1001	endocervical adenocarcinoma	1
1001	adenocarcinoma, endocervical type	1
1001	adenocarcinoma	1
1021	adenomatoid	1
1021	adenomatous polyps	1
1021	adenomatous polyp	1
1021	adenomatous	1
1021	adenomata	1
1021	adenomas	1
1021	adenoma	1
1021	polyps	2
1021	adeomas	3
1021	adeoma	3
1021	ademoa	3
1021	ademoma	3
1021	ademona	3

In the example above, there are two groups of concepts (group 1001, and group 1021). Group 1021 has mix of all three concept types: seven regular Synonyms (Concept Type = 1), one Related Term (Concept Type = 2), and five Spelling Alternatives (Concept Type = 3).

When a user enters a term, EMERSE does the following to determine which Synonyms to display:

  1. The user-entered term is matched to all Regular synonyms (Concept Type = 1) in a case-insensitive manner.

  2. For all matches, the key values are obtained (a concept may belong to more than one grouping and thus may have more than one key value).

  3. All other terms related to the key value(s) are obtained

  4. Duplicates are removed

  5. The final set is displayed to the user, organized by the three Concept Types, as shown in the figure below

Synonyms suggestions display
Figure 1. Synonyms shown for the term 'adenomatous'. Note that the terms are organized into three categories, based on the concept type: Synonyms, Related Terms, and Spelling Alternatives.

This organizational scheme of three Concept Types allows for some terms to be suggested based on a user-entered term, but supports the ability to have a one-way match to reduce extraneous matching when desired. For example, if the terms entered is 'adenomatous', the term 'polyps' will be displayed as a Related Term (see figure above). However, since in that grouping (group 1021 in the above example) the term 'polyps' is not a Regular concept type (i.e., it is not Concept Type 1 but rather Concept Type 2) then it means that a user-entered search term of 'polyps' will not match to anything in that 1021 grouping. This can be seen in the figure, below, where none of the terms in group 1021 are displayed to the user when 'polyps' is entered as a search term.

Synonyms suggestions display for the term Polyps
Figure 2. Synonyms shown for the term 'polyps'. This shows that the 'adenomatous' term is not displayed to the user since 'polyps' in the 'adenomatous' concept groups is not a Regular synonym Concept Type but rather is a Related Term Concept Type.

Uploading a Synonyms File

A Synonyms file should be prepared as a 3 column TSV file as described above. To upload the file, login to the administrator page and choose the Synonyms tab. Then select Upload, and click on the Choose button to select the TSV file. After the file is selected press Upload.

Minimal feedback is provided while the file is loaded and verified. Common problems include a file size too large for the server to handle, an upload time that exceeds the server’s timeout maximum, or a string length that exceeds what the database allows (we recommend no terms > 200 characters). A description of a few problems that might be encountered, and possible solutions, can be found in the Troubleshooting Guide.

When uploading a Synonyms file, note that if the same name is used as an existing dataset (based on the included metadata; and a case-sensitive match), the newly uploaded dataset will replace the existing dataset.

Managing the Synonyms

Multiple Synonym datasets can be loaded into EMERSE via the Admin app. After loading, a dataset will be in a Deactivated state, meaning that it will not be available or even visible to users. Clicking the Activate button will do two things: (1) it will make the dataset available to users and (2) it will invoke a background process that counts how many times each term appears in the overall set of documents. The counts resulting from this background process provide an intelligent approach to displaying the synonyms to users because it allows users to rank the terms based on how often they appear, including providing the option to hide terms that never appear in the corpus of all notes in the Solr index.

Counting the terms can take hours, depending in part on how many terms are in the dataset and also on how many documents are indexed. While the count is ongoing the synonyms will be available to users but users will not be able to change options that limit the results by count. Counts will not appear for a dataset until the entire counting process for all of the terms is complete. The counting process is refreshed only at the time a dataset is Activated. To refresh the counts, Deactivate and then Activate the dataset again. Refreshing the counts may be desirable if many new documents have been added since the counts were last conducted, which could change the frequency of the Synonyms in the dataset.

The Deactivate button retains the Synonyms dataset in the system but makes them unavailable to users. The Remove button deletes the dataset entirely from the database.

If the counting process was interrupted (for example, by the server being patched and restarted) the system should be able to recover from where it left off and complete the counts, but it must be restarted manually be pressing the the Resume button which will be displayed when an interrupted process is detected. The counting process can also be stopped manually by pressing the Stop button. When counts have been stopped before completion, the Synonyms will be available but without counts.

There are several things displayed in the table that may be useful. Some of these are described below:

Synonym Dataset Statistics

A few statistics are reported that can be helpful to understand how valuable the dataset might be for users. Some of these statistics will not be available until counting has completed for the entire dataset.

Rows/Terms

Rows are the total number of rows in the dataset. Terms are the number of distinct terms in the dataset, since a term can appear in more than one row if it is "mapped" to multiple different terms.

Rows/Terms w/ count > 0

This lists the number of Rows/Terms in the Synonyms dataset that appear in at least 1 document across the entire document index.

Synonym Dataset Statuses

A dataset can have several statuses, described below:

Deactivated

The dataset has been successfully loaded, but it is not available to users. Users will not even be aware that it is in the system.

Activated without Counts: XX% completed

The counting for the frequency for the dataset has begun, but it is not complete. After counting has begun (by pressing the Activate button) the dataset will be available to users, but counts will not be available until the counts are complete for the entire dataset.

Counting is stopped

Counting is no longer occurring, either because it was interrupted by a problem with the system (such as a server restart) or because it was manually stopped by the Administrator. Counting can be restarted by pressing the Resume button.

Activated with Counts

The synonym dataset is available to all users, along with the document counts.

Enqueued

Only one synonym dataset can be counted at a time, so if multiple datases are activated in quick succession, some will be in the queue (enqueued) while waiting for counting to begin.

Synonyms management page
Figure 3. The Synonym Datasets table on the Administrator app. In this screen several datasets have already been uploaded and are Activated with the term counting completed (Status: "Activated with Counts"). Others are in various stages of counting or availability to users, described by their Status.

The Fields Tab

The fields tab in the admin interface tells EMERSE how to interpret and use the fields in Solr.

fields sidebar

On the left hand side of the fields tab, you will see groups of fields. The first two groups are labelled "Patient Fields" and "Document Fields". The former describes the fields of the patients index. This is a fixed set of fields, with fixed field names in Solr. The latter is a group of fields that describe the documents index.

Below that, you may see additional groups. These are groups of fields that are specific to a source. There will be one group for each source configured in the system. Sources will be covered more later.

Whenever multiple fields are displayed in the same section, such as columns or rows of a table, the order of those columns or rows are ordered as they appear top-down in the sidebar shown here. To change that order, drag the fields by their triple dots on the right hand side. Fields cannot be dragged outside their group. A document field can be moved to a source-specific group or vice versa by changing the "source" of that field when editing it. (This will be discussed more below.)

To add a new field to a group, click the plus icon to the right of the group heading. You cannot add new patient fields.

Removing a field can be done in the edit page for the field. You cannot remove patient fields.

Solr Documents, Fields, Values, and Filters

To understand fields, their values, and filters on their values, a brief discussion of Solr, indexes, and the structure of documents is useful.

An index in Solr is a collection of documents which have been indexed, meaning you can find the document given the values they are indexed under. A more familiar example of an index in general would be the index of a textbook, which indexes the pages of the textbook by words, phrases, or concepts. Thus, you look up (say) a concept like "mesoscale convective vortex" in the index of a meteorology textbook, and it tells you the pages that talk about that concept. Solr is similar, but instead of indexing the pages of a single book, we index documents of the institution-wide medical record. Thus, the index tells us what document a word or phrase appears.

Solr documents differ from pages of a book in one important way: Solr documents are not just a block of text, but a block of text with metadata. Metadata associated with the document can be anything you wish to load with the document, but typically includes things like the department that produced the document, the doctor that wrote the note, etc. Each piece of metadata is placed in a different "field" alongside the main text of the document. In fact, the main text of the document is contained in just another field, so the representation of a document is uniform, and could include many large chunks of free text (but EMERSE does not do so).

Look at the below screenshot of Solr, which shows a document in JSON format:

fields solr doc

Here, we can see a number of fields written in bold text, such as DEPT or CLINICIAN, and their values Urology and Harper, Bradley. These are what we would call metadata fields, and RPT_TEXT can be seen to have a html-formatted value which runs off the page — this is the main text of the document.

Documents in the documents index represent medical records, notes, reports, etc, whereas documents in the patient index represent patients themselves, with no main text field. See the below screenshot for the raw JSON-formatted patient document.

fields patient solr doc

The values of a field are all the values that appear for that field across all documents in the index. For instance, the values of the DEPT field are the departments. In the system pictured here, these values are human-readable, however you may have department codes loaded into Solr.

Patient Fields

fields patient edit

Clicking on a patient field allows you to customize the following things:

Label

This is how the data will be labeled in EMERSE, both as titles of columns of tables in which it is shown, in demographic charts, and in the patient details dialog.

Field

This is never changeable for patient fields.

Type

This tells EMERSE what type of data to expect to be stored in the field. This should be DATE for BIRTHDATE and TEXT for all others.

Filter UI

This determines if users can filter on this field, and if so, what filter’s user interface should be used to select such a filter:

Filter UI Setting Description
 none

This field will not appear as an available filter for the user.

 CHECKBOXES

Each value of the field will be a checkbox in the UI. This is intended for small lists of values, usually less than 50.

 AUTOCOMP

This will show an input box where users can type a value of the field, and the input box will autocomplete the value. Users can also call up a list of all values and select from that. This is intended for when there are thousands of values for a field.

 TEXTAREA

This will show a textarea where users can write out a single value per line. There is no help in autocompleting these values, nor are they validated currently. This is useful when values are relatively unique, such as encounter IDs, document IDs, etc, and typically there are tens of hundreds of thousands of values in the field.

Show in Patient Table

This decides whether this field is shown as a column in the patient table. Typically, MRN, name, and birthdate are shown. If birthdate is shown, age is also shown in a column after it. The last columns of the table are always comment, tag, and action.

patient table example
Figure 4. The patient table
Show in Patient Demographics Dialog, etc

This decides whether the field is shown in the listed items. The demographics dialog is found by clicking on the patient’s name in the overview table reached after clicking the "highlight documents" button. The "charts" the checkbox refers to are the demographic charts shown in all-patient search, and the demographics tab for a patient list.

For the NAME patient field, unchecking this option will hide the entire name column in the overview table, thus preventing the user from viewing the demographics dialog.

demo dialog1 demo dialog2 demo charts pl demo charts all pt

Show Group as Field Value

This decides whether the field value groups are shown as the value in of the field in filters and elsewhere in the application, or the raw Solr value. Field values and value groups are discussed later.

Document Fields

fields doc edit

Document fields describe the documents index. The fields in the documents index are very customizable, however, EMERSE needs to know which fields to use for certain special purposes. These "special roles" are listed as checkboxes. A single field can play many roles, though only one is typical. The roles are described here:

RPT_TEXT

This field should store the text of the document, and indexed case insensitively.

RPT_TEXT_NOIC

This field should be a copy of the same field as RPT_TEXT, but is indexed with different settings that make it not case-sensitive.

RPT_ID

This is the unique identifier of the document in the index.

MRN

This field should hold the MRN of the patient the document belongs to. It should match the values of the MRN in the patient index.

SOURCE

This field should store the name of the "source" the document is from. More on sources below.

DOC_CONTEXT

This field stores some value that provides context to the document in the results page of all-patient search. All patient search results show only snippets of matched text in the document, hopefully not exposing a lot of PHI, and this field should similarly not show a lot of PHI but help users understand the context around the shown snippets. Often it’s the note type.

snippet doc context
CLINICAL_DATE

This field should store a relevant clinical date of the document, such as the encounter date, exam date, etc. This is used to determine the date range of the index, and shown in the snippets in all-patient search like DOC_CONTEXT.

PATIENT_ENCOUNTER_ID

This field should contain an identifier that links a collection of documents together into an "encounter" or some other greater grouping. Users can add patients to a patient list by a list of these encounter IDs in the add/upload patients tab of a patient list. The label of this field controls the wording on that page. If a filter is enabled for this field, there will be an option to add the values as a filter as well.

patient encounter upload

Field Value Groups

Normally, when a field is shown in EMERSE, the value of that field is shown. However, EMERSE can group field values together, showing the group label for every value in the group, instead of the value itself. This is configured on the values tab of the field.

field values

The mapping between what value is in what group is stored in the SQL database. The first 100 values of that mapping is shown in the table on this page. Each value has a count associated with it, which is the number of documents in Solr that have that value.

Values in Solr

This is the number of values of this field in Solr. This is initially not calculated since it’s expensive. To calculate it, click CHECK.

Values in Database

This is the number of values stored in the SQL database. Click UPDATE button to fetch all values from Solr and store them into the database. As more values appear in Solr, you’ll occasionally have to do this. This updates the count for existing values in the database, along with pulling in any new values. It does not delete any values in the database.

Values with Zero Count

This is the number of values in the database that are marked with a count of zero. If you have recently clicked the UPDATE button on the Values in Database row, then these counts should be up-to-date, and so those values in the database that have zero count never appear in any document in Solr. Thus, they are not really needed. If you wish to delete them from the database, click CLEANUP.

Groups

This is the number of groups in the database.

Values not in a Group

This is the number of values in the database that are not grouped into some group. Generally, if you are using groups at all, you want every value to be in some group, but this isn’t necessary.

The Field Value Groups Table

The table initially shows values, one value per row, and any groups that value is in. However, you can toggle this in the header of the table so the rows of the table are the groups, showing what values are in the groups. (This will not show values not in any group.)

field groups
Figure 5. The table showing groups instead of the default of values.

You may search for a value (or group in the toggled view) with the adjacent input field.

New Value/Group

Create a new row in the table by clicking this button. The new row will be in edit mode until you click CREATE to save the value/group.

Download TSV

This will download a TSV of the entire mapping.

Upload TSV

This will overwrite the entire mapping with the given TSV. The TSV is usually attained by downloading it first, filling in the groups, then uploading.

The TSV format resembles the table:

Count Value Group

123

C282

Abdomen

276

C147

Extremity

276

C147

Hand

276

C147

Left Hand

18

C352

Extremity

18

C352

Knee

18

C352

Left Knee

That is, the first line of the TSV is the header. The first column is the count, the second is the value, and the third is the group that value is in. If a value should be in multiple groups, duplicate the entire row, changing the group at the end. Only the count on the first row of a particular value is used; the others are ignored.

On each row of the table, there is an X in the top right corner. Clicking this will remove the value/group, disassociating it from any groups/values. If a group is ever empty, it will automatically be deleted. The UI will show empty groups until a refresh (since they are never actually stored in the database).

Clicking on the X of the pill showing the associated group/value on the row will remove the connection, but does not delete the clicked-on entity. (But again, groups that become empty will be removed from the databsae.)

You can load values into the database, and configure groups whenever you wish. However, they are only used if EMERSE is configured to use them. Back on the attributes tab of the field, the checkbox "Show Group as Field Value" will cause EMERSE to show the group whenever it would have shown the value.

show dept value
Figure 6. Before showing groups as field values
show dept group
Figure 7. After showing groups as field values

Notice that if a single value is mapped to multiple groups, each group is shown separated by slashes. Similarly, if a filter is set for the field, the options change to the groups:

filter dept value
Figure 8. Before showing groups as field values
filter dept group
Figure 9. After showing groups as field values

Sources

The field that plays the SOURCE special role tells EMERSE which source a document is from. A source is generally a source system of documents, such as the main EHR system, like Epic, or other more specialized systems, like SoftPathDx for Pathology, or legacy EHR systems. EMERSE allows documents from different sources to use and show Solr fields differently. For instance, the DEPT field might store department information if the document is from the main EHR source, but for the pathology source, it may store the lab instead. Thus, you would want to label the field "Department" for the main EHR, but "Lab" for pathology. To do this, EMERSE allows you to set a source for a field. Doing so will move the field to a separate group just for that source on the left-hand side. Once a field is associated with a source, it is only used on documents from that source. If a field is associated with a specific source, it cannot play a special role in EMERSE.

Sources are described as the values of the field that plays the SOURCE special role. If you go to the values tab for that field, you will get a different interface:

source values

Here are a few differences between this sources page and other field value pages:

  • Sources cannot be grouped.

  • Each source (a row in the table) has a label and (Solr) value. The label is shown in the UI, the value must match the value in Solr.

  • Each source has an "Initial Sort". This tells what column to sort the rows in the summaries table by. (The summaries table is reached after clicking on a cell of the overview table.) The field chosen should have the "Show in Summaries Table" checkbox checked so that the field is a column in the table.

    initial sort
  • If a field is marked as HTML, then its text will be rendered as HTML in the UI. (This allows showing images, tables, lists, and other rich formatting in documents.) Otherwise, it will be shown with a monospace font with newlines preserved (like in a <pre> tag).

  • Sources have an order like how fields have an order. You can re-order sources by dragging them from the triple dots in the top left corner. Source order determines the order of columns in the overview table, and the order of groups of fields in filters.

  • Finally, after you’ve made changes, to save them, you must explicitly click the save button.

Source-Specific Fields

Once sources have been defined, any document field can be made source-specific by setting the source on its attributes tab. Each source has its own group of fields in the left-hand sidebar, underneath the "Document Fields" group which are for document fields not associated with a specific source. Setting the field’s source will move it into the source’s group, where you can now order it with respect to the other source-specific fields. Source-specific fields always appear after cross-source fields whenever both are shown (just like they are in the sidebar).

Source specific fields are only shown when showing a document from that specific source. Similarly, when loading field values from Solr, a source-specific field only looks at documents from that particular source, instead of all documents loaded into the index.

Multiple Field Mappings

A single field in Solr can be the backing Solr field of many fields in EMERSE. That is, you can create multiple fields in the fields tab, and select the same Solr Field for all of them. This allows you to take the same field, and group its values two different ways.

Suppose you have a single Solr field called IMAGE_DESC which stores a kind of code of the imaging done, which practically speaking contains two pieces of information: what was imaged, and what the imaging modality was. Even though there is only one field in Solr, we can split this into two fields in EMERSE by creating an imaged body part field and a imaging modality field, both backed by the same IMAGE_DESC field in Solr. We then group the values different. For the imaged body party field, the groups are body parts, putting any imaging description that mentions the body part under the group for that body part. We do the same for the imaging modality, but where the groups are the modalities.

Another way to use the ability to map the same field multiple times is to provide different levels of granularity of some hierarchy. You may have the field mapped once without grouping, and then once with grouping.

However, since a value can appear in many groups, it is possible to have all level sin a hierarchy in a single mapping by having both granular groups and coarse groups defined side-by-side. Similarly, you can define groups for imaged body parts and for imaging modality for a single EMERSE field, rather than splitting that into two fields.

The choice is of which to do is mainly a preference, but remember that a document must match all filters to match the search (except filters for a source the document isn’t from), and to match a filter, the document must have only one of the values required by the user. Thus, selecting multiple values for a single filter is different than selecting a single value in a multiple filters (even when they are mapped to the same underlying Solr field).

In other words, selecting the Ankle and CT groups in a filter on an imaging description field has the semantics of matching documents marked as either ankle or CT, whereas selecting Ankle in a filter on the imaged body part field, and selecting CT in a filter on the imaging modality field has the semantics of matching documents marked as both ankle and CT.

System Synchronization

The System tab in the admin app provides a feature to help synchronize components of the system. In general this should not be needed since synchronization automatically occurs once per night. However, when installing the system or troubleshooting it may be useful to force these events to occur immediately so that changes can be verified. To invoke this, simply click on the Synchronize button. Some of these actions may take time, though.

The Synchronize action will:

  1. Copy the Patient table from the database to the master patient Solr index

  2. Replicate the master patient Solr index to the patient-slave Solr index

  3. Update document statistics as they are displayed in the UI, such as the number of patients and the date range of the documents. Note that the number of patients displayed in the UI is the number of distinct patients with at least one document in the Solr documents index, not the number of patients in the Patient table.

Additional details about how to check the progress of these synchronization steps can be found in the Troubleshooting Guide.

System Caches

The Caches page under the System tab allows you to force EMERSE to refresh its caches of certain database tables. This is normally done periodically on a "cron job" (see the Configuration Guide), however, during installation, it may be helpful to force a refresh if you made significant changes to the database, but don’t want to restart EMERSE.

Heatmap (Overview) Statistics Page

The Heatmap (now called the Overview within the application) Statistics page under the System tab contains some statistics about the performance of EMERSE to run queries from the "Overview Page" in the EMERSE application. These can be especially slow to run since the queries can be complex and numerous, in addition to the fact that it’s one of the most used parts of EMERSE.

The statistics are split up into a few sections. The only statistics collected are simple minimum, maximum, and average. Some statistics are point-in-time snapshots, but most are cumulative, and can be reset to zero with the "Reset Stats" button. The "Reload Stats" button allows you to update the statistics displayed; they are not live. Reloading the whole page would do the same, but the button is much faster.

The Fair Heatmap Query Scheduler Algorithm

To understand many of the statistics, it’s best to explain the internal scheduling mechanism used to run heatmap queries.

When a user views a page of patients in the Overview table, this triggers the backend to add a batch of overview-table rows to a data structure we call the fair batch heap. All of the rows of a page of the table go into the same batch.

There are then a configurable number of threads that each pull a row from the highest-priority batch in the fair batch heap, and run queries to fill out that row. If there are N columns in the table (ie, N sources), then there are 2N queries done to produce simple counts of matching documents, or N + NC queries done for the mosaic view, where C is the number of colors in the term bundle.

After filling out the row, the thread sends the row to the browser, goes back to the fair batch heap, and re-proirities the batch based on the amount of time it took to fill out the row. It then starts again, pulling a row from the highest-priority batch.

The prioritization of batches is determined by a priority number maintained on the batch. Batches start with zero priority. If a row from a batch takes M milliseconds to fill out, and there are N other batches in the fair batch heap, then the batch that contained the processed row is docked MN priority, and every other batch is given M more priority. If a batch completes but has non-zero priority, it’s remaining priority is distributed evenly between the remaining batches in the fair batch heap. This ensures there is no "drift" from zero of the avarage priority of a batch in the heap. (In the code, this is actually phrased with "penality" which is just negative priority.)

The Statistics

Batch Heat Statistics

These statistics are a point-in-time snapshot of the number of rows in the batches in the heap. In particular, this lets you know how many pages of the overview are being processed concurrently.

Row Query Time Statistics

These statistics are cumulative since the last reset. Each data point is the duration needed to fill out a row. The unit is milliseconds.

Intra-Batch Service Delay Statistics

These statistics are cumulative since the last reset. Since batches are processed in a changing priority order, if there are lot of batches, and if a batch is slow to process, it may be a rather long time before even one row is taken from the batch to be processed. The data points of these statistics are the durations from one row being taken from a batch to the next row being taken from the same batch. Generally, the minimum duration is very low, since at some point multiple threads will grab a row from the same batch roughly simultaneously, especially on the first batch in the heap, when all threads are inactive and grab from the one and only batch in the heap.

Cancelled Batch Jobs

These statistics are cumulative since the last reset. This tracks the number of rows remaining in a batch that was cancelled. It’s mainly here to ensure the cancel mechanism is working, and to show how much work is saved by employing a cancel mechanism. A batch is cancelled if a user changes the search, or after a minute of inactivity from the browser, in the case they closed their browser tab that initiated the search.

Batch Computation Stats

These statistics are a point-in-time snapshot. This section is actually a list of Row Query Time Statistics, but specific to each batch currently in the fair batch heap. The data points are the number of milliseconds it took to process a row from the given batch. If you suspect a single very long-running query is slowing down the system, you should be able to see that from the statistics here.