EMERSE Data Guide

Overview

This guide covers issue surrounding initial setup of the system, specifically as it relates to the customization of patient and document data sources.

Before Getting Started

Before getting started, it is important to note a few things.

You must provide EMERSE both patient data and document data. Patient data is data describing the actual patient, such as the patient name and MRN. Document data is data describing the medical records or other records connected to a patient.

EMERSE has two places to store this imported data: (1) in the Oracle database, and (2) in Solr’s indexes. Patient documents are stored exclusively in Solr’s indexes. Patient data is stored in the Oracle database, but EMERSE periodically copies this data into Solr indexes to aid in searches.

Finally, there is data that EMERSE generates and uses on its own, such as patient lists, term bundles, audit logs and user account information. This is mostly stuff you will not need to change, as it can be changed through the user interface. This guide focuses only on imported data described above. However, a full data dictionary is available if more details are desired.

The data for the documents and document indices are not stored within the relational database. Instead these are managed by Solr in its own data store and can be on a separate server from the Oracle database.

It is also important to note that as you are making changes to the database tables described here, some changes might be reflected immediately within the user interface, whereas others might not. If you run into trouble when making modifications, a good first step would be to restart everything and then see if the changes have taken place. For example, if you make modifications to the tables and then add clinical documents to the document index, those changes may not be reflected immediately until the index has been closed and re-opened (or Solr is restarted). Similarly, if patients are added to the database table, they are typically only copied to the corresponding Solr index once each day through a scheduled job (details in the Configuration Guide). It is possible to force this to happen more frequently when needed (potentially useful during initial setup and testing), and if that is desired such issues are detailed in the Troubleshooting Guide.

Patient Demographics

PATIENT table

Table: PATIENT
Population: From external source (such as EHR)
Population Frequency: Can be variable, but once per day is reasonable

The EMERSE schema includes a patient table with medical record number (MRN), name, date of birth, and other demographic information which is displayed in the search results. Data in this table are used to display the patient name, validate user-entered or uploaded MRNs and to calculate current ages of the patients. Other demographic data are used to summarize the populations found in a search.

Although the coded demographics information is not required, some features such as the demographics breakdowns within the All Patients search feature will not work if sex, race, ethnicity are not populated.

For all documents indexed, there must be a corresponding patient in the patient table with a medical record number (MRN) that matches the document. This should be taken into consideration when determining the frequency for updating this table.

Column name	Description	Required or Optional
id	Primary Key	Required
external_id	Medical Record Number	Required
first_name	First Name	Required
middle_name	Middle Name	Optional
last_name	Last Name	Required
birth_date	Birth Date — used to calculate current age	Required
sex_cd	Sex	Optional
language_cd	Language	Optional
race_cd	Race	Optional
marital_status_cd	Marital Status	Optional
religion_cd	Religion	Optional
zip_cd	ZIP code	Optional
create_date	Date the row was created. Can be used to track changes to the table.	Optional
update_date	Date the row was updated. Can be used to track changes to the table.	Optional
deleted_flag	Logical delete flag. Useful for merged patients. Valid values are 1 = yes, deleted; 0 = no, not deleted	Required
deceased_flag	Currently not used. Valid values are: 1 = yes, deceased; or 0 = no, not deceased	Required

Column name

Description

Required or Optional

Primary Key

Required

external_id

Medical Record Number

Required

first_name

First Name

Required

middle_name

Middle Name

Optional

last_name

Last Name

Required

birth_date

Birth Date — used to calculate current age

Required

sex_cd

Sex

Optional

language_cd

Language

Optional

race_cd

Race

Optional

marital_status_cd

Marital Status

Optional

religion_cd

Religion

Optional

zip_cd

ZIP code

Optional

create_date

Date the row was created. Can be used to track changes to the table.

Optional

update_date

Date the row was updated. Can be used to track changes to the table.

Optional

deleted_flag

Logical delete flag. Useful for merged patients. Valid values are 1 = yes, deleted; 0 = no, not deleted

Required

deceased_flag

Currently not used. Valid values are: 1 = yes, deceased; or 0 = no, not deceased

Required

Many of the columns in the patient table use codes for their values- sex, race, ethnicity, etc. Although these values are not constrained by the database, the UI can display descriptions for them in the patient demographics areas, and are used in the bar charts that breakdown sex, race, gender in the "All Patient Search" feature. The lookup tables for these codes are

LKP_PATIENT_RACE
LKP_PATIENT_GENDER
LKP_PATIENT_SEX
LKP_PATIENT_MARITAL_STATUS
LKP_PATIENT_RELIGION
LKP_PATIENT_ETHNICITY

These tables all have the same structure:

Column	Description
DESCRIPTION	The description that is shown in the User Interface
CODE	The coded value in the patient table

Column

Description

DESCRIPTION

The description that is shown in the User Interface

CODE

The coded value in the patient table

The EMERSE distribution has default codes already in the tables, but it is important to make sure that these codes match what your own local institution uses. Otherwise it will be necessary to update these tables with your local codes and text descriptions.

An example of how these codes are used within EMERSE can be found in our Virtual Machine Guide.

The data in the Patient table are automatically copied to a Solr index, by default once per day. This is detailed in the Configuration Guide.

Research Studies and Attestation

Immediately after each login, every user is required to ‘attest’ to their use of EMERSE for that session by specifying their reason for using the system. This is called the ‘Attestation’ page, and the results are stored in the SESSION_ATTESTATION table. EMERSE provides three options (configurable by a system administrator) for this attestation: (1) a free text box, (2) ‘Quick Buttons’ for choosing pre-selected options that are commonl used (for example, “Quality Improvement”, “Patient Care”, “Infection Control”, etc), and (3) a table of research studies to which a user is associated. Additional tables RESEARCH_STUDY_ATTESTATION, and ATTESTION_OTHER may contain additional information depending on whether the user specifies a research study, or other reasons. The free text option would be used by users when no other attestation choices are reasonable. Additionally, previously used entries from the free text box will appear in the table, along with any IRB-approved studies, for the user’s convenience.

For our implementation at Michigan Medicine, we pull data on all studies in the IRB system, even if the study/person is not currently a part of EMERSE. This is because the dataset is generally small, and it makes it easier for users to validate their studies if the data are already populated, once the user is given an EMERSE account

Figure 1. Entity relationship diagram of some tables related to capturing attestion data which occurs immediately after a user logs in.

RESEARCH_STUDY table

Table: RESEARCH_STUDY
Population: Populated from external source such as an electronic IRB system
Population Frequency: Can be variable, but once per day is reasonable

If a user is required to select his/her study from the table, then delays in moving IRB data to EMERSE after IRB approval can result in delays access for that user.

This table contains information about research studies. Using this table and RESEARCH_STUDY_MEMBER allows EMERSE to show a list of studies the end user is associated with.

Column name	Description	Required or Optional
id	Primary Key	Required
external_id	IRB study number — used to link specific studies to usage, and is very helpful for tracking research usage	Required
study_name	Name of the study	Required
principal_investigator_name	Name of the principal investigator	Required
prin_invest_org_id	id of principal investigator. Not currently used by EMERSE. This could be a user id, or email, but it is a good idea to ensure it is unique.	Optional
expiration_date	Expiration date of study. Used to determine if a user should be allowed to proceed. If the expiration date is older than the current date, the user will not be able to select it in the attestion GUI.	Required
project_status	Current project status. This is used to track where a study is in the review and approval process. Only certain study statuses allow access to EMERSE for research. The statuses that allow a study to be selected during attestation are defined in the VALID_RES_STUDY_STATUS table.	Required
last_updated	A last updated date is not used by EMERSE, but can be useful for troubleshooting and tracking changes to the table.	Optional
begin_date	This originally referred to the date the study began or should be allowed to begin. This field that can be used for tracking and troubleshooting.	Optional

Column name

Description

Required or Optional

Primary Key

Required

external_id

IRB study number — used to link specific studies to usage, and is very helpful for tracking research usage

Required

study_name

Name of the study

Required

principal_investigator_name

Name of the principal investigator

Required

prin_invest_org_id

id of principal investigator. Not currently used by EMERSE. This could be a user id, or email, but it is a good idea to ensure it is unique.

Optional

expiration_date

Expiration date of study. Used to determine if a user should be allowed to proceed. If the expiration date is older than the current date, the user will not be able to select it in the attestion GUI.

Required

project_status

Current project status. This is used to track where a study is in the review and approval process. Only certain study statuses allow access to EMERSE for research. The statuses that allow a study to be selected during attestation are defined in the VALID_RES_STUDY_STATUS table.

Required

last_updated

A last updated date is not used by EMERSE, but can be useful for troubleshooting and tracking changes to the table.

Optional

begin_date

This originally referred to the date the study began or should be allowed to begin. This field that can be used for tracking and troubleshooting.

Optional

VALID_RES_STUDY_STATUS table

Table: VALID_RES_STUDY_STATUS
Population: By System Admin. Only needed if research studies need to be validated
Population Frequency: May only need to be done once, at the time of system setup. May need periodic updates if the source data (such as from IRB system) defining study status is changed.

EMERSE contains a simple table defining study statuses. The statuses that are initially populated in the system (loaded up in the build script) are unique to Michigan Medicine (that is, they were developed locally and are implemented in our separate electronic IRB tracking system) and other implementations would have to have their own set of valid statuses if these were to be used to validate and approve usage for research. If the status of a research study is not in this table, EMERSE will not allow the study to be used for attestation; that is, the study would not even be displayed to the user to select.

Column name	Description	Required or Optional
status	A list of study statuses that EMERSE considers valid in terms of allowing a user to proceed. These statuses are generally defined by the IRB and are universal across studies.	Required (if research studies need to be validated before presenting them to the user)

Column name

Description

Required or Optional

status

A list of study statuses that EMERSE considers valid in terms of allowing a user to proceed. These statuses are generally defined by the IRB and are universal across studies.

Required
(if research studies need to be validated before presenting them to the user)

VALID_RES_STUDY_STATUS Table Example:

Status
Exempt Approved - Inital
Approved
Not Regulated
Exempt Approved - Tranistional

Status

Exempt Approved - Inital

Approved

Not Regulated

Exempt Approved - Tranistional

RESEARCH_STUDY_MEMBER table

Table: RESEARCH_STUDY_MEMBER
Population: Populated from external source such as an electronic IRB system
Population Frequency: Can be variable, but once per day is reasonable

This table contains information about study team members, and is related to the RESEARCH_STUDY table, described above. Each study can have one or many study team members.

This table at Michigan Medicine contains information on all study team members for all studies, whether they have an EMERSE account or not.

Column name Description Required or Optional

Column name	Description	Required or Optional
RESEARCH_STUDY_ID	Foreign key reference to row `id` in `RESEARCH_STUDY` table	Required
USER_ID	Foreign key reference to row in `LOGIN_ACCOUNT` table	Required
ROLE_NAME	A string describing a person’s role on the study team. EG. “PI”, “Staff”, “Study Coordinator”. This can be useful when generating usage reports.	Optional
FIRST_NAME	First name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.	Optional
LAST_NAME	Last name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.	Optional
BEGIN_DATE	This is not currently used by EMERSE.	Optional
LAST_UPDATED	Date row was last updated	Optional
DELETED	Flag to indicate if the record has been logically deleted. 0 = false, not deleted; 1 = true, deleted.	Required

RESEARCH_STUDY_ID

Foreign key reference to row id in RESEARCH_STUDY table

Required

USER_ID

Foreign key reference to row in LOGIN_ACCOUNT table

Required

ROLE_NAME

A string describing a person’s role on the study team. EG. “PI”, “Staff”, “Study Coordinator”. This can be useful when generating usage reports.

Optional

FIRST_NAME

First name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.

Optional

LAST_NAME

Last name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.

Optional

BEGIN_DATE

This is not currently used by EMERSE.

Optional

LAST_UPDATED

Date row was last updated

Optional

DELETED

Flag to indicate if the record has been logically deleted. 0 = false, not deleted; 1 = true, deleted.

Required

SESSION_ATTESTATION table

Table: SESSION_ATTESTATION
Population: Used internally by EMERSE
Population Frequency: In real time by EMERSE

Each time a user attests to why they are using EMERSE, a row is inserted into this table, which is one of the audit tables. Attestations related to research can be joined to the RESEARCH_ATTESTION table. Non research uses can be joined to ATTESTION_OTHER.

Column name Description Required or Optional

Column name	Description	Required or Optional
id	Primary Key	N/A (populated internally by EMERSE)
type	A string indicating the top level category of attestation. `RSA` indicates session is used for research. `OTH` means other usage. Research attestations will have an associated row in `RESEARCH_ATTESTATION`. If the type is `OTH`, a row will also exist in `OTHER_ATTESTATION_REASON` when the system populates that table at the time of a user attestation after login.	N/A (populated internally by EMERSE)
User_session_id	A foreign key reference to the `USER_SESSION` table	N/A (populated internally by EMERSE)

Primary Key

N/A (populated internally by EMERSE)

type

A string indicating the top level category of attestation. RSA indicates session is used for research. OTH means other usage. Research attestations will have an associated row in RESEARCH_ATTESTATION. If the type is OTH, a row will also exist in OTHER_ATTESTATION_REASON when the system populates that table at the time of a user attestation after login.

N/A (populated internally by EMERSE)

User_session_id

A foreign key reference to the USER_SESSION table

N/A (populated internally by EMERSE)

OTHER_ATTESTATION_REASON table

Table: OTHER_ATTESTATION_REASON
Population: By System Admin. Only needed if commonly used text reasons are needed as quick buttons in the application
Population Frequency: May only need to be done once, at the time of system setup.

For non-research attestations, there is a lookup table called OTHER_ATTESTATION_REASON that lists available options. These can be configured by each institution, and may include commonly used access reasons that don’t involve research (such as quality improvement, patient care, etc). These options (other than the Free text reason) can be used to populate “quick buttons” that provide a simple way for a user to click on one of the common reasons for use.

Column name	Description	Required or Optional
USER_KEY	Text based primary key of this table. The column name might better be thought of as as 'reason key'.	Required
DESCRIPTION	The text description that will be displayed in the Quick Buttons section of the Attestation page.	Required
DELETED FLAG	Has this reason been deleted? (0 = no; 1= yes)	Required
DISPLAY_ORDER	Order of display in the UI. Can be any integer, but should be unique per row. The buttons are ordered by this column via sql sort. Generally start with 0,1,2, etc.	Optional

Column name

Description

Required or Optional

USER_KEY

Text based primary key of this table. The column name might better be thought of as as 'reason key'.

Required

DESCRIPTION

The text description that will be displayed in the Quick Buttons section of the Attestation page.

Required

DELETED FLAG

Has this reason been deleted? (0 = no; 1= yes)

Required

DISPLAY_ORDER

Order of display in the UI. Can be any integer, but should be unique per row. The buttons are ordered by this column via sql sort. Generally start with 0,1,2, etc.

Optional

OTHER_ATTESTATION_REASON Table Example:

USER_KEY	DESCRIPTION	DISPLAY_ORDER
QI	Quality Improvement	0
RVPREPRES	Review Preparatory to Research	1
STDYDESC	Study involving only decedents (deceased patients)	2

ATTESTATION_OTHER table

Table: ATTESTATION_OTHER
Population: Used internally by EMERSE
Population Frequency: Application dependent

The free text reasons that users enter are stored in a table called ATTESTATION_OTHER. This is populated by EMERSE and is not customizable by users.

Column name Description Required or Optional

Column name	Description	Required or Optional
SESSION_ATTESTATION_ID	A unique ID for the session attestation. Used for audit logging.	Required
FREE_TEXT_REASON	The free text reason that a user entered.	Required
OTHER_ATTEST_REASON_KEY	This will currently only be populated by the system with `FRETXT`.	Required

SESSION_ATTESTATION_ID

A unique ID for the session attestation. Used for audit logging.

Required

FREE_TEXT_REASON

The free text reason that a user entered.

Required

OTHER_ATTEST_REASON_KEY

This will currently only be populated by the system with FRETXT.

Required

ATTESTATION_OTHER Table Example:

SESSION_ATTESTATION_ID	FREE_TEXT_REASON	OTHER_ATTEST_REASON_KEY
50208	Testing out the system	FRETXT
52060	Testing out the system	FRETXT
46051	Looking up a patient in clinic	FRETXT
71052	infection control monitoring	FRETXT
74107	cancer registry operational work	FRETXT

SESSION_ATTESTATION_ID

FREE_TEXT_REASON

OTHER_ATTEST_REASON_KEY

50208

Testing out the system

FRETXT

52060

Testing out the system

FRETXT

46051

Looking up a patient in clinic

FRETXT

71052

infection control monitoring

FRETXT

74107

cancer registry operational work

FRETXT

Clinical Documents

EMERSE search is enabled by the indexing of clinical text documents by Apache Solr. Documents in a clinical environment can come from a myriad of sources like transcription, radiology, and pathology, or from an electronic health record. Normally the structure, data, and metadata related to these documents from different sources varies considerably.

To simplify things, we configure Solr with a single document schema containing all fields from all sources. This requires that documents from different sources use fields in a consistent way. For instance, if one source uses field X for purpose Y, another source must use field X for only purpose Y as well, or not at all. Certain essential elements, such as patient MRN, clinical date, document source key, and document text are required to be to certain fields, and should not be configured differently.

The structure of clinical documents stored in Solr is described by three tables.

DOCUMENT_SOURCE lists the data sources,
DOC_FIELD_EMR_INTENT lists the purposes of fields (such as being the document text, being the document date, or being the unique document identifier),
and finally the table DOCUMENT_FIELDS lists the fields of each data source, stating for each field, what its Solr field name is, and what the abstract purpose of the field is.

DOC_FIELD_EMR_INTENT also associates a Solr field for with each abstract purpose (through the column DEFAULT_LUCENE_NAME), those these are not always used.

Figure 2. Entity relationship diagram of some tables related to clinical documents.

DOC_FIELD_EMR_INTENT table

Table: DOC_FIELD_EMR_INTENT
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

This table lists the abstract purposes of Solr fields across all document sources.

Each row is marked as required or optional. Required rows indicate that the Solr field (found in the DEFAULT_LUCENE_NAME column) must be used for that purpose across all data sources. Optional rows indicate that the Solr field name is found on DOCUMENT_FIELDS table, not in the DEFAULT_LUCENE_NAME column in this table.

You can customize the value of the DEFAULT_LUCENE_NAME column only for two rows:

CLINICAL_DATE
LAST_UPDATED

This means all Solr documents for all data sources must use the following Solr field names:

ID for the unique identifier for the document
RPT_TEXT for the text of the document (what is searched)
MRN for the medical record number of the document, linking the document to a patient in the PATIENT table
RPT_TEXT_NOIC for the non-case sensitive indexed version of RPT_TEXT.

Name	Description	DEFAULT_LUCENE_NAME (aka the Solr field)	Required or Optional
MRN	Patient medical record number, which is a unique patient identifier	MRN	Required
RPT_ID	Unique document identifier. This must be unique across all documents and sources	ID	Required
CLINICAL_DATE	Date when the clinical event occurred. Often this would be considered the "note date" or "document date". When displayed for users within EMERSE in the Summaries section, this is the default sort column with the most recent date shown at the top.	ENCOUNTER_DATE	Required, customizable
LAST_UPDATED	Date when the document was last updated, since changes are sometimes made to documents	LAST_UPDATED	Required, customizable
RPT_TEXT	The actual text of the clinical document. This field is used by Lucene for lower-case indexing (case-insensitive searching).	RPT_TEXT	Required
RPT_TEXT_NOIC	A copy of the document text to be indexed using a case-sensitive Lucene filter (NOIC = NO Ignore Case)	RPT_TEXT_NOIC	Required
TEXT	Any generic text field. Note that a document may have multiple of these types of generic text fields (e.g., clinical service, document type, clinician name, etc). This is useful when additional metadata are associated with the document and should be displayed. If this field is also defined in the Solr configuration it can become searchable in advanced search. Otherwise, it could still potentially be used to help filter queries based on additional metadata (e.g., 'study type').	ignored	Optional
DATE	Any generic date field, since a document may have more than one kind of date associated with it. Otherwise, it could still potentially be used to help filter queries based on additional metadata	ignored	Optional
ENCOUNTER_ID	This is no longer used. It had been used for a time to search across all patients without limiting it to a set of medical record numbers.

Name

Description

DEFAULT_LUCENE_NAME (aka the Solr field)

Required or Optional

MRN

Patient medical record number, which is a unique patient identifier

MRN

Required

RPT_ID

Unique document identifier. This must be unique across all documents and sources

Required

CLINICAL_DATE

Date when the clinical event occurred. Often this would be considered the "note date" or "document date". When displayed for users within EMERSE in the Summaries section, this is the default sort column with the most recent date shown at the top.

ENCOUNTER_DATE

Required, customizable

LAST_UPDATED

Date when the document was last updated, since changes are sometimes made to documents

LAST_UPDATED

Required, customizable

RPT_TEXT

The actual text of the clinical document. This field is used by Lucene for lower-case indexing (case-insensitive searching).

RPT_TEXT

Required

RPT_TEXT_NOIC

A copy of the document text to be indexed using a case-sensitive Lucene filter (NOIC = NO Ignore Case)

RPT_TEXT_NOIC

Required

TEXT

Any generic text field. Note that a document may have multiple of these types of generic text fields (e.g., clinical service, document type, clinician name, etc). This is useful when additional metadata are associated with the document and should be displayed. If this field is also defined in the Solr configuration it can become searchable in advanced search. Otherwise, it could still potentially be used to help filter queries based on additional metadata (e.g., 'study type').

ignored

Optional

DATE

Any generic date field, since a document may have more than one kind of date associated with it. Otherwise, it could still potentially be used to help filter queries based on additional metadata

ignored

Optional

ENCOUNTER_ID

This is no longer used. It had been used for a time to search across all patients without limiting it to a set of medical record numbers.

DOCUMENT_SOURCE table

Table: DOCUMENT_SOURCE
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

Each source of documents (e.g., pathology, radiology, commercial EHR, legacy EHR, etc.) is listed as a row in the document_source table. The EMERSE application searches and displays the results based on document source. Additionally, advanced search queries can leverage these source data to limit queries to a specific source (e.g., searching only pathology reports). Document sources normally differ in their format and metadata depending on the source of origin. Each row in this table corresponds to a column in the Overview display within EMERSE, and as a subset of documents when a patient is selected.

Column name Description Required or Optional

Column name	Description	Required or Optional
SOURCE_KEY	A short name or abbreviation for the document source. This field needs to be unique as it is the primary key of the table, and search results are displayed on separate tabs for each source. However, this name is not displayed to the user but instead is used to match the name of the document source as defined in the Solr configuration, in the `schema.xml` file. In other words, the `source_key` should be the same names of the sources that are defined in the `source` field in the solr schema (`schema.xml`).	Required
USER_DESCRIPTION	A Description for the source of documents. This field is used only internally and can be useful for system admins who set up EMERSE to provide a a description of the `source_key` for easier recognition. This is not displayed to users in the UI.	Required
HTML_FLAG	If set to false (`0`), the body of the note is wrapped in a `<pre>` tag, which will help preserve line feeds for display with documents that were not originally formatted for the web. If true (`1`) then it means that the document is already in HTML format with possible formatting tags, and no `<pre>` tag is used to wrap the document.	Required
PRELIMINARY_DOC_FLAG	If it is possible that the source will have documents without text, this can be set to `1`. If it is set to `1` another column of metadata will be displayed to users in the Summaries table for that document source in the UI. The heading for that column will be "Final" and it will show a `Y` in a document row if the document is final (meaning that there is at least some text there) or a `N` in a document row if it is not final (meaning that the document is blank). Typically this flag would be set to `0`. Currently this is based on a source, and there is no setting on a document level (for example, to change the preliminary status based on whether a document is signed or not), although such a feature is possible for a future version.	Required
DISPLAY_NAME	The name of the source as it is displayed in the UI (e.g., "Pathology", "Radiology", "Main EHR").	Required
CSS_DISPLAY_PREFIX	Prefix used internally by CSS components in the UI. This can be anything, but each source must have a unique `css_display_prefix`. Additionally, it should conform to typical CSS naming conventions (e.g., no spaces, no quotation marks, etc). This is not displayed to the user; it is basically just an ID for the different tabs of the source documents.	Required
DISPLAY_ORDER	Order in which sources appear in the Overview and the tabs within the Summary results page. Each row should have a distinct display order. Start sequential numbering with `0`.	Required
EXTERNAL_SOURCE	Currently not in use. May be used if documents need to be displayed externally, for example with a PDF viewer outside the browser, and will not be displayed using SOLR’s copy of the document. (`1` = yes, external source; `0` = no, no external source). Generally this should always be set to `0`.	Required

SOURCE_KEY

A short name or abbreviation for the document source. This field needs to be unique as it is the primary key of the table, and search results are displayed on separate tabs for each source. However, this name is not displayed to the user but instead is used to match the name of the document source as defined in the Solr configuration, in the schema.xml file. In other words, the source_key should be the same names of the sources that are defined in the source field in the solr schema (schema.xml).

Required

USER_DESCRIPTION

A Description for the source of documents. This field is used only internally and can be useful for system admins who set up EMERSE to provide a a description of the source_key for easier recognition. This is not displayed to users in the UI.

Required

HTML_FLAG

If set to false (0), the body of the note is wrapped in a <pre> tag, which will help preserve line feeds for display with documents that were not originally formatted for the web. If true (1) then it means that the document is already in HTML format with possible formatting tags, and no <pre> tag is used to wrap the document.

Required

PRELIMINARY_DOC_FLAG

If it is possible that the source will have documents without text, this can be set to 1. If it is set to 1 another column of metadata will be displayed to users in the Summaries table for that document source in the UI. The heading for that column will be "Final" and it will show a Y in a document row if the document is final (meaning that there is at least some text there) or a N in a document row if it is not final (meaning that the document is blank). Typically this flag would be set to 0. Currently this is based on a source, and there is no setting on a document level (for example, to change the preliminary status based on whether a document is signed or not), although such a feature is possible for a future version.

Required

DISPLAY_NAME

The name of the source as it is displayed in the UI (e.g., "Pathology", "Radiology", "Main EHR").

Required

CSS_DISPLAY_PREFIX

Prefix used internally by CSS components in the UI. This can be anything, but each source must have a unique css_display_prefix. Additionally, it should conform to typical CSS naming conventions (e.g., no spaces, no quotation marks, etc). This is not displayed to the user; it is basically just an ID for the different tabs of the source documents.

Required

DISPLAY_ORDER

Order in which sources appear in the Overview and the tabs within the Summary results page. Each row should have a distinct display order. Start sequential numbering with 0.

Required

EXTERNAL_SOURCE

Currently not in use. May be used if documents need to be displayed externally, for example with a PDF viewer outside the browser, and will not be displayed using SOLR’s copy of the document. (1 = yes, external source; 0 = no, no external source). Generally this should always be set to 0.

Required

DOCUMENT_SOURCE Table Example:

source_key	user_description	html_flag	display_name	css_display_prefix	display_order
epic	Primary EHR	1	Epic EHR	ehr	0
rad	Radiology Documents	0	Radiology	rad	1
path	Pathology Document	0	Pathology	path	2

DOCUMENT_FIELDS table

Table: DOCUMENT_FIELDS
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

This table provides EMERSE with information about what fields are available in the underlying Solr index, their data type, and additional metadata. Each field indexed with Solr should exist in this table for each source system in the DOCUMENT_SOURCE table. The column EMR_INTENT is linked to the NAME column of the DOC_FIELD_EMR_INTENT mapping table. The column DOC_SOURCE_KEY is linked to the SOURCE_KEY column of the DOCUMENT_SOURCE table.

Each document source should have at least six rows in this table corresponding to the required purposes listed in DOC_FIELD_EMR_INTENT. The value of the SOLR_FIELD_NAME column of these rows should be exactly the value specified in the DEFAULT_LUCENE_NAME column of the matching row in DOC_FIELD_EMR_INTENT.

Additional fields can be specified using the generic EMR_INTENT options of TEXT or DATE. (These are the optional purposes in the DOC_FIELD_EMR_INTENT table.) These additional metadata fields are used by EMERSE for display in the UI but are not used for search. However, you must configure Solr’s index schema.xml to store these fields.

Some of the data defined here includes the document text itself, but also the metadata fields that will likely vary for each source system (e.g., authoring clinician, clinical service, document identifier, date of service, etc). The metadata can be displayed (or hidden) in two basic places in the EMERSE UI which are:

Within the Summaries table in the EMERSE UI that shows a listing of all documents for a single patient and a specific document source (referenced in the DOCUMENT_FIELDS table with the summary_display_flag column).
Inside a small box in the EMERSE UI that shows document-specific metadata that is shown above a single document after a user drills down to view a document (referenced in the DOCUMENT_FIELDS table with the display_flag column)

The metadata displayed within the EMERSE UI can, for the most part, be ordered by using the display_order column defined in the DOCUMENT_FIELDS table. The display order applies to both places in the UI where the data can be displayed, but note that for each of these two locations the system can be setup to display or not display the metadata element. If the ordering is not listed correctly in this DOCUMENT_FIELDS table (e.g., two items are given the same ordering number, no error will occur for the user, but the actual order may be unpredictable).

Column name Description Required or Optional

Column name	Description	Required or Optional
SOLR_FIELD_NAME	Name that corresponds with the Solr document field . The names of the fields are specified in Solr `schema.xml` file, and the names in this column must match what is listed in the `schema.xml` file. This is distinct from the Solr source name, which is defined in the `document_source_key` column in this table, described below.	Required
DATATYPE	Mainly used by the UI. Should be either `Text` or `Date` (case-sensitive)	Required
DISPLAY_ORDER	Order in which fields need to appear in the search results, either in the Summaries section of the UI or in a small box above a single displayed document. This should be unique among rows for each source but note that some elements (such as the text of the document itself) would not actually be displayed as a metadata element.	Required
DISPLAY_NAME	Name that appears in the UI	Required
EMR_INTENT	Specifies the purpose of the field. This refers to the values of the `NAME` column in the `DOC_FIELD_EMR_INTENT` table.	Required
DOCUMENT_SOURCE_KEY	Specifies the document type key from `DOCUMENT_SOURCE` table. It should match the values of the `SOURCE_KEY` column in the `DOCUMENT_SOURCE` table.	Required
DISPLAY_FLAG	Flag that controls if the field is displayed when the document is displayed. This display of metadata is in a small table above the document when an individual document is shown in the EMERSE UI, when a user drills down to view a complete document. (`1` = yes, display; `0` = no, do not display). Note also that for items such as the text of the document, that text will already be displayed and thus it should not be displayed here as a metadata element.	Required
SUMMARY_DISPLAY_FLAG	Flag that controls if the field is displayed in the search results summary page, which would show up as a metadata coulumn in the Summary results table. (`1` = yes, display; `0` = no, do not display). Note that the `case_date` is a required field for each data source (listed in the `SOLR_FIELD_NAME` column) and it must have a `SUMMARY_DISPLAY_FLAG` set to `1`, otherwise the browser will report a "no source column in index" error when using EMERSE, since that field used as the default sorting column. Note also that for items such as the text of the document, it should not be displayed here as a metadata element since the Summaries section would only show the text snippets of search result 'hits'.	Required

SOLR_FIELD_NAME

Name that corresponds with the Solr document field . The names of the fields are specified in Solr schema.xml file, and the names in this column must match what is listed in the schema.xml file. This is distinct from the Solr source name, which is defined in the document_source_key column in this table, described below.

Required

DATATYPE

Mainly used by the UI. Should be either Text or Date (case-sensitive)

Required

DISPLAY_ORDER

Order in which fields need to appear in the search results, either in the Summaries section of the UI or in a small box above a single displayed document. This should be unique among rows for each source but note that some elements (such as the text of the document itself) would not actually be displayed as a metadata element.

Required

DISPLAY_NAME

Name that appears in the UI

Required

EMR_INTENT

Specifies the purpose of the field. This refers to the values of the NAME column in the DOC_FIELD_EMR_INTENT table.

Required

DOCUMENT_SOURCE_KEY

Specifies the document type key from DOCUMENT_SOURCE table. It should match the values of the SOURCE_KEY column in the DOCUMENT_SOURCE table.

Required

DISPLAY_FLAG

Flag that controls if the field is displayed when the document is displayed. This display of metadata is in a small table above the document when an individual document is shown in the EMERSE UI, when a user drills down to view a complete document. (1 = yes, display; 0 = no, do not display). Note also that for items such as the text of the document, that text will already be displayed and thus it should not be displayed here as a metadata element.

Required

SUMMARY_DISPLAY_FLAG

Flag that controls if the field is displayed in the search results summary page, which would show up as a metadata coulumn in the Summary results table. (1 = yes, display; 0 = no, do not display). Note that the case_date is a required field for each data source (listed in the SOLR_FIELD_NAME column) and it must have a SUMMARY_DISPLAY_FLAG set to 1, otherwise the browser will report a "no source column in index" error when using EMERSE, since that field used as the default sorting column. Note also that for items such as the text of the document, it should not be displayed here as a metadata element since the Summaries section would only show the text snippets of search result 'hits'.

Required

DOCUMENT_FIELDS Table Example:

Shown below is an example document_fields table for three different document sources:

SOLR_FIELD_NAME	DATATYPE	DISPLAY_ORDER	DISPLAY_NAME	EMR_INTENT	DOCUMENT_SOURCE_KEY	DISPLAY_FLAG	SUMMARY_DISPLAY_FLAG
MRN	Text	0	MRN	MRN	epic	0	0
RPT_TEXT	Text	1	Report Text	RPT_TEXT	epic	0	0
RPT_TEXT_NOIC	Text	2	Report Text	RPT_TEXT_NOIC	epic	0	0
ID	Text	3	Report ID	RPT_ID	epic	1	1
LAST_UPDATED	Date	4	Last Updated	LAST_UPDATED	epic	1	0
CASE_DATE	Date	5	Case Date	CLINICAL_DATE	epic	1	1
MRN	Text	0	MRN	MRN	path	0	0
RPT_TEXT	Text	1	Report Text	RPT_TEXT	path	0	0
RPT_TEXT_NOIC	Text	2	Report Text	RPT_TEXT_NOIC	path	0	0
ID	Text	3	Report Id	RPT_ID	path	1	1
LAST_UPDATED	Date	4	Last Updated	LAST_UPDATED	path	1	1
DR_NUM	Text	5	Doctor Num	TEXT	path	1	1
CASE_DATE	Date	6	Collection Date	CLINICAL_DATE	path	1	0
MRN	Text	0	MRN	MRN	rad	0	0
RPT_TEXT	Text	1	Report Text	RPT_TEXT	rad	0	0
RPT_TEXT_NOIC	Text	2	Report Text	RPT_TEXT_NOIC	rad	0	0
ID	Text	3	Report ID	RPT_ID	rad	1	1
LAST_UPDATED	Date	4	Last Updated	LAST_UPDATED	rad	1	0
SVC_CD	Text	5	Service Code	TEXT	rad	1	0
DR_NUM	Text	6	Doctor Num	TEXT	rad	1	0
CASE_DATE	Date	7	Report Date	CLINICAL_DATE	rad	1	1

SOLR_FIELD_NAME

DATATYPE

DISPLAY_ORDER

DISPLAY_NAME

EMR_INTENT

DOCUMENT_SOURCE_KEY

DISPLAY_FLAG

SUMMARY_DISPLAY_FLAG

MRN

Text

MRN

epic

RPT_TEXT

Text

Report Text

RPT_TEXT

epic

RPT_TEXT_NOIC

Text

Report Text

RPT_TEXT_NOIC

epic

Text

Report ID

RPT_ID

epic

LAST_UPDATED

Date

Last Updated

LAST_UPDATED

epic

CASE_DATE

Date

Case Date

CLINICAL_DATE

epic

MRN

Text

MRN

path

RPT_TEXT

Text

Report Text

RPT_TEXT

path

RPT_TEXT_NOIC

Text

Report Text

RPT_TEXT_NOIC

path

Text

Report Id

RPT_ID

path

LAST_UPDATED

Date

Last Updated

LAST_UPDATED

path

DR_NUM

Text

Doctor Num

TEXT

path

CASE_DATE

Date

Collection Date

CLINICAL_DATE

path

MRN

Text

MRN

rad

RPT_TEXT

Text

Report Text

RPT_TEXT

rad

RPT_TEXT_NOIC

Text

Report Text

RPT_TEXT_NOIC

rad

Text

Report ID

RPT_ID

rad

LAST_UPDATED

Date

Last Updated

LAST_UPDATED

rad

SVC_CD

Text

Service Code

TEXT

rad

DR_NUM

Text

Doctor Num

TEXT

rad

CASE_DATE

Date

Report Date

CLINICAL_DATE

rad

SOLR_INDEX table

Table: SOLR_INDEX
Population: Likely once at system setup, but the dates may get updated with every indexing.
Population Frequency: Variable, but usually automated.

EMERSE previously used this table to locate Solr/Lucene indexes that were available, as several indexes (shards) were created to improve performance. However, we no longer use multiple indexes. For most users running EMERSE on a single server, having one row in this table pointing to a single Solr/Lucene index yields adequate performance for 1-2TB indexes with 100’s of millions of documents. Thus, only one row would be setup in this table. After indexing, EMERSE will automatically update the START_DATETIME and END_DATETIME fields with the latest date range of the indexed documents. The Start and End dates in this table are used within the EMERSE UI to display the date range of the documents. The automatic updating of the start and end dates can be overriden using two parameters (see batch.updateIndexMinDateFromSolrIndex and batch.updateIndexMaxDateFromSolrIndex in the 'Batch Updating Begin/End Dates' section of the Configuration Guide).

If the need arose to break the index into smaller pieces for performance gains, we would recommend using Solr Cloud.

Column name	Description	Required or Optional
ID	The Lucene name of the index	Required
START_DATETIME	Start date of clinical documents in this shard	Required
END_DATETIME	End date of clinical documents in this shard	Required
PATIENT_COUNT	Total distinct MRN’s found in the solr index. Presented in the count in "All Patient" patient list. Updated periodically by the application as a background task.	No

Column name

Description

Required or Optional

The Lucene name of the index

Required

START_DATETIME

Start date of clinical documents in this shard

Required

END_DATETIME

End date of clinical documents in this shard

Required

PATIENT_COUNT

Total distinct MRN’s found in the solr index. Presented in the count in "All Patient" patient list. Updated periodically by the application as a background task.

SOLR_INDEX Table Example:

ID	START_DATETIME	END_DATETIME	PATIENT_COUNT
unified	01.02.2008 00:00:00	31.12.2099 00:00:00	1223829

START_DATETIME

END_DATETIME

PATIENT_COUNT

unified

01.02.2008 00:00:00

31.12.2099 00:00:00

1223829