Overview

EMERSE uses Solr to search and retrieve documents, and all documents and document metadata are stored in the file structure of the Solr indexes. All other data are stored and retreived from a relational database. This includes patients (names, demogreaphics), patient lists, Term bundles, auditing and user information. This document highlights a few areas within the data model, but a full data dictionary is available if more detail is desired.

The data for the documents and document indices are not stored within this database. Instead these are managed by Solr in its own data store and can be on a separate server from the Oracle database.

Patient Demographics

PATIENT table

Table: PATIENT
Population: From external source (such as EHR)
Population Frequency: Can be variable, but once per day is reasonable

The EMERSE schema includes a patient table with medical record number (MRN), name, date of birth, and other demographic information which is displayed in the search results. Data in this table is used to display the patient name, validate user-entered or uploaded MRNs and to calculate current ages of the patients. Other demographic dta are used to summarize the populations found in a search.

Although the coded demographics information is not required, some features such as the demographics breakdowns within the all patient search feature will not work if sex, race, ethnicity are not populated.
For all documents indexed, there must be a corresponding patient in the patient table with a medical record number (MRN) that matches the document. This should be taken into consideration when determining the frequency for updating this table.
Column name Description Required or Optional

id

Primary Key

Required

external_id

Medical Record Number

Required

first_name

First Name

Required

middle_name

Middle Name

Optional

last_name

Last Name

Required

birth_date

Birth Date — used to calculate current age

Required

sex_cd

Sex

Optional

language_cd

Language

Optional

race_cd

Race

Optional

marital_status_cd

Marital Status

Optional

religion_cd

Religion

Optional

zip_cd

ZIP code

Optional

create_date

Date the row was created. Can be used to track changes to the table.

Optional

update_date

Date the row was updated. Can be used to track changes to the table.

Optional

deleted_flag

Logical delete flag. Useful for merged patients. Valid values are 1 = yes, deleted; 0 = no, not deleted

Required

deceased_flag

Currently not used. Valid values are: 1 = yes, deceased; or 0 = no, not deceased

Required

Many of the columns in the patient table use codes for their values- sex, race, ethnicity, etc. Although these values are not constrained by the database, the UI can display descriptions for them in the patient demographics areas, and are used in the bar charts that breakdown sex, race, gender in the "All Patient Search" feature. The lookup tables for these codes are

  • LKP_PATIENT_RACE

  • LKP_PATIENT_GENDER

  • LKP_PATIENT_SEX

  • LKP_PATIENT_MARITAL_STATUS

  • LKP_PATIENT_RELIGION

  • LKP_PATIENT_ETHNICITY

These tables all have the same structure:

Column Description

DESCRIPTION

The description that is shown in the User Interface

CODE

The coded value in the patient table

Research Studies and Attestation

Immediately after each login, every user is required to ‘attest’ to their use of EMERSE for that session by specifying their reason for using the system. This is called the ‘Attestation’ page, and the results are stored in the SESSION_ATTESTATION table. EMERSE provides three options (configurable by a system administrator) for this attestation: (1) a free text box, (2) ‘Quick Buttons’ for choosing pre-selected options that are commonl used (for example, “Quality Improvement”, “Patient Care”, “Infection Control”, etc), and (3) a table of research studies to which a user is associated. Additional tables RESEARCH_STUDY_ATTESTATION, and ATTESTION_OTHER may contain additional information depending on whether the user specifies a research study, or other reasons. The free text option would be used by users when no other attestation choices are reasonable. Additionally, previously used entries from the free text box will appear in the table, along with any IRB-approved studies, for the user’s convenience.

For our implementation at Michigan Medicine, we pull data on all studies in the IRB system, even if the study/person is not currently a part of EMERSE. This is because the dataset is generally small, and it makes it easier for users to validate their studies if the data are already populated, once the user is given an EMERSE account
ER diagram - Attestation
Figure 1. Entity relationship diagram of some tables related to capturing attestion data which occurs immediately after a user logs in.

RESEARCH_STUDY table

Table: RESEARCH_STUDY
Population: Populated from external source such as an electronic IRB system
Population Frequency: Can be variable, but once per day is reasonable

If a user is required to select his/her study from the table, then delays in moving IRB data to EMERSE after IRB approval can result in delays access for that user.

This table contains information about research studies. Using this table and RESEARCH_STUDY_MEMBER allows EMERSE to show a list of studies the end user is associated with.

Column name Description Required or Optional

id

Primary Key

Required

external_id

IRB study number — used to link specific studies to usage, and is very helpful for tracking research usage

Required

study_name

Name of the study

Required

principal_investigator_name

Name of the principal investigator

Required

prin_invest_org_id

id of principal investigator. Not currently used by EMERSE. This could be a user id, or email, but it is a good idea to ensure it is unique.

Optional

expiration_date

Expiration date of study. Used to determine if a user should be allowed to proceed. If the expiration date is older than the current date, the user will not be able to select it in the attestion GUI.

Required

project_status

Current project status. This is used to track where a study is in the review and approval process. Only certain study statuses allow access to EMERSE for research. The statuses that allow a study to be selected during attestation are defined in the VALID_RES_STUDY_STATUS table.

Required

last_updated

A last updated date is not used by EMERSE, but can be useful for troubleshooting and tracking changes to the table.

Optional

begin_date

This originally referred to the date the study began or should be allowed to begin. This field that can be used for tracking and troubleshooting.

Optional

VALID_RES_STUDY_STATUS table

Table: VALID_RES_STUDY_STATUS
Population: By System Admin. Only needed if research studies need to be validated
Population Frequency: May only need to be done once, at the time of system setup. May need periodic updates if the source data (such as from IRB system) defining study status is changed.

EMERSE contains a simple table defining study statuses. The statuses that are initially populated in the system (loaded up in the build script) are unique to Michigan Medicine (that is, they were developed locally and are implemented in our separate electronic IRB tracking system) and other implementations would have to have their own set of valid statuses if these were to be used to validate and approve usage for research. If the status of a research study is not in this table, EMERSE will not allow the study to be used for attestation; that is, the study would not even be displayed to the user to select.

Column name Description Required or Optional

status

A list of study statuses that EMERSE considers valid in terms of allowing a user to proceed. These statuses are generally defined by the IRB and are universal across studies.

Required
(if research studies need to be validated before presenting them to the user)

VALID_RES_STUDY_STATUS Table Example:

Status

Exempt Approved - Inital

Approved

Not Regulated

Exempt Approved - Tranistional

RESEARCH_STUDY_MEMBER table

Table: RESEARCH_STUDY_MEMBER
Population: Populated from external source such as an electronic IRB system
Population Frequency: Can be variable, but once per day is reasonable

This table contains information about study team members, and is related to the RESEARCH_STUDY table, described above. Each study can have one or many study team members.

This table at Michigan Medicine contains information on all study team members for all studies, whether they have an EMERSE account or not.
Column name Description Required or Optional

RESEARCH_STUDY_ID

Foreign key reference to row id in RESEARCH_STUDY table

Required

USER_ID

Foreign key reference to row in LOGIN_ACCOUNT table

Required

ROLE_NAME

A string describing a person’s role on the study team. EG. “PI”, “Staff”, “Study Coordinator”. This can be useful when generating usage reports.

Optional

FIRST_NAME

First name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.

Optional

LAST_NAME

Last name of the username who is on the study. It is currently populated from the source IRB system, but it is not used at all by EMERSE. Nevertheless, it may be useful when generating reports.

Optional

BEGIN_DATE

This is not currently used by EMERSE.

Optional

LAST_UPDATED

Date row was last updated

Optional

DELETED

Flag to indicate if the record has been logically deleted. 0 = false, not deleted; 1 = true, deleted.

Required

SESSION_ATTESTATION table

Table: SESSION_ATTESTATION
Population: Used internally by EMERSE
Population Frequency: In real time by EMERSE

Each time a user attests to why they are using EMERSE, a row is inserted into this table, which is one of the audit tables. Attestations related to research can be joined to the RESEARCH_ATTESTION table. Non research uses can be joined to ATTESTION_OTHER.

Column name Description Required or Optional

id

Primary Key

N/A (populated internally by EMERSE)

type

A string indicating the top level category of attestation. RSA indicates session is used for research. OTH means other usage. Research attestations will have an associated row in RESEARCH_ATTESTATION. If the type is OTH, a row will also exist in OTHER_ATTESTATION_REASON when the system populates that table at the time of a user attestation after login.

N/A (populated internally by EMERSE)

User_session_id

A foreign key reference to the USER_SESSION table

N/A (populated internally by EMERSE)

OTHER_ATTESTATION_REASON table

Table: OTHER_ATTESTATION_REASON
Population: By System Admin. Only needed if commonly used text reasons are needed as quick buttons in the application
Population Frequency: May only need to be done once, at the time of system setup.

For non-research attestations, there is a lookup table called OTHER_ATTESTATION_REASON that lists available options. These can be configured by each institution, and may include commonly used access reasons that don’t involve research (such as quality improvement, patient care, etc). These options (other than the Free text reason) can be used to populate “quick buttons” that provide a simple way for a user to click on one of the common reasons for use.

Column name Description Required or Optional

USER_KEY

Text based primary key of this table. The column name might better be thought of as as 'reason key'.

Required

DESCRIPTION

The text description that will be displayed in the Quick Buttons section of the Attestation page.

Required

DELETED FLAG

Has this reason been deleted? (0 = no; 1= yes)

Required

DISPLAY_ORDER

Order of display in the UI. Can be any integer, but should be unique per row. The buttons are ordered by this column via sql sort. Generally start with 0,1,2, etc.

Optional

OTHER_ATTESTATION_REASON Table Example:

USER_KEY DESCRIPTION DELETED_FLAG DISPLAY_ORDER

QI

Quality Improvement

0

0

RVPREPRES

Review Preparatory to Research

0

1

STDYDESC

Study involving only decedents (deceased patients)

0

2

ATTESTATION_OTHER table

Table: ATTESTATION_OTHER
Population: Used internally by EMERSE
Population Frequency: Application dependent

The free text reasons that users enter are stored in a table called ATTESTATION_OTHER. This is populated by EMERSE and is not customizable by users.

Column name Description Required or Optional

SESSION_ATTESTATION_ID

A unique ID for the session attestation. Used for audit logging.

Required

FREE_TEXT_REASON

The free text reason that a user entered.

Required

OTHER_ATTEST_REASON_KEY

This will currently only be populated by the system with FRETXT.

Required

ATTESTATION_OTHER Table Example:

SESSION_ATTESTATION_ID FREE_TEXT_REASON OTHER_ATTEST_REASON_KEY

50208

Testing out the system

FRETXT

52060

Testing out the system

FRETXT

46051

Looking up a patient in clinic

FRETXT

71052

infection control monitoring

FRETXT

74107

cancer registry operational work

FRETXT

Clinical Documents

EMERSE search is enabled by the indexing of clinical text documents by Apache Solr. Documents in a clinical environment can come from a myriad of sources like transcription, Radiology, and Pathology, or from an electronic health record. Normally the structure, data, and metadata related to these documents from different sources varies considerably. To simplify things, we configure Solr with a single document schema containing a congolmerate of all fields from all sources. Common data elements, such as patient MRN, clinical date, and source primary key are used across many sources thus are mapped to the same Solr schema field. Search results for each source are displayed in a separate tab in the UI.

ER diagram - Documents
Figure 2. Entity relationship diagram of some tables related to clinical documents.

DOCUMENT_INDEX table

Table: DOCUMENT_INDEX
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

Each source of documents (e.g., pathology, radiology, commercial EHR, legacy EHR, etc.) is listed as a row in the document_index table. The EMERSE application searches and displays the results based on document source. Additionally, Advanced Search Queries within EMERSE can leverage these source data to limit queries to a specific source (e.g., searching only pathology reports). Document sources normally differ in their format and metadata depending on the source of origin. Each row in this table corresponds to a column in the “Overview” display within EMERSE, and as a subset of documents when a patient is selected.

Column name Description Required or Optional

lucene_name

Name of the document source. This field needs to be unique and search results are displayed on separate tabs for each source. However, this name is not displayed to the user but instead is used to match the name of the document source as defined in the Solr configuration, in the schema.xml file. In other words, the lucene_name should be the same names of the sources that are defined in schema.xml.

Required

user_description

Description for the source of document. This field is used only internally and can be useful for system admins who set up EMERSE to provide a a description of the lucene_name for easier recognition. This is not displayed to users in the UI.

Required

compound_key_flag

This flag is present for historic reasons and will be deprecated in future releases. Note: Lucene indexing requires that each document has a unique identifier. So, when indexing, the fields that uniquely identify a document need to be concatenated and indexed as RPT_ID. For example, when we indexed Radiology documents we used a combination of document Id and exam description to uniquely identify documents. These fields are concatenated using ‘|’ and then indexed as RPT_ID. Since it is no longer used, just add 0 for this field.

Required

display_name

The name displayed in the UI

Required

display_prefix

Prefix used by internally by CSS components in the UI. This can be anything, but each source must have a unique display_prefix. Additionally, it should conform to typical CSS naming conventions (e.g., no spaces, no quotation marks, etc). This is not displayed to the user, it is basically just an ID for the different tabs of the source documents.

Required

display_order

Order in which sources appear in the Overview and the tabs within the Summary results page. Each row should have a distinct display order. Start sequential number with 0.

Required

DOCUMENT_INDEX Table Example:

lucene_name user_description compound_key_flag default_sort_column display_name display_prefix display_order

DMI

Central transcription document

0

Case Date

CareWeb

dmi

0

Radiology

Radiology Documents

0

Report Date

Radiology

rad

1

Pathology

Pathology Document

0

Last Updated

Pathology

path

2

DOCUMENT_FIELDS table

Table: DOCUMENT_FIELDS
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

This table provides EMERSE with information about what fields are available in the underlying Solr/Lucene index, their data type, and additional metadata. Each field indexed with Solr/Lucene should exist in this table for each source system in the document_index table. The column EMR_INTENT is linked to the name field of the doc_field_emr_intent mapping table. The column DOC_INDEX_LUC_NAME is linked to the lucene_name field of the document_index table.

Each document source should contain at least six rows (see ‘EMR Intent’ and the doc_field_emr_intent table below for the six required types). One for each type as defined in document fields table. Additional fields can be specified using the generic EMR_INTENT options of TEXT or DATE. These additional/optional metadata fields are used by EMERSE for display in the UI but are not used by Solr/Lucene. Also, note that indexing of these fields depends on the Solr configuration which must define these fields for indexing.

Column name Description Required or Optional

LUCENE_NAME

Name that corresponds with the Solr document field . The names of the fields are specified in schema.xml file

Required

DATATYPE

Mainly used by the UI Should be either “Text” or “Date” (case-sensitive)

Required

DISPLAY_ORDER

Order in which fields need to appear in the search results

Required

DISPLAY_NAME

Name that appears in the UI

Required

EMR_INTENT

Specifies the intent of the field. This refers to the fields defined in the doc_field_emr_intent table.

Required

DOC_INDEX_LUC_NAME

Specifies the document type key from document_index table. It should match the lucene_name in the document_index table.

Required

DISPLAY_FLAG

Flag that controls if the field is displayed when document is displayed. This display of metadata is in a small table above the document when an individual document is shown in the EMERSE UI. (1 = yes, display; 0 = no, do not display)

Required

SUMMARY_DISPLAY_FLAG

Flag that controls if the field is displayed in search results summary page, which would show up as a metadata coulumn in the Summary results table. (1 = yes, display; 0 = no, do not display)

Required

DOCUMENT_FIELDS Table Example:

Shown below is an example document_fields table for three different document sources:

LUCENE_NAME DATATYPE DISPLAY_ORDER DISPLAY_NAME EMR_INTENT DOC_INDEX_LUC_NAME DISPLAY_FLAG SUMMARY_DISPLAY_FLAG

MRN

Text

0

MRN

MRN

DMI

0

0

RPT_TEXT

Text

1

Report Text

RPT_TEXT

DMI

0

0

RPT_TEXT_NOIC

Text

2

Report Text

RPT_TEXT_NOIC

DMI

0

0

ID

Text

3

Report ID

RPT_ID

DMI

1

1

LAST_UPDATED

Date

4

Last Updated

LAST_UPDATED

DMI

1

0

CASE_DATE

Date

5

Case Date

CLINICAL_DATE

DMI

1

1

MRN

Text

0

MRN

MRN

PATHOLOGY

0

0

RPT_TEXT

Text

1

Report Text

RPT_TEXT

PATHOLOGY

0

0

RPT_TEXT_NOIC

Text

2

Report Text

RPT_TEXT_NOIC

PATHOLOGY

0

0

ID

Text

3

Report Id

RPT_ID

PATHOLOGY

1

1

LAST_UPDATED

Date

4

Last Updated

LAST_UPDATED

PATHOLOGY

1

1

DR_NUM

Text

5

Doctor Num

TEXT

PATHOLOGY

1

1

COLLECTION_DATE

Date

6

Collection Date

CLINICAL_DATE

PATHOLOGY

1

0

MRN

Text

0

MRN

MRN

RADIOLOGY

0

0

RPT_TEXT

Text

1

Report Text

RPT_TEXT

RADIOLOGY

0

0

RPT_TEXT_NOIC

Text

2

Report Text

RPT_TEXT_NOIC

RADIOLOGY

0

0

ID

Text

3

Report ID

RPT_ID

RADIOLOGY

1

1

LAST_UPDATED

Date

4

Last Updated

LAST_UPDATED

RADIOLOGY

1

0

SVC_CD

Text

5

Service Code

TEXT

RADIOLOGY

1

0

DR_NUM

Text

6

Doctor Num

TEXT

RADIOLOGY

1

0

RPT_DATE

Date

7

Report Date

CLINICAL_DATE

RADIOLOGY

1

1

DOC_FIELD_EMR_INTENT table

Table: DOC_FIELD_EMR_INTENT
Population: Likely once at system setup
Population Frequency: May need updating as data sources change.

This is a lookup table for the column EMR_INTENT in the previously defined document_fields table. This table does not normally need to be edited. It is used by the system to help map various sources and types of data to the intended uses of those data by the system. The values contained in the name field of this table are listed below.

DOC_FIELD_EMR_INTENT is used internally by EMERSE. The first 6 items are required for the Solr/Lucene indexer to work, the next two are optional, and the final one is no longer used.
Column name Description DEFAULT_LUCENE_NAME Required or Optional

MRN

Patient medical record number, which is a unique patient identifier

MRN

Required

RPT_ID

Unique document identifier. This must be unique across all documents and sources

ID

Required

CLINICAL_DATE

Date when the clinical event occurred. Often this would be considered the “note date”

ENCOUNTER_DATE

Required

LAST_UPDATED

Date when the document was last updated, since changes are sometimes made to documents

LAST_UPDATED

Required

RPT_TEXT

The actual text of the clinical document. This field is used by Lucene for lower-case indexing (case-insensitive searching).

RPT_TEXT

Required

RPT_TEXT_NOIC

A copy of the document text to be indexed using a case-sensitive Lucene filter (NOIC = NO Ignore Case)

RPT_TEXT_NOIC

Required

TEXT

Any generic text field. Note that a document may have multiple of these types of generic text fields (e.g., clinical service, document type, clinician name, etc). This is useful when additional metadata are associated with the document and should be displayed. If this field is also defined in the Solr configuration it can become searchable. Otherwise, it could still potentially be used to help filter queries based on additional metadata (e.g., 'study type').

Optional

DATE

Any generic date field, since a document may have more than one kind of date associated with it. Otherwise, it could still potentially be used to help filter queries based on additional metadata

Optional

ENCOUNTER_ID

This is no longer used. It had been used for a time to search across all patients without limiting it to a set of medical record numbers.

No longer used

LUCENE_SHARDS table

Table: LUCENE_SHARDS
Population: Likely once at system setup, but the dates may get updated with every indexing.
Population Frequency: Variable, but usually automated.

EMERSE previously used this table to locate Solr/Lucene indexes that were available, as several indexes (shards) were created to improve performance. However, we no longer user multiple indexes. For most users running EMERSE on a single server, having one row in this table pointing to a single Solr/Lucene index yields adequate performance for 1-2TB indexes with 100’s of millions of documents. Thus, only one index needs to be defined here. After indexing, EMERSE will automatically update the START_DATETIME and END_DATETIME fields with the latest date range of the indexed documents. The Start and End dates in this table are used within the EMERSE UI to display the date range of the documents. The automatic updating of the start and end dates can be overriden using two parameters (see batch.updateIndexMinDateFromSolrIndex and batch.updateIndexMaxDateFromSolrIndex in the 'Batch Updating' section of the Configuration Guide).

If the need arose to break the index into smaller pieces for performance gains, we would recommend using Solr Cloud instead.
Column name Description Required or Optional

ID

The Lucene name of the index

Required

PARENT_DOC_INDEX

Specifies the document type key from document_index table. This used to refer to a specific shard.

Optional (Needed when using multiple shards)

START_DATETIME

Start date of clinical documents in this shard

Required

END_DATETIME

End date of clinical documents in this shard

Required

LUCENE_SHARDS Table Example:

ID PARENT_DOC_INDEX START_DATETIME END_DATETIME

unified

(null)

01.02.2008 00:00:00

31.12.2099 00:00:00