The STRIDE Anonymous
Patient Cohort Discovery Tool - User Documentation
Download
as a .pdf | View
Demo
GETTING HELP: To get assistance
with the STRIDE Anonymous Patient Cohort Discovery Tool
you can call the IRT Help Desk at 5-8000 or send an email
to stride-beta@med.stanford.edu.
The STRIDE Clinical Data Warehouse (CDW) contains clinical and demographic information on patients cared for at Lucile Packard Children’s Hospital (LPCH) and Stanford Hospital and Clinics (SHC). The intent of aggregating this information is to support Stanford University Medical Center’s (SUMC) clinical and translational research mission. Access to patient data in the STRIDE CDW for research purposes requires Stanford IRB approval.
As of April 2007, the STRIDE CDW archive include over 71 million data elements on approximately 760,000 individual patients cared for at SUMC over the past 20 years. The CDW contains the following data, though not necessarily on all patients:
- Patient Identifiers (Name, Address, Medical Record Numbers)
- Demographics (Age, Date of Birth, Gender, Race, Ethnicity)
- ICD9-coded diagnoses (inpatient and outpatient)
- ICD9 and CPT-coded clinical procedures (inpatient and outpatient)
- Radiology reports
- Surgical Pathology reports
- Transcribed clinical documents (e.g. Discharge Summaries)
- Laboratory test results
To assist researchers in determining if the STRIDE CDW contains data needed for research studies or to help in identifying potential research patient cohorts, we have developed a computer program called the Anonymous Patient Cohort Discovery Tool. This computer program allows SUMC faculty researchers to directly search the STRIDE CDW using one or more of the following criteria:
- Demographics (Age, Gender, Race, Ethnicity)
- ICD9-coded diagnoses (inpatient and outpatient)
- ICD9 and CPT coded clinical procedures (inpatient and outpatient)
- Radiology reports
- Surgical Pathology reports
- Transcribed clinical documents (e.g. Discharge Summaries)
- Selected Laboratory test results
The STRIDE Anonymous Patient Cohort Discovery Tool determines the approximate number of patients who meet the entered criteria. It also provides basic demographic statistics on the resulting patient cohort. IT DOES NOT EXPOSE ANY DEMOGRAPHIC OR CLINICAL DATA ON INDIVIDUAL PATIENTS. Search criteria can be saved and used later as part of the STRIDE Research Chart Review process (under development), or to request clinical data sets from the STRIDE CDW for research purposes (following IRB approval).
STRIDE CDW Archive Data Used in the Beta Test
The beta test version of the STRIDE Anonymous Patient Cohort Discovery Tool searches a subset of the STRIDE CDW archive. We are still processing and parsing large amounts of clinical data for inclusion in the first service release of the STRIDE CDW, scheduled for October 2007. The beta test data consists of:
- 760,000 patients with clinical and demographic data (1994-2007)
- 3.7 million ICD-coded diagnoses (1994-2007)
- 3.2 million ICD and CPT-coded clinical procedures (1994-2007)
- 500,000 radiology reports (2004-2007)
- 79,000 pathology reports (2005-2007)
- 1 million other transcribed documents (2005-2007)
- 15.5 million laboratory test values (2005-2007)
One consequence of the difference in time range for coded diagnosis/procedure data and document/laboratory data used in the beta test is that searches that combine these two categories of data may produce smaller than expected cohort sizes. For example, you will find that although there are 6080 patients with a diagnosis of ‘Acute Myocardial Infarction’, when you then add ‘elevated Total CPK’ as a laboratory test result, the cohort size drops to 525. This is because, while the ICD-coded diagnosis data covers 15 years, the laboratory test data, in the beta test database, covers only 2.25 years.

The same phenomenon will occur when combining coded diagnosis/procedure data and clinical document searches. For example, while 1210 patients have an ICD-coded diagnosis of Malignant melanoma of the skin, if combined with searches of pathology reports for ‘skin biopsy’ AND ‘melanoma’ the cohort size drops to 240, again (in part) because of the difference in range of years for which data is available in the beta test system.
Requirements
To use the STRIDE Anonymous Patient Cohort Discovery Tool you must be:
-
A SUMC faculty member or Senior Research Staff with Stanford Principle Investigator privileges
-
Connected to the SUMC Network (School of Medicine, LPCH or SHC)
-
Using a Macintosh computer running Mac OS X 10.4.9 or later or a Windows computer running Windows XP (with Service Pack 2) or Windows Vista.
-
Registered as a STRIDE Anonymous Patient Cohort Discovery Tool Beta Test participant at http://stride.stanford.edu/cohort-beta/
-
Have Java JRE version 1.4.2 or later installed on your computer. Instructions for checking which version of Java is installed on your computer is available at http://www.javatester.org/version.html
Using the STRIDE Anonymous Patient Cohort Discovery Tool
To run the STRIDE Anonymous Patient Cohort Discovery Tool, go to the launch site URL, that was sent to you in the STRIDE beta test registration confirmation email, read the terms of use and click on the launch button. If this is your first time using the application, you may have to wait a few minutes while the software is automatically installed on your computer. The software is self-updating. Each time it is run, it checks to see if a more recent version is available and downloads/installs any newer version to your computer.
When launched, the application will ask you to authenticate using your SUNet ID and SUNet password (Figure 1). If authentication fails, make sure that you are entering the correct SUNet ID/Password and that you have received an email from the STRIDE team confirming that you have been registered as a beta-test user. For additional assistance see the ‘Getting Help’ section at the end of this document.

Figure 1. STRIDE Anonymous Patient Cohort Discovery Tool login window
If authentication is successful, you will see the following application window (Figure 2) consisting of three main areas (designated A,B,C in the figure).

Figure 2.STRIDE STRIDE Anonymous Patient Cohort Discovery Tool Interface
Section A contains a list of STRIDE CDW data elements that you can use to construct patient cohort searches. Section B is where you construct patient cohort searches. Section C is where patient cohort demographics are displayed, if requested, by clicking on the ‘Graph Demographics’ button.
Search Elements (Section A):
This section of the application window consists of a series of data categories, containing STRIDE CDW search elements that may be used to construct patient cohort queries. Each category heading displays the number of data elements in that category in the beta test version of the STRIDE CDW and how many years of data in that category are present in the beta test version of the STRIDE CDW. For example, the Diagnosis category in figure 2 contains 3.7 million ICD-coded diagnoses dating back to 1994. Please note that, other than the statement reading “Patient Demographics (760K patients w/ clinical data)”, all of the numbers listed is Section A of the application window are STRIDE beta test CDW data element counts, not patient counts.
The following is a brief description of each data category, and how it is used when creating cohort searches.
Demographics: This data category contains Current Age, Gender, Race and Ethnicity. These criteria are NOT required to create a patient cohort search. They should ONLY be used if you require that the cohort search be constrained by these criteria. To add one of these criteria to a patient cohort search, click on its name and, while holding down the mouse button, drag it to AREA B. Alternatively double-clicking on a criteria’s name adds it to AREA B.
When you add Current Age, you will see the following:

The popup menu after the Patient Age label allows you to select ‘less than’, ‘equals’ or ‘is greater’ than the age you enter. You can also elect to use an age range, by selecting the ‘is between’ option from the popup menu. You can then enter a start and end age, thus defining a range of ages:

Note that Current Age means just that. This is the patient’s age at the time that the patient cohort search is executed, not the patients age when, for example, a particular diagnosis, procedure or laboratory test results occurred.
To execute a cohort search when using a typed text/numeric entry field, such as Patient Age, press the TAB key on your keyboard when finished setting up the condition. This causes the patient cohort search to re-execute. Note also that each search condition includes a Trash Can icon to its right. Clicking this icon removes that condition from the cohort query. Clicking the Trash Can icon in the top left corner of SECTION B removes ALL of the conditions from the current cohort search. This is not undoable, so be careful.
Patient Gender can be either ‘Male’ or ‘Female’.
Patient Ethnicity can be ‘Hispanic or Latino’ or ‘Not Hispanic or Latino’, designators used by SUMC hospitals.

Patient Race can be ‘Asian’, ‘Black’, ‘Hawaiian or Pacific Islander’, ‘Native American’ or ‘White’ (designators used by SUMC hospitals) and introduces the concept of OR’ing search criteria to create a compound condition. For example, adding ‘Patient Race’ to the cohort search and selecting ‘Native American’ from the popup menu results in:

You will now see a small green icon , displaying a plus sign, to the right of the condition. Clicking on this icon allows you to add an additional criterion for that condition, as in:

These two criteria will be logically OR’d to create a new, compound condition. In effect, you are specifying that patients must be EITHER Native American, Hawaiian OR Pacific Islander to be included in the patient cohort.
You can remove a criterion in a compound condition by clicking the small orange icon to the right of the criterion. You can add an additional criterion to the condition by again clicking the small green icon to the right of the topmost criterion.
Because conditions such as Current Age, Gender, Race, Ethnicity and Data Source apply to almost all patients, setting up a condition using these criteria does not return a patient count but instead shows ‘N/A’ where the patient count would normally appear to the right of the condition. Note that the order of criteria in a compound condition does not alter the results.
DIAGNOSIS: The International Classification of Diseases (ICD) is widely used to code patient diagnoses for reimbursement, statistical, administrative and clinical needs. SUMC uses trained coding personnel to review patient charts and abstract ICD9-CM codes following inpatient care. While neither ICD itself nor the human coding process is perfect, ICD codes are the most widely used system for capturing patient diagnoses in a standardized way. One advantage of ICD coding is that, at least theoretically, all patients that share a diagnosis are coded in the same way. ICD uses a shallow hierarchy of codes to represent a general diagnosis and more specific variants. For example Primary Pancreatic Cancer is represented in ICD as follows:
- Malignant Neoplasm of Pancreas (157)
- Malignant Neoplasm of Head of Pancreas (157.0)
- Malignant Neoplasm of Body of Pancreas (157.1)
- Malignant Neoplasm of Tail of Pancreas (157.2)
- Malignant Neoplasm of Pancreatic Duct (157.3)
- Malignant Neoplasm of Islets of Langerhans (157.4)
- Malignant Neoplasm of Other Unspecified Sites of Pancreas (157.8)
- Malignant Neoplasm of Pancreas, Part Unspecified (157.9)
The STRIDE Anonymous Patient Cohort Discovery Tool supports the use of ICD coded diagnoses as search conditions. Drag the Diagnosis criteria to Section B and you will see:

Click in the text entry area of the criteria and type either an ICD code (e.g. 157.) or an ICD term (e.g. Pancreatic Cancer) into the text field of the condition. As you type, the application will begin searching for matching ICD terms and display them in a popup menu so:

A lot is going on here behind the scenes. STRIDE will look at the text (or code) that you enter and attempt to interpret it, producing a list of suggested ICD codes that you can chose from. It supports the use of synonymy, so that entering “breast cancer” finds “breast neoplasms”. Word order and case are ignored, so that “breast cancer” and “CANCER BREAST” are equivalent. STRIDE will also attempt to display the suggested ICD codes with the most general code at the top of the list.
While searching for ICD codes, the text entry area will display a magnifying glass to the right. If unable to find a match for what is typed, no results are returned.
Click on an ICD code displayed in the popup list (or press the Enter or Return key) to select it. Selecting a general ICD code (e.g. Malignant neoplasm of the pancreas) instructs STRIDE to include patients whose disease was coded with that ICD code or ANY of its more specific (children) codes. To be precise, selecting ICD code 157 Malignant Neoplasm of the Pancreas will instruct STRIDE to also search for 157.0, 157.1, 157.2, 157.3, 157.4 etc. Unless you are sure that you only want to include patients with a very specific diagnosis (e.g. 157.3 Malignant neoplasm of the pancreatic duct), it is often a good idea to select the more general (or parent) ICD code, as this will find all child codes for the disease.
The small green ‘plus’ icon to the right of the condition means that this condition supports OR’d searches. This allows you to include multiple ICD codes in the condition. For example to search for patients who had Malignant neoplasm of the pancreatic duct OR Malignant neoplasm of the islets of Langerhans, one would use the OR feature to produce:

You can also use the OR search feature to include patients who have at least one of a number of diagnoses. For example if you are interested in patients with smoking-related cancers, one might create the following condition:

One can also AND search conditions. For example to determine how many patients have both ICD-coded diagnoses of Ulcerative Colitis and Colon Cancer, insert two separate diagnosis conditions, as follows:

This instructs STRIDE to include only patients with BOTH diagnoses. You will also note that STRIDE returns the number of patients meeting each criterion in a condition and (in the right lower corner of Section B), the number of patients in the cohort. This automatically updates as you add to or modify the search conditions.
The STRIDE Anonymous Patient Cohort Discovery Tool also allows you to search for ICD ‘E codes’ and ‘V-codes’. Though not strictly diagnoses, these codes logically fit into the general model of using disease states (including injuries, accidents, poisonings, drug adverse effects, medical and surgical misadventures) to define patient cohorts.
ICD9-CM E-codes (external causes of injury and poisoning codes) are intended to provide data for injury research and evaluation of injury prevention strategies. E codes capture how the injury or poisoning happened (cause), the intent (unintentional or accidental; or intentional, such as suicide or assault), and the place where the event occurred.
ICD-9-CM provides V-codes to deal with encounters for circumstances other than a disease or injury. The Supplementary Classification of Factors Influencing Health Status and Contact with Health Services (V01.0 - V84.8) is provided to deal with occasions when circumstances other than a disease or injury (codes 001-999) are recorded as a diagnosis or problem
A good source of ICD-9-CM information and downloadable code references is:
http://www.cdc.gov/nchs/about/otheract/icd9/abticd9.htm
A comprehensive guide to ICD-9 is available at:
http://www.cdc.gov/nchs/data/icd9/icdguide.pdf
PROCEDURES: Medical and surgical procedures performed on patients are coded using ICD and/or CPT (Current Procedural Terminology – see below) codes. In general inpatient procedures performed at SUMC are coded using ICD, while many outpatient procedures are coded using CPT. The STRIDE Anonymous Patient Cohort Discovery Tool supports integrated searching of both inpatient and outpatient ICD and CPT coded procedures, using the Procedure condition. This operates in much the same way as the Diagnosis condition, supporting ‘search as you type’, intelligent lookup of ICD and CPT codes, ability to OR procedure codes as well as AND procedure codes.

A major difference is that procedure code lookups may return a mixture of ICD and CPT procedure codes. There is no attempt to eliminate ICD and CPT codes that represent essentially the same procedure. We are working on more elegant ways to address this issue but for now it does allow searching for procedures without having to distinguish between inpatient and outpatient procedures. When using this condition you may wish to OR equivalent ICD and CPT codes together in a condition to ensure that you include patients who had a procedure performed as either an both inpatient or outpatients e.g.:

CPT (Current Procedural Terminology) codes are categorized into three groups:
Category I CPT codes describe a procedure or service identified with a five-digit CPT code (e.g. 29870) and descriptor nomenclature (Arthroscopy, knee, diagnostic, with or without synovial biopsy). The inclusion of a descriptor and its associated specific five-digit identifying code number in this category of CPT codes is generally based upon the procedure being consistent with contemporary medical practice and being performed by many physicians in clinical practice in multiple locations.
Category II CPT codes are intended to facilitate data collection by coding certain services and/or test results that are agreed upon as contributing to positive health outcomes and quality patient care. This category of CPT codes is a set of optional tracking codes for performance measurement. These codes may be services that are typically included in an Evaluation and Management (E/M) service or other component part of a service and are not appropriate for Category I CPT codes.
Category III CPT codes contains a temporary set of tracking codes for new and emerging technologies. Category III CPT codes are intended to facilitate data collection on and assessment of new services and procedures. These codes are intended to be used for data collection purposes to substantiate widespread usage or in the FDA approval process.
CLINICAL DOCUMENTS:
The STRIDE Anonymous Patient Cohort Discovery Tool allows searching inside clinical documents for words or phrases, as part of a patient cohort search. If the text entered contains more than one word, a popup menu appears that allows searching for either (a) documents in which the component words are ‘Near Each Other’, independent of word order or (b) for documents that contain the individual words anywhere ‘In the Same Document.’ In general you should choose the default ‘Near Each Other’ option rather than the ‘In Same Document’ option. You can OR multiple document searches using the small green ‘plus’ icon to the right of the condition.

As an example, if searching within ‘Other documents’ for ‘Abdominal Pain’ a document containing these two sentences:
“The patient presented with acute onset left-sided chest pain and shortness of breath. Exam revealed no abdominal abnormalities but absent breath sounds over the left lung.”
Would (appropriately) not be included if the ‘Near Each Other’ option was chosen but would be included if the ‘In Same Document’ was chosen.
Word order, case and punctuation are ignored when searching inside documents.
Combinations of document searches can be useful. For example, the following allows EXAMPLE searches for surgical pathology reports containing the phrase “skin biopsy” AND the phrase “melanoma”:

The following example searches for radiology reports for ‘abdominal CT’ and ‘aortic aneurysm’:

The utility of searching within clinical documents is limited by the absence of contextual information. A set of documents may contain the phrase ‘Myocardial Infarction’ but in could be in many different contexts (e.g. ‘a history of myocardial infarction’, ‘Father died of myocardial infarction’, ‘Rule-out myocardial infarction’, ‘Patient has no history of myocardial infarction’ etc.). In addition this search would not include the document that states ‘The patient says that he had a heart attack two months ago’. We are working on strategies to address some of these issues.
LABORATORY TEST RESULTS: The STRIDE Anonymous Patient Cohort Discovery Tool supports using selected laboratory test results as conditions in cohort searches. We have included a subset of the laboratory tests contained in the beta version of the STRIDE CDW, to evaluate the utility and functionality of using test results as criteria for including patients in cohorts.
In addition to searching by specific criteria (e.g. a value of less than or greater than an entered number), the Cohort Search Tool also allows you to use the existence (‘Exists’) of a specific laboratory test in a patient’s STRIDE CDW record to include that patient in a cohort. For example, if interested in patients with a diagnosis of Diabetic Ketoacidosis who have (at least one) Anion Gap measurement then the following condition would be used:

SAVING AND LOADING COHORT QUERIES: You can save a STRIDE Anonymous Patient Cohort Discovery Tool query to your computer, for later use, by clicking on the ‘Save Query’ icon, which is the leftmost icon of the two orange folder icons at the top right of the application window. Previously saved queries can be loaded and then executed, by clicking the ‘Open Query’ icon to its right.

COHORT STATISTICS: One a cohort has been identified using one or more conditions, you can view graphical statistics on the age, gender, Ethnicity and County distribution of the cohort by clicking on the ‘Graph Demographics’ button in SECTION C of the application window.

Protecting Patient Privacy
In addition to never revealing individual patient identifiers or data, the STRIDE Anonymous Patient Cohort Discovery Tool uses a number of strategies to prevent “triangulation” of data that might identify an individual patient. As a consequence, total cohort sizes of less than ten patients are reported as “<10 Patients” and individual categories in criteria results and demographics graphs are reported in increments of 5, with a small random “fuzzy” rounding factor added to each search results to further prevent triangulation.
TERMS OF USE
In agreeing to participate in the April-June 2007 beta test of the STRIDE Anonymous Patient Cohort Discovery Tool, participants acknowledge that:
This is a pre-release version of the system. Data quality and completeness have not yet been fully evaluated. Not all clinical data received from SUMC hospitals is included in the database searched during the beta test. As such, you should not rely on the current beta version of the system for research purposes.
This system was not designed, nor is it intended, to support any aspect of patient care and its use for any patient care purposes is prohibited.
The results returned by the beta version this system are for internal evaluation purposes only and should not be distributed outside of Stanford University Medical Center.
All searches executed within the system during the beta test are recorded and will be examined, as part of the system evaluation. The identity of the user is recorded along with information related to each search executed.
No person registered as a beta test user of the system will share his/her login information with any other person, for purposes of assessing this system. Only registered users who are SUMC faculty or Senior Research Staff, with Stanford Principle Investigator privileges, may use the system during the beta test period.
GETTING HELP: To get assistance with the STRIDE Anonymous Patient Cohort Discovery Tool you can call the IRT Help Desk at 5-8000 or send an email to stride-beta@med.stanford.edu. You can also view an on-line version of this document by clicking on the help icon in the top right corner of the application window.
|