Generation of a Public Base for Evaluation of Persistence Mechanisms of Electronic Health Records Systems Based on the openEHR Foundation Specifications

Abstract: This project aims to publicly assemble and make available a database generated according to the specifications of the openEHR foundation to be used in evaluation studies of persistence mechanisms of openEHR based systems. A set of Electronic Health Records (EHR), based on the openEHR reference model, can be generated based on publicly available data in order to simulate a EHR system in production. The data consist of the APACs and AIHs made publicly available by DATASUS, without patient identification, from 2008 to 2012. Archetypes and templates were created to convert data from the datasus .dbf files into pseudo EHR, generated according to openEHR Foundation specifications.

Team: Sergio Miranda Freire, Douglas Teodoro, Mario João Junior

Approved by the ethics committee of the Pedro Ernesto University Hospital, process number CAAE 39418314.9.0000.5259


Instructions for Installing the Database

1) Install Postgresql version 8.4.13 or higher

2) Run the database creation script. The example below shows the command to run the script as user postgres. Modify the script variables LC_COLLATE and LC_CTYPE according to your location.
psql -U postgres -f orbdaCreateDB.sql

3) Run the ORBDA database table creation script:
psql -U postgres -d orbda -f orbdaCreateTables.sql

4) Load data on database. Change the script to the data set that has been downloaded:
psql -U postgres -d orbda -f importingFiles.sql

Instructions to Run the Conversion Program for openEHR

The input data should be stored in a relational database and a file with the patient ids should be provided as input.
Patient ids for the whole database can be found in two files:

For subsets of the database, the two patient ids files must be generated by extracting the corresponding  subsets of the full id files.

The database connection parameters should be set in the file.

Extract the terminology.tar.gz file in “project-dir/repository/terminologies”.


The converter syntax is the following:

java -Xmx8g -Dfile.encoding=UTF-8 -cp projectdir/bin/uber-sus-openehr-builder-1.0.1-SNAPSHOT.jar br.uerj.lampada.openehr.susbuilder.EHRGenerator --patients patientids --ehr-dir outputdir --type < ehr|version|composition > --format < json|xml > [--aih]

This program converts data from a relational database to openEHR format and requires at least 8 Gbytes of heap space.


  • –patients patientids

The patientid file to be used in the conversion.

  • –ehr-dir outputdir

The output files are located in the “ehr”, “ehrAccess”, “ehrStatus”, “contribution” and “composition” directories within the specified outputdir.

  • –format < json|xml >

Specifies the output format. It can be either json or xml files.

  • –type < ehr|version|composition >

Specifies the output type. It can be either ehr, version or composition.

  • –aih

The argument –aih is used when the data from hospitalizations is to be converted to the openEHR format. This argument is omitted if the data to be converted is from the outpatient procedures.


From the project folder, type either of the two of the following commands:

For outpatient patients:
java -Xmx8g -Dfile.encoding=UTF-8 -jar “sus-openeher-builder.jar” –type ehr –format xml –patients ./patientIds/apac_cnspcn.txt –ehr-dir ./ehr

For hospitalized patients:
java -Xmx8g -Dfile.encoding=UTF-8 -jar “sus-openeher-builder.jar” –aih –type ehr –format xml –patients ./patientIds/aih_n_aih.txt –ehr-dir ./ehr

Another option to make the conversion is to use the batch file which generates one job in parallel to each patient id file at patient_dir directory.

Syntax: -< composition|version|ehr > -< json|xml > -< aih|apac > patient_dir output_dir


./ -composition -json -aih aih_dir aih_comp_output_dir