Data & Informatics Portals
The CBTTC utilizes a variety of web-based applications to manage the collection of specimen, clinical and genomic data. Much like an application on a smartphone allows users to connect with programs and information, the software platforms utilized by the CBTTC allow researchers to view and analyze data which is most relevant to their research efforts. These applications, called Application Programming Interfaces (APIs), allow for large amounts of data to be encrypted and shared across applications and networks. The encryption process ensures that subject identification and personal information is protected at all times. These APIs allow researchers to access data and collaborate in real time from anywhere in the world. All of the software used in the CBTTC is open-source (with the exception of the commercial Laboratory Information Management System used at CHOP) and is available on GitHub. For additional information about the informatics architecture of the CBTTC, click here.
Cavatica is a cloud-based portal environment developed by the CBTTC to securely store, share and analyze large volumes of pediatric brain tumor genomics data and facilitate collaboration in translational tumor research. Named for the spider in the popular children’s story Charlotte’s Web, Cavatica allows researchers and investigators to access and share a network of data, pipelines, algorithms, visualizations, and hypotheses’ about specific types of tumors. This online eco-system acts as a hub for platforms including:
Harvest – Biorepository & specimen query tool
PedcBioPortal – Cancer visualization application
Data storage – Amazon’s Simple Storage Service (S3 buckets)
Data processing – Seven Bridges Genomics
Cavatica helps to solve the challenges faced by researchers working with “big data” by allowing large sets of information to be stored in a cloud-environment. Various controls and protocols allow users to select whether to share projects with private groups or within a common working space available to the entire scientific community. The platform is constantly evolving and improving to keep pace with the needs of cancer researchers as technology allows for new scientific breakthroughs. Cavatica will include data from a number of sources including the Children’s Brain Tumor Tissue Consortium (CBTTC), Pacific Neuro-oncology Consortium (PNOC), Stand Up to Cancer, TARGET (Therapeutically Applicable Research to Generate Effective Treatments), and TCGA (The Cancer Genome Atlas).
Scalable storage space within the cloud eliminates the need to download or large volumes of data, allowing researchers to view and interact with only the data they need in real time. This new framework will allow for the continued collaboration of data scientists, statisticians, data engineers, programmers, application developers, bioinformaticians, and scientists/post-doc/PI. Following its launch in October 2016, Cavatica will become the largest clinically annotated pediatric cancer database on earth.
Harvest is an open source data and specimen inventory tool which allows researchers to search and sort the CBTTC’s specimen data by field, including demographics, medical history, diagnosis detail, specimen inventory and genomic data availability. The CBTTC utilizes a custom-version of the Harvest platform, which was developed by the Department of Biomedical and Health Informatics at Children’s Hospital of Philadelphia.
Harvest users can build or select their own cohorts, which allows them to search within the biorepository to identify all of the samples with certain characteristics. After a search is performed, the generated results are displayed in a detailed report of the subject, their diagnoses and the specimen’s availability. Users can also browse a individual subject’s clinical data, including clinician notes, diagnosis information and pathology reports. Additionally, the Harvest platform links this data to other genomic analysis platforms including PedcBioPortal.
On the back end, Harvest Stack is an open source, BSD-licensed toolkit for building web applications for integrating, discovering, and reporting data. To access the Harvest platform, follow the link at https://eig.research.chop.edu/cbttc.
The Biorepository Portal and Honest Broker: Operational Technology
A foundational mission of the CBTTC is to protect subject, participant and family privacy. Subjects’ privacy is assured through the use of an electronic Honest Broker (eHB), a platform which de-identifies subject information while regulating how the data is distributed. The Biorepository Toolkit Project, another open source software development project, provides a single unified interface for data collection and analysis.
All subject identifiers are kept in a secure and encrypted database which is only accessible to the operations team and the honest broker software. This information cannot be retrieved unless a request by a credentialed user of the subject’s home consortium site is made. All information complies with HIPAA policies and personal identification disclosure guidelines. Once the data is de-identified, it is accessible through the Biorepository Portal (BRP) to allow sites around the globe to view and analyze subject data or request additional specimens.
CBTTC Data Specimen and Inventory Portal
The CBTTC Data Specimen and Inventory portal allows researchers to query and request de-identified data, clinical information and or specimens for research purposes. The user friendly platform is accessible via the web 24 hours a day 7 days a week. Functionalities include reports and graphics for data usability. Designed specifically for Biomedical Data and build on the open-source Harvest platform, the tool was developed and is managed by the Department of Biomedical and Health Informatics (DBHi) at The Children’s Hospital of Philadelphia.
The PedCBioPortal for Cancer Genomics provides visualization, analytics, and download of large-scale cancer genomics data sets to assist researchers in understanding the molecular mechanisms of cancer and defining actionable targets.
The PedCbioPortal is an open-access resource that supports visualizations and analytics on multidimensional cancer genomics datasets. It builds upon the original cBioPortal, developed at Memorial Sloan Kettering that was paramount in allowing investigators and researchers to rapidly explore the TCGA with detailed plots and summary statistics. The PedCbioPortal houses the TCGA and many large adult cancer studies, but has a focus on bringing in high-quality pediatric cancer datasets. In addition, it has novel visualizations and links to other applications that heavily support team based translational cancer research and abnegate data silos. Overall, this framework fits uniquely within the CBTTC applications ecosystem to empower researcher to translate genomics data into biological insights and clinical applications.
About The CBTTC Informatics Infrastructure
CHOP CBTTC Informatics
The CBTTC Biorepository Portal (BRP), developed by the Enterprise Informatics Group (EiG) of CHOP’s Department of Biomedical and Health Informatics facilitates secure data capture, integration, and specimen management while maintain participant privacy. The Data and Specimen Inventory tool used to query and request de-identified data/specimens and promote quality control and scientific discovery is built on the Harvest platform. Other tools used and integrated into the CBTTC informatics infrastructure include the Electronic Honest Broker (eHB), REDCap data capture, and Nautilus laboratory information management system (LIMS).
CHOP Center for Biomedical Informatics Enterprise Informatics Group (EiG)
The Enterprise Informatics Group (EiG) at DBHi works to improve data quality and integrity, facilitate the re-use of research data and materials, and promote better research by supporting data and materials discovery. The EiG also unifies, standardizes, and enables inter-operable research data across institutions. DBHI’s commitment to data transparency promotes efficiencies in resource utilization and minimizes compliance and security risks.
CBTTC Data Quality Assurance
Because data is always flowing into the CBTTC, quality control must also be constant. The CBTTC’s data quality is maintained through an online portal that research coordinators use to track and solve data quality issues such as entry errors, omissions, formatting errors, and incongruous data across clinical and specimen datasets. On a nightly basis, the entire dataset is checked and workloads are made available for research coordinators to view and solve. This system also allows us to track quality and overall metrics over time to ensure our accuracy is maintained as the CBTTC expands.
CBTTC Software Toolkit
The Biorepository Portal (BRP): The informatics platform for research staff to seamlessly and asynchronously enter data across multiple systems. Note, this tool is only available to research staff. If you would like more information, please see http://www.brptoolkit.com/ for information about this open-source project.
Harvest: A framework created by DBHi for building highly interactive data-intensive biomedical applications, enabling real-time query and data reporting including custom data sets for exporting. The EiG and DBHi built a specific harvest application to serve the query and specimen request needs of the CBTTC.
Electronic Honest Broker: A secure, non-user facing software service that works behind the scenes to connect the BRP and other laboratory, clinical and genomics software tools. This tool securely provides a solution to the onerous process of protecting participant privacy while maintaining highly complex specimen and clinical data. More information can be found in a paper published in the August 2016 Special Issue of BMC Genomics.
RedCap: Data capture and management tool developed by Vanderbilt University
Nautilus LIMS: Laboratory Information Management System for clinical and translational research, developed in partnership with ThermoFisher Scientific