Informatics Portals to Accelerate Discoveries
The CBTTC utilizes a variety of web-based applications to manage the collection of specimen, clinical and genomic data for accelerated discovery. Much like an application on a smartphone allows users to connect with programs and information, the software platforms utilized by the CBTTC allow researchers to view, compute, store and analyze data which is most relevant to their research efforts. All of the software used in the CBTTC is open-source (with the exception of the commercial Laboratory Information Management System used at CHOP) and is available on GitHub.
Kids First Data Resource Portal provides access to pediatric disease data previously studied in isolation and empowers discovery efforts by enabling collaborative analyses across institutions and researchers around the globe. Data from approximately 8,000 DNA and RNA samples from children affected with cancer or structural birth defects and their families will be ready for analysis with the launch of the portal and is expected to grow to more than 30,000 over the next few years including the CBTTC Pediatric Brain Tumor Atlas genomic data set and structural birth defects samples accessible through the cloud-based portal, the Kids First Data Resource Portal is one of the largest collections of childhood genomic and clinical data. The portal also provides resources for the patient, medical and research communities to learn and interact, highlighting the importance of data sharing across institutions and between disease research environments. Kids First DRC Git Hub
Cavatica is a cloud-based portal environment developed to securely store, share and analyze large volumes of pediatric brain tumor genomics data and facilitate collaboration in translational tumor research. Named for the spider in the popular children’s story Charlotte’s Web, Cavatica allows researchers and investigators to access and share a network of data, pipelines, algorithms, visualizations, and hypotheses’ about specific types of tumors. This online eco-system acts as a hub for platforms including:
Cavatica helps to solve the challenges faced by researchers working with “big data” by allowing large sets of information to be stored in a cloud-environment. Various controls and protocols allow users to select whether to share projects with private groups or within a common working space available to the entire scientific community. The platform is constantly evolving and improving to keep pace with the needs of cancer researchers as technology allows for new scientific breakthroughs. Cavatica will include data from a number of sources including the Children’s Brain Tumor Tissue Consortium (CBTTC), Pacific Neuro-oncology Consortium (PNOC), Stand Up to Cancer, TARGET (Therapeutically Applicable Research to Generate Effective Treatments), and TCGA (The Cancer Genome Atlas).
Scalable storage space within the cloud eliminates the need to download or large volumes of data, allowing researchers to view and interact with only the data they need in real time. This new framework will allow for the continued collaboration of data scientists, statisticians, data engineers, programmers, application developers, bioinformaticians, and scientists/post-doc/PI. Following its launch in October 2016, Cavatica will become the largest clinically annotated pediatric cancer database on earth.
The PedCBioPortal for Cancer Genomics provides visualization, analytics, and download of large-scale cancer genomics data sets to assist researchers in understanding the molecular mechanisms of cancer and defining actionable targets. The PedCbioPortal is an open-access resource that supports visualizations and analytics on multidimensional cancer genomics datasets. It builds upon the original cBioPortal, developed at Memorial Sloan Kettering that was paramount in allowing investigators and researchers to rapidly explore the TCGA with detailed plots and summary statistics. The PedCbioPortal houses the TCGA and many large adult cancer studies, but has a focus on bringing in high-quality pediatric cancer datasets. In addition, it has novel visualizations and links to other applications that heavily support team based translational cancer research and abnegate data silos. Overall, this framework fits uniquely within the CBTTC applications ecosystem to empower researcher to translate genomics data into biological insights and clinical applications.
CBTTC Data Specimen and Inventory Portal
The CBTTC Data Specimen and Inventory portal allows researchers to query and request de-identified data, clinical information and or specimens for research purposes. The user friendly platform is accessible via the web 24 hours a day 7 days a week. Functionalities include reports and graphics for data usability. Designed specifically for Biomedical Data and build on the open-source Harvest platform, the tool was developed and is managed by the Department of Biomedical and Health Informatics (DBHi) at The Children’s Hospital of Philadelphia.
About the CBTTC Informatics Infrastructure
The Biorepository Portal and Honest Broker: Operational Technology
A foundational mission of the CBTTC is to protect subject, participant and family privacy. Subjects’ privacy is assured through the use of an electronic Honest Broker (eHB), a platform which de-identifies subject information while regulating how the data is distributed. The Biorepository Toolkit Project, another open source software development project, provides a single unified interface for data collection and analysis.
All subject identifiers are kept in a secure and encrypted database which is only accessible to the operations team and the honest broker software. This information cannot be retrieved unless a request by a credentialed user of the subject’s home consortium site is made. All information complies with HIPAA policies and personal identification disclosure guidelines. Once the data is de-identified, it is accessible through the Biorepository Portal (BRP) to allow sites around the globe to view and analyze subject data or request additional specimens.
CHOP CBTTC Informatics
The CBTTC Biorepository Portal (BRP), developed by the Enterprise Informatics Group (EiG) of CHOP’s Department of Biomedical and Health Informatics facilitates secure data capture, integration, and specimen management while maintain participant privacy. The Data and Specimen Inventory tool used to query and request de-identified data/specimens and promote quality control and scientific discovery is built on the Harvest platform. Other tools used and integrated into the CBTTC informatics infrastructure include the Electronic Honest Broker (eHB), REDCap data capture, and Nautilus laboratory information management system (LIMS).
CHOP Center for Biomedical Informatics Enterprise Informatics Group (EiG)
The Enterprise Informatics Group (EiG) at DBHi works to improve data quality and integrity, facilitate the re-use of research data and materials, and promote better research by supporting data and materials discovery. The EiG also unifies, standardizes, and enables inter-operable research data across institutions. DBHI’s commitment to data transparency promotes efficiencies in resource utilization and minimizes compliance and security risks.
CBTTC Data Quality Assurance
Because data is always flowing into the CBTTC, quality control must also be constant. The CBTTC’s data quality is maintained through an online portal that research coordinators use to track and solve data quality issues such as entry errors, omissions, formatting errors, and incongruous data across clinical and specimen datasets. On a nightly basis, the entire dataset is checked and workloads are made available for research coordinators to view and solve. This system also allows us to track quality and overall metrics over time to ensure our accuracy is maintained as the CBTTC expands.
CBTTC Software Toolkit
The Biorepository Portal (BRP): The informatics platform for research staff to seamlessly and asynchronously enter data across multiple systems. Note, this tool is only available to research staff. If you would like more information, please see http://www.brptoolkit.com/ for information about this open-source project.
Harvest: A framework created by DBHi for building highly interactive data-intensive biomedical applications, enabling real-time query and data reporting including custom data sets for exporting. The EiG and DBHi built a specific harvest application to serve the query and specimen request needs of the CBTTC.
Electronic Honest Broker: A secure, non-user facing software service that works behind the scenes to connect the BRP and other laboratory, clinical and genomics software tools. This tool securely provides a solution to the onerous process of protecting participant privacy while maintaining highly complex specimen and clinical data. More information can be found in a paper published in the August 2016 Special Issue of BMC Genomics.
RedCap: Data capture and management tool developed by Vanderbilt University
Nautilus LIMS: Laboratory Information Management System for clinical and translational research, developed in partnership with ThermoFisher Scientific
Harvest – Biorepository & specimen query tool
PedcBioPortal – Cancer visualization application
Data storage – Amazon’s Simple Storage Service (S3 buckets)
Data processing – Seven Bridges Genomics