The Data Engineering for Research Unit focuses on the digitization and standardization of data, working closely with various research groups within the institution. In addition to actively participating in research projects, the unit is dedicated to creating and maintaining a solid infrastructure and environment for institutional-level data storage and management.
The unit handles a wide variety of data, including clinical information, genomics, images, and other biomedical sources, with the goal of maximizing data utilization and enhancing its use in research by providing researchers with the necessary tools. It also offers support in defining Data Management Plans (DMPs), which outline the data usage and flow within a project. Through the development of ETL (Extract, Transform, Load) processes, the creation of structured databases, the integration of data from diverse sources, and the development and management of specialized software, a robust infrastructure is provided to enable researchers to fully leverage the value of generated data.
- Standardization of clinical data using the OMOP (Observational Medical Outcomes Partnership) model.
- Advanced visualization of clinical, genomic, and other ‘omics’ data through platforms such as cBioPortal.
- Integration of databases and platforms to ensure efficient and centralized information management.
- Development of customized electronic case report forms (eCRF) in REDCap for research data capture and management.
- Implementation of ETL processes to automate data transfer between various sources within the research environment.
- Artificial Intelligence research for automated structuring of unstructured data.
- Development of customized applications for patient and sample tracking in clinical trials, as well as tailored solutions for research projects.
Group lead
Anna Pedrola
Biomedical Data Engineers
Benet Fité
Carlota Gozalbo
Marina Arias
Image Data Engineer
Cristina Villaseca
Data Steward
Clara Vallés
Most relevant scientific publications
- Pedrola A, Franch-Expósito S, Lahoz S, Esteban-Fabró R, Dienstmann R, Bassaganyas L, et al. PCIG: a web-based application to explore immune-genomics interactions across cancer types. Bioinformatics. 2022 Apr 15;38(8):2374–2376.
- Matos I, Villacampa G, Hierro C, Martin-Liberal J, Berché R, Pedrola A, et al. Phase I prognostic online (PIPO): A web tool to improve patient selection for oncology early phase clinical trials. Eur J Cancer. 2021 Sep;155:168–78.
- Cedres, S., Assaf, J. D., Iranzo, P., Callejo, A., Pardo, N., Navarro, A., Martinez-Marti, A., Marmolejo, D., Rezqallah, A., Carbonell, C., Frigola, J., Amat, R., Pedrola, A., Dienstmann, R., & Felip, E. (2021). Efficacy of chemotherapy for malignant pleural mesothelioma according to histology in a real-world cohort. Scientific reports, 11(1), 21357.
- Mirallas, O., Martin-Cullell, B., Navarro, V., Vega, K.S., Recuero-Borau, J., Gómez-Puerto, D., López-Valbuena, D., de Torres, C.S., Andurell, L., Pedrola, A. and Berché, R., 2024. Development of a prognostic model to predict 90-day mortality in hospitalised cancer patients (PROMISE tool): a prospective observational study. The Lancet Regional Health–Europe.
- SYNTHIA: Synthetic Data Generation Framework for Integrated Validation of Use Cases and AI Healthcare Applications. Funded by the European Commission. 01/09/2024-31/08/2029. PI: Rodrigo Dienstmann
- DART; Building Data Rich Clinical Trials. CCE_DART is an innovative EU-funded project dedicated to deliver novel methods for
the design and implementation of newer, more efficient and effective clinical trials in oncology - CancerCoreEurope: Seven Leading Cancer Centres Improve Cancer Health in Europe.
- Historia clínica Inteligente: Transformando Notas Clínicas en Datos Estructurado.
- AI4Lungs: AI-Based Personalised Care for Respiratory Disease using Multi-Modal Data in Patient Stratification
- EUCANCan: a federated network of aligned and interoperable infrastructures for the homogeneous analysis, management and sharing of genomic oncology data for Personalized Medicine.
- UpSMART Accelerator Management Team – SMART Experimental Cancer Medicine Trials eNABLED.
- American Association for Cancer Research’s (AACR) Genomics Evidence Neoplasia Information Exchange (GENIE) project.