Data mining

In collaboration with researchers of the Hospital "12 de Octubre", the GIB worked between 1995 and 1997 in a project for the extraction of rules for prediction of clinical histories of patients with rheumatoid arthritis. To implement this system were tested techniques of artificial neural networks, induction, and "clustering".

The system, aimed to link clinical data with "outcomes" related to quality of life. To this end, were examined more than 1000 records, of which only about 250 were selected due to poor data quality of the remaining, deriving this quality problem in the consistency of data in the subsequent creation of the project "OntoDataClean".

The GIB developed a distributed KDD methodology, through the use of domain ontologies in the upstream data mining. To experimentally validate the created model, batteries consultations were carried out in various heterogeneous data sources. The subsequent application of mining algorithms on the results, also gave better final results of the separate data sources and than the traditional integration methods.

Data mining



  1. Pérez-Rey D., Anguita A., Crespo J. “OntoDataClean: Ontology-based Integration and Preprocessing of Distributed Data”. Lecture notes in Computer Science, 2006; 4345/2006: 262-272.
  2. Pérez-Rey D., Maojo V., García-Remesal M, Alonso-Calvo R., Billhardt H., Martín-Sánchez F., Sousa A. “ONTOFUSION: Ontology-Based Integration of Genomic and Clinical Databases”. Computers in Biology and Medicine, 2006; 36: 712-30.
  3. Anguita, A., Pérez-Rey, D., Crespo, J., Maojo, V. “Automatic Generation of Integration and Preprocessing Ontologies for Biomedical Sources in a Distributed Scenario”, Proceedings of the 21st International Symposium on Computer-Based Medical Systems (CBMS2008), Jyväskylä (Finlandia) Junio 17-19, 2008. Pp. 336-341.
  4. Pérez-Rey D., Anguita A., Crespo J., Maojo V. “An Ontology-based and Distributed KDD Model for Biomedical Sources”. Proceedings of the AMIA Annu Symp Proc. 2007, Chicago, EEUU, Noviembre 2007.
  5. Sanandres-Ledesma JA., Maojo, V., Crespo, J., Gómez de la Cámara, A and García-Remesal, M. A performance comparative analysis between rule-induction algorithms. Application to rheumatoid arthritis. Lecture Notes in Computer Science 3337: 224-234, 2004.
  6. Maojo, V. Domain-specific particularities of data mining: Lessons learned. Lecture Notes in Computer Science 3337: 235-242, 2004.
  7. Crespo, J.; Maojo, V. y Martín, F. (Eds). Medical Data Analysis. Lecture Notes in Computer Science 2199. Springer 2001. Springer Verlag.
  8. Maojo, V.; Crespo, J.; Sanandrés, J. y Billhardt, H. Computational Intelligence Techniques in Medical Decision Making. The Data Mining Perspective. En Jain, L. et al (Ed). Computational Intelligence Processing in Medical Diagnosis, pp.13-44 2002. Springer Verlag.