Machine Learning-Based Dual-Target Prediction of Metal Oxide Nanoparticle Cytotoxicity: Integrating Physicochemical, Electronic, and Compositional Descriptors with Transfer Learning

Authors

  • Hafsa Batool Department of Physics, University of Agriculture Faisalabad, Pakistan Author
  • Saeed Rasheed Department of Computer Science, University of Agriculture Faisalabad, Pakistan Author
  • Hamda Khalid Faculty of CS & IT The Superior University Lahore, Pakistan, Email: hamdakhalid@superior.edu.pk Author
  • Samavia Khalid Faculty of CS & IT The Superior University Lahore, Pakistan, Email: samaviakhalid@superior.edu.pk Author

DOI:

https://doi.org/10.63163/jpehss.v4i2.1470

Keywords:

Nanotoxicity; Metal Oxide Nanoparticles; Machine Learning; Xgboost; Random Forest; Multilayer Perceptron; Transfer Learning; Compositional Embeddings; Shap; Integrated Gradients; Dual-Target Prediction; Out-Of-Distribution Generalization

Abstract

There is an urgent need for robust and scalable computational tools for nanosafety assessment, given that the number of engineered metal oxide nanoparticles (NPs) is increasing rapidly in their use in industries and tissues across a wide variety of applications in biomedical sciences. Using the S2NANO MeOx_I meta-analysis, we consider 26 metal oxide NP materials for their in vitro cytotoxicity effect on both human and bacterial cells (n = 6,842 experimental records) and present an extensive machine learning framework to predict the dual target in vitro cytotoxicity. Our framework treats both binary toxicity classification (Toxic/Nontoxic) and continuous cell viability regression (Viability %) problems at the same time under strict out-of-distribution (OOD) evaluation protocol which includes a test set of novel materials not included in the training set. To systematically compare four baseline models (XGBoost and Random Forest (RF) for both tasks with two dual-target multilayer perceptron (MLP) architectures (A) physicochemical-feature MLP and (B) composition-embedding transfer learning MLP which uses 132-dimensional Magpie compositional embeddings derived from matminer library. Exploratory data analysis showed strong class imbalance (84.5% Nontoxic), an 8-order-of-magnitude range that required log-transformation, and monotonicity in the response appeared to be present. Surface formation enthalpy (Hsf) and valence band maximum (Ev) are among these key material electronic descriptors, which were consistently found to be among the most important in SHAP and Integrated Gradient analyses, in addition to log10(dose). In the case of the OOD test set, the best baseline classifier (RF) resulted in an ROC-AUC = 0.721 and Model B had an ROC-AUC = 0.737. To measure regression, Model B performed with R² = 0.085 with RMSE = 26.7% which is significantly better than all baselines (RF R² = −0.060 and RMSE = 28.8%) and utilized compositional transfer information. More importantly, model B achieved regression R² of +0.059 on Fe3O4 against all test materials, compared with −0.115 in the model RF. This illustrates that compositional embedding yield gains in both generalization and precision of the model, on chemically dissimilar test materials. The results made the creation of a reproducible NP nanosafety predictive baseline and showed the importance of electronic and compositional descriptors, which go beyond classical physicochemical descriptors.

Downloads

Published

2026-06-23

Issue

Section

Computer Science and Information Technology