Enter any SMILES notation and our ensemble ML model predicts the log(LD50) value in milliseconds — powered by Morgan fingerprints, RDKit descriptors, and gradient-boosted trees.
7,400+
Training Molecules
0.61
RMSE (log mol/kg)
0.84
R² Score
2048
Fingerprint Bits
Core Tool
LD50 Prediction Engine
🧪
Molecular Toxicity Predictor
ensemble model · morgan fingerprints · rdkit descriptors
AspirinCC(=O)Oc1cc...
Pyrenec1ccc2c(c1)...
IbuprofenCC(C)Cc1cc...
Carbon TetrachlorideC(Cl)(Cl)...
Benzoic AcidOC(=O)c1cc...
Benzenec1ccccc1
Enter SMILES above to preview molecular structure
RUNNING INFERENCE ...
Predicted log(LD50)
—
mol/kg (log scale)
LD50 (mg/kg est.)
—
rat oral, estimated
GHS Class
—
Relative Toxicity Level—
HIGHLY TOXICPRACTICALLY NON-TOXIC
Model Confidence—
Top Feature Contributions
⚠️ HIGH TOXICITY DETECTED — This compound's predicted LD50 suggests it falls in GHS Category 1–2. Handle with extreme care. Consult safety data sheets before any experimental use.
ML Architecture
Our 4-Stage Pipeline
🔤
01
SMILES Parsing
RDKit converts SMILES strings to molecular graph objects with full atom/bond metadata
🔢
02
Featurization
Morgan Fingerprints (radius=2, 2048 bits) + 200 RDKit physicochemical descriptors
🤖
03
Ensemble Model
XGBoost + Random Forest stacked via Ridge regression for robust log(LD50) regression
An ensemble of 500 decision trees trained on Morgan fingerprints. Robust, interpretable, and surprisingly competitive for molecular property prediction.
0.68
RMSE
0.79
R² Score
500
Trees
2048
Features
Advanced
XGBoost
Gradient boosted trees with learning rate tuning, depth control, and feature subsampling. Trained on combined fingerprint + descriptor features for top accuracy.
0.63
RMSE
0.82
R² Score
1000
Estimators
2248
Features
🏆 Best
Stacked Ensemble
XGBoost + Random Forest predictions blended via Ridge regression meta-learner. Achieves state-of-the-art performance by combining complementary model strengths.