Restoring Civility
To Digital Discourse.
A high-precision deep learning model designed to filter toxicity from online conversations with 98.01% accuracy.
The Signal in the Noise
Raw social media data is messy and heavily imbalanced. In the real world, toxic comments are rare outliers. To build a robust model, we first had to balance the scales using SMOTE (Synthetic Minority Over-sampling Technique).
Data Distribution
We synthesized minority class examples to ensure the model doesn't just memorize "safe" patterns but actively learns to identify toxicity.
Feature Extraction
Using TF-IDF Vectorization, we transformed raw text into numerical vectors, capturing the weight of words like "stupid", "idiot", and "hate" versus neutral terms.
The Architecture
A visual journey through the neural pathways. Data flows from high-dimensional vectors to a single probability score.
Dense Layer
Receives 5000 TF-IDF features. 64 Neurons activated by ReLU function.
Dropout Layer
Randomly disables 50% of neurons to force redundant feature learning and prevent overfitting.
Sigmoid Output
Compresses the result into a single probability score between 0 (Safe) and 1 (Toxic).
Uncompromising Accuracy.
Test Accuracy
Loss
Learning Curve
The model converges rapidly, reaching optimal performance within 15 epochs. The narrow gap between training and validation accuracy confirms minimal overfitting.
Performance Metrics
> output: TOXIC [0.92]
Live Inference Environment
Interact with the model in real-time. (Currently undergoing server maintenance).
Demo Temporarily Locked
We are upgrading the inference API to handle higher throughput. Check back soon for the interactive playground.
Probability: 0.0021
Verdict: NON-TOXIC