End-to-End Machine Learning Project

Restoring Civility
To Digital Discourse.

A high-precision deep learning model designed to filter toxicity from online conversations with 98.01% accuracy.

More by Harshith Varma
Chapter I

The Signal in the Noise

Raw social media data is messy and heavily imbalanced. In the real world, toxic comments are rare outliers. To build a robust model, we first had to balance the scales using SMOTE (Synthetic Minority Over-sampling Technique).

Data Distribution

Before SMOTEHigh Imbalance
After SMOTEPerfectly Balanced

We synthesized minority class examples to ensure the model doesn't just memorize "safe" patterns but actively learns to identify toxicity.

Feature Extraction

Using TF-IDF Vectorization, we transformed raw text into numerical vectors, capturing the weight of words like "stupid", "idiot", and "hate" versus neutral terms.

hatestupidkillidiotworsttrashignorantbad+ 4,992 more
Chapter II

The Architecture

A visual journey through the neural pathways. Data flows from high-dimensional vectors to a single probability score.

LAYER 01

Dense Layer

Receives 5000 TF-IDF features. 64 Neurons activated by ReLU function.

LAYER 02

Dropout Layer

Randomly disables 50% of neurons to force redundant feature learning and prevent overfitting.

LAYER 03

Sigmoid Output

Compresses the result into a single probability score between 0 (Safe) and 1 (Toxic).

Chapter III

Uncompromising Accuracy.

Test Accuracy

98.01%
On unseen test data

Loss

0.076
Binary Crossentropy

Learning Curve

98.01%VAL ACCURACY
EPOCH 01EPOCH 07EPOCH 15

The model converges rapidly, reaching optimal performance within 15 epochs. The narrow gap between training and validation accuracy confirms minimal overfitting.

Performance Metrics

0.98
Precision
0.98
Recall
0.98
F1-Score
Sample Prediction:
> input: "This is the worst product ever"
> output: TOXIC [0.92]

Live Inference Environment

Interact with the model in real-time. (Currently undergoing server maintenance).

Demo Temporarily Locked

We are upgrading the inference API to handle higher throughput. Check back soon for the interactive playground.

toxic_detection_v1.py
# Initializing session...
# Model loaded successfully. Weights: 420kb
user@demo:~$ analyze_text --input "You are amazing!"
Analysis Result:
Probability: 0.0021
Verdict: NON-TOXIC
user@demo:~$ _