End-to-End Machine Learning Project

Restoring Civility
To Digital Discourse.

A high-precision deep learning model designed to filter toxicity from online conversations with 98.01% accuracy.

More by Harshith Varma

Chapter I

The Signal in the Noise

Raw social media data is messy and heavily imbalanced. In the real world, toxic comments are rare outliers. To build a robust model, we first had to balance the scales using SMOTE (Synthetic Minority Over-sampling Technique).

20,000+

Vocabulary Size

Data Distribution

Before SMOTEHigh Imbalance

After SMOTEPerfectly Balanced

We synthesized minority class examples to ensure the model doesn't just memorize "safe" patterns but actively learns to identify toxicity.

Feature Extraction

Using TF-IDF Vectorization, we transformed raw text into numerical vectors, capturing the weight of words like "stupid", "idiot", and "hate" versus neutral terms.

hatestupidkillidiotworsttrashignorantbad+ 4,992 more

Chapter II

The Architecture

A visual journey through the neural pathways. Data flows from high-dimensional vectors to a single probability score.

LAYER 01

Dense Layer

Receives 5000 TF-IDF features. 64 Neurons activated by ReLU function.

LAYER 02

Dropout Layer

Randomly disables 50% of neurons to force redundant feature learning and prevent overfitting.

LAYER 03

Sigmoid Output

Compresses the result into a single probability score between 0 (Safe) and 1 (Toxic).

Chapter III

Uncompromising Accuracy.

Test Accuracy

98.01%

On unseen test data

Loss

0.076

Binary Crossentropy

Learning Curve

98.01%VAL ACCURACY

EPOCH 01EPOCH 07EPOCH 15

The model converges rapidly, reaching optimal performance within 15 epochs. The narrow gap between training and validation accuracy confirms minimal overfitting.

Performance Metrics

0.98

Precision

0.98

Recall

0.98

F1-Score

Sample Prediction:

> input: "This is the worst product ever"
> output: TOXIC [0.92]

Live Inference Environment

Interact with the model in real-time. (Currently undergoing server maintenance).

Demo Temporarily Locked

We are upgrading the inference API to handle higher throughput. Check back soon for the interactive playground.

toxic_detection_v1.py

# Initializing session...

# Model loaded successfully. Weights: 420kb

user@demo:~$ analyze_text --input "You are amazing!"

Analysis Result:
Probability: 0.0021
Verdict: NON-TOXIC

user@demo:~$ _

Restoring Civility To Digital Discourse.

The Signal in the Noise

Data Distribution

Feature Extraction

The Architecture

Dense Layer

Dropout Layer

Sigmoid Output

Uncompromising Accuracy.

Test Accuracy

Loss

Learning Curve

Performance Metrics

Live Inference Environment

Demo Temporarily Locked

Restoring Civility
To Digital Discourse.