Peer-Reviewed Phishing Detection Infrastructure
Born from NSF-funded research at Alabama A&M University's Cybersecurity Lab. Published in Springer. Battle-tested at DEF CON. Now available as a production API.
This infrastructure was developed under a National Science Foundation grant awarded to Alabama A&M University's Cybersecurity Laboratory, supporting rigorous academic research into email-based threat detection at scale.
The detection methodology and experimental results were peer-reviewed and published through Springer, one of the world's leading academic publishers, validating our multi-signal classification approach.
Presented at DEF CON, the world's largest and most respected hacker conference in Las Vegas, where the system was stress-tested by the security community and received recognition from practitioners.
Real-time lookup against Google's phishing and malware URL database, updated continuously with newly discovered threats.
Community-verified phishing URL intelligence, cross-referenced against our scan requests for immediate threat identification.
Parallel lookups against Spamhaus, SURBL, and URIBL blocklists to catch known spam infrastructure and malicious domains instantly.
TF-IDF ensemble classifier trained on 80,000 emails, achieving 95% F1 score on holdout test sets across multiple phishing categories.
Deep inspection of SPF, DKIM, and DMARC authentication results, plus Reply-To spoofing and display name impersonation detection.
Shannon entropy scoring, homoglyph character substitution detection, redirect chain traversal, and typosquatting pattern matching.
Identifies Base64-encoded payloads, CSS-hidden content, misleading HTML comments, and other obfuscation techniques used to bypass filters.
WHOIS-based domain registration age lookup. Domains registered under 30 days are flagged as high risk — a consistent indicator of phishing infrastructure.
Submit email details to analyze with all detection layers.
Powered by Scanner API · Hosted on Render free tier — first request may take 30s to wake
Request Body
{ "email_address": "support@paypa1.com", "email_text": "Verify your PayPal account immediately or it will be suspended", "email_headers": "Authentication-Results: spf=fail; dkim=fail; dmarc=fail (p=REJECT)\r\nFrom: PayPal Support <support@paypa1.com>\r\nReply-To: attacker@gmail.com" }
Response
{ "scan_id": "1a49b796-d242-448d-929a-dbec00106dcf", "scam_score": 83.38, "risk_level": "CRITICAL", "labels": ["Homoglyph Domain", "DMARC Fail", "Phishing Content", "Reply-To Mismatch"], "recommendations": [ "Do not click any links in this email", "This email failed DMARC authentication — sender domain is spoofed", "Display name impersonates PayPal but was not sent from their domain" ], "content_analysis": { "prediction": "Phishing Email", "confidence": 0.986, "risk_score": 98.6, "is_phishing": true }, "header_analysis": { "dmarc": "fail", "dmarc_policy": "reject", "reply_to_mismatch": true, "display_name_spoof": true, "spoofed_brand": "PayPal", "risk_score": 100.0, "flags": ["DMARC_FAIL", "REPLY_TO_MISMATCH", "DISPLAY_NAME_SPOOF"] }, "email_verification": { "homoglyph_detected": true, "risk_score": 100.0 } }
Code Examples
curl -X POST https://scanner-api.railway.app/api/scan \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key_here" \
-d '{"email_address": "test@example.com", "email_text": "Your message here"}'
import requests
response = requests.post(
"https://scanner-api.railway.app/api/scan",
headers={"X-API-Key": "your_api_key_here"},
json={"email_address": "test@example.com", "email_text": "Your message here"}
)
result = response.json()
print(f"Risk Level: {result['risk_level']}, Score: {result['scam_score']}")
const response = await fetch('https://scanner-api.railway.app/api/scan', {
method: 'POST',
headers: {'Content-Type': 'application/json', 'X-API-Key': 'your_api_key_here'},
body: JSON.stringify({email_address: 'test@example.com', email_text: 'Your message here'})
});
const result = await response.json();
console.log(`Risk Level: ${result.risk_level}, Score: ${result.scam_score}`);
Request Body
{ "urls": ["https://paypa1.com/verify", "https://secure-login.bank.xyz"] }
Response
{ "scan_id": "f83a21bc-...", "urls_analyzed": 2, "results": [ { "url": "https://paypa1.com/verify", "risk_score": 97.5, "homoglyph_detected": true, "safe_browsing_hit": false, "dnsbl_hit": false, "entropy_score": 4.2 } ] }
Request Body
{ "text": "Urgent: Your account has been compromised. Click here to verify now." }
Response
{ "prediction": "Phishing Email", "confidence": 0.941, "risk_score": 94.1, "is_phishing": true, "model_version": "tfidf-ensemble-v2" }
Response
{ "status": "healthy", "version": "2.1.0", "model_loaded": true, "uptime_seconds": 86420, "services": { "ml_classifier": "online", "safe_browsing": "online", "dnsbl": "online" } }
Every field in the response has a purpose. Nothing is a black box.
{
"scan_id": "1a49b796-...", // ← unique per request
"scam_score": 83.38, // ← 0–100 composite
"risk_level": "CRITICAL", // ← SAFE/LOW/MED/HIGH/CRIT
"labels": [...], // ← human-readable signals
"recommendations": [...], // ← actionable guidance
"content_analysis": { // ← ML model output
"prediction": "Phishing Email",
"confidence": 0.986, // ← 0.0–1.0
"is_phishing": true
},
"header_analysis": { // ← SPF/DKIM/DMARC
"dmarc": "fail",
"reply_to_mismatch": true, // ← Reply-To ≠ From
"spoofed_brand": "PayPal", // ← detected impersonation
"flags": [...] // ← machine-readable flags
},
"email_verification": { // ← domain checks
"homoglyph_detected": true, // ← paypa1.com ≠ paypal.com
"risk_score": 100.0
}
}
A weighted average across all active detection layers, normalized to 0–100. Weights are calibrated on the 80k training corpus to minimize false positives on legitimate transactional emails.
SAFE (<20), LOW (20–39), MEDIUM (40–59), HIGH (60–79), CRITICAL (80+). Tier boundaries were set to match industry-standard SOC triage thresholds for email security workflows.
The flags array in each sub-analysis contains uppercase string constants ideal for programmatic routing, SIEM integration, and automated quarantine rules.
The email_verification layer compares the sender domain character-by-character against a curated list of major brands, detecting substitutions like 1→l, 0→o, and Unicode lookalikes.
The top-level labels array condenses the most significant findings into short strings suitable for display in email clients, browser extensions, or end-user dashboards without exposing raw internals.
Designed to support researchers, defenders, and builders fighting phishing at every level.
Free tier with generous rate limits for academic use, coursework, and thesis research. No credit card required. Cite us in your paper.
Free · AcademicCustom rate limits, production SLA guarantees, and dedicated support for security operations centers and enterprise email security integrations.
Custom · ProductionFree access for qualifying open source security tools, email clients, and community projects that help protect users from phishing.
Free · OSSDeveloped at AAMU Cybersecurity Lab · NSF Funded