Perspective

NLP API to return probability that if text is toxic, obscene, insulting or threatening

Perspective API (by Google's Jigsaw) uses machine learning to score text for toxicity, insults, threats, profanity, and other attributes. It returns probability scores (0-1) for each attribute, making it easy to flag or filter harmful content in comments, chat messages, or user-generated text. ⚠️ **Important: Perspective API is sunsetting and will be out of service after December 31, 2026.** Google is not offering migration support. For OpenClaw agents, Perspective was useful for content moderation in group chats or community channels. Consider alternatives like OpenAI's Moderation API or Hugging Face toxicity models going forward.

Tags: ml, ai, nlp

Category: Machine Learning

Use Cases

  • Score incoming group chat messages for toxicity and flag harmful content
  • Auto-moderate community channels by filtering highly toxic messages
  • Analyze comment sentiment trends on blog posts or social media

Tips

  • Migrate to OpenAI's free Moderation API before the 2026 sunset
  • If still using, batch text analysis to work within the 1 QPS limit
  • Use SEVERE_TOXICITY instead of TOXICITY for fewer false positives

Known Issues & Gotchas

  • Sunsetting after December 31, 2026 — do not build new projects on this API
  • Default rate limit is only 1 QPS — must request increases via Google Cloud
  • Scores are probabilities, not binary — you need to set your own threshold for what counts as toxic

Frequently Asked Questions

Is Perspective API still available?

Yes, but it's sunsetting. The API will remain active until December 31, 2026. No new quota increase requests after February 2026. Plan to migrate to alternatives.

What are good alternatives to Perspective?

OpenAI's Moderation API (free, no rate limit concerns), Hugging Face toxicity models (free inference), or Cloudflare AI content moderation. OpenAI's option is the simplest drop-in replacement.

What attributes can Perspective score?

TOXICITY, SEVERE_TOXICITY, INSULT, PROFANITY, THREAT, IDENTITY_ATTACK, and several experimental attributes. Each returns a 0-1 probability score.