Content quality scoring engine #3

Open
opened 2026-06-14 18:00:04 +00:00 by glow · 0 comments
Owner

Heuristic quality scoring.

Metrics:

  • Word count (<100 = skip)
  • Link density (>0.40 = skip)
  • Headings (none + wc<300 = skip)
  • Sentence variance (<0.30 = boilerplate)

All thresholds configurable.

Files:

  • extractor/quality.py
  • tests/test_quality.py (12 tests)

Status: implemented and tested

Heuristic quality scoring. Metrics: - Word count (<100 = skip) - Link density (>0.40 = skip) - Headings (none + wc<300 = skip) - Sentence variance (<0.30 = boilerplate) All thresholds configurable. Files: - extractor/quality.py - tests/test_quality.py (12 tests) Status: implemented and tested
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
glow-all/sibyl-extractor#3
No description provided.