Pure-Python SimHash deduplication #4

Open
opened 2026-06-14 18:00:04 +00:00 by glow · 0 comments
Owner

Pure-Python SimHash using MD5. No C extensions.

Replaces simhash-py (fails on Python 3.13).

Files:

  • extractor/dedup.py
  • tests/test_dedup.py (17 tests)

Status: implemented and tested

Pure-Python SimHash using MD5. No C extensions. Replaces simhash-py (fails on Python 3.13). Files: - extractor/dedup.py - tests/test_dedup.py (17 tests) Status: implemented and tested
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
glow-all/sibyl-extractor#4
No description provided.