[generic] Check for valid feeds #5
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When rsstube calls its generic extractor, it checks for a success (2xx) HTTP status code for downloaded pages, but it doesn't check the downloaded page to see if it's a valid feed.
This is especially problematic because some JS-heavy web apps (such as PeerTube, Funkwhale, and Pleroma) always respond 200 and then use client-side JS to render any error messages (such as 404s). When rsstube tries its generic extractor on these sites, it causes false positives, as any URL appears to be valid.
A fix for the false positives could be to simply disable the generic extractor on (some or all) known software, but it would be better in general to properly check if downloaded pages are actually RSS/Atom feeds.
Basic check implemented in
a1475943a7
I'd rather not pull in something like feedparser. rsstube doesn't need to read the feed, just figure out if it is a feed. I think just checking for required elements is fine. If issues come up, I'll try to address them, but I'd rather err on the side of not discounting things that are meant to be feeds but maybe missing elements or improperly formatted. The goal is to identify intended feeds, not to run them through a validity checker.