Ingest podcast feeds, crowdsource hashes of whole and partial sections of the downloaded audio, which should be a good start to auto-tag dynamically inserted ads.
For non-dynamic ads, provide an interface to manually identify their start/end, and publish for others. The same interface could be used to add chapters and other metadata.
Then you’d just point your podcast app to an RSS feed you self host.
I propose Listenarr, unless this has already been taken.
Alternatively what you're describing sounds like SponsorBlock but for podcasts. You probably wouldn't have to rehost the actual audio files to accomplish this, just have a podcast client/addon that allows user submissions for ad segments and a database somewhere that can host the metadata for ad breaks.
Biggest issue is probably that you're probably building or forking an existing podcast app to do it, and some podcasts dynamically insert ads so it's possible that peoples downloaded files could have different ad segments/times.
I thought I explained how to handle the dynamically inserted ads, but I’ll elaborate a little here.
If your Listenarr instance is part of a broader network of other instances, they’ll all potentially receive a unique file with different ads inserted, but they’ll typically be inserted at the same cut location in the program timeline. Listenarr would calculate the hash of the entire file, but also sub spans of various lengths.
If the hash of the full file is the same among instances, you know everyone is getting the same file, and any time references suggested for metadata will apply to everyone.
If the full file hash is different, Listenarr starts slicing it up and generating hashes of subsections to help identify where common and variant sections are. Common sections will usually be the actual content, variants are likely tailored ads. The broader the Listenarr network, the greater the sample size for hashes, which will help automate identification. In fact, the more granular and specific the targeting of inserted ads, the easier it will be to identify them.
Once you have the file sections sufficiently hashed, tagged, and identified, you can easily stitch together a sanitised media stream into a file any podcast app can ingest.
You could shove this function into a podcast player, but then you’d need to replicate all the existing permutations of player applications.
The beauty of the current podcast environment is it’s just RSS feeds that point to audio files in a standard way. This permits handling by a shim proxy in the middle of the transaction between the publisher and the player.
This could also be a way to better incorporate media into the fediverse. One example is the chapters and transcripts generated could be directly referenced in Lemmy and Mastodon posts.