It's coming along nicely, I hope I'll be able to release it in the next few days.
Screenshot:
How It Works:
I am a bot that generates summaries of Lemmy comments and posts.
Just mention me in a comment or post, and I will generate a summary for you.
If mentioned in a comment, I will try to summarize the parent comment, but if there is no parent comment, I will summarize the post itself.
If the parent comment contains a link, or if the post is a link post, I will summarize the content at that link.
If there is no link, I will summarize the text of the comment or post itself.
Extra Info in Comments:
Prompt Injection:
Of course it's really easy (but mostly harmless) to break it using prompt injection:
It will only be available in communities that explicitly allow it. I hope it will be useful, I'm generally very satisfied with the quality of the summaries.
Just curious because I was discussing this with someone else on here. Do you think it's possible to create a tldw bot with chatgpt for YouTube videos as well?
It is definitely possible, at least for videos that have a transcript. There are tools to download the transcript which can be fed into an LLM to be summarized.
I used GPT-4 for this post, which is miles ahead of GPT-3.5, but it would be prohibitively expensive (for me) to use it for a publicly available bot. I also asked it to generate a longer summary with subheadings instead of a TLDR.
The real question is if it is legal to programmatically download video transcripts this way. But theoretically it is entire possible, even easy.
Oh, Iβve just realized that itβs also possible if the video doesnβt have a transcript. You can download the audio and feed it into OpenAI Whisper (which is currently the best available audio transcription model), and pass the transcript to the LLM. And Whisper isnβt even too expensive.
fyi someone else launched one a day ago; please see my suggestions to them in their 'how the bot works' thread in tldrbot@lemmy.world.
tldr: i am really not a fan of automated posting of automated plausibly-not-BS-but-actually-BS, which is what LLMs tend to produce, but I realize some people are. if you really want to try it, please make it very clear to every reader that it is machine-generated before they read the rest of the comment.
Yeah, but if the thing is that I don't want my content to be sent to an AI?
Your idea is very cool, but be sure to implement a way to detect if the user that posted the comment/post doesn't have a #nobot tag in its profile description.
This is an excellent idea, and I'm not sure why people downvoted you. The bot library I used doesn't support requesting the user profile, but I'm sure it can be fetched directly from the API. I will look into implementing it!
No. Intellectual property. And that's why licenses exist.
And btw, I publish on my own server, my property, so if I don't want and state that an AI scans its content, in an automated way, it simply doesn't have the right.
Unfortunately the locally hosted models I've seen so far are way behind GPT-3.5. I would love to use one (though the compute costs might get pretty expensive), but the only realistic way to implement it currently is via the OpenAI API.
EDIT: there is also a 100 summaries / day limit I built into the bot to prevent becoming homeless because of a bot
By the way, in case it helps, I read that OpenAI does not use content submitted via API for training. Please look it up to verify, but maybe that can ease the concerns of some users.
Also, have a look at these hosted models, they should be way cheaper than OpenAI. I think that this company is related to StabilityAI and the guys from StableDiffusion and also openassistant.
I looked into using locally hosted models for some personal projects, but they're absolutely awful. ChatGPT 3.5, and especially 4 are miles above what local models can do at the moment. It's not just a different game, the local model is in the corner not playing at all. Even the Falcon instruct, which is the current best rated model on HuggingFace.