morrowind

2w ago

Alright, I'm waiting on the youtube playlist

2w ago

Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

Technically it supports fewer languages than whisper, 40 vs 99

The main problem isn't "bother", it's training data. You need hundreds of thousands of hours of high quality transcripts to train models like these and that just doesn't exist for like zulu or whatever

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 2w ago

arxiv.org Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages

This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages. Our approach integrates in-house proprietary and open-source datasets to refine and optimize Dolphin's performance. The mod...

2

3w ago

Sentence transformers v4

I want to clarify something. Reranker is a general term that can refer to any model used for reranking. It is independent of implementation.

What you refer to

because reranker models look at the two pieces of content simultaneously and can be fine tuned to the domain in question. They shouldn't be used for the initial retrieval because the evaluation time is O(n²) as each combination of input

Is a specific implementation known as CrossEncoder that is common for reranking models but not retrieval ones for the reasons you described. But you can also use any other architecture

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 3w ago

Sentence transformers v4

Link to bluesky https://bsky.app/profile/tomaarsen.com/post/3llc2jvwah22f

Some more details https://huggingface.co/blog/train-reranker

3

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 3w ago

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

electricalexis.github.io SOCIAL MEDIA TITLE TAG

SOCIAL MEDIA DESCRIPTION TAG TAG

0

3w ago

StarVector - a foundation model for generating svgs

autotracers can't generate svgs from text

4w ago

StarVector - a foundation model for generating svgs

Claude frequently draws svgs to illustrate things for me (I'm guessing it's in the prompt) but even though it's better at it than all the other models, it still kinda sucks. It's just fudamentally dumb task to do for a purely language model, similar to the arc-agi benchmark , just makes more sense for a vision model and trying to get an llm to do is a waste

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 4w ago

StarVector - a foundation model for generating svgs

huggingface.co starvector/starvector-1b-im2svg · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

6

1mo ago

EXAONE Deep ━ Setting a New Standard for Reasoning AI - LG AI Research News

what is the license? The link on hf just 404s

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 1mo ago

EXAONE Deep ━ Setting a New Standard for Reasoning AI - LG AI Research News

www.lgresearch.ai EXAONE Deep Released ━ Setting a New Standard for Reasoning AI - LG AI Research News

4

1mo ago

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Very similar to chain of draft but seems more thorough

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 1mo ago

arxiv.org Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Recent advances in large language models have demonstrated remarkable reasoning capabilities through Chain of Thought (CoT) prompting, but often at the cost of excessive verbosity in their intermediate outputs, which increases computational overhead. We introduce Sketch-of-Thought (SoT), a novel pro...

2

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 1mo ago

flashinfer.ai Sorting-Free GPU Kernels for LLM Sampling

Background

0

1mo ago

Reka Flash, open source 21B model comparable to QWQ 32B

More info here https://www.reka.ai/news/introducing-reka-flash
HF: https://huggingface.co/RekaAI/reka-flash-3

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 1mo ago

Reka Flash, open source 21B model comparable to QWQ 32B

2

1mo ago

Qwen/QwQ-32B · Hugging Face

It matches R1 in the given benchmarks. R1 has 671B params (36 activated) while this only has 32

1mo ago

Qwen/QwQ-32B · Hugging Face

insane, absolutely insane

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 2mo ago

arxiv.org Chain of Draft: Thinking Faster by Writing Less

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermedia...

0

LocalLLaMA @sh.itjust.works morrowind @lemm.ee 2mo ago

Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1

bsky.app Sung Kim (@sungkim.bsky.social)

Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1 ! For each reasoning step, it: 1. Decompose the question into DAG 2. Contract the subquestions into a NEW simpler question 3. Iterate until reaching an atomic question

0