May 22, 2026

Report summary

12 stories cleared the bar, led by GitHub confirms breach of 3,800 repos via malicious VSCode extension, OpenAI to confidentially file for IPO as soon as Friday, and An OpenAI model has disproved a central conjecture in discrete geometry.

12 worth-attention items40 digest lines

Worth attention

GitHub confirms breach of 3,800 repos via malicious VSCode extension

GitHub has confirmed that a malicious VSCode extension was used to steal developer credentials and access over 3,800 repositories. This is a supply chain attack vector targeting developer workstations directly. Immediate action: audit all installed VSCode extensions, remove anything unfamiliar or low-reputation, and check your repositories for unauthorized access or committed secrets.

OpenAI to confidentially file for IPO as soon as Friday

OpenAI is filing confidentially with the SEC for an IPO, potentially as soon as Friday May 22. Going public changes OpenAI's corporate incentives significantly — quarterly earnings pressure, shareholder priorities, and regulatory scrutiny all increase. Builders relying on OpenAI APIs should watch for any pricing or rate limit changes that could follow increased investor visibility.

An OpenAI model has disproved a central conjecture in discrete geometry

OpenAI's model found a valid counterexample to a longstanding conjecture in discrete geometry, verified by Fields medalist Tim Gowers. This is the first credible instance of a frontier AI model making a genuinely original mathematical contribution — not solving a known problem but disproving a believed-true conjecture. A landmark AI capability milestone with implications for how we think about frontier model reasoning.

Cohere launches Command-A Plus as open weights

Cohere cofounder Nick Frosst confirmed on Reddit that Command-A Plus is being released as open weights (BF16 on HuggingFace). This is a significant open-source model release from a credible enterprise AI company targeting RAG and agentic use cases. Worth evaluating against current open-weight alternatives if you need an enterprise-focused open model.

Comparing coding agents: GitHub Copilot, Pi, Claude Code, and opencode with Qwen3.6 27B

A builder created a reusable test harness to run identical coding tasks across multiple AI agent environments (Copilot, Pi, Claude Code, opencode) with both cloud and local models. The key finding is that harness design contributes significantly to coding agent performance, independent of the underlying model. Relevant if you are choosing or designing a coding agent workflow.

Qwen 3.6 35B GGUF: NTP vs MTP quantization guide across GPUs and CPUs

ByteShape released Qwen 3.6 35B GGUF quantizations in both NTP and MTP families with benchmark-backed guidance on which to pick for different hardware configurations. If you are running Qwen 3.6 35B locally, this is the go-to reference for selecting the right GGUF and saves hours of personal benchmarking.

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB at real coding-agent context lengths. Key finding: MTP provides no benefit for this model at 128k context because memory bandwidth is the bottleneck. 56 tok/s is the practical ceiling. Useful data point for planning local inference for agentic code workloads.

AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory

AMD announced pricing for their Ryzen AI Halo PC at $3,999 with 128GB of unified memory on-board. This establishes a new consumer-grade tier for local AI inference with large memory capacity, competing with Apple Silicon Mac Studio for local LLM use cases.

AMD BC-250 (salvaged PS5 APU board): $50-150 for 16GB GDDR6 local inference

Salvaged PS5 APU boards (AMD BC-250) are available on eBay at $50-150 each, featuring Zen 2, 16GB unified GDDR6, and RDNA 2. ROCm has been confirmed working on them. At this price point they are the cheapest viable GPU inference node available, significantly cheaper than any current discrete GPU with similar VRAM.

How fast is N tokens per second really? (interactive visualizer)

An interactive tool that converts raw tokens-per-second into human-comprehensible equivalents (reading speed, typing speed). Useful for calibrating expectations about LLM inference performance and explaining token generation speed to non-technical stakeholders or clients.

Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP

A builder created a system running a Linux VM inside the browser (via WebAssembly) that bridges physical USB scanners via WebUSB + USB/IP protocol. This enables legacy SANE-compatible scanners to work in-browser without native drivers. The architectural pattern — browser WASM VM + WebUSB bridge — is novel and potentially applicable to other USB hardware classes.

HuggingFace benchmark datasets now let you filter by model size

HuggingFace added a model size filter to their benchmark dataset view, enabling comparison of models within a given parameter budget. Minor but genuinely useful for model selection tasks.

Full digest

R Qwen will release another 27B with high probability

Speculative tweet, no official confirmation.

reddit-localllama

P Cohere launches Command-A Plus as open weights

Enterprise open-weights model from Cohere cofounder confirmed; BF16 on HuggingFace.

reddit-localllama

R Back again, many changes have taken place

Hobbyist CLI project update; no broader relevance.

reddit-localllama

P Comparing coding agents: Copilot, Pi, Claude Code, opencode with Qwen3.6 27B

Harness design matters as much as model choice; reusable test rig.

reddit-localllama

R Qwen3.6 27B and llama.cpp appreciation post

Personal config post; nothing new.

reddit-localllama

R Training a vision model from scratch on iPod touch 4 images

Hobbyist DCGAN experiment; no practical takeaways.

reddit-localllama

M HuggingFace benchmark datasets filter by model size

Minor HF UI improvement; useful for model selection.

reddit-localllama

R Waiting on Qwen to drop those 3.7 models be like

Meme/engagement bait.

reddit-localllama

M AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory

New consumer AI PC tier; competes with Mac Studio for local inference.

reddit-localllama

P Qwen 3.6 35B GGUF: NTP vs MTP quantization guide

ByteShape pick-the-right-quant guide across GPU/CPU hardware.

reddit-localllama

P AMD BC-250 (PS5 APU): $50-150 for 16GB GDDR6

Cheapest viable GPU inference node with confirmed ROCm.

reddit-localllama

R CohereLabs/command-a-plus HuggingFace

duplicate of Cohere story.

reddit-localllama

R How can you stop your model from looping

Support question; no new info.

reddit-localllama

M HalBench: sycophancy/hallucination benchmark

Sonnet 4.6 tops benchmark; verify model version names before acting on.

reddit-localllama

R I guess 4 units wasn't enough

humor post.

reddit-localllama

R [WIP] Gemma 4 MTP

Unstable WIP; check back when mainline.

reddit-localllama

M llama.cpp MTP backend sampling merged

Performance improvement for MTP in upcoming builds.

reddit-localllama

R Model Golf for RunPod Credits

niche hobby competition.

reddit-localllama

M AWS secures rare Mac Studio M3 Ultras

M3 Ultra supply constrained; AWS priority buying confirmed.

reddit-localllama

R llama.cpp build 9254 fixes TG regression

hardware-specific report; not broadly actionable.

reddit-localllama

P RTX 5080 16GB: Qwen3.6 35B MoE at 128k context

56 tok/s — https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/ — MTP doesn't help large MoEs at long context; bandwidth-limited.

reddit-localllama

R Qwen3-VL-Embedding-2B on Orange Pi 5b

niche embedded hobbyist project.

reddit-localllama

R PDF reading with AnythingLLM

support question.

reddit-localllama

R Qwen3.6-35B + Hermes Agent on DGX Spark

setup-specific question.

reddit-localllama

P An OpenAI model has disproved a central conjecture in discrete geometry

First AI original math discovery verified by Fields medalist.

hn-top

P GitHub confirms breach of 3,800 repos via malicious VSCode extension

Audit your VSCode extensions now; 3,800+ repos compromised.

hn-top

R Haskell Foundation 2026 Update

niche language org update.

hn-top

R Show HN: reverse engineered Apple video wallpapers

consumer hobby project.

hn-top

M New features in GCC 16: improved errors and SARIF output

Machine-readable compiler diagnostics; useful for C/C++ tooling.

hn-top

R The Letter S, by Donald Knuth (1980)

off-topic typography essay.

hn-top

R DOS Zone

browser DOS gaming; entertainment only.

hn-top

M Flipper One tech specs published

Next Flipper hardware specs; no release date yet.

hn-top

M Anthropic expanding to Colossus2 with GB200

Capacity expansion; signals more Claude API headroom.

hn-top

R Archaeologists find Egyptian mummy buried with the Iliad

off-topic.

hn-top

P How fast is N tokens per second really?

Interactive tok/s → human speed converter.

hn-top

M Saying goodbye to asm.js

Firefox deprecating asm.js; use WASM for new work.

hn-top

P Reviving old scanners with in-browser Linux VM + WebUSB

Novel WASM VM + WebUSB bridge architecture.

hn-top

M Intuit lays off 3k+ to refocus on AI

Industry AI restructuring signal.

hn-top

P OpenAI to confidentially file for IPO

Major structural event for OpenAI API platform.

hn-top

R Qian Xuesen: The missile genius America lost and China gained

off-topic history.

hn-top

Original markdown

# Nightly Librarian — Newsletter draft

Run: a55aa549-8ffd-41a9-a765-5b4432778106
Started: 2026-05-22T06:09:41.788Z
Completed: 2026-05-22T06:25:00.957Z

## Worth attention

- **GitHub confirms breach of 3,800 repos via malicious VSCode extension**
  https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-malicious-vscode-extension/
  GitHub has confirmed that a malicious VSCode extension was used to steal developer credentials and access over 3,800 repositories. This is a supply chain attack vector targeting developer workstations directly. Immediate action: audit all installed VSCode extensions, remove anything unfamiliar or low-reputation, and check your repositories for unauthorized access or committed secrets.
- **OpenAI to confidentially file for IPO as soon as Friday**
  https://www.cnbc.com/2026/05/20/openai-ipo-filing.html
  OpenAI is filing confidentially with the SEC for an IPO, potentially as soon as Friday May 22. Going public changes OpenAI's corporate incentives significantly — quarterly earnings pressure, shareholder priorities, and regulatory scrutiny all increase. Builders relying on OpenAI APIs should watch for any pricing or rate limit changes that could follow increased investor visibility.
- **An OpenAI model has disproved a central conjecture in discrete geometry**
  https://openai.com/index/model-disproves-discrete-geometry-conjecture/
  OpenAI's model found a valid counterexample to a longstanding conjecture in discrete geometry, verified by Fields medalist Tim Gowers. This is the first credible instance of a frontier AI model making a genuinely original mathematical contribution — not solving a known problem but disproving a believed-true conjecture. A landmark AI capability milestone with implications for how we think about frontier model reasoning.
- **Cohere launches Command-A Plus as open weights**
  https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/
  Cohere cofounder Nick Frosst confirmed on Reddit that Command-A Plus is being released as open weights (BF16 on HuggingFace). This is a significant open-source model release from a credible enterprise AI company targeting RAG and agentic use cases. Worth evaluating against current open-weight alternatives if you need an enterprise-focused open model.
- **Comparing coding agents: GitHub Copilot, Pi, Claude Code, and opencode with Qwen3.6 27B**
  https://www.reddit.com/r/LocalLLaMA/comments/1tjbhjk/same_task_in_githubcopilot_pi_claudecode_and/
  A builder created a reusable test harness to run identical coding tasks across multiple AI agent environments (Copilot, Pi, Claude Code, opencode) with both cloud and local models. The key finding is that harness design contributes significantly to coding agent performance, independent of the underlying model. Relevant if you are choosing or designing a coding agent workflow.
- **Qwen 3.6 35B GGUF: NTP vs MTP quantization guide across GPUs and CPUs**
  https://www.reddit.com/r/LocalLLaMA/comments/1tipihx/qwen_36_35b_gguf_ntp_vs_mtp_quantization_results/
  ByteShape released Qwen 3.6 35B GGUF quantizations in both NTP and MTP families with benchmark-backed guidance on which to pick for different hardware configurations. If you are running Qwen 3.6 35B locally, this is the go-to reference for selecting the right GGUF and saves hours of personal benchmarking.
- **RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help**
  https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/
  Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB at real coding-agent context lengths. Key finding: MTP provides no benefit for this model at 128k context because memory bandwidth is the bottleneck. 56 tok/s is the practical ceiling. Useful data point for planning local inference for agentic code workloads.
- **AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory**
  https://www.reddit.com/r/LocalLLaMA/comments/1tinl98/amd_ryzen_ai_halo_pc_will_cost_3999_with_128gb/
  AMD announced pricing for their Ryzen AI Halo PC at $3,999 with 128GB of unified memory on-board. This establishes a new consumer-grade tier for local AI inference with large memory capacity, competing with Apple Silicon Mac Studio for local LLM use cases.
- **AMD BC-250 (salvaged PS5 APU board): $50-150 for 16GB GDDR6 local inference**
  https://www.reddit.com/r/LocalLLaMA/comments/1tj4unp/amd_bc250_and_the_search_for_cheap_compute/
  Salvaged PS5 APU boards (AMD BC-250) are available on eBay at $50-150 each, featuring Zen 2, 16GB unified GDDR6, and RDNA 2. ROCm has been confirmed working on them. At this price point they are the cheapest viable GPU inference node available, significantly cheaper than any current discrete GPU with similar VRAM.
- **How fast is N tokens per second really? (interactive visualizer)**
  https://mikeveerman.github.io/tokenspeed/
  An interactive tool that converts raw tokens-per-second into human-comprehensible equivalents (reading speed, typing speed). Useful for calibrating expectations about LLM inference performance and explaining token generation speed to non-technical stakeholders or clients.
- **Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP**
  https://yes-we-scan.app/details
  A builder created a system running a Linux VM inside the browser (via WebAssembly) that bridges physical USB scanners via WebUSB + USB/IP protocol. This enables legacy SANE-compatible scanners to work in-browser without native drivers. The architectural pattern — browser WASM VM + WebUSB bridge — is novel and potentially applicable to other USB hardware classes.
- **HuggingFace benchmark datasets now let you filter by model size**
  https://www.reddit.com/r/LocalLLaMA/comments/1tilvit/huggingface_benchmark_datasets_now_let_you_filter/
  HuggingFace added a model size filter to their benchmark dataset view, enabling comparison of models within a given parameter budget. Minor but genuinely useful for model selection tasks.

## Full digest

- [R] [reddit-localllama] Qwen will release another 27B with high probability — https://www.reddit.com/r/LocalLLaMA/comments/1tiwnpc/qwen_will_release_another_27b_with_high/ — Speculative tweet, no official confirmation.
- [P] [reddit-localllama] Cohere launches Command-A Plus as open weights — https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/ — Enterprise open-weights model from Cohere cofounder confirmed; BF16 on HuggingFace.
- [R] [reddit-localllama] Back again, many changes have taken place — https://www.reddit.com/r/LocalLLaMA/comments/1tj8d9i/back_again_many_changes_have_taken_place/ — Hobbyist CLI project update; no broader relevance.
- [P] [reddit-localllama] Comparing coding agents: Copilot, Pi, Claude Code, opencode with Qwen3.6 27B — https://www.reddit.com/r/LocalLLaMA/comments/1tjbhjk/same_task_in_githubcopilot_pi_claudecode_and/ — Harness design matters as much as model choice; reusable test rig.
- [R] [reddit-localllama] Qwen3.6 27B and llama.cpp appreciation post — https://www.reddit.com/r/LocalLLaMA/comments/1tjbi24/qwen36_27b_and_llamacpp_appreciation_post/ — Personal config post; nothing new.
- [R] [reddit-localllama] Training a vision model from scratch on iPod touch 4 images — https://www.reddit.com/r/LocalLLaMA/comments/1tjaedo/training_a_vision_model_from_scratch_on_ipod/ — Hobbyist DCGAN experiment; no practical takeaways.
- [M] [reddit-localllama] HuggingFace benchmark datasets filter by model size — https://www.reddit.com/r/LocalLLaMA/comments/1tilvit/huggingface_benchmark_datasets_now_let_you_filter/ — Minor HF UI improvement; useful for model selection.
- [R] [reddit-localllama] Waiting on Qwen to drop those 3.7 models be like — https://www.reddit.com/r/LocalLLaMA/comments/1tiqcwu/waiting_on_qwen_to_drop_those_37_models_be_like/ — Meme/engagement bait.
- [M] [reddit-localllama] AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory — https://www.reddit.com/r/LocalLLaMA/comments/1tinl98/amd_ryzen_ai_halo_pc_will_cost_3999_with_128gb/ — New consumer AI PC tier; competes with Mac Studio for local inference.
- [P] [reddit-localllama] Qwen 3.6 35B GGUF: NTP vs MTP quantization guide — https://www.reddit.com/r/LocalLLaMA/comments/1tipihx/qwen_36_35b_gguf_ntp_vs_mtp_quantization_results/ — ByteShape pick-the-right-quant guide across GPU/CPU hardware.
- [P] [reddit-localllama] AMD BC-250 (PS5 APU): $50-150 for 16GB GDDR6 — https://www.reddit.com/r/LocalLLaMA/comments/1tj4unp/amd_bc250_and_the_search_for_cheap_compute/ — Cheapest viable GPU inference node with confirmed ROCm.
- [R] [reddit-localllama] CohereLabs/command-a-plus HuggingFace — duplicate of Cohere story.
- [R] [reddit-localllama] How can you stop your model from looping — https://www.reddit.com/r/LocalLLaMA/comments/1tj6d4r/how_can_you_stop_your_model_from_looping/ — Support question; no new info.
- [M] [reddit-localllama] HalBench: sycophancy/hallucination benchmark — https://www.reddit.com/r/LocalLLaMA/comments/1tizvih/halbench_i_built_a_custom_sycophancy_and/ — Sonnet 4.6 tops benchmark; verify model version names before acting on.
- [R] [reddit-localllama] I guess 4 units wasn't enough — humor post.
- [R] [reddit-localllama] [WIP] Gemma 4 MTP — https://www.reddit.com/r/LocalLLaMA/comments/1tijpwl/wip_gemma_4_mtp/ — Unstable WIP; check back when mainline.
- [M] [reddit-localllama] llama.cpp MTP backend sampling merged — https://www.reddit.com/r/LocalLLaMA/comments/1tis73j/move_to_backend_sampling_for_mtp_draft_path_by/ — Performance improvement for MTP in upcoming builds.
- [R] [reddit-localllama] Model Golf for RunPod Credits — niche hobby competition.
- [M] [reddit-localllama] AWS secures rare Mac Studio M3 Ultras — https://www.reddit.com/r/LocalLLaMA/comments/1tio2i5/aws_secures_rare_mac_studios_while_ordinary_apple/ — M3 Ultra supply constrained; AWS priority buying confirmed.
- [R] [reddit-localllama] llama.cpp build 9254 fixes TG regression — hardware-specific report; not broadly actionable.
- [P] [reddit-localllama] RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s — https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/ — MTP doesn't help large MoEs at long context; bandwidth-limited.
- [R] [reddit-localllama] Qwen3-VL-Embedding-2B on Orange Pi 5b — niche embedded hobbyist project.
- [R] [reddit-localllama] PDF reading with AnythingLLM — support question.
- [R] [reddit-localllama] Qwen3.6-35B + Hermes Agent on DGX Spark — setup-specific question.
- [P] [hn-top] An OpenAI model has disproved a central conjecture in discrete geometry — https://openai.com/index/model-disproves-discrete-geometry-conjecture/ — First AI original math discovery verified by Fields medalist.
- [P] [hn-top] GitHub confirms breach of 3,800 repos via malicious VSCode extension — https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-malicious-vscode-extension/ — Audit your VSCode extensions now; 3,800+ repos compromised.
- [R] [hn-top] Haskell Foundation 2026 Update — niche language org update.
- [R] [hn-top] Show HN: reverse engineered Apple video wallpapers — consumer hobby project.
- [M] [hn-top] New features in GCC 16: improved errors and SARIF output — https://developers.redhat.com/articles/2026/04/28/gcc-16-improved-error-messages-sarif-output — Machine-readable compiler diagnostics; useful for C/C++ tooling.
- [R] [hn-top] The Letter S, by Donald Knuth (1980) — off-topic typography essay.
- [R] [hn-top] DOS Zone — browser DOS gaming; entertainment only.
- [M] [hn-top] Flipper One tech specs published — https://docs.flipper.net/one/general/tech-specs — Next Flipper hardware specs; no release date yet.
- [M] [hn-top] Anthropic expanding to Colossus2 with GB200 — https://twitter.com/nottombrown/status/2057194829986300375 — Capacity expansion; signals more Claude API headroom.
- [R] [hn-top] Archaeologists find Egyptian mummy buried with the Iliad — off-topic.
- [P] [hn-top] How fast is N tokens per second really? — https://mikeveerman.github.io/tokenspeed/ — Interactive tok/s → human speed converter.
- [M] [hn-top] Saying goodbye to asm.js — https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html — Firefox deprecating asm.js; use WASM for new work.
- [P] [hn-top] Reviving old scanners with in-browser Linux VM + WebUSB — https://yes-we-scan.app/details — Novel WASM VM + WebUSB bridge architecture.
- [M] [hn-top] Intuit lays off 3k+ to refocus on AI — https://techcrunch.com/2026/05/20/intuit-to-lay-off-over-3000-employees-to-refocus-on-ai/ — Industry AI restructuring signal.
- [P] [hn-top] OpenAI to confidentially file for IPO — https://www.cnbc.com/2026/05/20/openai-ipo-filing.html — Major structural event for OpenAI API platform.
- [R] [hn-top] Qian Xuesen: The missile genius America lost and China gained — off-topic history.