All reports

May 22, 2026

May 22, 2026, 2:25 AM run a55aa549
Report Summary

12 stories cleared the bar, led by GitHub confirms breach of 3,800 repos via malicious VSCode extension, OpenAI to confidentially file for IPO as soon as Friday, and An OpenAI model has disproved a central conjecture in discrete geometry.

12 worth-attention items40 digest lines

Worth attention

GitHub has confirmed that a malicious VSCode extension was used to steal developer credentials and access over 3,800 repositories. This is a supply chain attack vector targeting developer workstations directly. Immediate action: audit all installed VSCode extensions, remove anything unfamiliar or low-reputation, and check your repositories for unauthorized access or committed secrets.
OpenAI is filing confidentially with the SEC for an IPO, potentially as soon as Friday May 22. Going public changes OpenAI's corporate incentives significantly — quarterly earnings pressure, shareholder priorities, and regulatory scrutiny all increase. Builders relying on OpenAI APIs should watch for any pricing or rate limit changes that could follow increased investor visibility.
OpenAI's model found a valid counterexample to a longstanding conjecture in discrete geometry, verified by Fields medalist Tim Gowers. This is the first credible instance of a frontier AI model making a genuinely original mathematical contribution — not solving a known problem but disproving a believed-true conjecture. A landmark AI capability milestone with implications for how we think about frontier model reasoning.
Cohere cofounder Nick Frosst confirmed on Reddit that Command-A Plus is being released as open weights (BF16 on HuggingFace). This is a significant open-source model release from a credible enterprise AI company targeting RAG and agentic use cases. Worth evaluating against current open-weight alternatives if you need an enterprise-focused open model.
A builder created a reusable test harness to run identical coding tasks across multiple AI agent environments (Copilot, Pi, Claude Code, opencode) with both cloud and local models. The key finding is that harness design contributes significantly to coding agent performance, independent of the underlying model. Relevant if you are choosing or designing a coding agent workflow.
ByteShape released Qwen 3.6 35B GGUF quantizations in both NTP and MTP families with benchmark-backed guidance on which to pick for different hardware configurations. If you are running Qwen 3.6 35B locally, this is the go-to reference for selecting the right GGUF and saves hours of personal benchmarking.
Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB at real coding-agent context lengths. Key finding: MTP provides no benefit for this model at 128k context because memory bandwidth is the bottleneck. 56 tok/s is the practical ceiling. Useful data point for planning local inference for agentic code workloads.
AMD announced pricing for their Ryzen AI Halo PC at $3,999 with 128GB of unified memory on-board. This establishes a new consumer-grade tier for local AI inference with large memory capacity, competing with Apple Silicon Mac Studio for local LLM use cases.
Salvaged PS5 APU boards (AMD BC-250) are available on eBay at $50-150 each, featuring Zen 2, 16GB unified GDDR6, and RDNA 2. ROCm has been confirmed working on them. At this price point they are the cheapest viable GPU inference node available, significantly cheaper than any current discrete GPU with similar VRAM.
An interactive tool that converts raw tokens-per-second into human-comprehensible equivalents (reading speed, typing speed). Useful for calibrating expectations about LLM inference performance and explaining token generation speed to non-technical stakeholders or clients.
A builder created a system running a Linux VM inside the browser (via WebAssembly) that bridges physical USB scanners via WebUSB + USB/IP protocol. This enables legacy SANE-compatible scanners to work in-browser without native drivers. The architectural pattern — browser WASM VM + WebUSB bridge — is novel and potentially applicable to other USB hardware classes.
HuggingFace added a model size filter to their benchmark dataset view, enabling comparison of models within a given parameter budget. Minor but genuinely useful for model selection tasks.

Full digest

Speculative tweet, no official confirmation.
reddit-localllama
Enterprise open-weights model from Cohere cofounder confirmed; BF16 on HuggingFace.
reddit-localllama
Hobbyist CLI project update; no broader relevance.
reddit-localllama
Harness design matters as much as model choice; reusable test rig.
reddit-localllama
Personal config post; nothing new.
reddit-localllama
Hobbyist DCGAN experiment; no practical takeaways.
reddit-localllama
Minor HF UI improvement; useful for model selection.
reddit-localllama
Meme/engagement bait.
reddit-localllama
New consumer AI PC tier; competes with Mac Studio for local inference.
reddit-localllama
ByteShape pick-the-right-quant guide across GPU/CPU hardware.
reddit-localllama
Cheapest viable GPU inference node with confirmed ROCm.
reddit-localllama
R CohereLabs/command-a-plus HuggingFace
duplicate of Cohere story.
reddit-localllama
Support question; no new info.
reddit-localllama
Sonnet 4.6 tops benchmark; verify model version names before acting on.
reddit-localllama
R I guess 4 units wasn't enough
humor post.
reddit-localllama
Unstable WIP; check back when mainline.
reddit-localllama
Performance improvement for MTP in upcoming builds.
reddit-localllama
R Model Golf for RunPod Credits
niche hobby competition.
reddit-localllama
M3 Ultra supply constrained; AWS priority buying confirmed.
reddit-localllama
R llama.cpp build 9254 fixes TG regression
hardware-specific report; not broadly actionable.
reddit-localllama
P RTX 5080 16GB: Qwen3.6 35B MoE at 128k context
56 tok/s — https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/ — MTP doesn't help large MoEs at long context; bandwidth-limited.
reddit-localllama
R Qwen3-VL-Embedding-2B on Orange Pi 5b
niche embedded hobbyist project.
reddit-localllama
R PDF reading with AnythingLLM
support question.
reddit-localllama
R Qwen3.6-35B + Hermes Agent on DGX Spark
setup-specific question.
reddit-localllama
First AI original math discovery verified by Fields medalist.
hn-top
Audit your VSCode extensions now; 3,800+ repos compromised.
hn-top
R Haskell Foundation 2026 Update
niche language org update.
hn-top
R Show HN: reverse engineered Apple video wallpapers
consumer hobby project.
hn-top
Machine-readable compiler diagnostics; useful for C/C++ tooling.
hn-top
R The Letter S, by Donald Knuth (1980)
off-topic typography essay.
hn-top
R DOS Zone
browser DOS gaming; entertainment only.
hn-top
Next Flipper hardware specs; no release date yet.
hn-top
Capacity expansion; signals more Claude API headroom.
hn-top
R Archaeologists find Egyptian mummy buried with the Iliad
off-topic.
hn-top
Interactive tok/s → human speed converter.
hn-top
Firefox deprecating asm.js; use WASM for new work.
hn-top
Novel WASM VM + WebUSB bridge architecture.
hn-top
Industry AI restructuring signal.
hn-top
Major structural event for OpenAI API platform.
hn-top
R Qian Xuesen: The missile genius America lost and China gained
off-topic history.
hn-top
Original markdown
# Nightly Librarian — Newsletter draft

Run: a55aa549-8ffd-41a9-a765-5b4432778106
Started: 2026-05-22T06:09:41.788Z
Completed: 2026-05-22T06:25:00.957Z

## Worth attention

- **GitHub confirms breach of 3,800 repos via malicious VSCode extension**
  https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-malicious-vscode-extension/
  GitHub has confirmed that a malicious VSCode extension was used to steal developer credentials and access over 3,800 repositories. This is a supply chain attack vector targeting developer workstations directly. Immediate action: audit all installed VSCode extensions, remove anything unfamiliar or low-reputation, and check your repositories for unauthorized access or committed secrets.
- **OpenAI to confidentially file for IPO as soon as Friday**
  https://www.cnbc.com/2026/05/20/openai-ipo-filing.html
  OpenAI is filing confidentially with the SEC for an IPO, potentially as soon as Friday May 22. Going public changes OpenAI's corporate incentives significantly — quarterly earnings pressure, shareholder priorities, and regulatory scrutiny all increase. Builders relying on OpenAI APIs should watch for any pricing or rate limit changes that could follow increased investor visibility.
- **An OpenAI model has disproved a central conjecture in discrete geometry**
  https://openai.com/index/model-disproves-discrete-geometry-conjecture/
  OpenAI's model found a valid counterexample to a longstanding conjecture in discrete geometry, verified by Fields medalist Tim Gowers. This is the first credible instance of a frontier AI model making a genuinely original mathematical contribution — not solving a known problem but disproving a believed-true conjecture. A landmark AI capability milestone with implications for how we think about frontier model reasoning.
- **Cohere launches Command-A Plus as open weights**
  https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/
  Cohere cofounder Nick Frosst confirmed on Reddit that Command-A Plus is being released as open weights (BF16 on HuggingFace). This is a significant open-source model release from a credible enterprise AI company targeting RAG and agentic use cases. Worth evaluating against current open-weight alternatives if you need an enterprise-focused open model.
- **Comparing coding agents: GitHub Copilot, Pi, Claude Code, and opencode with Qwen3.6 27B**
  https://www.reddit.com/r/LocalLLaMA/comments/1tjbhjk/same_task_in_githubcopilot_pi_claudecode_and/
  A builder created a reusable test harness to run identical coding tasks across multiple AI agent environments (Copilot, Pi, Claude Code, opencode) with both cloud and local models. The key finding is that harness design contributes significantly to coding agent performance, independent of the underlying model. Relevant if you are choosing or designing a coding agent workflow.
- **Qwen 3.6 35B GGUF: NTP vs MTP quantization guide across GPUs and CPUs**
  https://www.reddit.com/r/LocalLLaMA/comments/1tipihx/qwen_36_35b_gguf_ntp_vs_mtp_quantization_results/
  ByteShape released Qwen 3.6 35B GGUF quantizations in both NTP and MTP families with benchmark-backed guidance on which to pick for different hardware configurations. If you are running Qwen 3.6 35B locally, this is the go-to reference for selecting the right GGUF and saves hours of personal benchmarking.
- **RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help**
  https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/
  Detailed benchmarks of Qwen3.6 35B MoE on RTX 5080 16GB at real coding-agent context lengths. Key finding: MTP provides no benefit for this model at 128k context because memory bandwidth is the bottleneck. 56 tok/s is the practical ceiling. Useful data point for planning local inference for agentic code workloads.
- **AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory**
  https://www.reddit.com/r/LocalLLaMA/comments/1tinl98/amd_ryzen_ai_halo_pc_will_cost_3999_with_128gb/
  AMD announced pricing for their Ryzen AI Halo PC at $3,999 with 128GB of unified memory on-board. This establishes a new consumer-grade tier for local AI inference with large memory capacity, competing with Apple Silicon Mac Studio for local LLM use cases.
- **AMD BC-250 (salvaged PS5 APU board): $50-150 for 16GB GDDR6 local inference**
  https://www.reddit.com/r/LocalLLaMA/comments/1tj4unp/amd_bc250_and_the_search_for_cheap_compute/
  Salvaged PS5 APU boards (AMD BC-250) are available on eBay at $50-150 each, featuring Zen 2, 16GB unified GDDR6, and RDNA 2. ROCm has been confirmed working on them. At this price point they are the cheapest viable GPU inference node available, significantly cheaper than any current discrete GPU with similar VRAM.
- **How fast is N tokens per second really? (interactive visualizer)**
  https://mikeveerman.github.io/tokenspeed/
  An interactive tool that converts raw tokens-per-second into human-comprehensible equivalents (reading speed, typing speed). Useful for calibrating expectations about LLM inference performance and explaining token generation speed to non-technical stakeholders or clients.
- **Reviving old scanners with an in-browser Linux VM bridged to WebUSB over USB/IP**
  https://yes-we-scan.app/details
  A builder created a system running a Linux VM inside the browser (via WebAssembly) that bridges physical USB scanners via WebUSB + USB/IP protocol. This enables legacy SANE-compatible scanners to work in-browser without native drivers. The architectural pattern — browser WASM VM + WebUSB bridge — is novel and potentially applicable to other USB hardware classes.
- **HuggingFace benchmark datasets now let you filter by model size**
  https://www.reddit.com/r/LocalLLaMA/comments/1tilvit/huggingface_benchmark_datasets_now_let_you_filter/
  HuggingFace added a model size filter to their benchmark dataset view, enabling comparison of models within a given parameter budget. Minor but genuinely useful for model selection tasks.

## Full digest

- [R] [reddit-localllama] Qwen will release another 27B with high probability — https://www.reddit.com/r/LocalLLaMA/comments/1tiwnpc/qwen_will_release_another_27b_with_high/ — Speculative tweet, no official confirmation.
- [P] [reddit-localllama] Cohere launches Command-A Plus as open weights — https://www.reddit.com/r/LocalLLaMA/comments/1tizmar/re_what_ever_happened_to_coheres_commanda_series/ — Enterprise open-weights model from Cohere cofounder confirmed; BF16 on HuggingFace.
- [R] [reddit-localllama] Back again, many changes have taken place — https://www.reddit.com/r/LocalLLaMA/comments/1tj8d9i/back_again_many_changes_have_taken_place/ — Hobbyist CLI project update; no broader relevance.
- [P] [reddit-localllama] Comparing coding agents: Copilot, Pi, Claude Code, opencode with Qwen3.6 27B — https://www.reddit.com/r/LocalLLaMA/comments/1tjbhjk/same_task_in_githubcopilot_pi_claudecode_and/ — Harness design matters as much as model choice; reusable test rig.
- [R] [reddit-localllama] Qwen3.6 27B and llama.cpp appreciation post — https://www.reddit.com/r/LocalLLaMA/comments/1tjbi24/qwen36_27b_and_llamacpp_appreciation_post/ — Personal config post; nothing new.
- [R] [reddit-localllama] Training a vision model from scratch on iPod touch 4 images — https://www.reddit.com/r/LocalLLaMA/comments/1tjaedo/training_a_vision_model_from_scratch_on_ipod/ — Hobbyist DCGAN experiment; no practical takeaways.
- [M] [reddit-localllama] HuggingFace benchmark datasets filter by model size — https://www.reddit.com/r/LocalLLaMA/comments/1tilvit/huggingface_benchmark_datasets_now_let_you_filter/ — Minor HF UI improvement; useful for model selection.
- [R] [reddit-localllama] Waiting on Qwen to drop those 3.7 models be like — https://www.reddit.com/r/LocalLLaMA/comments/1tiqcwu/waiting_on_qwen_to_drop_those_37_models_be_like/ — Meme/engagement bait.
- [M] [reddit-localllama] AMD Ryzen AI Halo PC: $3,999 with 128GB unified memory — https://www.reddit.com/r/LocalLLaMA/comments/1tinl98/amd_ryzen_ai_halo_pc_will_cost_3999_with_128gb/ — New consumer AI PC tier; competes with Mac Studio for local inference.
- [P] [reddit-localllama] Qwen 3.6 35B GGUF: NTP vs MTP quantization guide — https://www.reddit.com/r/LocalLLaMA/comments/1tipihx/qwen_36_35b_gguf_ntp_vs_mtp_quantization_results/ — ByteShape pick-the-right-quant guide across GPU/CPU hardware.
- [P] [reddit-localllama] AMD BC-250 (PS5 APU): $50-150 for 16GB GDDR6 — https://www.reddit.com/r/LocalLLaMA/comments/1tj4unp/amd_bc250_and_the_search_for_cheap_compute/ — Cheapest viable GPU inference node with confirmed ROCm.
- [R] [reddit-localllama] CohereLabs/command-a-plus HuggingFace — duplicate of Cohere story.
- [R] [reddit-localllama] How can you stop your model from looping — https://www.reddit.com/r/LocalLLaMA/comments/1tj6d4r/how_can_you_stop_your_model_from_looping/ — Support question; no new info.
- [M] [reddit-localllama] HalBench: sycophancy/hallucination benchmark — https://www.reddit.com/r/LocalLLaMA/comments/1tizvih/halbench_i_built_a_custom_sycophancy_and/ — Sonnet 4.6 tops benchmark; verify model version names before acting on.
- [R] [reddit-localllama] I guess 4 units wasn't enough — humor post.
- [R] [reddit-localllama] [WIP] Gemma 4 MTP — https://www.reddit.com/r/LocalLLaMA/comments/1tijpwl/wip_gemma_4_mtp/ — Unstable WIP; check back when mainline.
- [M] [reddit-localllama] llama.cpp MTP backend sampling merged — https://www.reddit.com/r/LocalLLaMA/comments/1tis73j/move_to_backend_sampling_for_mtp_draft_path_by/ — Performance improvement for MTP in upcoming builds.
- [R] [reddit-localllama] Model Golf for RunPod Credits — niche hobby competition.
- [M] [reddit-localllama] AWS secures rare Mac Studio M3 Ultras — https://www.reddit.com/r/LocalLLaMA/comments/1tio2i5/aws_secures_rare_mac_studios_while_ordinary_apple/ — M3 Ultra supply constrained; AWS priority buying confirmed.
- [R] [reddit-localllama] llama.cpp build 9254 fixes TG regression — hardware-specific report; not broadly actionable.
- [P] [reddit-localllama] RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s — https://www.reddit.com/r/LocalLLaMA/comments/1tiixql/rtx_5080_16gb_qwen36_35b_moe_at_128k_context_56/ — MTP doesn't help large MoEs at long context; bandwidth-limited.
- [R] [reddit-localllama] Qwen3-VL-Embedding-2B on Orange Pi 5b — niche embedded hobbyist project.
- [R] [reddit-localllama] PDF reading with AnythingLLM — support question.
- [R] [reddit-localllama] Qwen3.6-35B + Hermes Agent on DGX Spark — setup-specific question.
- [P] [hn-top] An OpenAI model has disproved a central conjecture in discrete geometry — https://openai.com/index/model-disproves-discrete-geometry-conjecture/ — First AI original math discovery verified by Fields medalist.
- [P] [hn-top] GitHub confirms breach of 3,800 repos via malicious VSCode extension — https://www.bleepingcomputer.com/news/security/github-confirms-breach-of-3-800-repos-via-malicious-vscode-extension/ — Audit your VSCode extensions now; 3,800+ repos compromised.
- [R] [hn-top] Haskell Foundation 2026 Update — niche language org update.
- [R] [hn-top] Show HN: reverse engineered Apple video wallpapers — consumer hobby project.
- [M] [hn-top] New features in GCC 16: improved errors and SARIF output — https://developers.redhat.com/articles/2026/04/28/gcc-16-improved-error-messages-sarif-output — Machine-readable compiler diagnostics; useful for C/C++ tooling.
- [R] [hn-top] The Letter S, by Donald Knuth (1980) — off-topic typography essay.
- [R] [hn-top] DOS Zone — browser DOS gaming; entertainment only.
- [M] [hn-top] Flipper One tech specs published — https://docs.flipper.net/one/general/tech-specs — Next Flipper hardware specs; no release date yet.
- [M] [hn-top] Anthropic expanding to Colossus2 with GB200 — https://twitter.com/nottombrown/status/2057194829986300375 — Capacity expansion; signals more Claude API headroom.
- [R] [hn-top] Archaeologists find Egyptian mummy buried with the Iliad — off-topic.
- [P] [hn-top] How fast is N tokens per second really? — https://mikeveerman.github.io/tokenspeed/ — Interactive tok/s → human speed converter.
- [M] [hn-top] Saying goodbye to asm.js — https://spidermonkey.dev/blog/2026/05/20/saying-goodbye-to-asmjs.html — Firefox deprecating asm.js; use WASM for new work.
- [P] [hn-top] Reviving old scanners with in-browser Linux VM + WebUSB — https://yes-we-scan.app/details — Novel WASM VM + WebUSB bridge architecture.
- [M] [hn-top] Intuit lays off 3k+ to refocus on AI — https://techcrunch.com/2026/05/20/intuit-to-lay-off-over-3000-employees-to-refocus-on-ai/ — Industry AI restructuring signal.
- [P] [hn-top] OpenAI to confidentially file for IPO — https://www.cnbc.com/2026/05/20/openai-ipo-filing.html — Major structural event for OpenAI API platform.
- [R] [hn-top] Qian Xuesen: The missile genius America lost and China gained — off-topic history.