• “Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda
    May 5 2025
    (Disclaimer: Post written in a personal capacity. These are personal hot takes and do not in any way represent my employer's views.)

    TL;DR: I do not think we will produce high reliability methods to evaluate or monitor the safety of superintelligent systems via current research paradigms, with interpretability or otherwise. Interpretability seems a valuable tool here and remains worth investing in, as it will hopefully increase the reliability we can achieve. However, interpretability should be viewed as part of an overall portfolio of defences: a layer in a defence-in-depth strategy. It is not the one thing that will save us, and it still won’t be enough for high reliability.

    Introduction

    There's a common, often implicit, argument made in AI safety discussions: interpretability is presented as the only reliable path forward for detecting deception in advanced AI - among many other sources it was argued for in [...]

    ---

    Outline:

    (00:55) Introduction

    (02:57) High Reliability Seems Unattainable

    (05:12) Why Won't Interpretability be Reliable?

    (07:47) The Potential of Black-Box Methods

    (08:48) The Role of Interpretability

    (12:02) Conclusion

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:
    May 4th, 2025

    Source:
    https://www.lesswrong.com/posts/PwnadG4BFjaER3MGf/interpretability-will-not-reliably-find-deceptive-ai

    ---

    Narrated by TYPE III AUDIO.

    Show More Show Less
    13 mins
  • “Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall” by Vladimir_Nesov
    May 3 2025
    It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sustain the current pace beyond ~2029 if AI doesn't deliver a transformative commercial success by then. Natural text data will also run out around that time, and there are signs that current methods of reasoning training might be mostly eliciting capabilities from the base model.

    If scaling of reasoning training doesn't bear out actual creation of new capabilities that are sufficiently general, and pretraining at ~2030 levels of compute together with the low hanging fruit of scaffolding doesn't bring AI to crucial capability thresholds, then it might take a while. Possibly decades, since training compute will be growing 3x-4x slower after 2027-2029 than it does now, and the ~6 years of scaling since the ChatGPT moment stretch to 20-25 subsequent years, not even having access to any [...]

    ---

    Outline:

    (01:14) Training Compute Slowdown

    (04:43) Bounded Potential of Thinking Training

    (07:43) Data Inefficiency of MoE

    The original text contained 4 footnotes which were omitted from this narration.

    ---

    First published:
    May 1st, 2025

    Source:
    https://www.lesswrong.com/posts/XiMRyQcEyKCryST8T/slowdown-after-2028-compute-rlvr-uncertainty-moe-data-wall

    ---

    Narrated by TYPE III AUDIO.

    Show More Show Less
    12 mins
  • “Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis” by jeanne_, eeeee
    May 1 2025
    In this blog post, we analyse how the recent AI 2027 forecast by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean has been discussed across Chinese language platforms. We present:

    1. Our research methodology and synthesis of key findings across media artefacts
    2. A proposal for how censorship patterns may provide signal for the Chinese government's thinking about AGI and the race to superintelligence
    3. A more detailed analysis of each of the nine artefacts, organised by type: Mainstream Media, Forum Discussion, Bilibili (Chinese Youtube) Videos, Personal Blogs.
    Methodology We conducted a comprehensive search across major Chinese-language platforms–including news outlets, video platforms, forums, microblogging sites, and personal blogs–to collect the media featured in this report. We supplemented this with Deep Research to identify additional sites mentioning AI 2027. Our analysis focuses primarily on content published in the first few days (4-7 April) following the report's release. More media [...]

    ---

    Outline:

    (00:58) Methodology

    (01:36) Summary

    (02:48) Censorship as Signal

    (07:29) Analysis

    (07:53) Mainstream Media

    (07:57) English Title: Doomsday Timeline is Here! Former OpenAI Researcher's 76-page Hardcore Simulation: ASI Takes Over the World in 2027, Humans Become NPCs

    (10:27) Forum Discussion

    (10:31) English Title: What do you think of former OpenAI researcher's AI 2027 predictions?

    (13:34) Bilibili Videos

    (13:38) English Title: \[AI 2027\] A mind-expanding wargame simulation of artificial intelligence competition by a former OpenAI researcher

    (15:24) English Title: Predicting AI Development in 2027

    (17:13) Personal Blogs

    (17:16) English Title: Doomsday Timeline: AI 2027 Depicts the Arrival of Superintelligence and the Fate of Humanity Within the Decade

    (18:30) English Title: AI 2027: Expert Predictions on the Artificial Intelligence Explosion

    (21:57) English Title: AI 2027: A Science Fiction Article

    (23:16) English Title: Will AGI Take Over the World in 2027?

    (25:46) English Title: AI 2027 Prediction Report: AI May Fully Surpass Humans by 2027

    (27:05) Acknowledgements

    ---

    First published:
    April 30th, 2025

    Source:
    https://www.lesswrong.com/posts/JW7nttjTYmgWMqBaF/early-chinese-language-media-coverage-of-the-ai-2027-report

    ---

    Narrated by TYPE III AUDIO.

    Show More Show Less
    28 mins
  • [Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan
    Apr 25 2025
    This is a link post. to follow up my philantropic pledge from 2020, i've updated my philanthropy page with the 2024 results.

    in 2024 my donations funded $51M worth of endpoint grants (plus $2.0M in admin overhead and philanthropic software development). this comfortably exceeded my 2024 commitment of $42M (20k times $2100.00 — the minimum price of ETH in 2024).

    this also concludes my 5-year donation pledge, but of course my philanthropy continues: eg, i’ve already made over $4M in endpoint grants in the first quarter of 2025 (not including 2024 grants that were slow to disburse), as well as pledged at least $10M to the 2025 SFF grant round.

    ---

    First published:
    April 23rd, 2025

    Source:
    https://www.lesswrong.com/posts/8ojWtREJjKmyvWdDb/jaan-tallinn-s-2024-philanthropy-overview

    Linkpost URL:
    https://jaan.info/philanthropy/#2024-results

    ---

    Narrated by TYPE III AUDIO.

    Show More Show Less
    1 min
  • “Impact, agency, and taste” by benkuhn
    Apr 24 2025
    I’ve been thinking recently about what sets apart the people who’ve done the best work at Anthropic.

    You might think that the main thing that makes people really effective at research or engineering is technical ability, and among the general population that's true. Among people hired at Anthropic, though, we’ve restricted the range by screening for extremely high-percentile technical ability, so the remaining differences, while they still matter, aren’t quite as critical. Instead, people's biggest bottleneck eventually becomes their ability to get leverage—i.e., to find and execute work that has a big impact-per-hour multiplier.

    For example, here are some types of work at Anthropic that tend to have high impact-per-hour, or a high impact-per-hour ceiling when done well (of course this list is extremely non-exhaustive!):

    • Improving tooling, documentation, or dev loops. A tiny amount of time fixing a papercut in the right way can save [...]
    ---

    Outline:

    (03:28) 1. Agency

    (03:31) Understand and work backwards from the root goal

    (05:02) Don't rely too much on permission or encouragement

    (07:49) Make success inevitable

    (09:28) 2. Taste

    (09:31) Find your angle

    (11:03) Think real hard

    (13:03) Reflect on your thinking

    ---

    First published:
    April 19th, 2025

    Source:
    https://www.lesswrong.com/posts/DiJT4qJivkjrGPFi8/impact-agency-and-taste

    ---

    Narrated by TYPE III AUDIO.

    Show More Show Less
    15 mins
  • [Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery
    Apr 24 2025
    This is a link post. Guillaume Blanc has a piece in Works in Progress (I assume based on his paper) about how France's fertility declined earlier than in other European countries, and how its power waned as its relative population declined starting in the 18th century. In 1700, France had 20% of Europe's population (4% of the whole world population). Kissinger writes in Diplomacy with respect to the Versailles Peace Conference:

    Victory brought home to France the stark realization that revanche had cost it too dearly, and that it had been living off capital for nearly a century. France alone knew just how weak it had become in comparison with Germany, though nobody else, especially not America, was prepared to believe it ...

    Though France's allies insisted that its fears were exaggerated, French leaders knew better. In 1880, the French had represented 15.7 percent of Europe's population. By 1900, that [...]

    ---

    First published:
    April 23rd, 2025

    Source:
    https://www.lesswrong.com/posts/gk2aJgg7yzzTXp8HJ/to-understand-history-keep-former-population-distributions

    Linkpost URL:
    https://arjunpanickssery.substack.com/p/to-understand-history-keep-former

    ---

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    6 mins
  • “AI-enabled coups: a small group could use AI to seize power” by Tom Davidson, Lukas Finnveden, rosehadshar
    Apr 23 2025
    We’ve written a new report on the threat of AI-enabled coups.

    I think this is a very serious risk – comparable in importance to AI takeover but much more neglected.

    In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here's a very basic threat model for AI takeover:

    1. Humanity develops superhuman AI
    2. Superhuman AI is misaligned and power-seeking
    3. Superhuman AI seizes power for itself
    And now here's a closely analogous threat model for AI-enabled coups:

    1. Humanity develops superhuman AI
    2. Superhuman AI is controlled by a small group
    3. Superhuman AI seizes power for the small group
    While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled [...]

    ---

    Outline:

    (02:39) Summary

    (03:31) An AI workforce could be made singularly loyal to institutional leaders

    (05:04) AI could have hard-to-detect secret loyalties

    (06:46) A few people could gain exclusive access to coup-enabling AI capabilities

    (09:46) Mitigations

    (13:00) Vignette

    The original text contained 2 footnotes which were omitted from this narration.

    ---

    First published:
    April 16th, 2025

    Source:
    https://www.lesswrong.com/posts/6kBMqrK9bREuGsrnd/ai-enabled-coups-a-small-group-could-use-ai-to-seize-power-1

    ---

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Show More Show Less
    15 mins
  • “Accountability Sinks” by Martin Sustrik
    Apr 23 2025
    Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport on the outskirts of Amsterdam. In April 1999, a cargo of 440 of the rodents arrived on a KLM flight from Beijing, without the necessary import papers. Because of this, they could not be forwarded on to the customer in Athens. But nobody was able to correct the error and send them back either. What could be done with them? It's hard to think there wasn’t a better solution than the one that was carried out; faced with the paperwork issue, airport staff threw all 440 squirrels into an industrial shredder.

    [...]

    It turned out that the order to destroy the squirrels had come from the Dutch government's Department of Agriculture, Environment Management and Fishing. However, KLM's management, with the benefit of hindsight, said that [...]

    ---

    First published:
    April 22nd, 2025

    Source:
    https://www.lesswrong.com/posts/nYJaDnGNQGiaCBSB5/accountability-sinks

    ---

    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

    Show More Show Less
    29 mins