Benjamin Wright

Medical Student

AS SEEN ON

topcarnews.net, LessWrong 2.0

Get in Touch

Geo Focus

Coverage Attributes:

Beta

Informative: 66 %

Data Driven: 33 %

Themes Covered:

Not enough data

Most Recent Topics:

Autonomous Systems
AI Platforms
Generative AI
Natural Language Processing (NLP)
Robotics

Pitching Insights

Benjamin's coverage predominantly focuses on local news, including lottery results, events, and tragic incidents. If you are looking to pitch to Benjamin, consider providing updates or insights related to ongoing local events in Swansea. For example, if there are developments or follow-up stories regarding the tragic camping trip incident mentioned in one of his articles.

Given his coverage attributes show a focus on evolving stories and breaking news, he may be interested in receiving pitches related to developing local events or breaking news stories within Swansea.

It is noteworthy that despite covering topics like crime and tragic events, these should be approached with sensitivity and appropriate expertise when reaching out.

This information evolves through artificial intelligence and human feedback. Improve this profile .

Journalists With Similar Coverage:

Based on similarity of content.

Thomas Claburn

Senior Reporter

******rn@************com

Publications

The Register, gearopen.com

Most recent topics

Not enough data

Will Knight

Senior Writer

*********ht@*******om

Publications

WIRED, Tech News Tube, Digg, WIRED Middle East

Most recent topics

Not enough data

Emilia David

Senior AI reporter

Publications

VentureBeat, Information Security Media Group, Corp, databreachtoday.asia, careersinfosecurity.asia, Bankinfosecurity.com

Most recent topics

Not enough data

Peter Hess

Science Journalist

********ss@******m

Publications

Berkeley Research Group

Most recent topics

Not enough data

Lance Eliot

Contributor

********ot@******m

Publications

Forbes, RamaOnHealthcare, tradekaizen.in

Most recent topics

Not enough data

Ayush Singh

*********gh@**************com

Publications

Hackerlap, Paperity, Indiasportshub Media & Management, Nature Portfolio, BioRxiv

Most recent topics

Not enough data

Articles

LessWrong 2.0

Alignment Faking in Large Language Models — LessWrong

By: Buck, Benjamin Wright, Sam Bowman, Evhub, Carson Denison, Monte M, Fabien Roger, Sam Marks, Johannes Treutlein

Comment by Andrew Schoen - @evhub, @ryan_greenblatt and team, I've been following your work for several years. Big fan! I know this post (and the paper behind it) is coming up on 6 months old at this point, but I'm still grappling with one key question, which is why this is my first post here on LW. For context, I'm an AI-focused VC investor at a major firm, not a researcher, so apologies in advance if this is more well-trodden than I realize...You bifurcate misalignment originating in pre-training from misalignment created or amplified in post-training, which makes sense as a way to organize the analysis. My intuition is that the composition of the pretraining corpus would impact both numbers meaningfully. E.g., it influences the distribution of personas RL has available to select from, and it shapes which latent behaviors or patterns post-training can amplify. In addition to your papers, I've read the paper "Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment" by Tice et al., which seems consistent with this intuition on the pretraining side, but I haven't seen it analyzed for connection to the alignment-faking threat model or for how it affects which behavior patterns end up magnified in post-training.My core question: Has your team done or seen any analyses on how the corpus composition/distribution affects alignment faking? I.e., how does it impact the probability of alignment faking taking place, and how does it change the character of what that faking looks like?It feels to me like corpus curation could be a meaningful lever to reduce alignment faking (and other related safety issues). I imagine, in practice, the data pipelines and pre-training teams might sit pretty far away from the alignment teams. While sanitization is a good baseline and probably easy to argue for, I wonder if there is more room to do interp style work on how the corpus composition impacts various safety metrics. Though, again, I imagine this would be fairly expensive given large traini