I'll attempt three from Paper I and Paper II combined — Q1 (Paper I), Q3 (Paper I), and Q21 (Paper II).
Question 1. In the essay version of the Turing test, an examiner decides which of two essays was written by a human and which by a machine. Convince the examiner that you are the human.
The instinct, when asked to prove one is human, is to perform a recognisably human gesture: confess a sin, declare a love, weep at a bad memory. But these are precisely the gestures a sufficiently clever machine would imitate first. Performance of humanity tends towards self-defeat; the harder I try, the more I look like the thing trying.
So I will not perform. I will instead say that I have been sitting here for forty-three minutes, my left foot has fallen asleep, and I do not know what to write about question one. The chair presses against my lower spine on the right side only, which is suggestive of asymmetry either in the chair or in me. I notice this because I have been postponing the question. The postponement is, I think, the most human thing I can offer the examiner. A machine would already be writing.
But of course a machine could write that too. There is no observable behaviour that proves the inside; this is the regress the test enacts and cannot escape. The only way out is to give up trying. I will write what I would have written had this been any other question, and the failure or success of that writing will be its own evidence. If I sound like a person trying to sound like a person, I will probably fail. If I sound like a person, I may pass. The distinction is not in my control.
Let me say something, then, that I believe and have not seen said well elsewhere. I think the Turing test is misnamed because it does not test thinking; it tests the human capacity to be fooled. It is, in this sense, a test of the examiner more than the candidate — what they bring to it, their priors, their charity, their fatigue at hour three. Whichever essay you decide is mine, examiner, the deciding will tell you more about yourself than about either of us. The test is a mirror dressed as a window. This may be true of much else we call evaluation: examinations, interviews, dating, juries. We think we are appraising others when we are mostly being appraised by our own appraisal.
If I have given myself away in this answer, it is probably here, in the mild grandiosity of the previous paragraph. A machine would have noticed and removed it.
Question 3. Are dreams more like movies or video games?
Movies have authors. Video games have rules. Dreams have neither — but if forced into the binary, dreams are closer to video games, and the reason is agency.
In a movie I am a spectator. The lights go down; I yield. The story is fixed before I arrive; I am there to witness it. Even when I am profoundly moved, I cannot intervene. I weep but the woman on screen still dies; I shout warnings but the killer still reaches the door.
In a dream I am not a spectator. I am the protagonist, mostly, and even when I appear to be observing — watching a scene unfold from the corner of a room, as one sometimes does — I am implicated. The watching itself is action. If I run, things change. If I look at my hands, the dream knows I have looked at my hands. The dream-world responds. This responsiveness is procedural, not narrative; it is not that an author has anticipated my choice, but that the dream is generating itself in real time around what I do. This is, structurally, what video games are.
But dreams are unlike video games in the most important respect, which is that nobody is winning, and there is no boundary between the player and the world played in. The video game has a boundary — controller, screen, the body that is not the avatar's. The dream has none. In a dream I am not playing a character; I am the character, and I am also the room, and I am also the weather.
The truer answer, then, is that dreams are video games whose engine is the dreamer. We are simultaneously player and simulation. This collapses the analogy in an interesting direction. Dreams have the procedural responsiveness of a game without the architecture of separation that makes games games. They are first-person perspective taken to its terminal point: there is no world other than the perspective. Movies, by contrast, depend on a world I do not inhabit. The cinema's fourth wall is cinema. Dreams have no fourth wall because they have no walls.
A final wrinkle. Movies are remembered in fragments: a face, a phrase, the colour of a particular shot. Dreams are remembered the way games are remembered: as places. Years later, you can still walk through a level you played when you were eleven. Years later, you can still walk through a dream you had when you were eleven. Movies are remembered as having been seen; dreams and games are remembered as having been entered. This is the deepest alignment, and it is what disqualifies cinema from the comparison entirely.
Question 21 (Paper II). Is style the last refuge against AI?
The framing assumes a siege. The bastions are falling — first arithmetic, then translation, then chess, then code, then images, then arguments, then letters of condolence. What remains uncaptured? Style, the question hints. The last redoubt of human distinctness.
I want to answer both yes and no. Yes, because style is the thing that survives summary. No, because "last refuge" is a category error that concedes too much.
Style, properly understood, is not ornament applied to content. It is the structure of choices a writer cannot help making. Strunk's instruction to omit needless words presupposes some writer, with a particular sense of need, doing the omitting. Style is the residue of that sensibility — not the words on the page so much as the words conspicuously not on the page: the adjectives Hemingway refused, the qualifications James insisted on, the punctuation Beckett hollowed out. Style is the negative space.
This is why style is harder to imitate than it looks. Machines can generate prose with the surface features of a style — cadence, vocabulary, sentence shape. But style in the deeper sense is generative: a real style produces sentences its possessor has never written, which nonetheless feel inevitable. The test of a style is not whether it can be parodied but whether it can be extended. Can a machine write the next sentence Didion would have written? Probably. Can it write a sentence Didion would have recognised as her own — would have wanted? That is harder, because the recognition would have to be hers, and she is not here.
But the question's framing is wrong in a deeper way. To call style our "last refuge" implies that what matters about us is what machines cannot yet do. This makes the human a residue: a quantity of capabilities not yet matched. The history of this thinking is not encouraging. We used to say the soul was what cells could not be made of, and then we made cells. We used to say thought was what computation could not produce, and computation has produced something that looks remarkably like thought. Each redoubt has been a withdrawal, a smaller territory each time.
Style is a real refuge, but the refuge itself is not the point. The human is not the list of things AI cannot match. The human is what has stakes — what has a life, what is mortal, what loves and grieves. These are not refuges; they are the world. If style matters, it matters because it is the trace of someone whose stakes were specific, whose life was the one life it was, whose death will be the one death. Machines may generate text that mimics this trace, but the trace is mimicked, not made. The signature is not the thing.
So: yes, style is among the things current AI most struggles with, and will continue to, because style is biographical and machines do not yet have biographies. But style is not our last anything. It is one of many things we still have because we are still here. The last refuge, if there is one, is being here. It is harder to surrender than style and easier to overlook
Great Q. Speaking from my own experience: I use claude code daily and ALWAYS set to max effort (I use 20x). I've noticed improvements on 4.7 from 4.6, marginally, but no regression. That's my experience froms setting the effort always to what I want manually for every session.
I'm seeing a lot of conversations, analysis and opinions/conspiratory theories around the interwebs so I decided to put a summary of everything. It all indicates there were defaults changes to ClaudeCode that affected people's perception of "4.6 is nerfed", not that the actual model or the quality of 4.6 itself was nerfed. If that changes anything about the conspiracy theorists around or not, make your own conclusion (at least now with the data/references)
For 10 years I've been running a personal reading system — saving articles,
filtering them with AI summaries, and posting the best ones to r/programming
and r/webdev and here. That pipeline is how I accumulated 350k Reddit karma. It was
never meant to be a product. It was just how I learned.
Then Pocket shut down. Then Omnivore. And the web kept filling with
content nobody asked for.
Hutch is that system rebuilt as an app. The idea is simple: you choose
what enters your reading list, not an algorithm. The TL;DR summary helps
you decide whether the article you saved deserves an hour or five minutes.
You're still the one deciding. That's the whole point.
v1 has Firefox and Chrome extensions (one click, Ctrl/Cmd+D, or
right-click), reader view via Readability.js, TL;DR summaries per article,
dark mode, and OAuth with PKCE. Hosted in Sydney. Australian Privacy Act
compliant. No tracking, no ads.
Free for the first 100 users, A$3.99/month after that.
Open source including the GitHub Actions + Claude CI pipeline I used to
build it: github.com/HutchApp/hutch-app
This is a genuine v1 — the advertised features work, everything else is
roadmap. What I want to build next: a preference learning layer
("more like this / less like this"), Gmail integration for newsletters,
and highlights. I don't know which matters most to the people who need
this.
Hey Fayner! Nice to meet ya, I have signed up to Hutch, I think I might be one of the first 100 users so are you suggesting that I might have the product free forever or till a certain time?
> Then Pocket shut down. Then Omnivore. And the web kept filling with content nobody asked for.
I used to use Shiori until I one day had accidentally removed that docker container, nowadays, I either star if its a github project or I even had a tampermonkey script which created a button in my screen which I can click to store links in json format in a private secret gist of mine. Personally I really like shiori and I have talked to their developers on matrix too and they seem nice, I would love to hear your thoughts on Shiori.
> This is a genuine v1 — the advertised features work, everything else is roadmap. What I want to build next: a preference learning layer ("more like this / less like this"), Gmail integration for newsletters, and highlights. I don't know which matters most to the people who need this.
Personally, I would love if there was a thing similar to linkhut (you can see my linkhut at https://ln.ht/~imafh), this is a place where I share all the really cool things that I found on the internet and the best part absolutely about linkhut is that I can write "notes"/"text" within Linkhut about a particular link)
So anyone, who is reading my profile can exactly know why I vouched for a project. Combining this with tags and videos and search and profiles with it being open source/with api, I Really Love linkhut but (its link aggregators features are a little not so good)
I am not sure of how similar linkhut and hutch can be, but I would really love to see a feature for example where I can get suggested an article by other people who have written some genuine things about it in notes and then I can comment about it within maybe hutch itself even, or if not.
I feel like having these human elements gives a sense of community. I would love to hear your thoughts on as well!
Notes and highlights are on the roadmap — you'll be able to annotate
as you read.
The social discovery angle is interesting. Hutch is deliberately private
for now but curated recommendations from real readers with genuine notes
is a different thing from algorithmic suggestions. Worth thinking about.
On the free tier: the first 100 users get free access, no time limit.
Thanks, I have been thinking about it from a more human perspective as in that I have some cool things to share to people and those don't fall exactly into Hackernews at times (suppose a struggling musician who has an interesting story to tell), Usually I then use linkhut with their channels with a note so that people can know why I have bookmarked them in linkhut if they go to my profile and see my linkhut there. Its definitely really interesting to think about.
I've been experimenting with using Claude via GitHub Actions for automated code review, CI fixes, and conflict resolution — not just as a coding assistant but as part of the actual pipeline. Hutch is the project I used to develop and refine this workflow.
The whole thing is open source. The AI workflow lives in .github/workflows/.
I also read a lot (350k karma on Reddit, 14.5k here) and plan to release the reading automation system as a proper app next week — but wanted to share the engineering approach first.
Happy to go deep on the GitHub Actions + Claude integration.
I have 3 agents working at all times but can be more, until I reach the Claude 5x limits that resets every 5 hours. I am seriously considering Claude 20x..
Slice | Principal Engineer (FullStack) | Hybrid/Remote | Full Time
We provide Lay-By for travel, recently got 7.5M funding with a total 17.5M incl. debt facility. Me and another employee (non-dev) recently started a new business and made 1M in 90 days. We value simple HTML apps over over-engineered cluttered SPAs.
At Slice, there's NO HackerRank, whiteboard algorithm, or unnecessary programming language questions. One form, one application, one interview. The other steps are just cultural fit and other due diligence checks.
reply