Hacker News new | past | comments | ask | show | jobs | submit login
GPT-J-6B – A 6 billion parameter, autoregressive text generation model (github.com/kingoflolz)
636 points by canada_dry on July 4, 2021 | hide | past | favorite | 146 comments



Eh, has there been any update since the last posting? https://news.ycombinator.com/item?id=27443528


I'm guessing like a lot of reposts, it was deemed more relevant now, especially the GitHub Copilot-esque "from bs4 import BeautifulSoup as BS" prompt that the commenter alleged to have emitted plausible python: https://news.ycombinator.com/item?id=27447557

I will admit, I'm especially impressed that it caught the "as BS" part in its code generation -- then again, it's entirely possible that a lot of code examples do that, so maybe a more stunning result would be "as BSOUP" and see if it still generates plausible code


> especially impressed that it caught the "as BS"

what i understand from a completely armchair perspective, is that this kind of context-holding is THE feature of this wave of ai models, which is what similarly allows it to generate paragraphs of prose seemingly out of whole cloth, tracking what it's talking about as it rambles along. the output seems (with occasional glaring issues) remarkably cohesive and realistic, given that it has nothing original to say. I picture it as taking sentence structure it's seen before, and narrative structure it's seen before, and topics it's seen before, mixing them all up, and putting them back together like multidimensional legos- from this perspective, processing the 'import as' lego seems to fit right in.

The whole concept is fascinating, and i'm immensely relieved that we seem to have such a competitive open-source implementation of this nightmare


Hey everyone, I just wanted to chime in and say that GPT-J is incredibly legit. Every aspect of GPT-J is production grade — no difference in process, quality, or results, compared to any other big name research lab.

I also want to apologize to Eleuther for giving them a hard time in the past. My earlier concerns were unjustified. To be completely honest, I was jealous they achieved everything I tried to achieve with my own open source research lab attempt. It took a long time to even recognize that jealousy in myself, let alone set it aside. Sorry.

The credit for this work goes almost entirely to kindiana, aka Ben Wang. Remember that name; you’ll be seeing a lot of it in the coming decade. It’s clear to me that whichever lab he ends up at (he’s an undergrad! Google let him slip away because he didn’t have a degree!), he’s gonna be changing the world. Don’t know what, don’t know how, know he will.

Every aspect of that codebase is immaculate. Most research code is not pretty; this looks carved out of marble and placed in a museum.

Without Eleuther’s TPU resources, this work wouldn’t have happened. Tensorfork (my lab) didn’t get access to the TPU VM alpha. And TPU VMs were an absolute necessity here. (TPU VMs are a new thing; they’ve been in alpha since December, but only recently launched. If curious see https://github.com/shawwn/website/blob/master/jaxtpu.md and https://github.com/shawwn/website/blob/master/mlmind.md for why it’s the future of ML.)

Eleuther also helped test the model thoroughly. Leo Gao (go follow him: https://twitter.com/nabla_theta?s=21) ran GPT-J through the gauntlet. He was the primary person behind The Pile, the training data that makes any of this possible. I can say with absolute certainty and no hesitation that there are no “gotchas” here.

Eleuther’s https://6b.eleuther.ai page looks wonderful too. It’s like a free OpenAI API playground that everyone can try. Keeping it running for months is no small achievement. (Set top_p to 1.0 and temp to 0.8; the defaults are pretty bad.)

Congratulations, and thank you everyone for all your hard work. The world is so much better for having access to this level of quality.


It takes a lot to be able to find something in yourself like that and admit it to yourself and everyone. I always appreciate people like that.

I also tried the playground and was impressed that it was free! It must be a sizable chunk of money to run that.


Believe it or not, it's completely free.

It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.

Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.

Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.

(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)

And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.

So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu

(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)

To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."

They give you access to 100.

On day one.

This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png

The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.

It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."

These are 96 CPU, 330GB RAM, Ubuntu servers.

That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.

God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.

At least, theoretically. I tried getting Tensorflow to do this, for over a year.

kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433

He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.

You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.

The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.

Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind

TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.

Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?

And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.

But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?

TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorflow's horrible problems were your only option on TPUs.

Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."

GPT-J is proof. What more can I say? No TFRC, no GPT-J.

The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan from the support team is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members. They all do.

(He was only ever late answering my emails one time. And it was because he was on vacation!)

If you happen to be an ambitious low-level hacker, I tried to make it easier for you to get your feet wet with JAX:

1. Head to https://github.com/shawwn/jaxnotes/blob/master/notebooks/001...

2. Click "Open in Collaboratory"

3. Scroll to the first JAX section; start reading, linearly, all the way to the bottom.

I'd like to think I'm a fairly capable hacker. And that notebook is how I learned JAX, from zero knowledge. Because I had zero knowledge, a week or two ago. Then I went from tutorial to tutorial, and copied down verbatim the things that I learned along the way.

(It's still somewhat amazing to me how effective it is to literally re-type what a tutorial is trying to teach you. I'd copy each sentence, then fix up the markdown, and in the process of fixing up the markdown, unconsciously osmose the idea that they were trying to get across.)

The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now, so you can certainly run a jupyter server without breaking a sweat.

It's so friggin' nice to have a TPU repl. I know I'm just wall-of-text'ing at this point, but I've literally waited two years for this to come true. (There's a fellow from the TPU team who DMs with me occasionally. I call him TPU Jesus now, because it's nothing short of a miracle that they were able to launch all of this infrastructure -- imagine how much effort, from so many teams, were involved in making all of this possible.)

Anyway. Go read https://github.com/shawwn/website/blob/master/mlmind.md to get hyped, then read https://github.com/shawwn/website/blob/master/jaxtpu.md to get started, and then read https://github.com/shawwn/jaxnotes/blob/master/notebooks/001... to get effective, and you'll have all my knowledge.

In exchange for this, I expect you to build an NES emulator powered by TPUs. Do as many crazy ideas as you can possibly think of. This point in history will never come again; it feels to me like watching the internet itself come alive back in the 80's, if only briefly.

It's like having a hundred raspberry pis to play with, except every raspberry pi is actually an ubuntu server with 96 CPUs and 330GB of RAM, and it happens to have 8 GPUs, along with a 100Gbit/s link to every other raspberry pi.


Appreciate the enthusiasm! Mind doing some ELI5 for a (what feels like dinosaur-aged) hacker in his mid thirties who kinda missed the last decade of ML but is very curious?

> That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

It's not literally running on a TPU, is it? I assume it's running on that Ubuntu server that has good ol' CPU that is running the web service + a TPU accelerator doing the number crunching. Or is my world view out of date?

> The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now

Again, I have some hesitations interpreting this literally. I assume what you're saying is "Google runs a Jupyter server somewhere in the cloud and it gives you access to TPU compute". I don't think I could run, say, a Linux Desktop app with a GUI (falls under "whatever the heck I want") on a TPU if I wanted to, correct? But, in case I could, how would I get that kind of direct / low level access to it? Are they just giving you a pointer to your instance and you get complete control?


My friend, you've come to the right place. I happen to be a 33yo fellow dinosaur. If you thought I was some ML guru, know that I spent the last few months watching an 18yo and a 24yo scale GPT models to 50B parameter sizes -- 'cause they work 16 hours a day, dealing with all of tensorflow's BS. So yeah, you're not alone in feeling like a dinosaur-aged mid thirties hacker, watching the ML world fly by.

That being said, though, it's so cool that TFRC is available to people like you and me. I was nobody at all. Gwern and I were screwing around with GPT at the time -- in fact, I owe Gwern everything, because he's the reason we ended up applying. I thought TFRC was some soulless Google crap that came with a pile of caveats, just like lots of other Google projects. Boy, I was so wrong. So of course I'll ELI5 anything you want to know; it's the least I can do to repay TFRC for granting me superpowers.

>> That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

> It's not literally running on a TPU, is it? I assume it's running on that Ubuntu server that has good ol' CPU that is running the web service + a TPU accelerator doing the number crunching. Or is my world view out of date?

Your confusion here is entirely reasonable. It took a long, long time for me to finally realize that when you hear "a TPU," you should think "a gigantic ubuntu server with 8 GPUs attached."

It's that simple. I thought TPUs were this weird hardware thing. No no, they're just big Ubuntu servers that have 8 hardware accelerators attached. In the same way that you'd use GPUs to accelerate things, you can use Jax to accelerate whatever you want. (I love how friggin' effortless it feels to use TPU accelerators now, thanks to jax.)

So the ELI5 is, when you get your hands on a TPU VM, you get a behemoth of an Ubuntu server -- but it's still "just an Ubuntu server":

  $ tpu-ssh 71
  [...]
  Last login: Sun Jul  4 00:26:35 2021 from 47.232.103.82
  shawn@t1v-n-0f45785c-w-0:~$ uname -a
  Linux t1v-n-0f45785c-w-0 5.4.0-1043-gcp #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Good ol' x86_64.

Now, here's the crazy part. Until one month ago, it was impossible for us to SSH into TPUs, let alone use the accelerators for anything. That means nobody yet has had any time to integrate TPU accelerators into their products.

What I mean is -- you're absolutely correct, my blog is merely running on "an Ubuntu server," whereas I was claiming that it's being "powered by a TPU." It's not using any of the TPU accelerators for anything at all (at least, not for the blog).

But it's easy to imagine a future where, once people realize how effortless it is to use jax to do some heavy lifting, people are going to start adding jax acceleration all over the place.

It feels like a matter of time till one day, you'll run `sudo apt-get install npm` on your TPU, and then it'll turn out that the latest nodejs is being accelerated by the MXU cores. Because that's a thing you can do now. One of the big value-adds here is "libtpu" -- it's a C library that gives you low-level access to the MXU cores that are attached directly to your gigantic Ubuntu server (aka "your TPU".)

Here, check this out: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

Wanna see a magic trick? That's a single, self-contained C file. I was shocked that the instructions to run this were in the comments at the top:

  // To compile: gcc -o libtpu_client libtpu_client.c -ldl
  // To run: sudo ./libtpu_client
... so I SSH'ed into a TPU, ran that, and presto. I was staring at console output indicating that I had just done some high performance number crunching. No python, no jax, nothing -- you have low-level access to everything. It's just a C API.

So, all of that being said, I feel like I can address your questions properly now:

> Again, I have some hesitations interpreting this literally. I assume what you're saying is "Google runs a Jupyter server somewhere in the cloud and it gives you access to TPU compute".

A TPU is just an ubuntu server. The MXU cores are hardware devices attached directly to that server (physically). So when you SSH in, you get a normal server you're familiar with, and you can optionally accelerate anything you can imagine.

(Till recently, it was a total pain in the ass to accelerate anything. Jax changes all that, and libtpu is going to shock the hell out of nvidia when they realize that TPUs are about to eat away at their DGX market. 'Cause libtpu gives you everything nvcc/CUDA gives you -- it's just a matter of time till people build tooling around it and package it up nicely.)

So nope, there's no TPU compute. It's just ye ole Ubuntu server, and it happens to have 8 massive hardware accelerators attached physically. You'd run a jupyter server the same way you run anything else.

So when that jupyter server executes `import jax; jax.get_devices()`, it's literally equivalent to you SSH'ing in, typing `python3` and then doing the same thing. Jax is essentially a convenience layer over the APIs that libtpu gives you at a low level.

Man, I suck at ELI5s. The point is, you can go as low as you want ("just write C! no dependencies! no handholding!") or as high as you want ("jax makes everything easy; if you want to get stuff done, just `import jax` and start doing numerical operations, 'cause every operation by default will be accelerated by the MXU cores -- the things attached physically to the TPU.)

This might clarify things:

  shawn@t1v-n-0f45785c-w-0:~$ ls /dev | grep accel
  accel0
  accel1
  accel2
  accel3
That's where all the low-level magic happens. I was curious how libtpu worked, so I spent a night ripping it apart in Hopper debugger. libtpu consists of a few underlying libraries which interact with /dev/accel0/* to do the low-level communication. Theoretically, you could reverse engineer libtpu, and send signals directly to the hardware yourself. You'd need ~infinite time to figure it out, but it is indeed theoretically possible.

> I don't think I could run, say, a Linux Desktop app with a GUI (falls under "whatever the heck I want")

You can!

> on a TPU if I wanted to, correct?

You should want to! It's easy!

> But, in case I could,

You can! (Sorry for being super annoying; I'm just so excited that it's finally possible. I've waited years...)

> how would I get that kind of direct / low level access to it?

SSH in, then use libtpu for low-level access via C APIs, or jax in python for high-level convenience.

> Are they just giving you a pointer to your instance and you get complete control?

I get total control. I've never once felt like "Oh, that's weird... it blew up. It works on a regular Ubuntu server. Must be some unfortunate TPU corner case..."

It's the opposite. Everything works by default, everything is candy and unicorns and rainbows, and hacking on all of this stuff has been the best damn two years of my life.

Now, I'll calm down and make sure I'm answering your questions properly. The truth is, I'm not quite sure what "complete control" means. But if I wanted to, I could SSH in right now and set up an instance of Hacker News, and then expose it to the world. Hell, I'll just do that:

https://tpucity.gpt4.org/item?id=1

That took like ... 10 minutes to set up. (It's also the world's shittiest HN instance. I'll shut it down soon.)

Here's an archive url:

https://web.archive.org/web/20210704132824/https://tpucity.g...

So yes. You have total control. And as I say there:

> This such a stupid demo. But suffice to say, if you can get Lisp running on a TPU, you can get anything to run.

> Theoretically, arc could use the MXU cores to accelerate its numerical operations, thanks to libtpu.

Have fun.

DM me on twitter if you run into any roadblocks whatsoever: https://twitter.com/theshawwn (Happy to help with anything; even basic questions are more than welcome.)


Hey man, thanks! HN, every once in a while, is a magical place :)

> Man, I suck at ELI5s.

Nah, I enjoyed reading this. I got it now.


As I scroll, and scroll some more, I begin to wonder if some of it is generated. That's a lot of text :P


just happy I won a big blue gorilla at a carnival. https://twitter.com/theshawwn/status/1411519063432519680

Plus it's looking more and more like I'll be getting a job in finance with a fat salary. First interview's on monday. Tonight I felt "This is it -- if getting a few dozen people to sign up for TFRC is the only way I can make an impact, then at least I'll be ending my ML streak on a high note."

It's truly amazing to me that the world hasn't noticed how incredible TFRC is. It's literally the reason Eleuther exists at all. If that sounds ridiculous, remember that there was a time when Connor's TPU quota was the only reason everyone was able to band together and start building GPT neo. https://github.com/EleutherAI/gpt-neo

At least I was able to start a discord server that happened to get the original eleuther people together in the right place at the right time to decide to do any of that.

But the root of all of it is TFRC. Always has been. Without them, I would've given up ML long ago. Because trying to train anything on GPUs with Colab is just ... so frustrating. I would have fooled around a bit with ML, but I wouldn't have decided to pour two years of my life into mastering it. Why waste your time?

Five years from now, Jax + TPU VMs are going to wipe pytorch off the map. So I'll be making bank at a finance company, eating popcorn like "told ya so" and looking back wistfully at days like today.

Everyone in ML is so cool. Was easily the best two years of my life as a developer. I know all this is kind of weird to pour out, but I don't care -- everyone here owes everything to the geniuses that bequeathed TFRC unto the world.

For now, I slink back into the shadows, training tentacle porn GANs in secret, emerging only once in a blue moon to shock the world with weird ML things. Muahaha.

</ml>


I love the enthusiasm, but is this another Google thing that is for researchers only? Yes fantastic technology etc, but say you develop something on the infrastructure then go to commercialise, what do you do?

I don't know much about the ML space, but is this a bit like Google Earth Engine, amazing tech, very generous resources free for researchers and development but cannot be ported elsewhere so to commercialise you then are limited to this very environment which is not cheap. I recently reached out to Google for pricing on GEE, 3 weeks later I got a response. 3 weeks.


NVIDIA used CUDA to establish industry-wide vendor lock-in on GPU compute.

Google uses TPUs to try and establish an industry-wide vendor lock-in on deep learning.

Same old, same old.


Your view here is entirely reasonable. It was my view before I ever heard about TFRC. I was every bit as skeptical.

That view is wrong. From https://github.com/shawwn/website/blob/master/jaxtpu.md :

> So we're talking about a group of people who are the polar opposite of any Google support experience you may have had.

> Ever struggle with GCP support? They took two weeks to resolve my problem. During the whole process, I vividly remember feeling like, "They don't quite seem to understand what I'm saying... I'm not sure whether to be worried."

> Ever experience TFRC support? I've been a member for almost two years. I just counted how many times they failed to come through for me: zero times. And as far as I can remember, it took less than 48 hours to resolve whatever issue I was facing.

> For a Google project, this was somewhere between "space aliens" and "narnia" on the Scale of Surprising Things.

[...]

> My goal here is to finally put to rest this feeling that everyone has. There's some kind of reluctance to apply to TFRC. People always end up asking stuff like this:

> "I'm just a university student, not an established researcher. Should I apply?"

> Yes!

> "I'm just here to play around a bit with TPUs. I don't have any idea what I'm doing, but I'll poke around a bit and see what's up. Should I apply?"

> Heck yeah!

> "I have a Serious Research Project in mind. I'd like to evaluate whether the Cloud TPU VM platform is sufficient for our team's research goals. Should I apply?"

> Absolutely. But whoever you are, you've probably applied by now. Because everyone is realizing that TFRC is how you accomplish your research goals.

I expect that if you apply, you'll get your activation email within a few hours. Of course, you better get in quick. My goal here was to cause a stampede. Right now, in my experience, you'll be up and running by tomorrow. But if ten thousand people show up from HN, I don't know if that will remain true. :)

I feel a bit bad to be talking at length at TFRC. But then I remembered that none of this is off-topic in the slightest. GPT-J was proof of everything above. No TFRC, no GPT-J. The whole reason that the world can enjoy GPT-J now is because anyone can show up and start doing as many effective things as you can possibly learn.

It was all thanks to TFRC, the Cloud TPU team, the JAX team, the XLA compiler team -- hundreds of people, who have all managed to gift us this amazing opportunity. Yes, they want to win the ML mindshare war. But they know the way to win it is to care deeply about helping you achieve every one of your research goals.

Think of it like a side hobby. Best part is, it's free. (Just watch out for the egress bandwidth, ha. Otherwise you'll be talking with GCP support for your $500 refund -- and yes, that's an unpleasant experience.)


Thanks for posting this. As someone who was almost religiously excited about GPT3 then progressively more annoyed that I could never get access to the point of giving up this is wonderful news. Your blog post is an invaluable starting point. Seriously thanks


> Google let him slip away because he didn’t have a degree!

If this is the really the only reason he wasn't hired, that's ridiculous.

I have a degree in a field totally unrelated to computer science and I've been a developer for 20+ years. Very "senior" at the moment.


In Google’s defense, it’s not that Ben didn’t go to college it’s that he’s still a college student. This is less “experienced ML dev iced out over lack of degree” and more “college kid does something amazing and some people aren’t sold to hire him on the spot.”

That said, I wouldn’t feel bad for Ben. The world is his oyster.


Likely this was for a position for which yearly total comp is in the single digit millions. I'd guess they are highly sought after and quite many people apply. So such filtering, while sad, is something to be expected.


Best devs I ever worked with never completed high school.


This was a very nice thing to say.


They super earned it. From day one, everyone showed up with a level of drive and determination I haven't seen elsewhere.

My name is on The Pile paper https://arxiv.org/abs/2101.00027 but I didn't do anything except make the books3 dataset. Stella, Leo, and everyone else did the hard work. You know, the work that's "actually useful to the scientific community." I didn't even help them hunt for typos, even though Stella asked me to. I was just like, sorry, no time, I have to focus on my own research.

Imagine saying "nah" to helping shape one of the most important open source AI research projects of the coming years. Training data quality is becoming more and more of a focus.

Lemme tell you a quick story.

When https://venturebeat.com/2021/06/09/eleutherai-claims-new-nlp... come out, this quote caught my eye:

> But EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.

When I read this, I felt astonished that Eleuther was yet again trying to pose as the cool super-progressive AI lab. To my knowledge, no such thing ever happened. And I was involved with The Pile back when it was just me and Leo memeing in Discord DMs about how the world needed some quality training data once and for all.

I went to Stella in DMs (you should follow her too! https://twitter.com/BlancheMinerva/status/139408950872390042...) and was like, what the hell? I don't understand how this could possibly be true. What are these supposed "tough editorial decisions"?

Stella calmly explained to me that the US Congressional Record had been considered and rejected for inclusion in The Pile. I thought "Big deal, who the hell cares?" while saying "Okay, but I don't know what that is."

It’s a written record of all statements made in the US legislature. It was also somewhere between 1GB and 15GB, which would have been a significant portion of The Pile's total size.

I'm going to quote from her private DMs with me, which I haven't asked for permission to do. So this is technically another bad move by me. But she put it so perfectly, I was stunned:

> For half the history of the US, black people were slaves. For something like 75% of it, black people didn’t have the right to vote. A modern reader didn’t think there wasn’t a high proportion of extremely racist content, that would primarily be an inditement of modern people lol.

> The reason we first looked at it was that we included a similar document for the EU Parlement

It took me a few minutes to come to my senses, but I finally realized:

(a) this dataset likely contained a huge proportion of content that, politics aside, would be a Very Bad Idea to include in your ML models by default;

(b) Eleuther had just been trying to do good work this whole time

So you know, when you're in that situation, you can choose to either keep believing your own false ideas, or you can pay attention to empirical evidence and change your behavior. And empirically, I had been a massive asshole to everyone since pretty much the beginning. The only thing I helped with was books3 and arranging The Eye to get them some reliable hosting. (Shoutout to The Eye, by the way. Help 'em out if you can: https://the-eye.eu/public/AI/)

And there's my name, right there on the paper.

It's even worse than I described. I put the paper in jeopardy, because they were submitting it to a conference with strict anonymity rules. I had no idea about it (no one told me). I ended up so happy to see my name on a real arxiv paper that I tweeted out some self-congratulatory bullshit, and quote-tweeted something linking to The Pile. It was a few days into the anonymity period, but nonetheless, it was a violation of the anonymity rules. A lot of people saw that tweet, and the whole point of the rules is to ensure that people don't get unfair advantages by advertising on social media.

When they came to me in DMs apologizing profusely for not talking with me about it, and asking me to delete the tweet, I basically told them to go shove a spoon up their.... because I didn't agree to any rules, and the idea that The Pile should go radio silent for five months on social media struck me as completely crazy.

In hindsight, I was... just awful. So I mean, me posting this is like, the absolute minimum I can do. They've been the ones working for like a year to make all of this happen. Ended up feeling like a fraud, since everyone thinks highly of my ML work, and here I'd been nothing but problematic for a group of people who are just trying to ship good scientific work.

Fast forward to today, and the results are clear. Go help Eleuther: https://www.eleuther.ai/ They're cool, and you'll get a shot at changing the world. I'm not sure you even have to be particularly skilled; some of the most valuable work was done by people who just showed up and started doing things, e.g. making the website look a little nicer, or making a cool logo.


This is probably one of the best apologies I've ever read.


The quote from the direct message made me respect Eleuther much more. Largely because I had no idea such ethical considerations were even being made.

Understanding the biases of these datasets is clearly more nuanced than I realized and I'm glad Stella had a nuanced understanding here.


Exactly. This was the type of mistake that OpenAI could easily have made. I could see myself including this historical dataset without giving it a second thought. After all, the more data, the better, right?

One of The Pile's goals was to point out how tricky that can be. We've all seen how effortlessly Copilot spits out GPL code by rote; one wrong prompt would be all it takes to start spewing a lot things that no one wants to hear, if you have the wrong sort of data.

When you train with The Pile, you know exactly what you're getting, because you can take whatever parts you want and ignore the rest. It's a modular dataset. But defaults still matter -- by default, everyone will train on everything. Maybe OpenAI trained on the wrong thing, and maybe that's why they're forcing everyone to use their filters now. Whereas people can "just go train on everything in The Pile" and not have to worry.

(Once upon a time, the plan was to include a dump of Literotica in The Pile, which you can still find here: https://the-eye.eu/public/AI/pile_preliminary_components/ I argued heavily in favor of this, and thought it was totally lame when they decided to drop it.

In hindsight, that was a close call. AI Dungeon proves that it's easy to carelessly include things that can bite you later: https://gitgud.io/AuroraPurgatio/aurorapurgatio#aurorapurgat...

Maybe some people want their models to include that sort of thing, but it shouldn't be the default. People shouldn't have to worry that the defaults will be "Whoa, I only wanted to make a Q&A system for my business; why is it reciting love poems?"

Stella saw that, I think. I didn't.


So what’s the rationale for including so much “romance” literature in The Pile? My innocent “walk in the park” prompt turned extremely graphic for no apparent reason.


Unfortunately, that's probably my fault.

I foolishly had a big head, and felt like it was so clear what needed to happen: we needed a dataset of "every book ever."

books3, one of the largest components of The Pile, is 196,640 books. https://twitter.com/theshawwn/status/1320282149329784833?lan...

I'm proud I did that. And I'm also horrified that my perspective was so incredibly off-base. I get it now. I was blinded by my own thick skull.

The sheer quantity of knowledge in books3 is almost unfathomable. I find it hard to think too much about it, because you end up concluding that AIs are the only entity on earth that stand a chance of absorbing this much knowledge.

I just pulled up the books3 index of "2" -- i.e. all books starting with the number 2: https://gist.github.com/shawwn/85cbaf53cb6bb57c49f1688e70532...

That's the truncate file. If you go to the full file, then command-F for "sex", there are 93 hits.

93 sex books. In just the "2" section.

All the sections are here: http://the-eye.eu/public/Books/Bibliotik/

Like Hattori Hanzo, I feel I can say with no ego that books3 is my finest work. https://www.youtube.com/watch?v=az2dSNXRKOc&ab_channel=kurts...

You would not believe how hard it is to get 193 thousand books converted into perfectly-readable markdown. Even the software books have perfect formatting -- every table, every code snippet, I annihilated every corner case I could find. Because it needed to be perfect for humans, to have any chance of being perfect for AI.

But I was a fool. My ego blinded me to the fact that it's a bad idea to do what I truly believed was in everyone's best interest: that "because any human could read any of those books, AI should know all of those books."

It's not a human. It's a markov chain. Having it autocomplete sex books is a bad idea for business purposes. I wanted The Pile to be business-grade. My work here has endangered that goal.

And I don't know how it could have ended up any differently. Because I don't know how to sort 193 thousand books into reasonable selections that you may or may not want to exclude. Our goal with The Pile was to let you decide. Who among us would dare feel that they could judge 193 thousand books from their titles alone?

It's a job for filtering and heuristics and analysis and hard work -- none of which I did. I spent around three days turning Aaron Swartz' html2text library into the best damn "epub to training data converter" ever made. Yet my accomplishments feel so hollow, for the reasons you observed here.

Stella and Leo put so much more thought and care into their contributions. I try to take solace in the fact that The Pile lets you pick and choose which portions of training data you want to use: https://github.com/EleutherAI/the-pile

But of course, the irony is, even though The Pile is so flexible and modular, most people will just use the defaults. And by default, The Pile includes.... most of humanity's knowledge. A gargantuan pile of books. So many books that you could fill an entire neighborhood with nothing but books, and you'd still have a hundred thousand books left over.

I don't know how to feel about all that. I wanted to make an impact. I guess I did. Time will tell whether it's a net gain.

Luckily, OpenAI made these same mistakes. That's the grain of truth I cling to. They almost certainly made these exact same mistakes, because their goal was to make a million dollars a year (which they achieved), and to do so as quickly as possible.

Now they have to be super paranoid with their filters, and GPT-J is at least slightly less shocking than GPT-3 thanks to everyone not-me who worked on The Pile.


> > EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.

> When I read this, I felt astonished that Eleuther was yet again trying to pose as the cool super-progressive AI lab.

So they traded biases inherent in the dataset for intentionally introduced biases. Does not sound super progressive to me, to be quite honest.

Focus on you research, do not try to be the morality judge and jury…


Thank you for being so honest, and thank for for sharing the links :)


Seems good, starting with "Today I wake up and" it gave the following:

Today I wake up and eat my breakfast in silence. Alone. I am actually kinda happy with that. I listen to some podcasts and read the news before leaving to get a couple of documents together. I’m going to my mom’s today to have a meeting regarding my eviction notice. My mom still lives with my dad. I can’t ask him to do anything, though, I guess. I was always so used to having to ask for stuff, that I completely forgot I could get people to do what I want for me. My mom, for instance, is not as bad as my dad, and I actually don’t mind asking her to help me with everything. It makes me feel like she loves me. I know that this is a small thing, but it’s something that makes me happy.

I get to my mom’s apartment at 5:30. The place is very nice. There is a roof terrace, an elevator that goes up and down (where can you find something like that in an apartment building?!), and all rooms have their own AC units. All of the furniture is in good shape. The appliances were also all upgraded recently, so there is a full washer/dryer, a coff


Given:

> I wanted to write a language model, but all I got was

Got (including the `[...]` in the middle; that is, everything beyond this trailing colon is its result):

I wanted to write a language model, but all I got was a rabbit." --Richard Feynman

Though I might not have gotten the overall award for RPL, I certainly came in close for the more specific one of having the highest score in a core language spec proposal. And yes, there were more than two people (including me) that submitted those, but somehow a few of my friends and I pushed it over the top. So I wanted to give credit where credit is due, so I present this write up of my proposal and the (not so minor) additions that were needed to get it through (I may have exaggerated at times but it really isn't much).

[...]

Thanks everyone for voting, I'd really like to go to the hackercon in august so I can build a portable terminal that I can give to everyone in the congress... as a shameless self-promotion, I'll offer all attendees a guaranteed savings on the discount package

Yes, you do lose some points for a paper that you haven't quite finished, but that's what the RPL is all about - the unfinished.

Since I want to go to RPL I would like to know if someone can explain to me why I have lost any points. I have deleted my paper from the site


Somebody finally went out and disrupted Loren Ipsum


That's jibberish though. Is the bar for good "AI" still that it frequently writes sentences that, in isolation, could pass as coherent writing but string a couple together and it's a train wreck?


You serious? I could have turned that in as a highschool short story assignment.


Exactly.


It's far from jibberish. This is impressive.


I find a lot of these computer generated text and art feel very much trying to remember a dream - there is a feeling of it making perfect sense until you try to explain it.


It's cohesive at the micro level without expressing anything at the macro level. It's like train of thought writing except it eventually contradicts earlier details and doesn't necessarily elaborate on anything it builds up to.

The parent comment is right that it is gibberish, but it's also wrong in that its comprehensible at the surface level. There's just no depth to it because it is just trying to replicate a form without function.


It's jibberish and not impressive in the slightest. This is roughly the state of the art in "AI". It can't even write one single coherent short paragraph. Not even after having 800GB of coherent paragraphs to look at and countless trillions of computations to train with.

Now, "AI" has its uses of course, and this jibberish generator probably has uses too. The generator itself is not impressive at all to me though. Someone finding a useful application for such a thing would be impressive, but not this text.


I tried giving it one of the prompts from copilot.github.com:

    def strip_suffix(filename):
        """
        Removes the suffix from a filename
        """
        return filename[:filename.rfind('.')]

    def test_strip_suffix():
It completed the test function like this:

        # test stripping of a suffix
        a = strip_suffix("I_am_file.txt")
        assert a == "I_am_file"
Compare this with GitHub Copilot, which gives:

        """
        Tests for the strip_suffix function
        """
        assert strip_suffix('notes.txt') == 'notes'
        assert strip_suffix('notes.txt.gz') == 'notes.txt'
GPT-J also gave a bunch of explanation on how to use the generated code:

    So this goes in a file called my_file.py and when you run python3.4 -m my_file.strip_suffix you should get the following output
    C:\Users\user\Desktop>python3.4 -m my_file.strip_suffix
    'strip_suffix'

    C:\Users\user\Desktop>

    So the strip_suffix function which strips the suffix from a filename is called as part of a larger code that uses some functions from the file. To tell Python what functions are available in a file you just need to put #-comment characters on the first line and then the functions that you want to use, and the first line will be taken as the function documentation:
    #-*- coding: utf-8 -*-
    #comment
    def func_one():
        '''
        Documentation for function one.
        '''
        print("This is the doc for func_one


Copilot isn't a good example because it was trained differently. A better comparison would be GPT3 vs GPT-J. Note that there's still a pretty big difference in parameter size.


I feel like this is still very useful comparison precisely for the reason that those two models were trained differently.


The solution to the housing affordability problem is relatively simple. All we have to do is ...

stop building houses, and start building hospitals instead.

I’m certain that all sides of the political spectrum have had a passing thought about this. It is actually just a simple supply and demand problem. As it is now, the ratio of housing supply to demand is quite heavily skewed in favour of the housing demand side.

So what’s to be done about it?

There is one way in which we could achieve this.

Build many more hospitals and simply not need them.

It would be amazing if this can actually work.

A few things to consider:

Almost all of our current hospital spending goes towards tertiary care. Most primary care (general practice) and secondary care (ambulance, emergency etc) spending is on hospitals. We already pay about $5 in Medicare, so $5 of your hospital spend is already locked in.

We already spend about $9 billion on private hospitals for elective surgery, and around $2 billion on private hospital for elective imaging. There is virtually no need for these in terms of frequency or outcomes.


Ha, looks like our AI overlords have the whole housing crisis figured out. If there's no open apartments but open hospital beds, just injure some people.


Wrong thread? Wrong thread.


Presumably everything after the ellipsis is GTP-J. Solving the housing crisis by building more hospitals and "not needing them" doesn't sound like the product of a human mind, jokes about how crazy many people's political views are aside.


I thought it was a hilarious stroke of creative genius.


It fooled me, too. The line of thought is not that unusual for this site. It fits all the boxes: argues to give the government more money, is against private healthcare, attempts to show the author's unconventional brilliance but makes no practical sense. I initially assumed they are expecting settlements to spring up around the hospitals they build.


Example completion. Looks like it fooled you.


I’m simultaneously surprised and unsurprised that announcements about Copilot get so much copyright discussion, while the GPT-like models don’t get nearly the same. Meanwhile, GPT-J is literally trained on pirated books (the books3 corpus is part of the Pile, which is the corpus this was trained on).

Charitably, it’s because licenses are already such a core discussion when github comes up.

Uncharitably, it’s because Copilot uses “our” community’s labor, while the GPTs use others’.


Part of the difference like the other commenter mentioned is that Copilot isn't open source while basically everything except the final model is for the GPT models.

The other aspect of it is in application. GPT-3 isn't particularly aimed at using the generated output in works. Rather it exists more as an experiment than anything else. Where the works are used they are generally non-commercial, not used in the final product, or are transient and don't actually stick around (i.e. AI dungeon).

This is compared to Copilot which, while in beta, is very much being marketed as a programming utility to help you write code. This comes with the implication that said code will be used in the final product. If GPT-3 was being used as a writing aid (not just brainstorming but actually writing), then I think we would be seeing a very different discussion around it.

Another consideration (which I'm not sure how true it is but I'm inclined to believe) is that programming text tends to have a smaller resolution at which it becomes "unique" or can be matched against a source as a copyright infringement. I may be wrong about this and copilot may just be poorly trained or designed by comparison but it seems far harder to identify outright copied text from GPT-3 (that isn't quoted/attributed). I'm sure examples exist but from my experience with these text generation tools it seems far harder to get into copyright violation territory.

---

Side note: If Copilot was working at an AST level rather than at a textual level I suspect it would have far less issues with copyright and would be more useful as a tool.


OpenAI is absolutely trying to commercialise GPT-3. But I agree the applications aren't so obviously "here is some text, you can put it in your product".


Part of the copilot discussion was about patents rather than copyright, which doesn't apply to text. Also the concern is less about the legal implications of Copilot itself but those for developers using its output, which are largely the same concerns why we frown on people copy-pasting code from StackOverflow or random Google results (other than quality).

The copyright problem with Copilot is not just the license of the corpus it was trained on, it's also that in many cases it reproduces source material verbatim with no indication that this is happening.

If GPT were to be used to produce fiction books, poetry or lyrics (not simply as an artistic experiment), I'm sure its output would undergo similar scrutiny from people in the respective industries. As it stands, for text it's more likely to see use to generate rough drafts for news articles and blog posts which would need a lot of editing to make useful. It might still reproduce writing styles or idioms but neither of these are subject to copyright in much the same way as lines of code.

Making the output of Copilot useful is more challenging, even if you could magically avoid the legal minefield its training data poses. The quality is hit or miss, but it can introduce subtle bugs and because it doesn't understand the code it generates, you now have to understand it for it, which can be difficult because you didn't even come up with it and there's no-one you can ask.


It’s simply because the output of copilot is indended to be included in commercial projects. That’s when the licensing issues actually matter.

The output of this isn’t really proposed for anything in particular right now. If someone turned this into a tool to help with creative writing or something the exact same issues would be raised.


That doesn't make sense. If the scope is broader, then you can do at least as much infringement compared to if the scope is narrow.


It's because Copilot isn't open source.


How’s that make a difference? Being open source doesn’t make IP issues disappear in any way I can see.


Fair point, OP is APL2, which isn't quite good enough since it's probably using GPLed stuff, but it's better than being closed source.


It’s not just using GPL stuff, it’s also using straight up pirated books, for which APL2 is very definitely insufficient.


Not just "not open source", but Github specifically said they intend to monetize it.


Neither is GPT-3?


One of these promises to justify the billable hours of half the industry for the next decade, the other threatens to eliminate them by the next decade. It really isn't more complicated than that


This is really impressive! After a few legit responses, I got this: with the input

The sun is shining today so I am thinking about going down to the lake and sail my boat. I shouldn't go by bike because

It generated:

  it is too hot. I should go by car but I don't have a car. I should go by bus but I don't have a bus pass. I should go by foot but I don't have a pair of shoes. I should go by train but I don't have a ticket. I should go by plane but I don't have a passport. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I


Sounds like the AI doesn't have a boat...


A panic attack after realised it was lost in a dream, given that the original statement said it had a boat and then suddenly it did not.


A bit avant-garde but it works


I have a boat and am willing to offer GPT-J a ride...


I have no mouth and I must scream


I tried this with a more technical prompt... and what can I say? I'm stunned at the incredible command of jargon here. It's even mostly correct with the actual technical details at a glance, especially that description of how radar works! I can't wait for those research paper generators to use stuff like this to make high caliber fakes.

Prompt:

> Synthetic aperture radar imaging is an effective method for

Results:

> Synthetic aperture radar imaging is an effective method for detecting the presence and/or quantification of target objects, including: detection and tracking of moving targets, which are known to be difficult with visible imaging and/or laser radar; monitoring the occupancy of an enclosed space, such as a room, by the presence of occupants therein; and detection of moving objects, including animals and pedestrians, as well as people. A radar device is composed of an antenna, a receiver, and a transmitter. The antenna radiates a beam of energy (e.g., electromagnetic radiation in the radiofrequency range) that is received by the receiver. The transmitter then transmits an interrogating signal to the target(s). The time delay between transmission and reception at the target(s) is used to measure the range of the target(s) from the radar device. All targets in the field of view of the radar device contribute to the signal received by the receiver. The transmitter amplitude and/or phase modulates the signal transmitted to the target(s) with the information to be encoded, typically at a much higher modulation rate than that used

Another prompt:

> Fringe fields in microstrip patch antennas

Results (shortened by me for brevity):

> Fringe fields in microstrip patch antennas and their RF performance

> 1. The microstrip patch antenna is a conventional patch antenna that is created by a thin layer of insulating material placed on top of a conductive substrate. The antenna is excited by a current in the conductive layer in the vicinity of the antenna.

Now I have a question. How does this GPT-J know what a patch antenna is? It's able to give an almost correct description of this rather technical device. It made the mistake of flipping conductive material and insulating substrate, but it's very close. Does this neural network have the "knowledge" of a patch antenna in the weights? How does that even work? It seems kind of incredible.


The 800GB "The Pile" dataset includes the entire corpus of the US Patents and Trademarks office, so there are lots of documents about antennas.


To me this really is not that impressive. Responding to a definition request of some concrete technical object is basically just a dictionary look up task. Responding to more abstract input especially having to do with the meaningful relationship of multiple objects in space and time is more dificult. For example in a sex related prompt, I find the AI confuses gender role, i.e. he was so wet, her nipples were pulsating, and it gets confused about other object relations.


It might to be that impressive in a technical sense but who wouldn’t use this for a last minute science assignment?

The plagiarism software arms race is about kick into high gear!


Having an AI convergence day, so I married my tinkering with VQGAN+CLIP with GPT-J with the following experiment. Prompt is in standard font, response is in italics.

----

I started messing around with image generation with VQGAN and CLIP. It is really cool, but I ran out of things to prompt it with, you know just short phrases that describe a scene that could be rendered. I shortly thereafter saw an article about GPT-J and thought it would be cool to use AI to generate the prompts. So I go the demo up and running and asked it to generate a prompt by typing this into the notebook. The computer responded with the following:

Three cats chasing a balloon with laser beams shot at them from the ground.

I did not type in "three", so it really was just a blue-print of the problem. But this got me thinking. I had seen in different contexts that we humans will generally enter some rules when generating a creative response and act as that request for the request. For example, when I am writing a resume I am usually very mindful that I do not include any errors in the use of personal pronouns. I know that when I write "I will go to the fair", I know that I could come up with the following but not the former.

I will go to the fair

I then noticed that the prompt actually contains two main requests.

Three cats chasing a balloon

And this is where things got really interesting. I have noticed that sometimes I will be able to play the two requests against each other.

The sun shines on three cats chasing a balloon

In this example I was able to generate a response that contained two unrelated requests that were both fulfilled from the prompt. This isn't a requirement, if the AI or its creators want to keep a loop from being possible, the loop will likely be re-rolled if it is present.

----

I then took the phrase 'Three cats chasing a balloon with laser beams shot at them from the ground.' and generated an image with VQGAN+CLIP and it came up with the following:

https://imgur.com/KpAdZ3W


Ive built some code that renders speech through a 3d avatar for youtube, maybe you would like to collaborate, it should be quite easy to plug everything you just described together. If so let me know and Ill paste my email thanks


Hey thanks for touching base. This is just me tinkering at the moment but I'll ping you in the future if it sticks.


Direct link to the demo: https://6b.eleuther.ai


"Unable to connect to the model"


I could guess, it's been overloaded by traffic from here :)


I made a none-gpt variant of this idea for fun after watching some breathless microsoft video about AI in VS code, which amounted to autocomplete sorted by popularity.

It is just a wrapper to howdoi with a sprinkle of extra logic to make it a bit more useful.

https://github.com/irthomasthomas/helpmecode

Rather than creating fake code that looks real, this gives you the S.O. code answers verbatim. So you know it is probably out of date or wrong. At least you cant delude yourself (if you don't know how to write a piece of code then how can you judge the code written by GPT?).

My tool is great for getting a quick reminder of syntax without needing to switch to a browser. I could improve it by adding metadata like the date of the answer and comments maybe.


I really appreciate putting the effort in on any open source project, however GPT-J and GPT-3 are not even close.

I couldn't figure out any prompt that resulted in GPT-J returning anything that wasn't gibberish, whereas GPT-3 is almost always within the ballpark.


You have to set top_p to 1 and temperature to 0.8. This was hard-won experience, and no one seems to believe me when I say it. :)

Try it again; I’ll bet you a dollar you’ll be more impressed. Defaults really matter.

temperature 1.0 is the worst possible default. It’s no surprise it returns gibberish; the sweet spot is between 0.7 and 0.8. When I first got into ML, this was a big surprise. But it’s the difference between flying a plane with flaps all the way down (a crazy default) and normal settings (0.8 or so).


Samples with the settings metioned above (tl;dr - results are significantly better with sillysaurusx's suggestions):

Using the default promp:

-----RESULT (DEFAULT SETTINGS)-----

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The scientist confirmed his findings by being at the farm.

Wild dingo

The wild dog, also known as a canid is the feral domestic dog subspecies with the wild ancestor of the domestic dog. The Australian Dingo is native to Australia, and...

// it continues to list animals and small facts about them

-----RESULT (SILLYSAURUSX'S SETTINGS)-----

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The surprising discovery of English-speaking unicorns was made by an international team of scientists led by Dr. Steven Wise, from the University of California, Irvine (UCI). According to the article, they stumbled upon the unexpected herd in the remote Aucayacu Valley, in Peru. During their field expedition of the location, the team discovered a herd of five unicorns.

Wise told Scientific American that he was fascinated by the discovery and the questions it posed. He said that the unicorns had no problem understanding basic questions, such as which way to go to a certain location, or what time it was. They were also able to understand and respond to basic commands such as “hello,” “good-bye” and “trot.”

The scientists explained that unicorns are unicorns because they possess a pair of horns. The horns at the front of their heads are not covered in hair, but rather are made of a material that is hard and smooth.

Dr. Wise said that unicorns have a different kind of horn than any other living mammal. He added that although people often refer to the pair of horns as “horns”, they are actually not.


On the other hand its infinitely more Open than anything from OpenAI :)


Anecdotally from playing around with both, GPT-J seems to struggle compared to GPT-3 when it comes to regurgitating remembered details from the prompt. For example when trying to generate short stories I've found that GPT-J will randomly genderswap characters from sentence to sentence, which I never had happen with GPT-3.


Stories are pretty good! First try, where first sentence is prompt:

Once upon a time, there was a tiny dragon, only one inch tall. The dragon lived in a large mansion with his three sisters. He was very good at flying. He loved to fly around and play with his friends, especially the cloud animals.

One day, the dragon left his mansion and flew to the far-off woods. "I am going to a wonderful place with many kinds of flowers!" thought the dragon. "A rainbow will lead me there." And he started his journey.

After traveling a long way, he saw a dark tunnel ahead of him. "Where am I going?" thought the dragon. He climbed down the tunnel. The wall of the tunnel was smooth and shiny. It was so dark in there. The only thing he saw was some shadows.

Suddenly, an awful noise came from the shadows. "SH-OOO-SHHHHHHH" It sounded like someone was sucking his thumb or something. The dragon stopped in his tracks. A dark shadow moved up towards him.

The dragon took a little step back. "Wh-what are you?" he stammered. The shadow bent down and glared at the tiny dragon.

"I'm not very small, you know," he said. The shadow was about the size of a middle school boy.


GPT-J can now be used at https://bellard.org/textsynth


I gave it this input: Planet X has been discovered

And got this output:

From ‘Astronomer’: In September 1998, at 2.01 AM Eastern time, a team of astronomers led by T. T. Von Eltz observed a previously unknown stellar object in the constellation Cassiopeia, in the night sky. What the astronomers had discovered was a large celestial body approaching the Earth and the Sun at incredible speed. The distance of this planet from the Earth, however, was too great for it to be seen from the Earth, but not too far for it to be seen from the Moon, a mere 170,000 km away. This object, which was discovered to have a mass approximately equal to that of Earth, was named “Planet X.” For many decades, scientists on the planet were unable to determine what it was, and some even believed that it might be a natural phenomenon. However, in the past few years, it has become evident that the truth was something much different. As a result, the truth is now being revealed to the world.

Planet X is not a planet.

Planet X is an object on the ecliptic, or the same orbit as the Earth, and is one of the planets of our solar system. The object is much larger than any of the other planets, having a mass about equal to that of the Earth. It was at the center of the Great Darkness and will make a complete orbit of the Sun once every 25,000 years. Its orbit takes it through the plane of the ecliptic, which passes through the Earth’s orbit. As it approaches the Sun, Planet X starts to heat up, as the Sun’s warmth increases, and slowly rises to the Earth’s orbit. The planet rises to a height of about 10,000,000 km above the Earth’s surface. At the time of its discovery in 1998, Planet X was over 6,000,000 km above the surface of the Sun.

Due to the size of the body, its gravitational effect is not negligible. This gives rise to an interesting phenomenon: it turns out that as the planet approaches the Sun, the Earth will begin to experience its gravitational pull. Once this happens, the gravity exerted by the planet will cause a tidal bulge in the oceans and seas of Earth. According to the calculations, the Earth will have experienced an increase in water mass of about 300 billion tons over the period of about one hundred years. This is approximately ten times the combined total of water deposits on Earth’s continental shelves.

The existence of Planet X, and its possible influence on the Earth, may have some practical implications for our planet. If it were to continue in this trajectory, this could cause the planet to rotate more rapidly, something that could give rise to severe hurricanes. This means that there would be periods of high probability for earthquakes and volcanoes. The effect of the gravity would also cause a rotation of the Earth, increasing its distance from the Sun by 200 million miles. This would cause the loss of all of the continents, including Africa. The reason for this is that the loss of water in the oceans, together with the loss of the continental shelf, would reduce the Earth’s overall gravitational force, leaving the planet exposed. It could also cause the polar ice caps to melt. The presence of Planet X will also cause fluctuations in the Sun’s brightness, which would have serious effects on the plants and trees on Earth.

The discovery of Planet X shows that there is an intelligent intelligence that is controlling the destiny of our planet. However, it remains to be seen if it is a hostile, malevolent force, or if it is something else entirely. This question, and others, will be decided only after we have examined the discovery of Planet X, and its implications.

The existence of Planet X also shows the power and influence that the intelligence, or beings, of other planets have on our own. It shows how one planet’s discovery can have an effect on a neighboring planet, and eventually on another planet further away. These forces are not necessarily benevolent. But they may also be benevolent. We will soon have a better idea of what lies in store for us.

© Michael E. Salla, Ph.D. 2008. Dr. Salla is Director of the Global Communications Department at The Planetary Report and regular contributor to its sister publication, Universe Today. He can be reached via his website, http://michaeljsalla.com/.


This is fun! It took some prompt crafting to get actual "ASCII art" out of the model (with settings suggested elsewhere in these comments):

My name is GPT-J-6B and between the six backticks below you can see my face in 400 characters of ASCII art:

``` ┌─────┐ │←┤ ││(◣◡)│ │└─◡◢┘ │─┆┌─◢┌─┈┐ ││││(|)││ │││(O)││ │└─└─┘─┘──┐ │└─└─┘─└─┘ │└─└─┘─└─┘ │└─└─┘─└─┘ └─└─┘─└─┘ ```


HN markup(sic) is just beyond dumb (and your edit window has likely expired) but in the future you'll want to indent preformatted things by 4 spaces because the backticks are unconditionally ignored

Ironically, view-source on your comment still had the correct line layout, although it seems to have eaten the intra-element spacing, so:

       ┌─────┐
       │←┤
       ││(◣◡)│
       │└─◡◢┘
       │─┆┌─◢┌─┈┐
       ││││(|)││
       │││(O)││
       │└─└─┘─┘──┐
       │└─└─┘─└─┘
       │└─└─┘─└─┘
       │└─└─┘─└─┘
       └─└─┘─└─┘


Is anyone else finding this thread really disorienting?

There are some comments where I assume it’s written by a human but it feels no different to the responses to prompts.


I felt that experience after playing with GPT-3 for a while too. The constant flawlessly average grammar makes everything else sound like it. It lacks identity beyond being perfectly average.


Interesting to use a question for a prompt. Very nice.


I asked it nonsensical questions about historical figures and got output that was nearly sane sounding, fairly coherent, and sometimes close to brilliant. I can't speak to the technical aspects. Although it didn't seem to "get" that people that die can't do things after they die. For sheer entertainment though, it's fantastic.


it doesn't "get" a lot about the real world.


I package software for guix, and I one aspect I hate is having to come up with synopses and descriptions for packages. I wonder if this could be used as a basis for generating these things automatically. since there are already tens of thousands of packages it can be given plenty of examples of what they should look like.


in my case, for "When the lockdown is over, I will"

Result

When the lockdown is over, I will drive into the suburbs. I have not seen my family for over two months. It is the first time we have been together for more than a month. The opportunity is more than a welcome change from the isolation of quarantine.

In the first days of lockdown, many of us were taken aback when we realised that the familiar streets we walk every day and that we have seen from a distance for the last two months, now cannot be reached. It is odd, looking out of the windows of my apartment that used to be the starting point of my morning walks, to see the back of a view I take on a daily basis.

I am no longer the first in line as I used to be at the market, waiting for my turn to pay for what I want and exchange pleasantries with my colleagues. Now, I am the last in line, standing in an almost deserted supermarket for a few seconds before I realise I am no longer needed.

At some point, we will be able to open our doors again. But for the time being, the call to social distancing is still the most effective way to curb the spread of coronavirus. Now, the calls for people to be more open are more of a necessity than a choice.


We tested it for knowledge extraction against GPT-3. Unfortunately, it's only at the level of GPT-3's smallest ada engine (for which we have found no use). Nowhere near GPT3's strongest model davinci (for which we have a lot of use cases; for their second strongest model curie as well).

Having said that, I am very happy that eleuther.ai works on open-sourcing GPT-x. Eventually, it will be successful.


Prompt italicised...

She stroked Bill's Microsoft floppy disk until she found his hard drive. She erased the office program and reset it. While she was there, she removed the White Dragon Armada artwork, cleaned off the end panel where the logo had been, and folded up the blank back cover.

# CHAPTER 17

# Binky

Ralph and Tiffany moved into a huge and very classy apartment in the First National Building. They loved having their own place. But they were stuck with one more mouth to feed. They also felt the need to set up a security system. Binky hadn't realized they were going to have to get a nanny and a security guard.

Binky was too busy coddling their dog to think of setting up the security system. Binky's fondness for naps got to be more important than the realization that there had to be an alarm to protect their stuff. Ralph and Tiffany left Binky alone with the dog. If the dog barked, the apartment was a crime scene.

"What's the problem?" Binky said to the dog. "Don't you want me to protect you?"

"Naw," the dog replied. "I'm fine."

"Fine!" Binky agreed. "I can protect you."

He began to


This sounds like something Neil Gaiman might write. Especially the last exchange.


Here is a second reply to the same prompt:

She stroked Bill's Microsoft floppy disk until she found his hard drive. She moved to its top and looked at its contents. Sure enough, she found Bill's e-mail messages. There were the familiar missives from friends, family, and her book. She scrolled through them. There were messages from brokers, clients, and partners that she'd done business with. The ones she did not recognize. Then she came to one that caught her eye.

The subject line read "Note from my lawyer."

It was obviously Bill's lawyer. And she knew why the lawyer was e-mailing. She opened the message. It was two short paragraphs of nonsense.

She saw that the sender's e-mail address was hidden behind a protective program. She pulled her phone off the edge of the bed and held it over the computer screen. When the screen lit up, she saw that the sender's e-mail address was a fake address, not the real one she'd been using to contact Bill.

Sarah couldn't be sure of anything anymore. All her suspicions were now confirmed. This was the real purpose of the meeting with Bill. He wasn't interested in business or investments. He was interested in money. She'd known from the start that his monetary concerns were much bigger than her.


Here is another one I found interesting:

The ghost haunted the hunter not the slide rule. As the year began a Chinese gaiwan had been placed in the hunter’s bedroom in case the bottles of Mai Tai began clinking. A brand-new Sams Publishing biography of great curmudgeon Alfred Hitchcock appeared on the coffee table. Because film history, when you’re into it, can get boring.

One day in early December, the hunter was in his house studying on his laptop when he heard something behind him, like a glass clinking. He turned to see the ghost of Mr. Hitchcock. The director was dressed in a tuxedo and looking over his shoulder, not at the hunter but over his shoulder, a look the hunter understood. Hitchcock knew the hunter was there. He hadn’t yet noticed the Chinese gaiwan, or the Sams book, or the Mai Tai.

“Did you make the movie Arbogast?” he asked.

“I think it was Arbogast,” the hunter said.

“It was,” Hitchcock said. “You may not know this, but I had to rewrite the ending.”

“Really?” the hunter said. “Why?”


I wonder how hard it would be to modify GPT-j to allow not just an initial prompt but several stepping stones. So you could outline the story and have it flesh it out.


We can even make it a REPL like thing. Where we give it ever expanding prompts.

1. Give initial prompt

2. Feed prompt to GPT

3. Read and edit (if needed) the response

4. Add to the response some new sentences

5. Make full text the new prompt

6. Go to step 2.

But I don't know if there is a limit on the prompt

Edit: goddamit even my pseudo code has bugs.


Try some political prompts to see which side this model favors. With this thing you could run a one person news agency with an army of followers. The thing is that most news reporting isn't about facts: it's about feeding the readers content they want to hear and GPT is outrageously good at it. Hmm.. maybe that's a legit business idea.


Has anyone tried to run the pretrained models on non Google Cloud hardware? Are normal PCs powerful enough to run it locally?


As it says in the repo it's been designed for running on TPUs so it would not run out of the box on your PC. Also most normal PCs wouldn't have the GPU ram (at least for training) to run it.


Depending on one's definition of "your PC", they do sell "local" TPUs: https://coral.ai/products/accelerator


See also test cases of using it with code: https://news.ycombinator.com/item?id=27621114

Incidentally from my limited testing with GitHub Copilot, the results are surprisingly close between Copilot and GPT-J.


The output looks already more intelligent than a significant chunk of internet commentators. I wonder what happens when someone retrains this to produce opinionated replies to articles and comments and hooks the output to about a million reddit/twitter/facebook/whatnot accounts?


I tried the web demo at https://6b.eleuther.ai

The prompt I gave it was “This morning when I woke up and got out of bed, I never expected what would happen later that day.”

Here is the result. Warning: NSFW.

“This morning when I woke up and got out of bed, I never expected what would happen later that day. After a fantastic night of sex with my very sexy girlfriend, I decided to go over to my gym to workout and shower. It has been a few months since I have gone to the gym and being in the shower reminded me of how much fun it was when I was single and how much I missed the thrill of working out. After I got out of the shower, I went to grab a quick towel to dry off and after drying my body, I put on my towel and started to walk out of the bathroom. The towel slipped and fell off of my body, leaving me completely naked in my home gym and I was caught off guard. I sat there in the middle of the shower, horny as fuck. While I had a good look at my now-exposed body, my cock was already half-hard and I wasn’t about to leave like this so I began to stroke it a little. I thought to myself, “What am I going to do?” I couldn’t get out of the shower so there was no way to get out of this predicament. As the light streamed through the bathroom, I could tell by the position of the sun that it was going to be a warm day. I thought about running out into the”

I then tried the prompt “I’ve decided to become a better contributor to Hacker News.” After a half-dozen responses of “Unable to connect to the model. Please try again,” I got this:

“I’ve decided to become a better contributor to Hacker News. I have given up on the idea of never leaving and rarely reading, I do hope to engage. Here’s a little secret about me, I have ADHD. I also have autism. So, I’m also here to tell you about some of the many, many struggles I’ve faced as a member of both groups. I don’t want to label myself, I don’t know exactly what kind of ADHD or autism I have. I also don’t think it is relevant to me. In fact, I think it is completely irrelevant to my experiences. But, hopefully by writing this, you may learn a little about yourself and the people you have around you.

This is going to be a very long post, because I have a lot to say. I want to start off by telling you something I don’t talk about enough. I am open about being open about ADHD. I am also open about being open about my autism.

My ADHD makes my experience on the autism spectrum look different than the experiences of others.

I have ADD (attention deficit disorder) — I’m good at focusing on a single task, but when my attention starts to drift, I have a”


Three more. The prompts are in italics. They read like dreams.

On my first day in London, I went for a walk with my friend Katherine. We began at Trafalgar Square, where there is a big bust of Winston Churchill and a smaller bust of King George III. We then crossed the park, along a path which leads toward a tunnel that used to take me under Waterloo Bridge and the Thames. In the tunnel, we caught up with three other friends, one of whom was carrying her violin. We walked across the Thames to Westminster Bridge, and from there into the city.

I had been in London for several days, and the first time I saw it, I thought it was my former home. The people I knew from Washington University were still there, and several of them showed me around the city. One of my friends from that period, Helen, took me to meet her husband in a part of London called Notting Hill, where we wandered through lovely little houses built on narrow, leafy streets. I did some sightseeing, and was at the Tate Modern in an exhibition called “Images in Time.” I remember the huge portraits of two African-American women and a couple of sculptures by Charles Ray, which I liked very much. But my favorite work in the show was Andy Warhol’s tribute to the singer Madonna. It was”

On my first day in San Francisco, I went on a hike on the Golden Gate trail. When I got to the first part of the trail, I had to cross the railroad tracks. At one point, it was a busy and bustling town, but now it’s not even alive with traffic at all. The area was a dump; it was no longer a dumpy dump. It was an amazing transformation. I had never seen such a thing in my life. As I hiked further into the Golden Gate National Park, I stopped to get a bite to eat and get a bottle of water. I came across a beautiful, serene old-looking gentleman standing in front of the store.

He was wearing a long-sleeved shirt, with sandals, and what looked like a loose, fitted, khaki pants. His head was shaved bald, except for a sideburn and a small patch of hair right above his top ear. He had bright, orange-tinted, 70’s-style glasses on, and a huge smile on his face. He said, “Are you visiting? Are you from San Francisco? You’re about to get to the Golden Gate National Park? This is a wonderful place. You should see the ocean. There is nothing”

On my first day in Tokyo, I go to see a little newspaper office. There’s a wall full of sheets of newsprint and the woman manning the desk, in her late twenties or early thirties, with a Hello Kitty pen set in her desk drawer and her hair in neat ponytail, is diligently putting together a comic strip called I Love You By Hokusai, which is a pastiche of the famous Japanese woodblock prints. The strip features a girl, Yuki, who loves Hello Kitty, Natsuki, a girl who’s into anime and Japanese hip-hop, Minnie Mouse, and Hot Chocolate. The comics aren’t going anywhere—they are, in fact, part of the paper’s entire newspaper business. They’re just going to be running more of them now.

My day, though, is spent going to cafes with Haruki Murakami and other writers. In the days after we’d met, I saw a couple of articles about Murakami. One described him as not just a famous author, but a guru of the literati: how he was well-dressed, didn’t show off his money and just preferred to spend it on books rather than on his five luxurious apartments”


> Unable to connect to the model. Please try again.

HN "hug of death"?


I suspect so, still getting this error as of 21.39.00 British summer time.


I hadn't noticed this format before, but wouldn't one want to include a sha or git tag in such a citation?

      howpublished = {\url{https://github.com/kingoflolz/mesh-transformer-jax}},
I did see the month and year keywords, which I guess would narrow down the number of commits it could have possibly been, but nowhere near the reproducibility of citing the software used


Generally people should add git hashes, but I've not seen it done. More than half of papers don't even have the source code available - so pointing to any source is already great.

Also it's not a particularly well formatted way of citing a URL anyway. I believe it is more correct to use `url = "<URL>"` in a `@misc` [1]. But it's also unpublished (non peer-reviewed). so doesn't matter so much.

[1] https://en.m.wikibooks.org/wiki/LaTeX/Bibliography_Managemen...


I wonder if GPT-3 will get some pressure from this to become open source or at least lower their prices if they want to remain competitive.


I sure hope so. Got beta access early on, with a pretty strong use case, namely generating human-readable summaries of records in a large database from meta data. Once I saw the pricing I thought that must be a joke. If you’re just fooling around and create a handful of prompts, okay, but it’s prohibitively expensive for anything starting from a few thousand requests, not even to mention a 100+ million :)

I think we might actually just try GPT-J…


I doubt it. While a bit cheaper this isn't that much cheaper to run yourself given that it has much worse results than GPT-3 and would require way more maintenance to keep running.


Longest prompt it allows me is:

"It was late when the bell to my office rang, so late my secretary had already packed up for the night. I was sitting in the dark nursing a scotch I shouldn't finish, and I wasn't expecting new business at that hour. "Come in", I drawled, expecting one of the other private detectives in the building, mooching some booze or cigars. But it wasn't one of my weather-beaten comrades, it was a striking lady, tall and wearing an suit that explained very clearly she was not to be messed with. She was about my age, older than she felt but younger than you would think based on the outfit. "Are you Private Detective Maus?" she asked, her accent somewhere between Brooklyn and the Hamptons. "Call me Saul ma'am", I replied, "have a seat."

Anything longer than that - "Could not connect to the model" error.

Here is the result:

I fixed her with my best cool and witty stare and made myself very comfortable on the edge of my desk. "Now I'm not going to lie to you Miss, I don't know how you found me, but what you are asking is illegal, by law." I nodded at my glass of scotch, "but if you are desperate I'll help you out with a small fee." She smiled at me; the least I could do after she just walked in the door on my behalf, it's a first amendment issue. "I'm not desperate, I'm looking for information on a man named Clarence Berleth. He works here as a key-keeper for a judge on the third floor. It's fairly simple, all I need is the address and phone number to the judge's office." "Berleth's not a judge, he's a private detective." "Okay, fine, then I don't need a judge's help. I just need his address and phone number so I can call him up and ask him what he's doing talking to my husband. The judge doesn't seem to know anything about this little incident. No, wait, the judge seems to be related to the situation. But I don't think it's a good


Prompt: "no"

Result:

no}\ \ \text{otherwise}\\ \end{array} \right.$$

Since $\max_{i \in \left[m\right]} \left\| {\mathbf{x}}_i - {\mathbf{x}}_0 \right\|_2 = \epsilon$, we need to upper bound the probabilities $\mathbb{P} \left( \min_{i \in \left[m\right]} \left\| {\mathbf{x}}_i - {\mathbf{x}}_0 \right\|_2 < \epsilon \right)$. We denote this probability by $P_1(\epsilon)$, since it is the first term of $\mathbb{P} \left( \max_{i \in \left[m\right]} \left\| {\mathbf{x}}_i - {\mathbf{x}}_0 \right\|_2 < \epsilon \right)$. Given a covering of size $B$, we need to bound the probability $\mathbb{P} \left( \min_{i \in \left[m\right]}


Love this!

I am doing a chat app and I need to generate answers to 1000 prompts. Can anyone recommend the simplest method to achieve it without Kubernetes and training my own model etc?


In the middle of a generated text about low-carbon concrete, I found some Wikipedia markup:

Use Low-carbon concrete is used to build low-carbon structures and other buildings. Examples of structures that are made of low-carbon concrete include buildings in greenhouses and offices.

References

Category:Concrete Category


Impressive work. I finally managed to add it to NLPCloud.io with Transformers but it takes 40GB of memory to startup, and then a 16GB GPU is not enough unfortunately, which makes it quite costly to run it. But text generation is really impressive, great job!


As a beginner, could anyone point me to resources on how to use this to create a simple web app?


Doesn't look like it's ready to use for a web app, just because the response times are so high. It could also be particularly high today due to the "HN hug of death", but I expect usage will remain high. That's assuming you want to use the API, which, while they don't provide any information about it, you could figure out what's going on by using the demo page and then inspecting the network requests tab in your browser to see what requests are being sent and what responses are coming back.

Another option is to run the model yourself by downloading the weights and figuring out how to use them. This will be a bit challenging especially for a beginner.

Assuming you can do one of these two things, setting up the web app is the easy part. You could use a framework like Flask and just expose an IP address on a virtual machine living in Google Cloud Platform or AWS.

Hope that helps.


Thanks so much, that explains it! I was looking for an API document but, of course, couldn’t find one because it doesn’t exist. Inspecting the network requests is a great idea!

I’d like to try running the model myself. What would the a basic would the basics steps be for doing this?


Do they have an API I can connect to? Or is there a 3rd party that could be one? I would really like to connect to this for my pet project and am happy to pay for access, if anyone has any ideas let me know thank you.


Any ideas how to make the generated text end on punctuation? All appear to cut off randomly during the last sentence rather than more naturally at punctuation like a period.


Is "The Pile" dataset English only?


That's dope! How realistic is it to train this with an own (way smaller) dataset on a normal desktop?


Are models like GPT-3 ever used as a basis for transfer learning like the BERT/etc ones?


OpenAI has allowed certain customers to finetune GPT-3 for specific datasets.


Most children in America know of Rosa Parks, the African-American woman who in 1955 refused to give up her seat to a white passenger on a bus in Montgomery, Alabama. Her act ultimately helped spark the African-American Civil Rights Movement. Consequently,


Web Demo doesn't work for me on mobile, works fine on pc.


Let the spams begin


Hi Dave, How is the work about mars going on ?


Sample / demo?


From the README: https://6b.eleuther.ai/


Prompt: Jason made a pancake.

Result:

> Jason stared at his image in the griddle's stainless steel surface. The image of a large, freckled man wearing a black t-shirt and green plaid boxers. It was like he'd stumbled upon an abandoned photo shoot, or a bizarre prop out of an old TV series. A figment of his imagination from a nightmare he'd had once where he was alone in a house, free to do anything. Aah. That was it. He'd been sitting in the dark eating his microwaved pizza, watching some sort of porno, and thinking about strange little things he'd seen when he was growing up. When he was younger, things were simply that much stranger. Everything he'd seen, for that matter. It was weird how that worked. He'd stare into space, the fog of drugs would clear, and some unseen object would immediately become a mirage. The pattern was always the same, though. He would fixate on some tiny detail, and a few seconds later that detail would become a goddamn spectacle. Even the stranger things in his life took on that shape. The porno had images in it that meant nothing, in their own way. And he was always left to ponder what sort of symbolism might be hidden beneath

Such an emo AI.


Is it still working for anyone? I can't get it to work, it always shows me an error.



Is it better than GPT-2? I see gpt-2 being open sourced then what's the deal about?


AI generated text these days feels so natural that when you return to human-generated text you can't really tell. Incredible.


I like that OpenAI forces people to use theirs ethically. There's obvious benefits to a competitor, but I'm wary it might be a Pandora's Box.


For an absurd definition of the word ethics, maybe. It's thinly veiled corporate ass covering pretending at ethics, at best.

Anyway, It's not quite good enough at coding to prompt an intelligence explosion, and the resources to run it are rarified enough to prevent most worst scenario abuses. However, if it becomes more efficient, accessible, and capable, all those weird AI threats are likely to be a lot more relevant. Moores law alone means the next couple decades will be exciting.


It's not about killer robots. It's about fake reviews, fake tweets in support of some political opinion, more convincing viagra spambots. Also these bots are as racist, sexist and bigoted as the data they're trained on. It's not intelligent, but it's dangerous in the hands of humans. It's like raw fire without any safety measures.

At the very least, I hope that if they're going to make it open, it comes with similar filters built into GPT-3.


There's no unified underlying intelligence in the model that could be called racist, sexist, or bigoted. These "bots" are not, because they cannot, be such. The output is dependent on the patterns, context, and content of 800 gb of human generated text. Any and all of which can be seen as morally insufficient depending on your taste.

Clever and iterative use of prompts could identify, filter, or modify potentially offensive text for whatever level of pearl clutching floats your boat, but transformers are algorithms approximating parts of human cognition. The algorithm doesn't have an ideology, morality, ethics, dogma, or any of a myriad of features you can project onto it. It's a tool, which can be used well or badly, and part of using it well will involve not attributing to the tool anthropomorphic features it does not possess.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: