More

ca_parody · on Sept 21, 2020

Is there a cicada in the logo?

sogen · on Sept 21, 2020

dragonfly

johnknowles · on Sept 21, 2020

a little too on the nose for me, especially given the cia's history trying to develop a dragonfly UAV [1] as a listening device. now they're seeking to be approved "flies on the wall" at the top labs in the country. given their history with abuse from tech development, i'm hoping people are approaching this with caution.

[1] https://spectrum.ieee.org/tech-history/heroic-failures/meet-...

ca_parody · on Aug 3, 2020

Sesco | Quantitative Developer | Pittsburgh, PA - Onsite | Full Time

Sesco is a proprietary energy-trading fund located in Pittsburgh, Pa. We are looking for quantitative developers to work in the automated trading team and help build performant, risk-ajusted trading systems. Contact sqs@sescollc.com for information.

ca_parody · on July 29, 2020

I would also add, for data-science esc tasks (or bulk queries in general) - COPY {} TO is unreasonably fast - often much faster than executing the standard select (especially if that select is being executed by a driver in a slower language).

_

[0] https://www.postgresql.org/docs/9.2/sql-copy.html

haki · on July 30, 2020

See this

https://hakibenita.com/fast-load-data-python-postgresql

ca_parody · on July 24, 2020

I am sympathetic to the practical issues & biases at play when discussing the systems we often call meritocracies; especially when those systems result in anti-meritocratic behavior.

However, what is the reader supposed to come away with here as an alternative? Equality of outcome? Should it really be that any given combination of work ethic, dedication, novelty, and a bit of luck will not land you above the mean? I personally am all for improving meritocracies so they live up to the name, but abolishing them due to their implementation failures seems silly and more importantly, the alternatives are dangerous.

Symmetry · on July 24, 2020

Well the traditional alternative is that you inherit your profession from your parent. In terms of political leadership this would be aristocracy but it was pretty common for others too. Men with the last name "smith" expecting to follow their father in becoming smiths for example. But of course levels of employment in different jobs are nowhere near as stable as they were back then. Not too many smiths around and we've gone from 80+% of people working as farmers to less than 5%.

I suppose you could have everybody randomly assigned to jobs. Or you could people have a list of preferences and to the extent that there are more applicants to a given job than open slots randomly assign them then move the other applicants to their next slot.

DFHippie · on July 24, 2020

> what is the reader supposed to come away with here as an alternative?

Humility, for a start.

ca_parody · on July 19, 2020

Sadly one doesn't - unless one happens to have ~300GB of RAM to fit the model into memory and a close personal friend at openai who will share the learned weights with you. Training your an even more expensive endeavor.

Presumably this is how they are justifying the for-a-price API; "its not like you can run it on your home computer anyway". For now, the API is private and geared towards researchers. Still a bit bollocks though.

There are plenty of wrappers [0] around GPT2 though - and those you can probably run on your home workstation.

_ [0] https://pypi.org/project/gpt2-client/

ca_parody · on July 17, 2020

Has the 747 transported to most human-distance (people * (km|miles)) of any other make&model transportation machine? I would imagine so...

pirocks · on July 17, 2020

I would guess it's actually the 737 or a320. Long haul flights are significantly less common.

ca_parody · on July 16, 2020

Honestly, for however much this project either (a) is a genuine archeological move for the preservation of information or (b) to get good press, all I genuinely thought when this happened is "aw shucks - wish i fixed those bugs before they zapped it onto film and flew it to santa clause".

jcahill · on July 17, 2020

I am a web archivist with an archival project on Svalbard that predates this GitHub initiative.

Additionally, large-scale github-specific projects like https://gharchive.org (formerly GitHub Archive) have existed for some time.

In my experience, code is more likely than not to be preserved in a stale revision, if at all.

The most common forms of preservation are (a) simple tarballing and (b) git bundles.

ca_parody · on July 15, 2020

Forgive me - but how does this avoid the chicken&egg problem here. Without digging through the promo copy, why would one programmatically label training data to do ML on if they have such a program to label data...

imranq · on July 16, 2020

That's a really good question. I took a class with one of the professors who started Snorkel.

The way he broke it down was either you can incorporate rules into your data or into your model. Because we want our model to be as general purpose as possible, it turns out you can squeeze some extra performance by "bronze/copper" quality data with handwritten rules in your dataset.

You can think of the model getting an extra boost from the latent knowledge within the rules.

manojlds · on July 16, 2020

Their paper explains it - https://arxiv.org/abs/1711.10160

Snorkel itself has been a open source package for a while - https://github.com/snorkel-team/snorkel

This new announcement is about Snorkel Flow

sriku · on July 16, 2020

Labels are knowledge about data. If you already know some rules that work reasonably well based on your domain experience, then Snorkel lets you capture those as "labeling functions" that may not cover the whole ground or can be "noisy". Snorkel can then build a model to label your data accounting for the "noise". Combining that with some "gold" labels (done by humans), you can use the generated labels on a large data set to build a higher quality model that generalizes better. This is similar to how you can take several low quality models and by virtue of them having expertise over different parts of the data, build an "ensemble" model that performs better than any of them.

Imho, Snorkel kinds of tools ("weak supervision") are game changers for ML .. though the biggies get all the press. So I'm excited to see this end to end direction taken by the team.

master_yoda_1 · on July 16, 2020

is not this done for years and called synthetic data generation, simulation etc.

sriku · on July 16, 2020

Not data generation. Label generation. .. but the charitable interpretation of your question is valid - we've been doing such ensembling to make higher quality models for some time now. It's getting some good structure, practice and tooling around it is what I feel.

master_yoda_1 · on July 16, 2020

Yeh then advertise it as a tool rather than AI. The problem is that sorkle is trying to sell snake oil on the name of Stanford and AI. Under the hood it is just a data generation pipeline. Remember you can't put label on random data. So "Not data generation. Label generation" is totally does not make any sense and sound to me like "brown sugar".

ianhorn · on July 16, 2020

I saw a talk on Snorkel a few years back, so I don't remember perfectly, but it seemed to be an iterative process. It's a tool for you to build and refine simple rules. If you have ingredients, a simple heuristic "<number> <units> <ingredient>" will get a lot of them, but there are tons of edge cases. With more heuristics, you might get lots of those, and so. I think it was a tool to help you explore and iterate on those heuristic labeling functions quickly. Then you can label the stuff that's hard in a more expensive way or something. I thought of it as noisily hand labeling sets of examples at a time rather than single examples at a time. This is all memory from a random conference talk or paper or something years ago so take it with some big grains of salt. I do clearly remember thinking it seemed really cool at the time.

eggie5 · on July 16, 2020

A human will label data according to hand-rules or heuristics. What's the difference is a program labels data according to hand-rules or heuristics.

The down-stream discriminates model's goal is to generalize via supervision.

ca_parody · on July 15, 2020

To me, Go feels condescending to write without generics. I may just not groke the idioms - but putting users in a walled garden that the standard library has the privileged to escape from (i.e. (Map<k, v>)) just seems to go too far in not trusting the programmer. The fact that generics have taken so many years - to still be talking about what [<( to use is beyond me.

There is a difference between becoming C++ and allowing programmers to make fundamental abstractions without interface{} hacks.

throw_m239339 · on July 15, 2020

The problem is not so much the absence of generics but the availability of crazy reflection at runtime compared to the rigid type system at compile time. There absolutely is a lack of balance here.

All these things were already pointed out 10 years ago and were met by "you don't need that with Go", "Go back writing Java" sort of contempt, which gave the Go community a bad reputation.

Go could have been fixed 10 years ago if Rob Pike and co listened. They didn't want to because they thought they knew better than everybody else. ADA already fixed the genericity problem while keeping things readable with a limited form of generics as incomplete types.

pjmlp · on July 15, 2020

As I already mentioned a coupled of times, even CLU like generics would have been a good enough solution.

Something that they acknowledged not bothered to look at initially.

https://go.googlesource.com/proposal/+/master/design/go2draf...

> We would have been well-served to spend more time with CLU and C++ concepts earlier.

BenFrantzDale · on July 15, 2020

Well said. As a C++ programmer I always find Go rubs me the wrong way. Like they want to make my life difficult. I think you hit the nail on the head that it feels condescending. My visceral reaction to not being able to write a standard library as performant as the provided one is “this is a toy language”. I know it’s not and I know I have to use C to do that in Python, but still.

yiyus · on July 15, 2020

Maps are part of the language. The standard library is written in Go. There are a few special packages (unsafe, reflect, ...) but maps are not one of those cases.