johncip's comments

johncip · on May 9, 2017

Some things cause (or have the risk of causing) acute damage, with others, it's chronic.

johncip · on May 9, 2017

I think you could make a case that charging the highest price for something that the market will bear corrects an inefficiency in the market.

http://www.investopedia.com/terms/i/inefficientmarket.asp

It's not that they started producing more units, it's that some folks (the ones using the drugs) were taking advantage of a market inefficiency (the insufficiently low price) which when corrected, meant greater returns for Shkreli et al.

Just as when a "retail" day trader hits on a system that enables them to make a large amount of money, they're taking advantage of an inefficiency, and, in a way, helping to correct it.

It's not pretty, but being ugly doesn't make it inaccurate.

forgotmysn · on May 12, 2017

but we're talking about a monopoly. a market inefficiency only occurs if there is competition, a more efficient alternative.

johncip · on Feb 22, 2017

> OCR itself is a pretty CPU intensive activity and takes a significant time to complete for many documents.

Leaving the quality part aside -- this job itself is easy to parallelize in that you can split it up by document or by page.

Open option is to run each job in Lambda asynchronously, with the input being a URL to the page or the full document, and have the job call back to you with the text of the page (or put it on S3 as a text file, or add it to a message queue, or whatever works). Regarding splitting: we've been using a python wrapper + pdfium for splitting PDFs into page images on Lambda, with excellent results.

To make the Lambda function, you'll either have to build e.g. Tesseract such that it fits into a 50MB zip, or download it while the Lambda function executes. LambCI has a set of docker containers that they've made for simulating lambda, and the "lambda:build" container makes building things easy and repeatable: https://github.com/lambci/docker-lambda. In a pinch, you can build on an Amazon Linux EC2 instance and it should work on Lambda, but you will have to be more careful about dynamic linking.

As another option: I'm not sure if it's been mentioned, but you can also try a ready-made OCR service before packaging up Tesseract, like this one: https://algorithmia.com/algorithms/ocr/SmartOCR.

So anyway, the performance part has good solutions, at least.

For fixing the accuracy: I know next to nothing about approximate string matching, but perhaps it would then be possible to do a fuzzy search over the text using something similar a Levenshtein automaton: https://en.wikipedia.org/wiki/Levenshtein_automaton.

You may also want to take a look at this: https://en.wikipedia.org/wiki/Bag-of-words_model

More broadly, I'm sure that there are text-based document classification methods that are robust against sloppy OCR. It may just take some research on the main approaches people take to document classification -- it's not my area, but my understanding is that this is typically approached with statistical methods. Otherwise your spam filter would get defeated by typos.

johncip · on Oct 14, 2016

Literally the first sentence after the preamble is:

  I use the term “computational thinking” as shorthand for “thinking like a computer scientist.”

johncip · on Oct 13, 2016

> The Democrats were happy to keep importing voters

I think it's important not to discount the difficulty of securing the border because of the size of it. And as you mention, American companies (like Smithfield) are happy to create incentives for border hopping because they can use the threat of calling INS to keep wages low and prevent unions from forming.

The problem with the idea of widespread voting fraud is the data doesn't support it, at all. Bush's DOJ came up with an estimate of %0.00000132 of fraudulent votes in federal elections, for instance. Note that by then it was explicitly illegal for aliens to vote in federal elections.

> We take for granted just how much Christianity has shaped our secular culture and morality.

I'd argue that across peoples and times you see more variation in theological doctrine than moral teachings in the various religions. Hinduism, for instance, is practiced by different people differently and encompasses polytheistic, monotheistic, and even atheistic traditions. But morality remains a feature, specifically the concept of karma. Sweden is by some counts 85% atheist, but I'm not aware of anyone calling it a den of iniquity. Certainly my Muslim-American friends are moral.

In other words, people tend to be recognizably moral regardless of their very divergent beliefs about other things. That points to a general human capacity for morality, rather than one specific to any one religion (that the others, even those predating it, were presumably lucky enough to develop independently).

Certainly moral codes differ. But they also differ within the same religion across time -- for instance, the modern Christian view of divorce vs. the ancient one.

Christian orthodoxy varies. Ancestor worship is common in African Christian sects. Protestant and Orthodox churches abhor the Roman Catholic practice of praying to statues. Unitarians discount the trinity. Sure, there's a common thread of Christian morals there, but I'd argue that it's the same thread you find everywhere, modulo views on homosexuality and a couple of other things.

I don't mean to be cruel but I find the assertion that America gets its morals from Christianity to be somewhat narrow in that it presupposes that Christians got theirs from on high, and ignores the similar moral teachings you find throughout the world and throughout history.

It also runs contrary to the founders' explicit intentions for the role of religion in government, and I would argue that it does a large disservice to your fellow Americans who aren't Christian.

wildmusings · on Oct 14, 2016

I'm not talking about fraudulent voting by illegal immigrants, I'm talking about the fact that they eventually become citizens when amnesty rolls around, and their kids become citizens automatically by birthright.

fl0wenol · on Oct 17, 2016

But the children who are citizens automatically by birth, you don't think they're more predisposed to want to stay here and to improve their communities if they end up staying? Are you suggesting that their allegiances ultimately lie elsewhere? Does that extend to children of immigrants born here legally?

If so, this is the same rhetoric used to discriminate or marginalize the early generations of immigrants in NY last century; my grandparents went through it. Not a lot of fun.

And if this is a ploy to gain sympathy to a particular party, I think that party deserves that continued support. Trust extended to the outsider begets trust as that outsider lays downs roots, and those children and their children would be likely to vote the same way by gratitude or tradition. Even if it distorts the way they would vote without that influence, I think the country overall benefits when immigrant communities feel supported, looking forward, participating in society, not isolated from it. It's how our country changes, grows, adapts.

johncip · on Sept 2, 2016

Hey, Gradescope dev here. What detaro said is on the money -- we're able to group identical short-answer responses so that they can be graded in one shot. It's not necessary to analyze the answer content for this.

Many (though certainly not all) of the instructors using Gradescope are teaching CS or Math courses with heavy enrollment. So each exam will have many submissions (even 1000+), and each submission will have a lot of short answers. Marking each one on its own is tedious, but until recently it was the state of the art for paper exams.

Instructors can and do grade essays on Gradescope, and are able to save time. But in that case the savings comes from being able to create rubrics on the fly, to change point values without re-adjusting every single marked paper, to grade across questions rather than across exams, to publish grades without having to type them all in, and so on.

There's a lot of grunt work that goes into grading, and it doesn't need to be the case :)

inputcoffee · on Sept 2, 2016

I may have misread it. Does that still count as AI?

Also, they have a robot grading the GMAT essays since 1999 (http://www.800score.com/content/essay.html)

mesozoic · on Sept 3, 2016

The classification of answers together is the AI

johncip · on July 14, 2016

I'd like an answer to this as well. I've only ever needed string refs, and the callback refs are noisy. I can see where the React team may not want to support both, but are string refs actually bad in some way?

danabramov · on July 14, 2016

String refs are bad in quite a few ways:

1. String refs are not composable. A wrapping component can’t “snoop” on a ref to a child if it already has an existing string ref. On the other hand, callback refs don’t have a single owner, so you can always compose them.

2. String refs don’t work with static analysis like Flow. Flow can’t guess the magic that framework does to make the string ref “appear” on `this.refs`, as well as its type (which could be different). Callback refs are friendlier to static analysis.

3. The owner for a string ref is determined by the currently executing component. This means that with a common “render callback” pattern (e.g. `<DataTable renderRow={this.renderRow} />`), the wrong component will own the ref (it will end up on `DataTable` instead of your component defining `renderRow`).

4. String refs force React to keep track of currently executing component. This is problematic because it makes `react` module stateful, and thus causes weird errors when `react` module is duplicated in the bundle.

This is why we want to move away from them in favor of callback refs that solve all those problems.

developit · on July 18, 2016

I think it's important people understand this. The last two points are precisely the reason String refs got moved from Preact's core into preact-compat.

Also, for the common-case usage of string refs, you can just use a helper to insert things into `this.refs`:

https://gist.github.com/developit/63e7a81a507c368f7fc0898076...

johncip · on July 17, 2016

Thanks for the thorough answer, Dan.

(Had I not followed the prescription for the sake of being future-proof, I probably would have had a painful debugging session over #3. And possibly #1 too.)

johncip · on May 23, 2014

I can't follow Jacques' reasoning when it comes to any of this. His first choice would have been a PHP framework (in 2014!), yet he goes on to criticize a bunch of languages which have sane == operators seemingly on the basis of his gut reaction to their very existence, while asserting things like "Python is like driving on the autobahn without guard rails" and "Lua is the sweet spot between Ruby and Python." That might be true if Python had shipped with exactly two data structures, and Ruby with zero, and neither of them had integers.

This is all before the frameworks are even discussed, which are included and dismissed for reasons just as arbitrary. Smaller frameworks like Sinatra are discarded because they can't handle the complex logic that flash cards demand, Django's out because of the whole auto safety thing, and we all know that ExpressJS didn't show up at Jaques' birthday party that one time. Play's out because Scala is unfamiliar, despite that not being a specific quality of Scala and just something that's true in general of programming languages one hasn't bothered to learn.

So the final list contains some more PHP horror shows and a few Java and Go frameworks to make things look democratic. Thankfully, Rails makes an appearance, but it feels like this is just because it was lucky enough not to get hit by a dart.

Let it be a lesson -- we all procrastinate, but when you get to the point where you're writing blog posts about dozens of languages and tools, none of which you have direct experience with, the best thing is just to pick something, do what you'd meant to do in the first place (although you will note that writing the flash card app was itself a way to procrastinate on learning Romainian), and discuss the pros and cons of the tools you chose once you actually know what they are.