rwojo's comments

rwojo · on Oct 25, 2023

This package suggests building a dataset and then using LLM-assisted evaluation via GPT-3.5/4 to evaluate your RAG pipeline on the dataset. It relies heavily on GPT-4 (or an equivalent model) to provide realistic scores. How safe is that approach?

IggyMac · on Oct 25, 2023

Using LLMs to evaluate other LLMs sounds like a it would be dumb, but LLMs work in mysterious ways. I’ve found this approach useful though. In the context of RAG, using an LLM to evaluate whether a context chunk is relevant to answer a question is a nice complement to using the vector embedding semantic similarity search. Sometimes prompting the LLM gives better results than vector similarity.

joewferrara · on Oct 25, 2023

Joe here. It's difficult to evaluate natural language responses that come from LLM applications - there are not hard metrics to measure performance like there are in say supervised machine learning tasks. For RAG, you have the response to evaluate as well as the retrieved context chunks. We found that using gpt-4 as an evaluator to measure the quality of RAG responses and the relevance of the context chunks gave similar results to using human evaluators at Tonic to do the same task. Some research also agrees that using LLMs as an evaluator for natural language tasks gives similar results to using human evaluators https://arxiv.org/abs/2306.05685.

As far as whether using gpt-4 is a safe approach, the best you could ask for is that gpt-4's evaluations match those of human evaluators, and that's what we've found as well as this research.

rwojo · on Sept 28, 2021

Something not mentioned much is that you can respond to these messages that come in through a Masked Email, and your identity is hidden on the outbound messages as well.

They seamlessly integrate with the sender identity feature in Fastmail making it very clear that you are replying from the Masked Email.

From a quick analysis on the headers, I don't see anything that leaks who your real identity is, but of course Fastmail knows and could reveal that if legal reasons exist.

Overall smooth feature along with the ability to use a custom domain for portability (to a less sophisticated wildcard setup, or another provider).

rwojo · on March 3, 2020

Very interested to see this as I was about to work on the same thing!

rwojo · on April 1, 2019

Kabbage, Inc | Full-Stack, Backend, QA/SDET, iOS, DBA, Data Engineers | Atlanta, GA and New York City (NYC) | Full-time ONSITE | kabbage.com

Kabbage is a leading FinTech company changing the way small businesses solve cash-flow challenges. Fully automated and deeply connected with its 160,000+ customers, Kabbage provides access to funding in minutes, extends more than $10 million every day to small businesses, and powers borrowing experiences for some of the largest companies in the world. While we've received numerous awards and recognition—such as Entrepreneur's Top Company Cultures, Inc Magazine's Top Private Companies, GlassDoor’s Best Places to Work, and Forbes FinTech 50 — it is our people, our culture, and our leaders that make Kabbage such a great place to work.

Our Technology teams are growing fast and we're hiring for the following roles:

* iOS Developer: https://www.kabbage.com/company/careers/job/1582281

* Software Engineer: https://www.kabbage.com/company/careers/job/1475102

* Senior Software Engineer: https://www.kabbage.com/company/careers/job/1488822

* Software Engineering Manager: https://www.kabbage.com/company/careers/job/1593159

* Software Development Engineer in Test: https://www.kabbage.com/company/careers/job/1487682

* Data Platform Engineer: https://www.kabbage.com/company/careers/job/1515789

* Database Administrator: https://www.kabbage.com/company/careers/job/1593607

See all of our job postings at: https://www.kabbage.com/company/careers/

rwojo · on Sept 10, 2015

Anyone find a secret command line 'default' setting to change how Mission Control / Spaces works so it shows the previews without having to go to the top of the screen?

josho · on Sept 10, 2015

What gets me is the useless numeric desktop naming. I'm shocked that we still can't rename a desktop, eg. TaskXDesktop instead of Desktop 4.

ackyshake · on Sept 10, 2015

I hate this change too, but I believe this was done to improve the frame rates. 10.11 finally has smooth Mission control animations on a retina display, even when a lot of apps are up and running, which is great.

I don't see any other reason why they would hide this from showing up by default.

dzhiurgis · on Sept 10, 2015

If you use you display in some Scaled mode it could be the culprit of low framerate as it then runs at something like 3k by 2k resolution which is then downscaled to your selection.

rwojo · on Sept 11, 2015

I think the improvements are more around Metal and such. It see improvements everywhere not just Mission Control, especially when hooked up to my 4K monitor.

ryenus · on Sept 10, 2015

+1, also annoyed by this change/regression.

rwojo · on Nov 27, 2012

Indeed it is cheaper to just get this at $49 than upgrade for $249. This is a great way to upgrade to the V3 chip.

You get more SNPs if you have V2 and they add in V3, so I'll call them and see if I can link it to my account.

Here are the SNP counts per chip version combo:

V2 only: 576,000 SNPs V3 only: 967,000 SNPs V2 + V3 (upgraded): 996,000 SNPs