nice! an alternative solution I came up with (it's the same intuition as divide and conquer, just a flattened out version, same value of 49):
Just go left to right on each bottle, and keep track of how often each prefix has appeared (i.e. on the first bottle, if you get 1, 0, 0, 1), we'd keep track of: {"1": 1, "10": 1, "100": 1}. Now, if a prefix of length 1 appears 7 times, or a prefix of length 2 appears 3 times, we stop measuring (because there's only 1 left).
In all cases, for 8 bottles you will need 4 measurements, for 4 bottles you will need 3 measurements, 2 bottles will require 2 measurements, and 2 bottles will require 1 measurement. (4 * 8) + (4 * 3) + (2 * 2) + (2 * 1) = 32 + 12 + 4 + 2 = 50. But for the very last bottle, you can just do 0 measurements, by way of process of elimination. so 50 - 1 = 49.
yeah, I've heard a bit about this- basically, it is technically legal right now to have multiple job offers from separate employers to get multiple H1B lottery tickets. The abuse comes from some shady operations, that will basically give you 3-10 "offers" from various consulting firms, where you'll be paid way less than market rate to work at any of them, with the idea being if you win a ticket from any then you trade off your potential salary for the H1b.
nice, I've been looking for something like this! A few notes / wishlist items:
* Looks like for gpt-4 turbo (https://artificialanalysis.ai/models/gpt-4-turbo-1106-previe...), there was a huge latency spike on December 28, which is causing the avg. latency to be very high. Perhaps dropping top and bottom 10% of requests will help with avg (or switch over to median + include variance)
* Adding latency variance would be truly awesome, I've run into issues with some LLM API providers where they've had incredibly high variance, but I haven't seen concrete data across providers
Thanks for the feedback and glad it is useful! Yes, agree might better representative of future use.
I think a view of variance would be a good idea, currently just shown in over-time views - maybe a histogram of response times or a box and whisker.
We have a newsletter subscribe form on the website or twitter (https://twitter.com/ArtificialAnlys) if you want to follow future updates
Variance would be good, and I've also seen significant variance on "cold" request patterns, which may correspond to resources scaling up on the backend of providers.
Would be interesting to see request latency and throughput when API calls occur cold (first data point), and once per hour, minute, and per second with the first N samples dropped.
Also, at least with Azure OpenAI, the AI safety features (filtering & annotations) make a significant difference in time to first token.
This was surprisingly fast, 276.27 T/s (although Llama 2 70B is noticeably worse than GPT-4 turbo). I'm actually curious if there's good benchmarks for inference tokens per second- I imagine it's a bit different for throughput vs. single inference optimization, but curious if there's an analysis somewhere on this
edit: I re-ran the same prompt on perplexity llama-2-70b and getting 59 tokens per sec there
But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down?
I've been pretty bearish on gen AI for music, but this is the most fun I've had playing with an AI tool in a long time- the filters remind me of the OG Instagram filter effect, where even shitty photos from phones could "magically" be enhanced.
This + the Music ControlNet post from yesterday gives me some hope that audio AI will go the direction of creative tools, rather than dystopian full song generation.
Just go left to right on each bottle, and keep track of how often each prefix has appeared (i.e. on the first bottle, if you get 1, 0, 0, 1), we'd keep track of: {"1": 1, "10": 1, "100": 1}. Now, if a prefix of length 1 appears 7 times, or a prefix of length 2 appears 3 times, we stop measuring (because there's only 1 left).
In all cases, for 8 bottles you will need 4 measurements, for 4 bottles you will need 3 measurements, 2 bottles will require 2 measurements, and 2 bottles will require 1 measurement. (4 * 8) + (4 * 3) + (2 * 2) + (2 * 1) = 32 + 12 + 4 + 2 = 50. But for the very last bottle, you can just do 0 measurements, by way of process of elimination. so 50 - 1 = 49.