Apr 11, 2024
Besides cost the other bottleneck is time. If model has to process millions of tokens then it will obviously be slower .
Moreover I feel it’s counter intuitive;I.e what is better option ? Passing millions of SAME words in every API call or somehow pass it once and then query it as many times as we want