@srchvrs.bsky.social: "This is too little for pre-training, but pre-training nowadays is probably not a bottleneck. For post-training 16M training samples can meaningfully improve performance on a lot of tasks."
@srchvrs.bsky.social: "This is too little for pre-training, but pre-training nowadays is probably not a bottleneck. For post-training 16M training samples can meaningfully improve performance on a lot of tasks."
It's a yet another example of how rich gets richer. Seasoned devs benefit from LLM assisted coding but their skills have developed already. So they get the best of both worlds.
"Meta is cutting around 600 positions out of the several thousand roles in its Superintelligence Labs, the Facebook owner said on Wednesday as it looks to make its artificial intelligence unit more flexible and responsive."
www.reuters.com/business/met...
Paper presents fascinating (as described by AI 😆) case study on how seemingly innocuous modification to neural net can drastically alter its perceived robustness against gradient-based adversarial attacks.
searchivarius.org/blog/curious...
🧵I recently finished my nerdiest computer science paper so far and it was accepted by TMLR: A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection. This work was done while I was at Bosch. ↩️
PS: Yes, this is a frontier LLM and it still cannot fully replace an editor.⏹️
4. Model complains about its own suggestion.
5. Bonus point: of course, often times the complaints are incorrect. If you further poke the model it will likely accept being wrong. Which, in turn, may not mean much because models are also clearly trained to agree with humans as much as possible.
↩️
🧵Hot take: LLMs still fail at basic grammar/style checking. A repeating situation that I encounter:
1. Ask a model about an issue.
2. The model suggests some rewrite for clarity/accuracy. Typically it's actually quite good (but watch for factual errors!).
3. Recheck the text again.
↩️
Results? If read tables correctly, there's only very modest boost in both recall & NDCG, which is within 2%. Given that the procedure requires a second retrieval, it does not seem to worth an effort.
🟦
dl.acm.org/doi/abs/10.1...
PRF was not forgotten in the neural IR times, but how does it perform really? Revanth Gangi Reddy & colleagues ran a rather thorough experiment and published it SIGIR.
↩️
It was doc2query before doc2query and, in fact, it improved performance (by a few%) of the IBM Watson QA system that beat human champions in Jeopardy!
↩️
research.ibm.com/publications...
I think this is a problem of completely unsupervised and blind approach of adding terms to the query. If we had some supervision signal to filter out potentially bad terms, this would work out better. In fact, a supervised approach was previously used to add terms to documents!
↩️
Fixing this issue produced a sub-topic in the IR community devoted to fixing this issue and identifying cases where performance degrades substantially in advance. Dozens of approaches were proposed, but I do not think it was successful. Why⁉️
↩️
PRF tends to improve things on average, but has a rather nasty property of tanking outcomes for some queries rather dramatically: When things go wrong (i.e., unlucky unrelated terms are added to the query), they can go very wrong. ↩️
PRF is an old technique introduced 40 years ago in the SMART system (arguably the first open-source IR system). ↩️
x.com/srchvrs/stat...
🧵Pseudo-relevance feedback (PRF) (also known as blind feedback) is a technique of first retrieving/re-ranking top-k documents and adding some of their words to the initial query. Then, a second retrieval/ranking stage uses an updated query. ↩️
If you submitted a messy paper, it's pointless to address every little comment and promise fixing it in the final version. 🟦
Instead, think hard about questions you can ask. What is the main misunderstanding? What will you have to do so that a reviewer will accept your work next time. Which concise questions can you ask to avoid misunderstanding in the future? ↩️
🧵 Dear (scientific) authors: I am being in the same boat too. However, if you receive a ton of detailed complaints regarding paper quality, do NOT try to address them during the rebuttal phase. It's just a waste of everybody's time. ↩️
@microsoft.com faces an interesting issue that might affect others selling wrappers around ChatGPT and Claude models: users prefer to use ChatGPT directly rather than engage with Microsoft's Copilot.
futurism.com/microsoft-co...
This is a rather blockbuster piece of news: the @hf.co library is dropping support for both Jax and Tensorflow.
www.linkedin.com/posts/lysand...
Humans are creating AGI and you claim that their intelligence is overrated?
Laptop keyboards are close to being unusable. Tremendous productivity hit.
Found a hidden gem on IR evaluation methodology from Microsoft "What Matters in a Measure? A Perspective from Large-Scale Search Evaluation."
dl.acm.org/doi/pdf/10.1...
Parental advice: if you master algebra you will know how to deal with your x-es.
@ccanonne.bsky.social feel free to borrow!
Some people say: A prompt is worth a thousand words! Excuse, but have you seen these ones? They are way longer!
However, unlike many others who see threat in the form of a "terminator-like" super-intelligence, @lawrennd.bsky.social worries about unpredictability of automated decision making by entity that's superior in some ways, inferior in others, but importantly is disconnected from the needs of humans. ⏹️
🧵A fascinating perspective on the nature of intelligence and the history of automation/ (and ahem development of AI). It is also a cautionary story of how to not trust AI too much. ↩️
Thus, it was quite insightful to read a recent blog post by
@netflix
detailing their experience in training foundation RecSys LLMs. It’s an informative read, packed with detailed, behind-the-scenes information.
🟦
Pre-training can be non-trivial. If you represent a set of users or items using fixed IDs, your model will not generalize well to a domain with different set of users or items (although there are some workarounds arxiv.org/abs/2405.03562).
↩️