Tian and Karolina and team are at ICLR. Come say hi.
@roydanroy
Research Director, Founding Faculty, Canada CIFAR AI Chair @VectorInst. Full Prof @UofT - Statistics and Computer Sci. (x-appt) danroy.org I study assumption-free prediction and decision making under uncertainty, with inference emerging from optimality.
Tian and Karolina and team are at ICLR. Come say hi.
Curious. Didnβt know meta had a PPL team.
I like to think about non-reasoning model responses as vibes.
So whoβs read the 2027 article? What do you think?
Someone has suggested I check out bsky again. So I'm back looking around here. Notification list is kinda boring. So any good conversations going on? Perhaps about LLM/AI reasoning?
Of course.
Anyone else have the worry that a lot of LLM research is .... just bad psychology?
And, to achieve the results in this paper, what was the most challenging part? Why had previous attempts fallen short? What was your key new insight?
Very interesting. So, what was the biggest hole to fill, in terms of hypotheses?
Okay, so just a few* thoughts (*this got longer as I wrote π β¦.long thread)-
Acknowledgments.
I got to ski Revelstoke this winter break.
Couple observations: the price of receiving 600 cm of snow by Jan 8 is that it is constantly snowing. Saw almost no sun the whole time and the peak was often in whiteout conditions (though North Bowl was always clearβ¦).
See image for more.
Multiple friends have likely lost their homes in Los Angeles. Canβt imagine how disorienting this would be. They had only minutes to flee and grab belongings.
What are the key papers to read?
OK. Practical question times. How are you adjusting your research given progress in reasoning style models? Also how are you adjusting the way you work?
A $100,000,000 experiment is no longer "consequence" free. Ilya is saying "scaling is over", but this may simply be that the scaling "laws" (not laws) are no longer accurate. Also, those laws are tied to hyperparameter tunings.
Sure some were empirical. Some were not.
I'd say no in a sense. Xavier-He initialization was theoretical work. And that was absolutely critical.
Pretraining is not done. It's just that theorists haven't told the hackers how to do it better.
Annoying. If it could be automatic, sure.
I'd say wait then.
That's part of the spec. I don't think this is too problematic. The example they give is problems in NP, where there is a polynomial time checker (i.e., a polytime EV), but generating an instance that passes the checker is hard in the worst case.
Now that I've had a taste of X without post length limitations, I've got to say that it is quite annoying have to fit tweets into 256 characters here on bsky. On X, when they get to long, they go below the fold, and so you're still incentivized to make it short. Can't we have that here?
Lottery ticket?
@gkdziugaite.bsky.social. Works at GDM and Mila. Influential, technical work.
OK
Many of these sound to be very problematic if you hope that the result would be accepted by the mathematical community. E.g. "The proof appears to use computational evidence (listing out cases) as a substitute for theoretical proof." It seems you're not meeting the usual standard.
Please ask Claude. What would likely be the chief criticisms of my argument above were I to submit it to a traditional mathematical journal.
I've now read this paper carefully if anyone wants to discuss it.
Great analogy.