Roman Knyazhitskiy (@knyaz.tech)

"RL is hard" - probably a lion, 2025

20.02.2025 22:02 👍 0 🔁 0 💬 0 📌 0

Why does it matter? I would let people project whatever identities they want, if the work is getting done.

And, probably there is a strong survivorship bias: the "engineers" like popularizing their work, while "scientists" not so much :)

15.02.2025 12:53 👍 0 🔁 0 💬 1 📌 0

It is quite depressing that there is not an existing platform for article clubs, with some cool features like maybe ranking of articles in terms of "surprisingness of results" or maybe "existence of the problem".. Maybe a review system built on top..

05.02.2025 11:09 👍 4 🔁 0 💬 2 📌 0

I think the author confused "winter" with "bubble", eg dot com bubble didn't imply dot com winter

06.01.2025 23:05 👍 1 🔁 0 💬 1 📌 0

I had something like bottou 1998 in mind - the first non convex convergence in the online learning framework

That's the earliest work I know about convergence in a non-convex setting :)

29.12.2024 20:08 👍 0 🔁 0 💬 1 📌 0

Maybe you could cite the first non convex convergence proof?

I'd guess it was written around the 2010s, though ofc it's nearly impossible to figure out the "first" proof

29.12.2024 17:15 👍 2 🔁 0 💬 1 📌 0

Isn't it somewhat trivial? As in, if an LLM has answered the question (adequately) within the first 25 tokens, then it doesn't need to search :)

26.12.2024 12:09 👍 1 🔁 0 💬 0 📌 0

I guess that most of the training is done in a regime when l_inf is about the same, or like "similar", so I actually would expect this method to work well

But I don't believe it's generalizable to arbitrary datasets without seeing the data in advance

20.12.2024 19:13 👍 3 🔁 0 💬 0 📌 0

But you should get slightly longer runtime when running half batches in sequence;

that is the reason why you use "largest possible batch size": so that you are guaranteed to get 100% utilization of cuda/tensor cores

good point nonetheless; for many applications running stuff in sequence works out

19.12.2024 16:52 👍 1 🔁 0 💬 1 📌 0

I dislike that the thing is a part of not the thing but also not the other thing

11.12.2024 10:20 👍 0 🔁 0 💬 1 📌 0

And median is the point that minimizes the average of absolute differences of points drawn...

M-estimators (generalization of these location estimators) are wonderful, I'm sad people outside of robust statistics have barely heard about them :(

10.12.2024 23:49 👍 2 🔁 0 💬 0 📌 0

Sewing maching is like the most complex mechanical apparatus that most of the people encounter in life lol

09.12.2024 01:27 👍 2 🔁 0 💬 0 📌 0

Hmm, introduction into stochastic calculus without measure theory? That's the only reason to put quotes I guess - if the person reading the book knows only Riemann definition than the stochastic integral does not really make sense, but it is still cute

08.12.2024 23:28 👍 1 🔁 0 💬 1 📌 0

So you are effectively killing the same neurons over and over again, unless the gradient norm is extremely large?

Like you should have qualitatively the same bahaviour as a bottleneck, I would guess

08.12.2024 09:39 👍 0 🔁 0 💬 0 📌 0

I had a script called "logging.py" once which is also quite fun

06.12.2024 18:22 👍 3 🔁 0 💬 0 📌 0

But there are also cool tricks to improve speed of methods, which Rahul actually talks about: like the neat connection to Walds sequential testing

05.12.2024 22:46 👍 2 🔁 0 💬 0 📌 0

My conclusion is that you need to derive the RANSAC method for each setting; I'm pretty sure the current SOTA for panorams is MLESAC-ish (Rahul doesn't mention it?), which is just an M-estimator

Maybe we should write a paper about "how to derive RANSAC for your problem" 🤔

05.12.2024 22:45 👍 2 🔁 0 💬 1 📌 0

RANSAC always feels to me like something that can be easily improved, but every time I remember that there is like a thousand variations, and I decide that it's not worth the effort :)

05.12.2024 21:29 👍 1 🔁 0 💬 2 📌 0

There are so many problems with science that can be solved by a decent publishing system, but oh well

I also think the situation improves: much more people just use preprints for most of their work these days, which might slowly force journalists to consider preprints to be real papers

02.12.2024 21:21 👍 1 🔁 0 💬 1 📌 0

I'm currently in the progress of attempting to publish such a paper, and my idea to make it "novel" is to just use a heuristic on top of the "old method" that improves performance.

It's not super novel, true, but it will satisfy reviewers who desire novelty ig

02.12.2024 18:15 👍 2 🔁 0 💬 1 📌 0

But - isn't it the problem with institutions, not with students? If e.g. Oxford requires 3 letters, but "no prior research experience is required", what the student should do?

I'd write a p.s. in the letter that you hate unis 'requiring' the recommendations, but still help the student.

01.12.2024 13:18 👍 0 🔁 0 💬 1 📌 0

This could actually be very useful for doing quick experimenting with LLMs, by putting the doc into the context, though the scale is somewhat small

01.12.2024 13:14 👍 0 🔁 0 💬 0 📌 0

Harry Potter and methods of rationality?) Some people like it, some don't, but ig the earlier you try to read it - the better

01.12.2024 12:26 👍 0 🔁 0 💬 0 📌 0

So we just need a billionaire to make their own journal :))

01.12.2024 12:09 👍 0 🔁 0 💬 0 📌 0

A simple way to enforce "paid only for corporations" is to highly encourage preprints, as was already mentioned in other comments

01.12.2024 12:08 👍 0 🔁 0 💬 0 📌 0

Another fun idea: publish not a % of all submitted papers, but % of all scientists in the field - nearly constant number of papers. This would significantly slow down publish or perish. And, with the right financial incentives, it forces people to produce high quality (high risk) research

01.12.2024 12:06 👍 0 🔁 0 💬 0 📌 0

The perfect publishing system would pay the authors and reviewers % of money earned from the paper + require only companies/academia to pay for access; So far the main issue is the starting capital - can't go to YC if not expecting decent returns

01.12.2024 12:02 👍 1 🔁 0 💬 3 📌 0

watch -n 0.1 ...

30.11.2024 12:54 👍 0 🔁 0 💬 0 📌 0

I only store the papers related to the currently running projects/ideas. If it's not related, and I don't have time to read it -> 🗑️

"Choosing the right problems to work on is the most useful skill one can have"

29.11.2024 20:22 👍 0 🔁 0 💬 0 📌 0

Tell this BPTT (which explodes, but unbiased!)

29.11.2024 20:12 👍 1 🔁 0 💬 0 📌 0

Roman Knyazhitskiy

Latest posts by Roman Knyazhitskiy @knyaz.tech