Yasser Souri's Avatar

Yasser Souri

@yassersouri

Senior Applied Scientist - Microsoft (Opinions are my own) Ex: Google Research PhD intern https://yassersouri.github.io

109
Followers
355
Following
35
Posts
19.11.2024
Joined
Posts Following

Latest posts by Yasser Souri @yassersouri

As expected. Congrats to the authors.

15.07.2025 05:17 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I remember when I saw the ICML 2015 test of time award winner, I noticed the paper "Learning to Rank Using Gradient Descent" for the first time. Then I got the idea for the "Deep Relative Attributes" paper.

13.07.2025 04:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

#ICML2025 Test of time award is likely going to the Batch Normalization paper.

13.07.2025 04:42 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 2

A while back when Sam Altman was in India he was asked whether a team with around $10 M could build something to compete with OpenAI and Sam Altman said it was "hopeless".

Deepseek-V3 with around $6 M cost for the pre-training run just released a model with very high capability (on benchmarks)

26.12.2024 21:32 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

During my graduate school I lived in Germany for around 5 years and lived there without learning German. Certainly it is possible. But not learning the native language makes everyday life a bit too hard and long-term living there not feasible.

16.12.2024 17:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Jason Weston comments on Ilya’s ToT award talk.

16.12.2024 00:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Says: "Pre-training _as we know it_ _will_ end"

emphasis on "as we know it" and "will"

15.12.2024 04:06 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

And language

14.12.2024 15:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I hope this idea of multiple smaller meeting won’t converge to one big meeting in north america and some small meetings around the world.

14.12.2024 15:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

Thomas Kipf with the Google IO's DJ bathrobe :D
#NeurIPS2024

13.12.2024 08:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

It is sad to see authors not being able to present their work at #NeurIPS2024 because of visa issues.
But some authors went above and beyond.

Here is @hadivafaii.bsky.social tele-presenting his work with an impressive setup (ipad, mic, speaker, holder, battery).
Well done sir!

13.12.2024 08:35 πŸ‘ 11 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1

Who has written this? Seems a little fishy. (Not saying it is untrue)

13.12.2024 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Given the large amounts of posters it was really hard or impossible to check all of them out, but I came across some interesting ones still and the authors usually did a great job explaining their work.

12.12.2024 17:54 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Although it seems that there remains some work and good engineering yet to be done to make this scheduler work in large-scale distributed setting.

12.12.2024 17:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

The talk by meta folks about their schedule-free learning was great.

They provide nice theoretical insights as well as good experiments in their paper "The Road Less Scheduled".
arxiv.org/abs/2405.15682

I guess the below picture shows well the "any-time stopping" property well.

12.12.2024 17:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Attending Orals was not easy yesterday. With 4 parallel tracks it made it really hard. I think shorter orals with less tracks (more similar papers in the same track) is better than the current format.

But one Oral presentation did stand out.

12.12.2024 17:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

One is Saurabh Tiwary which I heard talk about "Industrial Deep Learning" many times before.

The rest of the talk was about xLSTM among some other work and his company.

He also claimed that "the bitter lesson is over!"
bsky.app/profile/yass...

12.12.2024 17:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

In the morning we had the Sepp Hochreiter talk about "Industrial AI". He gave interesting analogies to steam engines and production of ammonium nitrate for fertilizers.

I guess many people have noticed these analogies and came up with similar talks before.

12.12.2024 17:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

#NeurIPS2024 (@neuripsconf.bsky.social) Day 2 (Wednesday) Experience

One of the great things about conferences like NeurIPS is that you get to see people who you admire for different reasons. I also got to see and talk to some. Really happy I got to talk to William Agnew.

12.12.2024 17:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Lot's of grokking papers recently. Lol
#NeurIPS2024

12.12.2024 07:37 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Sepp Hochreiter claims β€œthe bitter lesson is over”!
#neurips2024

11.12.2024 17:15 πŸ‘ 15 πŸ” 1 πŸ’¬ 0 πŸ“Œ 6
The Thesis Review Podcast Sean Welleck

There was a panel discussion at the end which I missed. (Hope to catch up on the video)
Fun-fact, Sean Welleck is the host of the amazing "The Thesis Review" podcast: wellecks.com/thesisreview

This Tutorial was mostly based on their recent paper: arxiv.org/abs/2406.16838

11.12.2024 07:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Neurips 2024 Tutorial: Meta-decoding algorithms LLM Inference Tutorial @ Neurips 2024

In the afternoon I attended the "Beyond Decoding" Tutorial by Sean Welleck and others.
cmu-l3.github.io/neurips2024-...
This was truly an amazing Tutorial on Generation/Sampling for decoding, Meta-generation and efficient decoding, highly recommended.

11.12.2024 07:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

- Usually there is around 65x reduction in data volume after filtering.
- Still training a good reward model is a challenge.

11.12.2024 07:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Some notes:
- It seems like if the scientific community does not do something, it might face major challenges accessing large-scale data. The inequality in data access is widening.
- User-provided content like Wikipedia and arXiv amount to less than 1% of data used in pre-training.

11.12.2024 07:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - allenai/awesome-open-source-lms: Friends of OLMo and their links. Friends of OLMo and their links. Contribute to allenai/awesome-open-source-lms development by creating an account on GitHub.

In the morning I attended the "Opening the Language Model Pipeline" Tutorial by @natolambert.bsky.social and others from Allen AI.
github.com/allenai/awes...
They talked about their work on Data, Pre-training and Post-Training while highlighting some recent works such as OLMov2, TULU3, etc.

11.12.2024 07:48 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

#NeurIPS2024 (@neuripsconf.bsky.social) Day 1 Experience

There were a bunch of interesting Tutorial, Talks and events today at NeurIPS. But definitely the highlight of the day was catching up with friends and current and past colleagues and seeing folks.

11.12.2024 07:48 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Ilya Sutskever has won 3 test of time awards at NeurIPS now!

2022: for AlexNet paper
2023: for word2vec paper
2024: for Seq2Seq paper

05.12.2024 02:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Often will figure out that your intuition was wrong and everyone was right. But that only happens often and not all the time, which is great!

30.11.2024 22:33 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Excellent explanation of RoPE embedding, from scratch with all the math needed: https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding

And with beautiful 3blue1brown's style of animation: https://github.com/3b1b/manim.

Original RoPE paper: arxiv.org/abs/2104.09864

29.11.2024 13:45 πŸ‘ 53 πŸ” 10 πŸ’¬ 0 πŸ“Œ 0