Ethan's Avatar

Ethan

@ethansmith2000.com

a boy and his gpu vs the world. cofounder/directing research at @leonardoai_. (now at @canva) trying to feel the magic. www.ethansmith2000.com

181
Followers
61
Following
35
Posts
17.11.2024
Joined
Posts Following

Latest posts by Ethan @ethansmith2000.com

Reminds me of the SolidGoldMagikarp

02.12.2024 00:36 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Differentiable Image Parameterizations A powerful, under-explored tool for neural network visualizations and art.

distill.pub/2018/differe...

01.12.2024 04:40 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

this is so cool

01.12.2024 04:40 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

it's crazy to me that RoPE's issue with BF16 wasn't noticed earlier.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.

28.11.2024 12:09 πŸ‘ 8 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1


It’s an unstoppable force and all I can say is don’t hate the player (especially not the underdog) hate the game. Whether or not HF released this dataset your data is being used, you may as well also have access to its collection.

28.11.2024 09:45 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This is the big one and I can’t stress this enough. All of your data everywhere is being gathered and used anyway by private actors. The only fire you can fight back with is to play on that same field and democratize it. This anger is way mistargeted

28.11.2024 09:41 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image
28.11.2024 09:35 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Aren’t there datasets just like this for twitter and everything else imaginable? Idk why this is suddenly taboo, most making these datasets also aren’t sharing them publicly

27.11.2024 12:56 πŸ‘ 9 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

πŸ™„

26.11.2024 12:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - ethansmith2000/fsdp_optimizers: supporting pytorch FSDP for optimizers supporting pytorch FSDP for optimizers. Contribute to ethansmith2000/fsdp_optimizers development by creating an account on GitHub.

github.com/ethansmith20...

25.11.2024 22:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Just added FSDP2 support for MARS and Muon!

25.11.2024 22:39 πŸ‘ 8 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

that's what they all say

25.11.2024 22:37 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image
25.11.2024 16:01 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Awesome list, thanks!

25.11.2024 08:32 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Excellent writeup on GPU streams / CUDA memory
dev-discuss.pytorch.org/t/fsdp-cudac...

TLDR by default mem is proper to a stream, to share it::
- `Tensor.record_stream` -> automatic, but can be suboptimal and nondeterministic
- `Stream.wait` -> manual, but precise control

24.11.2024 22:04 πŸ‘ 29 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0

Incredible to see what is likely SOTA results coming out of open source with full reproducibility!
Happy to have helped provide the compute for this and hoping to support more awesome research like this!

25.11.2024 02:50 πŸ‘ 11 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

First, my sincerest thanks to @leonardoai.bsky.social with the help of
@ethansmith2000.com for generously providing H100s to support this research to enable this release. Y'all rock, thanks so much! <3

25.11.2024 01:59 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

Absolutely sick!

25.11.2024 02:07 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks

25.11.2024 01:53 πŸ‘ 33 πŸ” 3 πŸ’¬ 2 πŸ“Œ 1

i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)

24.11.2024 17:08 πŸ‘ 36 πŸ” 1 πŸ’¬ 8 πŸ“Œ 0

Greetings xjdr

25.11.2024 00:33 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Untuned SOAP beats tuned adamw at ever single step

25.11.2024 00:08 πŸ‘ 6 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Yes @hessianfree.bsky.social can speak more to this

25.11.2024 00:09 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

24.11.2024 23:36 πŸ‘ 9 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1

There’s a void of PSGD hype that needs to be filled here

24.11.2024 20:43 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I goofed and never tested distributed saving, but now it works!
It was a little annoying as both SOAP and psgd maintain preconds as lists of varying size, which fail to be pickled. To fix this I hardcoded there to be a max of 4 (based on conv layers being 4d tensors).

24.11.2024 20:35 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I’ve generally preferred research to software engineering but I am growing a liking for building the tools used for research

24.11.2024 10:50 πŸ‘ 12 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Fancy seeing you here πŸ‘‹

24.11.2024 10:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Self-proclaimed hessianfree guy going back on his word

24.11.2024 06:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Haven’t tested, but should be typical FSDP experience.

24.11.2024 06:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0