Fast, Simple, Fun - Video Understanding with <40M Parameters
A very approachable jumping off point for video captioning. If you're GPU-poor (<24GB vram) this is for you.
Lead Machine Learning Engineer | Victoria, BC
A very approachable jumping off point for video captioning. If you're GPU-poor (<24GB vram) this is for you.
A small contribution to the community. Adds caption-like variety samples to SSV2 dataset.
Some thoughts about the potential power of Meta's V-Jepa 2.