Fast, Simple, Fun - Video Understanding with <40M Parameters
A very approachable jumping off point for video captioning. If you're GPU-poor (<24GB vram) this is for you.
Lead Machine Learning Engineer | Victoria, BC
I’ve spent the past 6 years working as a Machine Learning Engineer at progressively higher levels of responsibility across a handful of startups. I am currently a Lead at Kibeam, where I oversee everything from Ops to edge model training/deployment to LLMs, and everything in between. In my spare time, I tinker with whatever else interests me, and that gets posted here.
A very approachable jumping off point for video captioning. If you're GPU-poor (<24GB vram) this is for you.
A small contribution to the community. Adds caption-like variety samples to SSV2 dataset.
Some thoughts about the potential power of Meta's V-Jepa 2.