Gemma-3-4b run alone

The Gemma model comes in a range of sizes. here I’m using the 4b version (smallest and fastest version that has vision capabilities). The 12b version would work instead (possibly better). I haven’t tried any of the others (larger sizes are too big to train oon one H100).

We can do exactly the same with Gemma as we did with SmolVLM.

The code is mostly the same (just change the `---model` argument), but the scripts to run Gemma are slightly different.