Resolution means a lot for image domains an upscale from 512 to 1204 can make or break your AI model
U2Net still works better for segmentation than most of the SAM3 models when it comes to high defined smooth edges.
SAM models are data hungry, need a lot data to do something nice from it
Convolution operation and kernels are underrated, a lot can be done if the filter / kernel values are set correctly