[feat] OSS flatten state dict (#65)
Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
Showing
Please register or sign in to comment