README.md

# Benchmark

the benchmark results are available in [benchmark.csv](benchmark.csv).
You can visualize the results in the [notebook](results.ipynb)

# How to reproduce the CLIP benchmark results

## Webdataset evaluation: VTAB+ and retrieval datasets (MSCOCO, Flickr8k, Flickr30k)

```bash
clip_benchmark eval --pretrained_model openai openclip_base \
    --dataset "webdatasets.txt" \
    --dataset_root "https://huggingface.co/datasets/clip-benchmark/wds_{dataset_cleaned}/tree/main" \
    --output "benchmark_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

Once the evaluation finishes, you can construct a CSV with all the results:

```bash
clip_benchmark build benchmark_*.json --output benchmark.csv
```

*Notes:* Pascal VOC 2007 multilabel is not yet included in the webdataset test suite. Multilingual support with webdataset is in progress.

## Alternative: Local download

```bash
clip_benchmark eval --pretrained_model  openai openclip_base  --dataset vtab+ retrieval \
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "benchmark_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

(Change `--dataset_root` accordingly)

## Multilingual ImageNet benchmark

To run the multilingual ImageNet benchmark, use:

```bash
clip_benchmark eval --pretrained_model openclip_multilingual openclip_base openai  --dataset imagenet1k --language cn it jp en ar\
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "multilingual_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

(Change `--dataset_root` accordingly)

## Multilingual MS-COCO benchmark

To run the multilingual MS-COCO benchmark, use:

```bash
clip_benchmark eval --pretrained_model openclip_multilingual openclip_base openai --dataset multilingual_mscoco_captions --language es it ko pl ru tr zh en \
--dataset_root "clip_benchmark_datasets/{dataset}" \
--output "multilingual_{dataset}_{pretrained}_{model}_{language}_{task}.json"
```

(Change `--dataset_root` accordingly)