---
hide: toc
---
# Tutorials
- **End-to-end tutorials** provide detailed step-by-step explanations and the code used for end-to-end workflows.
- **Paper implementations** provide reproductions of fundamental papers in the synthetic data domain.
- **Examples** don't provide explenations but simply show code for different tasks.
## End-to-end tutorials
- __Generate a preference dataset__
---
Learn about synthetic data generation for ORPO and DPO.
[:octicons-arrow-right-24: Tutorial](tutorials/generate_preference_dataset.ipynb)
- __Clean an existing preference dataset__
---
Learn about how to provide AI feedback to clean an existing dataset.
[:octicons-arrow-right-24: Tutorial](tutorials/clean_existing_dataset.ipynb)
- __Retrieval and reranking models__
---
Learn about synthetic data generation for fine-tuning custom retrieval and reranking models.
[:octicons-arrow-right-24: Tutorial](tutorials/GenerateSentencePair.ipynb)
- __Generate text classification data__
---
Learn about how synthetic data generation for text classification can help address data imbalance or scarcity.
[:octicons-arrow-right-24: Tutorial](tutorials/generate_textcat_dataset.ipynb)
## Paper Implementations
- __Deepseek Prover__
---
Learn about an approach to generate mathematical proofs for theorems generated from informal math problems.
[:octicons-arrow-right-24: Example](papers/deepseek_prover.md)
- __DEITA__
---
Learn about prompt, response tuning for complexity and quality and LLMs as judges for automatic data selection.
[:octicons-arrow-right-24: Paper](papers/deita.md)
- __Instruction Backtranslation__
---
Learn about automatically labeling human-written text with corresponding instructions.
[:octicons-arrow-right-24: Paper](papers/instruction_backtranslation.md)
- __Prometheus 2__
---
Learn about using open-source models as judges for direct assessment and pair-wise ranking.
[:octicons-arrow-right-24: Paper](papers/prometheus.md)
- __UltraFeedback__
---
Learn about a large-scale, fine-grained, diverse preference dataset, used for training powerful reward and critic models.
[:octicons-arrow-right-24: Paper](papers/ultrafeedback.md)
- __APIGen__
---
Learn how to create verifiable high-quality datases for function-calling applications.
[:octicons-arrow-right-24: Paper](papers/apigen.md)
- __CLAIR__
---
Learn Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs.
[:octicons-arrow-right-24: Paper](papers/clair.md)
- __Math Shepherd__
---
Learn about Math-Shepherd, a framework to generate datasets to train process reward models (PRMs) which assign reward scores to each step of math problem solutions.
[:octicons-arrow-right-24: Paper](papers/math_shepherd.md)
## Examples
- __Benchmarking with distilabel__
---
Learn about reproducing the Arena Hard benchmark with disitlabel.
[:octicons-arrow-right-24: Example](examples/benchmarking_with_distilabel.md)
- __Structured generation with outlines__
---
Learn about generating RPG characters following a pydantic.BaseModel with outlines in distilabel.
[:octicons-arrow-right-24: Example](examples/llama_cpp_with_outlines.md)
- __Structured generation with instructor__
---
Learn about answering instructions with knowledge graphs defined as pydantic.BaseModel objects using instructor in distilabel.
[:octicons-arrow-right-24: Example](examples/mistralai_with_instructor.md)
- __Create a social network with FinePersonas__
---
Learn how to leverage FinePersonas to create a synthetic social network and fine-tune adapters for Multi-LoRA.
[:octicons-arrow-right-24: Example](examples/fine_personas_social_network.md)
- __Create questions and answers for a exam__
---
Learn how to generate questions and answers for a exam, using a raw wikipedia page and structured generation.
[:octicons-arrow-right-24: Example](examples/exam_questions.md)
- __Image generation with distilabel__
---
Generate synthetic images using distilabel.
[:octicons-arrow-right-24: Example](examples/image_generation.md)
- __Text generation with images in distilabel__
---
Ask questions about images using distilabel.
[:octicons-arrow-right-24: Example](examples/text_generation_with_image.md)