# Project information site_name: Distilabel Docs site_url: https://argilla-io.github.io/distilabel site_author: Argilla, Inc. site_description: Distilabel is an AI Feedback (AIF) framework for building datasets with and for LLMs. # Repository repo_name: argilla-io/distilabel repo_url: https://github.com/argilla-io/distilabel edit_uri: edit/main/docs/ extra: version: provider: mike social: - icon: fontawesome/brands/linkedin link: https://www.linkedin.com/company/argilla-io - icon: fontawesome/brands/x-twitter link: https://twitter.com/argilla_io - icon: fontawesome/brands/youtube link: https://www.youtube.com/channel/UCAIz8TmvQQrLqbD7sd-5S2A - icon: fontawesome/brands/discord link: http://hf.co/join/discord analytics: provider: plausible domain: distilabel.argilla.io feedback: title: Was this page helpful? ratings: - icon: material/thumb-up-outline name: This page was helpful data: 1 note: >- Thanks for your feedback! - icon: material/thumb-down-outline name: This page could be improved data: 0 note: >- Thanks for your feedback! Help us improve this page by opening a GitHub issue. extra_css: - stylesheets/extra.css extra_javascript: - javascripts/mathjax.js - https://polyfill.io/v3/polyfill.min.js?features=es6 - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js theme: name: material logo: assets/logo.svg favicon: assets/logo.svg icon: repo: fontawesome/brands/github features: - navigation.instant - navigation.sections - navigation.tabs - navigation.footer - navigation.top - navigation.tracking - navigation.path - header.autohide # header disappears as you scroll - content.code.copy - content.code.annotate - content.tabs.link - content.action.edit - toc.follow - search.suggest - search.highlight - search.share palette: - media: "(prefers-color-scheme)" primary: white toggle: icon: material/brightness-auto name: Switch to light mode - media: "(prefers-color-scheme: light)" scheme: default primary: custom toggle: icon: material/brightness-7 name: Switch to dark mode # Palette toggle for dark mode - media: "(prefers-color-scheme: dark)" scheme: slate primary: custom toggle: icon: material/brightness-4 name: Switch to system preference watch: - src/distilabel strict: true # Extensions markdown_extensions: - attr_list - md_in_html - admonition - pymdownx.superfences - pymdownx.arithmatex: generic: true - pymdownx.highlight: anchor_linenums: true line_spans: __span pygments_lang_class: true - pymdownx.inlinehilite - pymdownx.keys - pymdownx.superfences: custom_fences: - name: mermaid class: mermaid format: !!python/name:pymdownx.superfences.fence_code_format - pymdownx.snippets: check_paths: true base_path: [examples/, docs/, "."] - pymdownx.details - pymdownx.tabbed: alternate_style: true - pymdownx.emoji: emoji_index: !!python/name:material.extensions.emoji.twemoji emoji_generator: !!python/name:material.extensions.emoji.to_svg - footnotes - toc: permalink: true plugins: - search - autorefs # Cross-links to headings - gen-files: scripts: - docs/scripts/gen_popular_issues.py - section-index - mkdocstrings: handlers: python: setup_commands: - import sys; sys.path.insert(0, 'src') # API references are built from source options: show_inheritance_diagram: false show_source: true # include source code # Headings heading_level: 3 show_root_heading: true # show the python path of the class show_root_toc_entry: true # show the toc entry for the root class show_root_full_path: false # display "diffrax.asdf" not just "asdf" show_object_full_path: false # display "diffrax.asdf" not just "asdf" show_symbol_type_heading: true show_symbol_type_toc: true # Members inherited_members: false # allow looking up inherited methods members_order: source # order methods according to their order of definition in the source code, not alphabetical order show_labels: true # Docstring docstring_style: google # more info: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html show_if_no_docstring: false # Signature separate_signature: false show_signature_annotations: false - social - mknotebooks - material-plausible - glightbox - distilabel/components-gallery: add_after_page: How-to guides nav: - Distilabel: "index.md" - Getting started: - Quickstart: "sections/getting_started/quickstart.md" - Installation: "sections/getting_started/installation.md" - FAQ: "sections/getting_started/faq.md" - How-to guides: - "sections/how_to_guides/index.md" - Basic: - Steps for processing data: - "sections/how_to_guides/basic/step/index.md" - GeneratorStep: "sections/how_to_guides/basic/step/generator_step.md" - GlobalStep: "sections/how_to_guides/basic/step/global_step.md" - Tasks for generating and judging with LLMs: - "sections/how_to_guides/basic/task/index.md" - GeneratorTask: "sections/how_to_guides/basic/task/generator_task.md" - ImageTask: "sections/how_to_guides/basic/task/image_task.md" - Executing Tasks with LLMs: "sections/how_to_guides/basic/llm/index.md" - Execute Steps and Tasks in a Pipeline: "sections/how_to_guides/basic/pipeline/index.md" - Advanced: - The Distiset dataset object: "sections/how_to_guides/advanced/distiset.md" - Pipeline cache: "sections/how_to_guides/advanced/caching.md" - Exporting data to Argilla: "sections/how_to_guides/advanced/argilla.md" - Structured data generation: "sections/how_to_guides/advanced/structured_generation.md" - Offline Batch Generation: "sections/how_to_guides/advanced/offline_batch_generation.md" - Specifying requirements for pipelines and steps: "sections/how_to_guides/advanced/pipeline_requirements.md" - Load groups and execution stages: "sections/how_to_guides/advanced/load_groups_and_execution_stages.md" - Using CLI to explore and re-run existing Pipelines: "sections/how_to_guides/advanced/cli/index.md" - Using a file system to pass data of batches between steps: "sections/how_to_guides/advanced/fs_to_pass_data.md" - Assigning resources to a step: "sections/how_to_guides/advanced/assigning_resources_to_step.md" - Saving step generated artifacts: "sections/how_to_guides/advanced/saving_step_generated_artifacts.md" - Serving an LLM for sharing it between several tasks: "sections/how_to_guides/advanced/serving_an_llm_for_reuse.md" - Scaling and distributing a pipeline with Ray: "sections/how_to_guides/advanced/scaling_with_ray.md" - Tutorials: - "sections/pipeline_samples/index.md" - Tutorials: - Generate a preference dataset: "sections/pipeline_samples/tutorials/generate_preference_dataset.ipynb" - Clean an existing preference dataset: "sections/pipeline_samples/tutorials/clean_existing_dataset.ipynb" - Synthetic data generation for fine-tuning custom retrieval and reranking models: "sections/pipeline_samples/tutorials/GenerateSentencePair.ipynb" - Generate synthetic text classification data: "sections/pipeline_samples/tutorials/generate_textcat_dataset.ipynb" - Papers: - DeepSeek Prover: "sections/pipeline_samples/papers/deepseek_prover.md" - DEITA: "sections/pipeline_samples/papers/deita.md" - Instruction Backtranslation: "sections/pipeline_samples/papers/instruction_backtranslation.md" - Prometheus 2: "sections/pipeline_samples/papers/prometheus.md" - UltraFeedback: "sections/pipeline_samples/papers/ultrafeedback.md" - APIGen: "sections/pipeline_samples/papers/apigen.md" - CLAIR: "sections/pipeline_samples/papers/clair.md" - Math Shepherd: "sections/pipeline_samples/papers/math_shepherd.md" - Examples: - Benchmarking with distilabel: "sections/pipeline_samples/examples/benchmarking_with_distilabel.md" - Structured generation with outlines: "sections/pipeline_samples/examples/llama_cpp_with_outlines.md" - Structured generation with instructor: "sections/pipeline_samples/examples/mistralai_with_instructor.md" - Create a social network with FinePersonas: "sections/pipeline_samples/examples/fine_personas_social_network.md" - Create questions and answers for a exam: "sections/pipeline_samples/examples/exam_questions.md" - Image generation with distilabel: "sections/pipeline_samples/examples/image_generation.md" - Text generation with images in distilabel: "sections/pipeline_samples/examples/text_generation_with_image.md" - API Reference: - Step: - "api/step/index.md" - GeneratorStep: "api/step/generator_step.md" - GlobalStep: "api/step/global_step.md" - "@step": "api/step/decorator.md" - StepResources: "api/step/resources.md" - Step Gallery: - Argilla: "api/step_gallery/argilla.md" - Hugging Face: "api/step_gallery/hugging_face.md" - Columns: "api/step_gallery/columns.md" - Extra: "api/step_gallery/extra.md" - Task: - "api/task/index.md" - GeneratorTask: "api/task/generator_task.md" - Task Gallery: "api/task/task_gallery.md" - LLM: - "api/models/llm/index.md" - LLM Gallery: "api/models/llm/llm_gallery.md" - Embedding: - "api/models/embedding/index.md" - Embedding Gallery: "api/models/embedding/embedding_gallery.md" - ImageGenerationModels: - "api/models/image_generation/index.md" - Image Generation Gallery: "api/models/image_generation/image_generation_gallery.md" - Pipeline: - "api/pipeline/index.md" - Routing Batch Function: "api/pipeline/routing_batch_function.md" - Step Wrapper: "api/pipeline/step_wrapper.md" - Mixins: - RuntimeParametersMixin: "api/mixins/runtime_parameters.md" - RequirementsMixin: "api/mixins/requirements.md" - Exceptions: "api/exceptions.md" - Errors: "api/errors.md" - Distiset: "api/distiset.md" - CLI: "api/cli.md" - Types: "api/typing.md" - Community: - sections/community/index.md - How to contribute?: sections/community/contributor.md - Developer Documentation: sections/community/developer_documentation.md - Issue dashboard: sections/community/popular_issues.md