Update Dockerfile, eval_gsm8k.py, eval_math.py, LICENSE, read.md, README.MD,...

Update Dockerfile, eval_gsm8k.py, eval_math.py, LICENSE, read.md, README.MD, requirements.txt, run_mistral.sh, run.sh, util.py, train_math.py, imgs/dcu.png, imgs/metamath.svg, data/README.md, data/test/MATH_test.jsonl, data/test/GSM8K_test.jsonl, data/test/GSM8K_Backward.jsonl, data/train/README.md, code_for_generating_data/ReadMe.md, code_for_generating_data/code/main_backward_reasoning.py, code_for_generating_data/code/main_forward_reasoning.py, code_for_generating_data/code/main_create_backward_questions.py, code_for_generating_data/code/main_rephrase_question.py, code_for_generating_data/code/path_init.py, code_for_generating_data/code/main_self_verification.py, code_for_generating_data/code/run_create_backward_questions.sh, code_for_generating_data/code/run_forward.sh, code_for_generating_data/code/run_backward.sh, code_for_generating_data/code/run_sv.sh, code_for_generating_data/code/run_rephrase.sh, code_for_generating_data/code/utils/__init__.py, code_for_generating_data/code/utils/answer_clean_utils.py, code_for_generating_data/code/utils/config_utils.py, code_for_generating_data/code/utils/log_utils.py, code_for_generating_data/code/utils/math_utils.py, code_for_generating_data/code/utils/openai_api_utils.py, code_for_generating_data/code/utils/parallel_utils.py, code_for_generating_data/code/utils/time_utils.py, code_for_generating_data/code/utils/path_utils.py, code_for_generating_data/configs/ansaug_cot_math.txt, code_for_generating_data/configs/ansaug_cot_gsm8k.txt, code_for_generating_data/configs/fobar_cot_gsm8k.txt, code_for_generating_data/configs/fobar_cot_math.txt, code_for_generating_data/configs/rephrase_cot_gsm8k.txt, code_for_generating_data/configs/rephrase_cot_math.txt, code_for_generating_data/configs/sv_cot_gsm8k.txt, code_for_generating_data/configs/sv_cot_math.txt, code_for_generating_data/configs/sv_rewrite_question_prompt_gsm8k.txt, code_for_generating_data/configs/sv_rewrite_question_prompt_math.txt, code_for_generating_data/data/gsm8k_train.json, code_for_generating_data/data/MATH_train.json files

Update Dockerfile, eval_gsm8k.py, eval_math.py, LICENSE, read.md, README.MD,...
Update Dockerfile, eval_gsm8k.py, eval_math.py, LICENSE, read.md, README.MD, requirements.txt, run_mistral.sh, run.sh, util.py, train_math.py, imgs/dcu.png, imgs/metamath.svg, data/README.md, data/test/MATH_test.jsonl, data/test/GSM8K_test.jsonl, data/test/GSM8K_Backward.jsonl, data/train/README.md, code_for_generating_data/ReadMe.md, code_for_generating_data/code/main_backward_reasoning.py, code_for_generating_data/code/main_forward_reasoning.py, code_for_generating_data/code/main_create_backward_questions.py, code_for_generating_data/code/main_rephrase_question.py, code_for_generating_data/code/path_init.py, code_for_generating_data/code/main_self_verification.py, code_for_generating_data/code/run_create_backward_questions.sh, code_for_generating_data/code/run_forward.sh, code_for_generating_data/code/run_backward.sh, code_for_generating_data/code/run_sv.sh, code_for_generating_data/code/run_rephrase.sh, code_for_generating_data/code/utils/__init__.py, code_for_generating_data/code/utils/answer_clean_utils.py, code_for_generating_data/code/utils/config_utils.py, code_for_generating_data/code/utils/log_utils.py, code_for_generating_data/code/utils/math_utils.py, code_for_generating_data/code/utils/openai_api_utils.py, code_for_generating_data/code/utils/parallel_utils.py, code_for_generating_data/code/utils/time_utils.py, code_for_generating_data/code/utils/path_utils.py, code_for_generating_data/configs/ansaug_cot_math.txt, code_for_generating_data/configs/ansaug_cot_gsm8k.txt, code_for_generating_data/configs/fobar_cot_gsm8k.txt, code_for_generating_data/configs/fobar_cot_math.txt, code_for_generating_data/configs/rephrase_cot_gsm8k.txt, code_for_generating_data/configs/rephrase_cot_math.txt, code_for_generating_data/configs/sv_cot_gsm8k.txt, code_for_generating_data/configs/sv_cot_math.txt, code_for_generating_data/configs/sv_rewrite_question_prompt_gsm8k.txt, code_for_generating_data/configs/sv_rewrite_question_prompt_math.txt, code_for_generating_data/data/gsm8k_train.json, code_for_generating_data/data/MATH_train.json files
c36f0de4 · zhougaofeng · c36f0de4 · c36f0de4 · c36f0de4 · c36f0de4
Commit c36f0de4 authored Aug 23, 2024 by zhougaofeng
20 changed files
--- a/Dockerfile
+++ b/Dockerfile
+FROM docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
+COPY requirement.txt requirement.txt
+RUN source /opt/dtk-24.04/env.sh
+ENV LANG C.UTF-8
+RUN pip install -r requirement.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
+
--- a/LICENSE
+++ b/LICENSE
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.MD
+++ b/README.MD
+# MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
+
+[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](CODE_LICENSE)
+[![Model Weight License](https://img.shields.io/badge/Model%20Weights%20License-LLaMA2-yellow)](MetaMath/LICENSE)
+[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
+
+<p align="center">
+🤗 <a href="https://huggingface.co/meta-math" target="_blank">HF Repo</a> • 📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a><br>
+</p>
+
+<p align="center" width="100%">
+<a ><img src="./imgs/metamath.svg" alt="MetaMath" style="width: 80%; min-width: 300px; display: block; margin: auto;"></a>
+</p>
+
+
+## News
+- 🔥 Our **MetaMath-Llemma-7B** model achieves  **30.0 pass@1** on the MATH Benchmarks, surpassing all the SOTA open-source LLM in 7B-13B scales! All the training scripts and the model are opened.
+- 🔥 Our **MetaMath-Mistral-7B** model achieves  **77.7 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), surpassing all the SOTA open-source LLM! All the training scripts and the model are opened.
+- 🔥 The full **MetaMathQA** dataset is now released in the huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)!
+- 🔥 We released the GSM8K_Backward dataset is also released in the huggingface [GSM8K_Backward](https://huggingface.co/datasets/meta-math/GSM8K_Backward) to evaluate the reversal mathematical reasoning ability!
+- 🔥 Although the data augmentation for **MetaMathQA** is sourced from **ChatGPT 3.5**, Our **MetaMath-70B** model outperforms the closed-source LLMs **ChatGPT 3.5** on the GSM8K!
+- 🔥 Our **MetaMath-7B** model achieves  **66.5 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), **11.6** points higher than the SOTA open-source LLM!
+- 🔥 Our **MetaMath-7B** model achieves  **19.8 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), **9.1** points higher than the SOTA open-source LLM!
+
+| Model | Checkpoint | Paper  | GSM8k | MATH  | License|
+| ----- |------| ---- |------|-------| ----- |
+| MetaMath-70B-V1.0 | 🤗 <a href="https://huggingface.co/meta-math/MetaMath-70B-V1.0" target="_blank">HF Link</a> |  📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a>| **82.3**  |  **26.6**	| <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a> |
+| MetaMath-13B-V1.0 | 🤗 <a href="https://huggingface.co/meta-math/MetaMath-13B-V1.0" target="_blank">HF Link</a> |  📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a>| **72.3**  |  **22.4** | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 </a> |
+| MetaMath-7B-V1.0 | 🤗 <a href="https://huggingface.co/meta-math/MetaMath-7B-V1.0" target="_blank">HF Link</a>  |  📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a>| 	 **66.5**  |  **19.8** |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2  </a>|
+| MetaMath-Mistral-7B | 🤗 <a href="https://huggingface.co/meta-math/MetaMath-Mistral-7B" target="_blank">HF Link</a>  |  📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a>| 	 **77.7**  |  **28.2** |  <a href="http://www.apache.org/licenses/" target="_blank">Apache License 2.0  </a>|
+| MetaMath-Llemma-7B | 🤗 <a href="https://huggingface.co/meta-math/MetaMath-Llemma-7B" target="_blank">HF Link</a>  |  📃 <a href="https://arxiv.org/abs/2309.12284" target="_blank">[MetaMath]</a>| 	 **69.2**  |  **30.0** |  <a href="http://www.apache.org/licenses/" target="_blank">Apache License 2.0  </a>|
+                                                                                                                                                                                                                                                                                                   
+                                                                                                                                                                                                                                                                                                                                                                             
+
+## Comparing MetaMath with the LLM models.
+
+🔥 Comprehensive Results
+
+| Model               | GSM8k Pass@1 | MATH Pass@1 |
+|---------------------|--------------|-------------|
+| MPT-7B              | 6.8          | 3.0         |
+| Falcon-7B           | 6.8          | 2.3         |
+| LLaMA-1-7B          | 11.0         | 2.9         |
+| LLaMA-2-7B          | 14.6         | 2.5         |
+| MPT-30B             | 15.2         | 3.1         |
+| LLaMA-1-13B         | 17.8         | 3.9         |
+| GPT-Neo-2.7B        | 19.5         | --          |
+| Falcon-40B          | 19.6         | 2.5         |
+| Baichuan-chat-13B   | 23.9         | --          |
+| Vicuna-v1.3-13B     | 27.6         | --          |
+| LLaMA-2-13B         | 28.7         | 3.9         |
+| InternLM-7B         | 31.2         | --          |
+| ChatGLM-2-6B        | 32.4         | --          |
+| GPT-J-6B            | 34.9         | --          |
+| LLaMA-1-33B         | 35.6         | 3.9         |
+| LLaMA-2-34B         | 42.2         | 6.24        |
+| RFT-7B              | 50.3         | --          |
+| LLaMA-1-65B         | 50.9         | 10.6        |
+| Qwen-7B             | 51.6         | --          |
+| WizardMath-7B       | 54.9         | 10.7        |
+| LLaMA-2-70B         | 56.8         | 13.5        |
+| WizardMath-13B      | 63.9         | 14.0        |
+| 🔥 MetaMath-7B         | **66.5**     | **19.8**    |
+| 🔥 MetaMath-13B        | **72.3**     | **22.4**    |
+| 🔥 MetaMath-Mistral-7B | **77.7**     | **28.2**    |
+| 🔥 MetaMath-Llemma-7B  | **69.2**     | **30.0**    |
+| WizardMath-70B      | 81.6         | 22.7        |
+| 🔥 MetaMath-70B        | **82.3**     | **26.6**    |
+
+<h2 id="env">Quick Start</h2>
+
+Clone Metamath and install the required packages:
+
+```bash
+git clone https://github.com/meta-math/MetaMath.git
+cd MetaMath
+pip install -r requirements.txt
+```
+
+If you encounter a Ray installation problem, please run:
+
+```bash
+pip install --upgrade ray
+pip install --upgrade pyarrow
+pip install pandas
+```
+
+<h2 id="Inference">Dataset Usage</h2>
+
+Run the following command to load the data:
+
+```python
+from datasets import load_dataset
+dataset = load_dataset("meta-math/MetaMathQA")
+```
+
+
+<h2 id="train">Training</h2>
+
+you need to prepare the  llama-2 base model and our **MetaMathQA** dataset huggingface [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA/tree/main)
+
+```
+bash run.sh
+```
+or
+
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nproc_per_node=8 --use_env train_math.py \
+    --model_name_or_path "meta-llama/Llama-2-7b-hf" \
+    --data_path "path/to/metamathqa" \
+    --data_length 10000000 \
+    --bf16 True \
+    --output_dir "path/to/save" \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 4 \
+    --per_device_eval_batch_size 4 \
+    --gradient_accumulation_steps 4 \
+    --evaluation_strategy "no" \
+    --save_strategy "steps" \
+    --save_steps 1000 \
+    --save_total_limit 2 \
+    --learning_rate 2e-5 \
+    --weight_decay 0. \
+    --warmup_ratio 0.03 \
+    --lr_scheduler_type "cosine" \
+    --logging_steps 1 \
+    --fsdp "full_shard auto_wrap" \
+    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
+    --tf32 True
+```
+
+### Supervised fine-tuning
+
+We supervised fine-tune MetaMath-7B with the following hyperparameters:
+
+| Hyperparameter | LLaMA 2 7B |
+|----------------|-------------|
+| Batch size     | 128         |
+| Learning rate  | 2e-5        |
+| Epochs         | 3           |
+| Max length     | 512         |
+| LR scheduler   | cosine      |
+
+<h2 id="evaluation">Evaluation</h2>
+
+we use the vllm to help the fast generation:
+
+```
+python eval_gsm8k.py --model "path/to/save" --data_file ./data/test/GSM8K_test.jsonl
+python eval_math.py --model "path/to/save" --data_file ./data/test/MATH_test.jsonl
+```
+where the "path/to/save" should be replaced by the finetuned model, you can also download our series of MetaMath models in huggingface:  
+🤗 <a href="https://huggingface.co/meta-math/MetaMath-7B-V1.0" target="_blank">MetaMath 7B</a> 🤗 <a href="https://huggingface.co/meta-math/MetaMath-13B-V1.0" target="_blank">MetaMath 13B</a> 🤗 <a href="https://huggingface.co/meta-math/MetaMath-70B-V1.0" target="_blank">MetaMath 70B</a>
+
+The inference prompt for our MetaMath is:
+```
+"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: Let's think step by step."
+```
+
+Thanks for the open source code of [WizardMath](https://github.com/nlpxucan/WizardLM/tree/main/WizardMath) and [RFT](https://github.com/OFA-Sys/gsm8k-ScRel/tree/main). Some of our codes are based on them.
+
+<h2 id="citation">Citation</h2>
+Please cite the paper if you refer to our model, code, data or paper from MetaMath.
+
+```
+@article{yu2023metamath,
+  title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
+  author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
+  journal={arXiv preprint arXiv:2309.12284},
+  year={2023}
+}
+```
--- a/code_for_generating_data/ReadMe.md
+++ b/code_for_generating_data/ReadMe.md
+# Code for generating data in MetaMath
+
+> Tips
+- set `export APIKEY="YOUR_API_KEY"` in `~/.bash_profile` 
+- argument `--num_repeat`: #outputs from ChatGPT for each input by tempearture sampling
+- use argument `--part` to void overwriting previous generated data
+- use argument `--cont` to continue a previous generating procedure, otherwise, will re-fetch data
+
+### 0. Create backward questions
+```bash
+cd code
+bash -x run_create_backward_questions.sh
+```
+
+### 1. AnsAug
+
+```bash
+cd code
+bash -x run_forward.sh
+```
+
+### 2. Rephrasing
+
+```bash
+cd code
+bash -x run_rephrase.sh
+```
+
+### 3. Self-Verification
+
+```bash
+cd code
+bash -x run_sv.sh
+```
+
+### 4. FOBAR
+
+```bash
+cd code
+bash -x run_backward.sh
+```
\ No newline at end of file
--- a/code_for_generating_data/code/main_backward_reasoning.py
+++ b/code_for_generating_data/code/main_backward_reasoning.py
+import path_init
+
+from tqdm import tqdm
+import argparse
+import json
+import os
+import copy
+
+from utils.log_utils import LogUtils
+from utils.math_utils import MATH_DS_LIST
+from utils.parallel_utils import batch_get_api_merge
+from utils.path_utils import PathUtils
+import numpy as np
+
+from utils.answer_clean_utils import answer_cleansing
+
+ds_path_dict = {
+    "GSM8K": "GSM8K/gsm8k_train-cleaned",
+    "MATH": "MATH/MATH_train-cleaned",
+
+    "GSM8K_SV": "GSM8K/gsm8k_train-cleaned_SV",
+    "MATH_SV": "MATH/MATH_train-cleaned_SV",
+}
+
+string_number_dict = {"one": 1, "two": 2, "three": 3, "four": 4, "five": 5, "six": 6,
+ "seven": 7, "eight": 8, "nine": 9, "ten": 10, "eleven": 11, "twelve": 12, "fifth": 5, "sixteen": 16, "half": "50"}
+
+
+class BackwardReasoning():
+    def __init__(self, args):
+        self.args = args
+        self.ds_name = args.ds
+        self.temperature = args.temp
+        self.method = args.method_name
+        self.eng = args.eng
+        self.num_repeat = args.num_repeat
+        self.part = f"_{args.part}" if len(args.part) > 0 else ""
+
+        self.logger = LogUtils.get_or_init_logger(f"backward_cot_{self.args.method_name}_{self.ds_name}_{self.get_eng()}{self.part}", "backward")
+
+        self.inv_q_dict = {}
+        self.inv_question_path = os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict[self.ds_name]}-backward-questions.json""")
+        with open(self.inv_question_path) as f:
+            inv_qs = json.load(f)
+            self.logger.info(f"number of backward question: {len(inv_qs)}")
+            for e in inv_qs:
+                if "inverse_question" in e:
+                    if e["question"] not in self.inv_q_dict:
+                        self.inv_q_dict[e["question"]] = []
+                    self.inv_q_dict[e["question"]].append((e["inverse_question"], e['inverse_question_answer']))
+
+        self.save_file = os.path.join(PathUtils.DATA_HOME_PATH,
+                                      f"{ds_path_dict[self.ds_name]}_{self.args.method_name}_{self.get_eng()}-backward-answers{self.part}.json")
+        if not args.cont:
+            new_examples = []
+            self.todo_path = os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict['GSM8K']}.json") if 'GSM8K' in self.ds_name else os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict['MATH']}.json")
+            with open(self.todo_path) as f:
+                self.examples = json.load(f)
+                for e in self.examples:
+                    candidate_answer = e['answer']
+                    if e["question"] in self.inv_q_dict:
+                        for temp_inv_e in self.inv_q_dict[e["question"]]:
+                            new_e = copy.deepcopy(e)
+                            new_e["candidate_answer"] = candidate_answer
+                            new_e["inv_question"] = temp_inv_e[0]
+                            if temp_inv_e[1] in string_number_dict:
+                                new_e["inv_question_ans"] = str(string_number_dict[temp_inv_e[1]])
+                            else:
+                                new_e['inv_question_ans'] = temp_inv_e[1]
+                            new_e['inv_question_ans'] = answer_cleansing(new_e['inv_question_ans'], ds_name=self.ds_name)
+                            new_examples.append(new_e)
+                self.examples = np.repeat(new_examples, args.num_repeat).tolist()
+
+            self.save_data()
+
+        with open(self.save_file) as f:
+            self.examples = json.load(f)
+
+        self.unknown_var = "x"
+        if "MATH" in self.ds_name:
+            self.unknown_var = "X"
+
+        if self.method == "SV":
+            self.prompt = self.get_prompt("sv_cot_math.txt")
+        elif self.method == "fobar":
+            self.prompt = self.get_prompt("fobar_cot_math.txt")
+        else:
+            raise ValueError(f"unknown dataset: {self.method}")
+
+    def get_eng(self):
+        if "gpt-4" in self.eng:
+            return "gpt-4"
+        elif "gpt-3.5-turbo" in self.eng:
+            return "gpt-3.5-turbo"
+        else:
+            return self.eng
+
+    def save_data(self):
+        with open(self.save_file, 'w', encoding='utf-8') as f:
+            json.dump(self.examples, f, ensure_ascii=False, indent=4)
+
+    def get_prompt(self, prompt_file_name):
+        prompt_file = os.path.join(PathUtils.CONFIG_HOME_PATH, prompt_file_name)
+        with open(prompt_file, "r", encoding='utf-8') as f:
+            prompt = f.read().strip()
+        return prompt
+
+    def evaluate(self, end_idx):
+        num_correct = 0
+        for e in self.examples[0:end_idx]:
+            pred_ans = e['pred_inv_answer_cleaned']
+            gt_ans = answer_cleansing(e['inv_question_ans'], ds_name=self.ds_name)
+
+            if pred_ans == gt_ans:
+                num_correct += 1
+        return num_correct, end_idx, num_correct / end_idx
+
+    def get_inv_split_str(self):
+        if self.ds_name in MATH_DS_LIST:
+            return f"The value of ${self.unknown_var}$ is"
+        else:
+            return f"The value of {self.unknown_var} is"
+
+    def fetch_data_from_openai(self):
+        def wrap(e):
+            variable, special_token = (f"{self.unknown_var}", "") if "GSM8K" in self.ds_name else (f"${self.unknown_var}$", "### ")
+            if self.method == "fobar":
+                wrap_q = f"""{e['inv_question']}\n{special_token}If we know the answer to the above question is {e['candidate_answer']}, what is the value of unknown variable {variable}?"""
+            elif self.method == "SV":
+                wrap_q = f"""{e['inv_question']} What is the value of unknown variable {variable}?"""
+            else:
+                raise ValueError(f"unknown method: {self.method}")
+            return f"""{self.prompt}\n\nQuestion: {wrap_q}\nA: Let's think step by step.\n"""
+
+        def extract(e, reply):
+            e['inv_question_pred_answer'] = reply
+            e['pred_inv_answer_cleaned'] = answer_cleansing(pred=reply, ds_name=self.ds_name, split_str=self.get_inv_split_str())
+
+        todo_list = []
+        for i, example in tqdm(enumerate(self.examples), total=len(self.examples)):
+            if i % 10 == 0:
+                self.logger.info(f"processing: {i}/{len(self.examples)}")
+
+            if "inv_question_pred_answer" in example or 'inv_question' not in example:
+                self.logger.info(f"skip {i}th question, has no inv question.")
+                continue
+
+            todo_list.append(example)
+
+            if (len(todo_list) >= args.batch_size) or i >= (len(self.examples) - 1):
+                batch_get_api_merge(examples=todo_list, eng=self.args.eng, pre_fun=wrap, post_fun=extract,
+                                    logger=self.logger, n_processes=self.args.num_proc,
+                                    temperature=self.temperature, timeout=self.args.time_out, max_try=8)
+                self.save_data()
+                todo_list = []
+                num_correct, num_examples, acc = self.evaluate(i + 1)
+                self.logger.info(
+                    "=" * 20 + f"processed: {i}/{len(self.examples)}, acc: {num_correct}/{num_examples}={100 * acc:.2f}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('--eng', default="gpt-3.5-turbo", type=str)
+    parser.add_argument('--ds', default="GSM8K", type=str)
+    parser.add_argument('--temp', default=0.7, type=float)
+    parser.add_argument('--part',  type=str)
+    parser.add_argument('--cont', action='store_true', help="true=continue previous fetching, default=false")
+    parser.add_argument('--method_name', default="fobar", type=str)
+    parser.add_argument('--num_repeat', default=20, type=int)
+    parser.add_argument('--batch_size', default=20, type=int)
+    parser.add_argument('--time_out', default=30, type=int)
+    parser.add_argument('--num_proc', default=16, type=int)
+    args = parser.parse_args()
+
+    rephrase_cot = BackwardReasoning(args)
+    rephrase_cot.fetch_data_from_openai()
\ No newline at end of file
--- a/code_for_generating_data/code/main_create_backward_questions.py
+++ b/code_for_generating_data/code/main_create_backward_questions.py
+import path_init
+import re
+import json
+import os
+import copy
+import argparse
+
+from utils.answer_clean_utils import delete_extra_zero, string_number_dict, _strip_string
+from utils.path_utils import PathUtils
+from abc import ABC, abstractmethod
+
+
+class InverseQuestions(ABC):
+    def __init__(self, args):
+
+        self.output_clean_path = f"{self.input_path}-cleaned.json"
+        self.output_path = f"{self.input_path}-cleaned-backward-questions.json"
+
+        self.load_dataset()
+        self.parse_examples()
+        print(f"#samples: {len(self.input_examples)}")
+        self.save_cleaned_data()
+
+        self.unknown_var = "x"
+
+    def load_dataset(self):
+        with open(os.path.join(PathUtils.DATA_HOME_PATH, f"{self.input_path}.json"), 'r') as f:
+            self.input_examples = json.load(f)
+
+    @abstractmethod
+    def parse_examples(self): pass
+
+    def replace_number_x(self, s):
+        if s in string_number_dict:
+            s = str(string_number_dict[s])
+        if s[-1] in (",", ".", "?", ";", "”", "'", "!", "\"", "%"):
+            try:
+                mo = re.match('.*([0-9])[^0-9]*$', s)
+                return self.unknown_var + s[mo.end(1):]
+            except:
+                print(f"the string is {s}")
+        elif s[0] in ("$"):
+            return "$" + self.unknown_var
+        else:
+            return self.unknown_var
+
+    @staticmethod
+    def search_number(s):
+        if s in string_number_dict:
+            return True
+        if re.search('[\d]', s) is not None:
+            if re.search('[a-zA-Z]', s) or re.search('[\\n:\(\)-*\"+–-]', s):
+                return None
+            else:
+                return True
+
+    def save_cleaned_data(self):
+        with open(os.path.join(PathUtils.DATA_HOME_PATH, self.output_clean_path), 'w',
+                  encoding='utf-8') as f:
+            json.dump(self.input_examples, f, ensure_ascii=False, indent=4)
+
+    def save_data(self):
+        print(f"#samples for backward reasoning: {len(self.output_examples)}")
+        with open(os.path.join(PathUtils.DATA_HOME_PATH, self.output_path), 'w',
+                  encoding='utf-8') as f:
+            json.dump(self.output_examples, f, ensure_ascii=False, indent=4)
+
+    def make_inv_question(self):
+        self.output_examples = []
+        num_example_has_backward_question = 0
+        for e in self.input_examples:
+            token_list = e['question'].split(' ')
+            numbers_idx = [idx for idx, _ in enumerate(token_list) if self.search_number(_) is not None]
+            if len(numbers_idx) > 0:
+                num_example_has_backward_question += 1
+                for x_idx in numbers_idx:
+                    _e = copy.deepcopy(e)
+                    _token_list = copy.deepcopy(token_list)
+                    inverse_question_answer = _token_list[x_idx]
+                    _token_list[x_idx] = self.replace_number_x(_token_list[x_idx])
+                    _e['inverse_question'] = " ".join(_token_list)
+                    _e['inverse_question_answer'] = inverse_question_answer
+                    self.output_examples.append(_e)
+        print(f"has_inv_q: {num_example_has_backward_question}/{len(self.input_examples)}")
+
+        self.save_data()
+
+
+class GSM8K(InverseQuestions):
+    def __init__(self, args):
+        self.ds_name = "GSM8K"
+        self.input_path = f"GSM8K/gsm8k_train"
+        super(GSM8K, self).__init__(args=args)
+
+    def parse_examples(self):
+        temp_examples = []
+        for e in self.input_examples:
+            q = e['question']
+            a = e['answer']
+            if a[-2:] == ".0":
+                a = a[:-2]
+            a = delete_extra_zero(a)
+
+            ans_detail = e['answer_detail']
+
+            temp_examples.append(dict(question=q, answer=a, answer_detail=ans_detail))
+
+        self.input_examples = temp_examples
+
+
+class _MATH(InverseQuestions):
+    def __init__(self, args):
+        super(_MATH, self).__init__(args=args)
+        self.unknown_var = "X"
+
+    @staticmethod
+    def search_number(s):
+        if re.search('[\d]', s) is not None:
+            if re.search('[a-zA-Z]', s) or re.search('[\\n:\(\)-*\"+–-]', s):
+                return None
+            else:
+                return True
+
+    def find_math_answer(self, s):
+
+        assert ('boxed' in s)
+        # s = s.replace(",", "")
+        ans = s.split('boxed')[-1]
+        if (ans[0] == '{'):
+            stack = 1
+            a = ''
+            for c in ans[1:]:
+                if (c == '{'):
+                    stack += 1
+                    a += c
+                elif (c == '}'):
+                    stack -= 1
+                    if (stack == 0): break
+                    a += c
+                else:
+                    a += c
+        else:
+            a = ans.split('$')[0].strip()
+        a = _strip_string(a)
+        return a
+
+    def parse_examples(self):
+        temp_examples = []
+        for e in self.input_examples:
+            q = e['problem']
+            ans_detail = e['solution']
+            level = e['level']
+            type = e['type']
+            question_id = e['question_id']
+
+            a = self.find_math_answer(ans_detail)
+
+            temp_examples.append(dict(question=q, answer=a, answer_detail=ans_detail,
+                                      level=level, type=type, question_id=question_id))
+
+        self.input_examples = temp_examples
+
+    def replace_number_x(self, s):
+        if s[-1] in (",", ".", "?", ";", "”", "'", "!", "\"", "%"):
+            try:
+                mo = re.match('.*([0-9])[^0-9]*$', s)
+                return self.unknown_var + s[mo.end(1):]
+            except:
+                print(f"the string is {s}")
+        else:
+            return self.unknown_var
+
+    def make_inv_question(self):
+        self.output_examples = []
+        num_example_has_backward_question = 0
+        for e in self.input_examples:
+            token_list = e['question'].split(' ')
+            numbers_idx = [idx for idx, _ in enumerate(token_list) if self.search_number(_) is not None]
+            if len(numbers_idx) > 0:
+                num_example_has_backward_question += 1
+                for x_idx in numbers_idx:
+                    _e = copy.deepcopy(e)
+                    _token_list = copy.deepcopy(token_list)
+                    inverse_question_answer = _token_list[x_idx]
+                    _token_list[x_idx] = self.replace_number_x(_token_list[x_idx])
+                    _e['inverse_question'] = " ".join(_token_list)
+                    _e['inverse_question_answer'] = inverse_question_answer
+                    self.output_examples.append(_e)
+        print(f"has_inv_q: {num_example_has_backward_question}/{len(self.input_examples)}")
+
+        self.save_data()
+
+
+class MATH(_MATH):
+    def __init__(self, args):
+        self.ds_name = "MATH"
+        self.input_path = f"MATH/MATH_train"
+        super(MATH, self).__init__(args=args)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    args = parser.parse_args()
+
+    for method in [GSM8K, MATH]:
+        method(args).make_inv_question()
--- a/code_for_generating_data/code/main_forward_reasoning.py
+++ b/code_for_generating_data/code/main_forward_reasoning.py
+import path_init
+
+from tqdm import tqdm
+from statistics import mode
+import argparse
+
+from utils.parallel_utils import batch_get_api_merge
+from utils.answer_clean_utils import answer_cleansing
+from collections import Counter
+from utils.log_utils import LogUtils
+from utils.path_utils import PathUtils
+import os
+import json
+import numpy as np
+
+ds_path_dict = {
+    "GSM8K": "GSM8K/gsm8k_train-cleaned",
+    "MATH": "MATH/MATH_train-cleaned",
+    "GSM8K_rephrased": "GSM8K/gsm8k_train-cleaned_rephrased_questions",
+    "MATH_rephrased": "MATH/MATH_train-cleaned_rephrased_questions",
+}
+
+
+class ForwardReasoning():
+    def __init__(self, args):
+        self.args = args
+        self.ds_name = args.ds
+        self.temperature = args.temp
+        self.part = f"_{args.part}" if len(args.part) > 0 else ""
+
+        self.eng = args.eng
+        self.num_repeat = args.num_repeat
+        self.method_name = self.get_method_name()
+
+        self.logger = LogUtils.get_or_init_logger(f"{self.method_name}_{self.ds_name}_{self.get_eng()}{self.part}",
+                                                  "forward")
+        self.json_file = os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict[self.ds_name]}.json")
+        self.save_file = os.path.join(PathUtils.DATA_HOME_PATH,
+                                      f"{ds_path_dict[self.ds_name]}_{self.method_name}_answer_{self.get_eng()}_{self.part}.json")
+        self.save_stat_file = os.path.join(PathUtils.DATA_HOME_PATH,
+                                           f"{ds_path_dict[self.ds_name]}_{self.method_name}_answer_{self.get_eng()}_{self.part}_stat.json")
+        if not args.cont:
+            with open(self.json_file) as f:
+                self.examples = json.load(f)
+                self.examples = np.repeat(self.examples, self.num_repeat).tolist()
+            self.save_data()
+
+        with open(self.save_file) as f:
+            self.examples = json.load(f)
+
+        if "GSM8K" in self.ds_name:
+            self.prompt = self.get_prompt("ansaug_cot_gsm8k.txt")
+        elif "MATH" in self.ds_name:
+            self.prompt = self.get_prompt("ansaug_cot_math.txt")
+        else:
+            raise ValueError(f"unknown dataset={self.ds_name}")
+
+    def save_data(self):
+        with open(self.save_file, 'w', encoding='utf-8') as f:
+            json.dump(self.examples, f, ensure_ascii=False, indent=4)
+
+    def get_method_name(self):
+        return "SCComplexCoT"
+
+    def get_eng(self):
+        if "gpt-4" in self.eng:
+            return "gpt-4"
+        elif "gpt-3.5-turbo" in self.eng:
+            return "gpt-3.5-turbo"
+        else:
+            return self.eng
+
+    def get_prompt(self, prompt_file_name):
+        prompt_file = os.path.join(PathUtils.CONFIG_HOME_PATH, prompt_file_name)
+        with open(prompt_file, "r", encoding='utf-8') as f:
+            prompt = f.read().strip()
+        return prompt
+
+    def save_ans_stat(self):
+        examples_collect = {}
+        for e in self.examples[0:len(self.examples)]:
+            question = e['question']
+            if question not in examples_collect:
+                examples_collect[question] = {}
+
+                for k in ["answer", "question", "answer_detail"]:
+                    examples_collect[question][k] = e[k]
+
+                examples_collect[question]['pred_answer_cleaned_list'] = []
+
+            examples_collect[question]['pred_answer_cleaned_list'].append(e['pred_answer_cleaned'])
+
+        stat_list = []
+        for e in examples_collect.values():
+            counter = Counter(e['pred_answer_cleaned_list'])
+
+            e["ans_stat"] = dict(counter)
+
+            del e['pred_answer_cleaned_list']
+            stat_list.append(e)
+
+        with open(self.save_stat_file, 'w', encoding='utf-8') as f:
+            json.dump(stat_list, f, ensure_ascii=False, indent=4)
+
+    def evaluate(self, end_idx):
+        result_stat_dict = {}
+        for e in self.examples[0:end_idx]:
+            question = e['question']
+
+            if question not in result_stat_dict:
+                result_stat_dict[question] = []
+
+            result_stat_dict[question].append(e)
+
+        num_correct = 0
+        for q in result_stat_dict:
+            e_list = result_stat_dict[q]
+            answer = e_list[0]['answer']
+            pred_answers = [_['pred_answer_cleaned'] for _ in e_list]
+            freq_answer = mode(pred_answers)
+
+            if freq_answer == answer:
+                num_correct += 1
+        msg = f"acc: {100 * num_correct / len(result_stat_dict.keys()):.4f}"
+        self.logger.info(msg)
+        return num_correct, len(result_stat_dict.keys()), num_correct / len(result_stat_dict.keys())
+
+    def fetch_data_from_openai(self):
+        def wrap(e):
+            return "{}\n\nQuestion: {}\nA: Let's think step by step.\n".format(self.prompt, e['question'])
+
+        def extract(e, reply):
+            e['pred_answer'] = reply
+            e['pred_answer_cleaned'] = answer_cleansing(pred=reply, ds_name=self.ds_name)
+
+        todo_list = []
+        for i, example in tqdm(enumerate(self.examples), total=len(self.examples)):
+            if i % 10 == 0:
+                self.logger.info(f"processing: {i}/{len(self.examples)}")
+
+            if "pred_answer" in example and len(example['pred_answer']) > 10: # contain answer
+                continue
+
+            todo_list.append(example)
+
+            if len(todo_list) >= self.args.batch_size or i >= (len(self.examples) - 1):
+                if len(todo_list) > 0:
+                    batch_get_api_merge(examples=todo_list, eng=self.args.eng, pre_fun=wrap, post_fun=extract,
+                                        logger=self.logger, n_processes=self.args.num_proc,
+                                        temperature=self.temperature, timeout=self.args.time_out, max_try=0)
+                    todo_list = []
+
+                self.save_data()
+
+                num_correct, num_examples, acc = self.evaluate(i + 1)
+                self.logger.info(
+                    "=" * 20 + f"processed: {i}/{len(self.examples)}, acc: {num_correct}/{num_examples}={100 * acc:.2f}")
+
+        self.save_ans_stat()
+
+
+class SCComplexCoT(ForwardReasoning):
+    def __init__(self, args):
+        super(SCComplexCoT, self).__init__(args=args)
+
+    def get_method_name(self):
+        return "SCComplexCoT"
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('--eng', default="gpt-3.5-turbo", type=str)
+    parser.add_argument('--ds', default="GSM8K", type=str)
+    parser.add_argument('--part', default="", type=str)
+    parser.add_argument('--temp', default=0.7, help="temperature", type=float)
+    parser.add_argument('--method_name', default="SCComplexCoT", type=str)
+    parser.add_argument('--cont', action='store_true', help="true=continue previous fetching, default=false")
+    parser.add_argument('--num_repeat', default=10, type=int, help="for self-consistency")
+    parser.add_argument('--batch_size', default=20, type=int)
+    parser.add_argument('--time_out', default=30, type=int)
+    parser.add_argument('--num_proc', default=16, type=int)
+    args = parser.parse_args()
+
+    method = SCComplexCoT(args)
+
+    method.fetch_data_from_openai()
+    method.logger.info("final evaluation")
+    num_correct, num_question, acc = method.evaluate(len(method.examples))
+    msg = f"finished acc: {100 * num_correct / num_question:.4f}"
--- a/code_for_generating_data/code/main_rephrase_question.py
+++ b/code_for_generating_data/code/main_rephrase_question.py
+from tqdm import tqdm
+
+import path_init
+import argparse
+import json
+import os
+
+from utils.log_utils import LogUtils
+from utils.parallel_utils import batch_get_api_merge
+from utils.path_utils import PathUtils
+import numpy as np
+
+
+ds_path_dict = {
+    "GSM8K": "GSM8K/gsm8k_train-cleaned",
+    "MATH": "MATH/MATH_train-cleaned",
+}
+
+
+class RephraseQuestion():
+    def __init__(self, args):
+        self.args = args
+        self.ds_name = args.ds
+        self.temperature = args.temp
+
+        self.eng = args.eng
+        self.method_name = self.get_method_name()
+        self.num_repeat = args.num_repeat
+        self.logger = LogUtils.get_or_init_logger(f"rephrase_question_{self.ds_name}_{self.get_eng()}_rephrased_questions", "rephrase")
+
+        self.json_file = os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict[self.ds_name]}.json")
+        self.save_file = os.path.join(PathUtils.DATA_HOME_PATH,
+                                      f"{ds_path_dict[self.ds_name]}_rephrased_questions.json")
+
+        if not args.cont:
+            with open(self.json_file) as f:
+                self.examples = json.load(f)
+                self.examples = np.repeat(self.examples, self.num_repeat).tolist()
+            self.save_data()
+
+        with open(self.save_file) as f:
+            self.examples = json.load(f)
+
+        if "GSM8K" in self.ds_name:
+            self.prompt = self.get_prompt(f"rephrase_cot_gsm8k.txt")
+        elif "MATH" in self.ds_name:
+            self.prompt = self.get_prompt(f"rephrase_cot_math.txt")
+        else:
+            raise ValueError(f"unknown dataset={self.ds_name}")
+
+    def get_method_name(self):
+        return "SCComplexCoT"
+
+    def get_prompt(self, prompt_file_name):
+        prompt_file = os.path.join(PathUtils.CONFIG_HOME_PATH, prompt_file_name)
+        with open(prompt_file, "r", encoding='utf-8') as f:
+            prompt = f.read().strip()
+        return prompt
+
+    def get_eng(self):
+        if "gpt-4" in self.eng:
+            return "gpt-4"
+        elif "gpt-3.5-turbo" in self.eng:
+            return "gpt-3.5-turbo"
+        else:
+            return self.eng
+
+    def save_data(self):
+        with open(self.save_file, 'w', encoding='utf-8') as f:
+            json.dump(self.examples, f, ensure_ascii=False, indent=4)
+
+    def fetch_data_from_openai(self):
+        def wrap(e):
+            return f"""{self.prompt}\n\nQuestion: {e['question']}\nRephrase the above question: """
+
+        def extract(e, reply):
+            original_question = e['question']
+            e['question'] = reply
+            e['original_question'] = original_question
+
+        todo_list = []
+        for i, example in tqdm(enumerate(self.examples), total=len(self.examples)):
+            if i % 10 == 0:
+                self.logger.info(f"processing: {i}/{len(self.examples)}")
+
+            if "original_question" in example:
+                continue
+
+            todo_list.append(example)
+
+            if len(todo_list) >= args.batch_size or i >= (len(self.examples) - 1):
+                if len(todo_list) > 0:
+                    batch_get_api_merge(examples=todo_list, eng=self.args.eng, pre_fun=wrap, post_fun=extract,
+                                        logger=self.logger, n_processes=self.args.num_proc,
+                                        temperature=self.temperature, timeout=self.args.time_out, max_try=8)
+                    todo_list = []
+
+                self.save_data()
+
+                self.logger.info("=" * 40 + f"processed: {i}/{len(self.examples)}")
+        self.logger.info("Finished.")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('--eng', default="gpt-3.5-turbo", type=str)
+    parser.add_argument('--ds', default="GSM8K", type=str)
+    parser.add_argument('--temp', default=0.7, type=float)
+    parser.add_argument('--cont', action='store_true', help="true=continue previous fetching, default=false")
+    parser.add_argument('--num_repeat', default=40, type=int)
+    parser.add_argument('--batch_size', default=20, type=int)
+    parser.add_argument('--time_out', default=30, type=int)
+    parser.add_argument('--num_proc', default=20, type=int)
+    args = parser.parse_args()
+
+    rephrase_cot = RephraseQuestion(args)
+    rephrase_cot.fetch_data_from_openai()
\ No newline at end of file
--- a/code_for_generating_data/code/main_self_verification.py
+++ b/code_for_generating_data/code/main_self_verification.py
+import path_init
+
+import argparse
+import json
+import os
+
+from utils.log_utils import LogUtils
+from utils.math_utils import MATH_DS_LIST
+from utils.parallel_utils import batch_get_api_merge
+from utils.path_utils import PathUtils
+import numpy as np
+from utils.answer_clean_utils import answer_cleansing
+
+
+ds_path_dict = {
+    "GSM8K": "GSM8K/gsm8k_train-cleaned",
+    "MATH": "MATH/MATH_train-cleaned",
+}
+
+class SelfVerification():
+    def __init__(self, args):
+        self.args = args
+        self.ds_name = args.ds
+        self.temperature = args.temp
+
+        self.eng = args.eng
+        self.method_name = self.get_method_name()
+        self.num_repeat = args.num_repeat
+        self.logger = LogUtils.get_or_init_logger(f"SV_{self.ds_name}_{self.get_eng()}_rewritten_questions", "for_tuning")
+
+        self.json_file = os.path.join(PathUtils.DATA_HOME_PATH, f"{ds_path_dict[self.ds_name]}-backward-questions.json")
+        self.save_file = os.path.join(PathUtils.DATA_HOME_PATH,
+                                      f"{ds_path_dict[self.ds_name]}_SV-backward-questions.json")
+
+        if not args.cont:
+            with open(self.json_file) as f:
+                self.examples = json.load(f)
+                self.examples = np.repeat(self.examples, self.num_repeat).tolist()
+            self.save_data()
+
+        with open(self.save_file) as f:
+            self.examples = json.load(f)
+
+        if "GSM8K" in self.ds_name:
+            self.prompt = self.get_prompt("sv_rewrite_question_prompt_gsm8k.txt")
+        else:
+            self.prompt = self.get_prompt("sv_rewrite_question_prompt_math.txt")
+
+    def get_method_name(self):
+        return "SCComplexCoT"
+
+    def get_eng(self):
+        if "gpt-4" in self.eng:
+            return "gpt-4"
+        elif "gpt-3.5-turbo" in self.eng:
+            return "gpt-3.5-turbo"
+        else:
+            return self.eng
+
+    def get_prompt(self, prompt_file_name):
+        prompt_file = os.path.join(PathUtils.CONFIG_HOME_PATH, prompt_file_name)
+        with open(prompt_file, "r", encoding='utf-8') as f:
+            prompt = f.read().strip()
+        return prompt
+
+    def save_data(self):
+        with open(self.save_file, 'w', encoding='utf-8') as f:
+            json.dump(self.examples, f, ensure_ascii=False, indent=4)
+
+    def fetch_data_from_openai(self):
+        def wrap(e):
+            text = e['inverse_question'].replace(',', '.')
+            position_fullstop = text[::-1].find('.')
+            answer = answer_cleansing(e['answer'], ds_name=self.ds_name)
+            question = text[len(text) - position_fullstop:].strip()
+            e['base_text'] = e['inverse_question'][:len(text) - position_fullstop].strip()
+            return f"{self.prompt}\n\nQuestion: {question} The answer is {answer}.\n Result: "
+
+        def extract(e, reply):
+            e['inverse_question'] = f"{e['base_text']} {reply}"
+
+        todo_list = []
+        for i, example in enumerate(self.examples):
+            if i % 10 == 0:
+                self.logger.info(f"processing: {i}/{len(self.examples)}")
+
+            if "rephrased_question" in example:
+                continue
+
+            todo_list.append(example)
+
+            if len(todo_list) >= args.batch_size or i >= (len(self.examples) - 1):
+                if len(todo_list) > 0:
+                    batch_get_api_merge(examples=todo_list, eng=self.args.eng, pre_fun=wrap, post_fun=extract,
+                                        logger=self.logger, n_processes=self.args.num_proc,
+                                        temperature=self.temperature, timeout=self.args.time_out, max_try=8)
+                    todo_list = []
+
+                self.save_data()
+                self.logger.info("=" * 40 + f"processed: {i}/{len(self.examples)}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('--eng', default="gpt-3.5-turbo", type=str)
+    parser.add_argument('--ds', default="GSM8K", type=str)
+    parser.add_argument('--temp', default=0.7, type=float)
+    parser.add_argument('--cont', action='store_true', help="true=continue previous fetching, default=false")
+    parser.add_argument('--num_repeat', default=40, type=int)
+    parser.add_argument('--batch_size', default=20, type=int)
+    parser.add_argument('--time_out', default=10, type=int)
+    parser.add_argument('--num_proc', default=16, type=int)
+    args = parser.parse_args()
+
+    rephrase_cot = SelfVerification(args)
+    rephrase_cot.fetch_data_from_openai()
\ No newline at end of file
--- a/code_for_generating_data/code/path_init.py
+++ b/code_for_generating_data/code/path_init.py
+
+from pathlib import Path
+
+from utils.path_utils import PathUtils
+
+"""
+set the current working path
+"""
+
+work_path = Path().resolve().parent
+PathUtils.HOME_PATH = work_path
+PathUtils.set_path()
+print(work_path)
+
+
+def null_fun():
+    pass
\ No newline at end of file
--- a/code_for_generating_data/code/run_backward.sh
+++ b/code_for_generating_data/code/run_backward.sh
+source ~/.bash_profile
+
+model_name=gpt-3.5-turbo
+
+ds_name="GSM8K"
+#ds_name="MATH"
+
+python main_backward_reasoning.py --part part1 --method_name fobar --eng $model_name --ds $ds_name --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30  --num_proc 20
--- a/code_for_generating_data/code/run_create_backward_questions.sh
+++ b/code_for_generating_data/code/run_create_backward_questions.sh
+python main_create_backward_questions.py
\ No newline at end of file
--- a/code_for_generating_data/code/run_forward.sh
+++ b/code_for_generating_data/code/run_forward.sh
+source ~/.bash_profile
+
+model_name=gpt-3.5-turbo
+
+method_name="SCComplexCoT"
+
+ds_name="GSM8K"
+#ds_name="MATH"
+
+python main_forward_reasoning.py --part part1 --eng $model_name --ds $ds_name --method $method_name --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30 --num_proc 20
\ No newline at end of file
--- a/code_for_generating_data/code/run_rephrase.sh
+++ b/code_for_generating_data/code/run_rephrase.sh
+source ~/.bash_profile
+
+model_name=gpt-3.5-turbo
+
+ds_name="GSM8K"
+#ds_name="MATH"
+
+python main_rephrase_question.py --eng $model_name --ds $ds_name --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30  --num_proc 20
+
+python main_forward_reasoning.py --part part1 --eng $model_name --ds "${ds_name}_rephrased" --method $method_name --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30 --num_proc 20
--- a/code_for_generating_data/code/run_sv.sh
+++ b/code_for_generating_data/code/run_sv.sh
+source ~/.bash_profile
+
+model_name=gpt-3.5-turbo
+
+ds_name="GSM8K"
+#ds_name="MATH"
+
+python main_self_verification.py --eng $model_name --ds $ds_name --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30 --num_proc 20
+
+
+python main_backward_reasoning.py --part part1 --method_name SV --eng $model_name --ds "${ds_name}_SV" --temp 0.7 --num_repeat 2 --batch_size 500 --time_out 30  --num_proc 20
+
--- a/code_for_generating_data/code/utils/__init__.py
+++ b/code_for_generating_data/code/utils/__init__.py
--- a/code_for_generating_data/code/utils/answer_clean_utils.py
+++ b/code_for_generating_data/code/utils/answer_clean_utils.py
+import re
+
+from utils.math_utils import MATH_DS_LIST
+
+string_number_dict = {"one": 1, "two": 2, "three": 3, "four": 4, "five": 5,
+                      "six": 6, "seven": 7, "eight": 8, "nine": 9, "ten": 10,
+                      "eleven": 11, "twelve": 12, "fifth": 5,
+                      "sixteen": 16, "half": "50%"}
+
+
+def delete_extra_zero(n):
+    try:
+        n=float(n)
+    except:
+        # print("None {}".format(n))
+        return n
+    if isinstance(n, int):
+        return str(n)
+    if isinstance(n, float):
+        n = str(n).rstrip('0')  # 删除小数点后多余的0
+        n = int(n.rstrip('.')) if n.endswith('.') else float(n)  # 只剩小数点直接转int，否则转回float
+        n=str(n)
+        return n
+
+def extract_math_answer(pred_str, split_str='The answer is '):
+    if(split_str in pred_str):
+        pred = pred_str.split(split_str)[-1].strip()
+    elif('the answer is ' in pred_str):
+        pred = pred_str.split('the answer is ')[-1].strip()
+    elif 'boxed' in pred_str:
+        ans = pred_str.split('boxed')[-1]
+        if (ans[0] == '{'):
+            stack = 1
+            a = ''
+            for c in ans[1:]:
+                if (c == '{'):
+                    stack += 1
+                    a += c
+                elif (c == '}'):
+                    stack -= 1
+                    if (stack == 0): break
+                    a += c
+                else:
+                    a += c
+        else:
+            a = ans.split('$')[0].strip()
+        a = _strip_string(a)
+        pred=a
+
+    else:
+        pattern = '-?\d*\.?\d+'
+        pred = re.findall(pattern, pred_str)
+        if(len(pred) >= 1):
+            # print(pred_str)
+            pred = pred[-1]
+        else:
+            pred = ''
+    if pred != "" and len(pred) >= 1:
+        if pred[-1] == ".":
+            pred = pred[:-1]
+        if len(pred) >= 1 and pred[-1] == "/":
+            pred = pred[:-1]
+    pred = _strip_string(pred)
+    if 'boxed' in pred:
+        ans = pred.split('boxed')[-1]
+        if (ans[0] == '{'):
+            stack = 1
+            a = ''
+            for c in ans[1:]:
+                if (c == '{'):
+                    stack += 1
+                    a += c
+                elif (c == '}'):
+                    stack -= 1
+                    if (stack == 0): break
+                    a += c
+                else:
+                    a += c
+        else:
+            a = ans.split('$')[0].strip()
+        a = _strip_string(a)
+        pred=a
+    return pred
+
+
+def answer_cleansing(pred, ds_name, split_str="The answer is"):
+    if ds_name in MATH_DS_LIST:
+        return extract_math_answer(pred, split_str)
+    preds = pred.split(split_str)
+
+    pred = preds[-1]
+
+    pred = pred.replace(",", "")
+    pred = [delete_extra_zero(s.replace(",", "")) for s in re.findall(r'-?\d+/?\.?\d*', pred)]
+
+    # If there is no candidate in list, null is set.
+    if len(pred) == 0:
+        pred = ""
+    else:
+        pred = pred[-1]
+
+    # (For arithmetic tasks) if a word ends with period, it will be omitted ...
+    if pred != "":
+        if pred[-1] == ".":
+            pred = pred[:-1]
+        if pred[-1] == "/":
+            pred = pred[:-1]
+    return pred
+
+def _fix_fracs(string):
+    substrs = string.split("\\frac")
+    new_str = substrs[0]
+    if len(substrs) > 1:
+        substrs = substrs[1:]
+        for substr in substrs:
+            new_str += "\\frac"
+            if substr[0] == "{":
+                new_str += substr
+            else:
+                try:
+                    assert len(substr) >= 2
+                except:
+                    return string
+                a = substr[0]
+                b = substr[1]
+                if b != "{":
+                    if len(substr) > 2:
+                        post_substr = substr[2:]
+                        new_str += "{" + a + "}{" + b + "}" + post_substr
+                    else:
+                        new_str += "{" + a + "}{" + b + "}"
+                else:
+                    if len(substr) > 2:
+                        post_substr = substr[2:]
+                        new_str += "{" + a + "}" + b + post_substr
+                    else:
+                        new_str += "{" + a + "}" + b
+    string = new_str
+    return string
+
+
+def _fix_a_slash_b(string):
+    if len(string.split("/")) != 2:
+        return string
+    a = string.split("/")[0]
+    b = string.split("/")[1]
+    try:
+        a = int(a)
+        b = int(b)
+        assert string == "{}/{}".format(a, b)
+        new_string = "\\frac{" + str(a) + "}{" + str(b) + "}"
+        return new_string
+    except:
+        return string
+
+
+def _remove_right_units(string):
+    # "\\text{ " only ever occurs (at least in the val set) when describing units
+    if "\\text{ " in string:
+        splits = string.split("\\text{ ")
+        # assert len(splits) == 2
+        return splits[0]
+    else:
+        return string
+
+
+def _fix_sqrt(string):
+    if "\\sqrt" not in string:
+        return string
+    splits = string.split("\\sqrt")
+    new_string = splits[0]
+    for split in splits[1:]:
+        if split[0] != "{":
+            a = split[0]
+            new_substr = "\\sqrt{" + a + "}" + split[1:]
+        else:
+            new_substr = "\\sqrt" + split
+        new_string += new_substr
+    return new_string
+
+def _strip_string(string):
+    # linebreaks
+    string = string.replace("\n", "")
+    # print(string)
+
+    # remove inverse spaces
+    string = string.replace("\\!", "")
+    # print(string)
+
+    # replace \\ with \
+    string = string.replace("\\\\", "\\")
+    # print(string)
+
+    # replace tfrac and dfrac with frac
+    string = string.replace("tfrac", "frac")
+    string = string.replace("dfrac", "frac")
+    # print(string)
+
+    # remove \left and \right
+    string = string.replace("\\left", "")
+    string = string.replace("\\right", "")
+    # print(string)
+
+    # Remove circ (degrees)
+    string = string.replace("^{\\circ}", "")
+    string = string.replace("^\\circ", "")
+
+    # remove dollar signs
+    string = string.replace("\\$", "")
+
+    # remove units (on the right)
+    string = _remove_right_units(string)
+
+    # remove percentage
+    string = string.replace("\\%", "")
+    string = string.replace("\%", "")
+
+    # " 0." equivalent to " ." and "{0." equivalent to "{." Alternatively, add "0" if "." is the start of the string
+    string = string.replace(" .", " 0.")
+    string = string.replace("{.", "{0.")
+    # if empty, return empty string
+    if len(string) == 0:
+        return string
+    if string[0] == ".":
+        string = "0" + string
+
+    # to consider: get rid of e.g. "k = " or "q = " at beginning
+    if len(string.split("=")) == 2:
+        if len(string.split("=")[0]) <= 2:
+            string = string.split("=")[1]
+
+    # fix sqrt3 --> sqrt{3}
+    string = _fix_sqrt(string)
+
+    # remove spaces
+    string = string.replace(" ", "")
+
+    # \frac1b or \frac12 --> \frac{1}{b} and \frac{1}{2}, etc. Even works with \frac1{72} (but not \frac{72}1). Also does a/b --> \\frac{a}{b}
+    string = _fix_fracs(string)
+
+    # manually change 0.5 --> \frac{1}{2}
+    if string == "0.5":
+        string = "\\frac{1}{2}"
+
+    # NOTE: X/Y changed to \frac{X}{Y} in dataset, but in simple cases fix in case the model output is X/Y
+    string = _fix_a_slash_b(string)
+
+    return string
\ No newline at end of file
--- a/code_for_generating_data/code/utils/config_utils.py
+++ b/code_for_generating_data/code/utils/config_utils.py
+from utils.path_utils import PathUtils
+
+import yaml
+import torch
+import os
+
+
+class ConfigUtils(object):
+
+    def __init__(self):
+        pass
+
+    @staticmethod
+    def get_device(device_id=0):
+        device = torch.device("cpu")
+        if torch.cuda.is_available():
+            print("GPU is available, using GPU:{}".format(device_id))
+            device = torch.device('cuda:{}'.format(device_id))
+        else:
+            print("GPU is unavailable, using CPU")
+        return device
+
+    @staticmethod
+    def get_config_dict(config_file_name):
+        config_file_full_path = os.path.join(PathUtils.CONFIG_HOME_PATH, config_file_name)
+        return yaml.load(open(config_file_full_path, "r"), Loader=yaml.Loader)
--- a/code_for_generating_data/code/utils/log_utils.py
+++ b/code_for_generating_data/code/utils/log_utils.py
+import logging
+import os
+from pathlib import Path
+
+from utils.path_utils import PathUtils
+
+
+class LogUtils:
+
+    LOGGER_FORMATTER = logging.Formatter('%(asctime)s %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
+    LOGGER_DICT = {}
+
+    def __init__(self):
+        pass
+
+    @staticmethod
+    def get_or_init_logger(file_name, dir_name=None, level=logging.DEBUG) -> logging.Logger:
+        """
+        log: time-level-prefix-msg
+        :param file_name:
+        :param prefix:
+        :param dir_name:
+        :param level:
+        :return:
+        """
+        if dir_name is None:
+            raise ValueError("job id should not be None!")
+
+        log_dir_path = "{}/{}".format(PathUtils.Log_HOME_PATH, dir_name)
+        log_file = "{}/log_{}.log".format(log_dir_path, file_name)
+
+        # return the logger if exists
+        if log_file in LogUtils.LOGGER_DICT:
+            return LogUtils.LOGGER_DICT[log_file]
+
+        # create a logger if not exist
+        os.makedirs(os.path.dirname(log_file), exist_ok=True)
+        handler = logging.FileHandler(log_file, mode='a')
+        handler.setFormatter(LogUtils.LOGGER_FORMATTER)
+        logger = logging.getLogger(file_name)
+        logger.setLevel(level)
+        logger.addHandler(handler)
+
+        LogUtils.LOGGER_DICT[log_file] = logger
+        os.utime(Path(log_dir_path))
+
+        return logger
+
+    @staticmethod
+    def get_stat_from_dict(stat_dict, keys=None, is_simple=True):
+        if not keys:
+            keys = stat_dict.keys()
+
+        msg = []
+        for key in keys:
+            if is_simple:
+                msg.append(stat_dict[key].simple_repr())
+            else:
+                msg.append(str(stat_dict[key]))
+
+        return "; ".join(msg)
+
--- a/code_for_generating_data/code/utils/math_utils.py
+++ b/code_for_generating_data/code/utils/math_utils.py
+MATH_DS_LIST = ["MATH"]