README_official.md 1.37 KB
Newer Older
mashun1's avatar
omnisql  
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Web Table-Driven Database Synthesis

This is the first step in our data synthesis framework, designed to generate realistic databases using web tables.

## Prepare Web Tables
Unzip `web_tables.json.zip` to access 19,935 high-quality web tables from [Tablib](https://arxiv.org/pdf/2310.07875).

## Step 1: Initial Database Generation
Generate an initial database from the web tables.

1. Run `python3 generate_schema_synthesis_prompts.py` to create prompts for database generation.
2. Run `python3 synthesize_schema.py` to generate initial database schemas. (Implement the `llm_inference()` function to use your preferred LLMs.)

## Step 2: Database Enhancement
Enhance the initially generated databases to increase complexity and realism.

1. Run `python3 generate_schema_enhancement_prompts.py` to create prompts for database enhancement.
2. Run `python3 enhance_schema.py` to generate enhanced database schemas. (Implement the `llm_inference()` function to use your preferred LLMs.)

## Step 3: Building SQLite Databases
Build SQLite databases based on the enhanced database schemas.

1. Run `python3 build_sqlite_databases.py` to construct SQLite databases, which are stored in the `synthetic_sqlite_databases` folder.
2. Run `python3 generate_tables_json.py` to create the `tables.json` file, containing detailed information about the synthetic databases, aligning with previous text-to-SQL datasets.