code_and_doc.md 16.4 KB
Newer Older
thunder's avatar
thunder committed
1
2
3
4
  - Appendix

    This appendix contains python, document specifications and Pull Request process. Please follow the relevant contents

thunder's avatar
thunder committed
5
    - [Appendix 1:Python Code Specification](#Appendix1)
thunder's avatar
thunder committed
6

thunder's avatar
thunder committed
7
    - [Appendix 2:Document Specification](#Appendix2)
thunder's avatar
thunder committed
8

thunder's avatar
thunder committed
9
    - [Appendix 3:Pull Request Description](#Appendix3)
thunder's avatar
thunder committed
10
11
12

    <a name="Appendix1"></a>

thunder's avatar
thunder committed
13
    ## Appendix 1:Python Code Specification
thunder's avatar
thunder committed
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

    The Python code of PaddleOCR follows [PEP8 Specification]( https://www.python.org/dev/peps/pep-0008/ ), some of the key concerns include the following

    - Space 

      - Spaces should be added after commas, semicolons, colons, not before them

        ```python
        # true:
        print(x, y)
        
        # false:
        print(x , y)
        ```

      - When specifying a keyword parameter or default parameter value in a function, do not use spaces on both sides of it

        ```python
        # true:
        def complex(real, imag=0.0)
        # false:
        def complex(real, imag = 0.0)
        ```

    - comment

      - Inline comments: inline comments are indicated by the` # `sign. Two spaces should be left between code and` # `, and one space should be left between` # `and comments, for example

        ```python
        x = x + 1  # Compensate for border
        ```

      - Functions and methods: The definition of each function should include the following:

thunder's avatar
thunder committed
48
        - Function description: Utility, input and output of function
thunder's avatar
thunder committed
49

thunder's avatar
thunder committed
50
        - Args: Name and description of each parameter
thunder's avatar
thunder committed
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
        - Returns: The meaning and type of the return value

        ```python
        def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
            """Fetches rows from a Bigtable.
        
            Retrieves rows pertaining to the given keys from the Table instance
            represented by big_table.  Silly things may happen if
            other_silly_variable is not None.
        
            Args:
                big_table: An open Bigtable Table instance.
                keys: A sequence of strings representing the key of each table row
                    to fetch.
                other_silly_variable: Another optional variable, that has a much
                    longer name than the other args, and which does nothing.
        
            Returns:
                A dict mapping keys to the corresponding table row data
                fetched. Each row is represented as a tuple of strings. For
                example:
        
                {'Serak': ('Rigel VII', 'Preparer'),
                 'Zim': ('Irk', 'Invader'),
                 'Lrrr': ('Omicron Persei 8', 'Emperor')}
        
                If a key from the keys argument is missing from the dictionary,
                then that row was not found in the table.
            """
            pass
        ```

    <a name="Appendix2"></a>

thunder's avatar
thunder committed
85
    ## Appendix 2: Document Specification
thunder's avatar
thunder committed
86

thunder's avatar
thunder committed
87
    ### 2.1 Overall Description
thunder's avatar
thunder committed
88
89
90
91
92
93
94
95
96
97
98

    - Document Location: If you add new features to your original Markdown file, please **Do not re-create** a new file. If you don't know where to add it, you can first PR the code and then ask the official in commit.

    - New Markdown Document Name: Describe the content of the document in English, typically a combination of lowercase letters and underscores, such as `add_New_Algorithm.md`

    - New Markdown Document Format: Catalog - Body - FAQ

      > The directory generation method can use [this site](https://ecotrust-canada.github.io/markdown-toc/ ) Automatically extract directories after copying MD contents, and then add `<a name='XXXX'></a> before each heading of the MD file

    - English and Chinese: Any changes or additions to the document need to be made in both Chinese and English documents.

thunder's avatar
thunder committed
99
    ### 2.2 Format Specification
thunder's avatar
thunder committed
100
101
102
103
104

    - Title format: The document title format follows the format of: Arabic decimal point combination-space-title (for example, `2.1 XXXX`, `2.XXXX`)

    - Code block: Displays code in code block format that needs to be run, describing the meaning of command parameters before the code block. for example:

thunder's avatar
thunder committed
105
      > Pipeline of detection + direction Classify + recognition: Vertical text can be recognized after set direction classifier parameters`--use_angle_cls true`.
thunder's avatar
thunder committed
106
107
108
109
110
111
112
113
114
115
116
117
118
      >
      > ```
      > paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true
      > ```

    - Variable Rrferences: If code variables or command parameters are referenced in line, they need to be represented in line code, for example, above `--use_angle_cls true` with one space in front and one space in back

    - Uniform naming: e.g. PP-OCRv2, PP-OCR mobile, `paddleocr` whl package, PPOCRLabel, Paddle Lite, etc.

    - Supplementary notes: Supplementary notes by reference format `>`.

    - Picture: If a picture is added to the description document, specify the naming of the picture (describing its content) and add the picture under `doc/`.

thunder's avatar
thunder committed
119
120
    - Title: Capitalize the first letter of each word in the title.

thunder's avatar
thunder committed
121
122
    <a name="Appendix3"></a>

thunder's avatar
thunder committed
123
    ## Appendix 3: Pull Request Description
thunder's avatar
thunder committed
124

thunder's avatar
thunder committed
125
    ### 3.1 PaddleOCR Branch Description
thunder's avatar
thunder committed
126
127
128
129
130
131
132
133
134
135
136
137

    PaddleOCR will maintain two branches in the future, one for each:

    - release/x.x family branch: stable release version branch, also the default branch. PaddleOCR releases a new release branch based on feature updates and adapts to the release version of Paddle. As versions iterate, more and more release/x.x family branches are maintained by default with the latest version of the release branch.
    - dygraph branch: For the development branch, adapts the dygraph version of the Paddle dynamic graph to primarily develop new functionality. If you need to redevelop, choose the dygraph branch. To ensure that the dygraph branch pulls out the release/x.x branch when needed, the code for the dygraph branch can only use the valid API in the latest release branch of Paddle. That is, if a new API has been developed in the Paddle dygraph branch but has not yet appeared in the release branch code, do not use it in Paddle OCR. In addition, performance optimization, parameter tuning, policy updates that do not involve API can be developed normally.

    The historical branch of PaddleOCR will no longer be maintained in the future. These branches will continue to be maintained, considering that some of you may still be using them:

    - Develop branch: This branch was used for the development and testing of static diagrams and is currently compatible with version >=1.7. If you have special needs, you can also use this branch to accommodate older versions of Paddle, but you won't update your code until you fix the bug.

    PaddleOCR welcomes you to actively contribute code to repo. Here are some basic processes for contributing code.

thunder's avatar
thunder committed
138
    ### 3.2 PaddleOCR Code Submission Process And Specification
thunder's avatar
thunder committed
139

thunder's avatar
thunder committed
140
    > If you are familiar with Git use, you can jump directly to [Some Conventions For Submitting Code in 3.2.10](#Some_conventions_for_submitting_code)
thunder's avatar
thunder committed
141

thunder's avatar
thunder committed
142
    #### 3.2.1 Create Your `Remote Repo`
thunder's avatar
thunder committed
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157

    - In PaddleOCR [GitHub Home]( https://github.com/PaddlePaddle/PaddleOCR ) Click the `Fork` button in the upper left corner to create a `remote repo`in your personal directory, such as ` https://github.com/ {your_name}/PaddleOCR`.

    ![banner](../banner.png)

    - Clone `Remote repo`

    ```
    # pull code of develop branch
    git clone https://github.com/{your_name}/PaddleOCR.git -b dygraph
    cd PaddleOCR
    ```

    > Clone failures are mostly due to network reasons, try again later or configure the proxy

thunder's avatar
thunder committed
158
    #### 3.2.2 Login And Connect Using Token
thunder's avatar
thunder committed
159

thunder's avatar
thunder committed
160
    Start by viewing the information for the current `remote repo`.
thunder's avatar
thunder committed
161
162
163
164
165
166
167

    ```
    git remote -v
    # origin    https://github.com/{your_name}/PaddleOCR.git (fetch)
    # origin    https://github.com/{your_name}/PaddleOCR.git (push)
    ```

thunder's avatar
thunder committed
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
    Only the information of the clone `remote repo`, i.e. the PaddleOCR under your username, is available. Due to the change in Github's login method, you need to reconfigure the `remote repo` address by means of a Token. The token is generated as follows:
    
    1. Find Personal Access Tokens: Click on your avatar in the upper right corner of the Github page and choose Settings --> Developer settings --> Personal access tokens,
    
    2. Click Generate new token: Fill in the token name in Note, such as 'paddle'. In Select scopes, select repo (required), admin:repo_hook, delete_repo, etc. You can check them according to your needs. Then click Generate token to generate the token, and finally copy the generated token.

    Delete the original origin configuration
   
    ```
    git remote rm origin
    ```
    
    Change the remote branch to `https://oauth2:{token}@github.com/{your_name}/PaddleOCR.git`. For example, if the token value is 12345 and your user name is PPOCR, run the following command
    
    ```
    git remote add origin https://oauth2:12345@github.com/PPOCR/PaddleOCR.git
    ```
    
    This establishes a connection to our own `remote repo`. Next we create a remote host of the original PaddleOCR repo, named upstream.
thunder's avatar
thunder committed
187
188
189
190
191

    ```
    git remote add upstream https://github.com/PaddlePaddle/PaddleOCR.git
    ```

thunder's avatar
thunder committed
192
    Use `git remote -v` to view current `remote warehouse` information, output as follows, found to include two origin and two upstream of `remote repo` .
thunder's avatar
thunder committed
193
194
195
196
197
198
199
200
201
202

    ```
    origin    https://github.com/{your_name}/PaddleOCR.git (fetch)
    origin    https://github.com/{your_name}/PaddleOCR.git (push)
    upstream    https://github.com/PaddlePaddle/PaddleOCR.git (fetch)
    upstream    https://github.com/PaddlePaddle/PaddleOCR.git (push)
    ```

    This is mainly to keep the local repository up to date when subsequent pull request (PR) submissions are made.

thunder's avatar
thunder committed
203
    #### 3.2.3 Create Local Branch
thunder's avatar
thunder committed
204

thunder's avatar
thunder committed
205
206
    First get the latest code of upstream, then create a new_branch branch based on the dygraph of the upstream repo (upstream).
    
thunder's avatar
thunder committed
207
    ```
thunder's avatar
thunder committed
208
209
    git fetch upstream
    git checkout -b new_branch upstream/dygraph
thunder's avatar
thunder committed
210
    ```
thunder's avatar
thunder committed
211
212
213
214
215
216
217
218
219
220
221
    
    > If for a newly forked PaddleOCR project, the user's remote repo (origin) has the same branch updates as the upstream repository (upstream), you can also create a new local branch based on the default branch of the origin repo or a specified branch with the following command
    >
    > ```
    > # Create new_branch branch on user remote repo (origin) based on develop branch
    > git checkout -b new_branch origin/develop
    > # Create new_branch branch based on upstream remote repo develop branch
    > # If you need to create a new branch from upstream, 
    > # you need to first use git fetch upstream to get upstream code
    > git checkout -b new_branch upstream/develop
    > ```
thunder's avatar
thunder committed
222
223
224
225
226
227
228

    The final switch to the new branch is displayed with the following output information.

    ```
    Branch new_branch set up to track remote branch develop from upstream.
    Switched to a new branch 'new_branch'
    ```
thunder's avatar
thunder committed
229
230
231
    
    After switching branches, file changes can be made on this branch
    
thunder's avatar
thunder committed
232
    #### 3.2.4 Use Pre-Commit Hook
thunder's avatar
thunder committed
233
234
235
236
237
238
239
240
241
242
243
244
245
246

    Paddle developers use the pre-commit tool to manage Git pre-submit hooks. It helps us format the source code (C++, Python) and automatically check for basic things (such as having only one EOL per file, not adding large files to Git) before committing it.

    The pre-commit test is part of the unit test in Travis-CI. PR that does not satisfy the hook cannot be submitted to PaddleOCR. Install it first and run it in the current directory:

    ```
    pip install pre-commit
    pre-commit install
    ```

     >  1. Paddle uses clang-format to adjust the C/C++ source code format. Make sure the `clang-format` version is above 3.8.
     >
     >  2. Yapf installed through pip install pre-commit is slightly different from conda install-c conda-forge pre-commit, and PaddleOCR developers use `pip install pre-commit`.

thunder's avatar
thunder committed
247
    #### 3.2.5 Modify And Submit Code
thunder's avatar
thunder committed
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

     If you make some changes on `README.Md ` on PaddleOCR, you can view the changed file through `git status`, and then add the changed file using `git add`。

    ```
    git status # View change files
    git add README.md
    pre-commit
    ```

    Repeat these steps until the pre-comit format check does not error. As shown below.

    ![img](../precommit_pass.png)

    Use the following command to complete the submission.

    ```
    git commit -m "your commit info"
    ```

thunder's avatar
thunder committed
267
    #### 3.2.6 Keep Local Repo Up To Date
thunder's avatar
thunder committed
268
269
270
271
272
273
274
275
276

    Get the latest code for upstream and update the current branch. Here the upstream comes from section 2.2, `Connecting to a remote repo`.

    ```
    git fetch upstream
    # If you want to commit to another branch, you need to pull code from another branch of upstream, here is develop
    git pull upstream develop
    ```

thunder's avatar
thunder committed
277
    #### 3.2.7 Push To Remote Repo
thunder's avatar
thunder committed
278
279
280
281
282

    ```
    git push origin new_branch
    ```

thunder's avatar
thunder committed
283
    #### 3.2.7 Submit Pull Request
thunder's avatar
thunder committed
284
285
286
287
288

    Click the new pull request to select the local branch and the target branch, as shown in the following figure. In the description of PR, fill in the functions completed by the PR. Next, wait for review, and if you need to modify something, update the corresponding branch in origin with the steps above.

    ![banner](../pr.png)

thunder's avatar
thunder committed
289
    #### 3.2.8 Sign CLA Agreement And Pass Unit Tests
thunder's avatar
thunder committed
290
291
292
293
294
295
296

    - Signing the CLA When submitting a Pull Request to PaddlePaddle for the first time, you need to sign a CLA (Contributor License Agreement) agreement to ensure that your code can be incorporated as follows:

      1. Please check the Check section in PR, find the license/cla, and click on the right detail to enter the CLA website

      2. Click Sign in with GitHub to agree on the CLA website and when clicked, it will jump back to your Pull Request page

thunder's avatar
thunder committed
297
    #### 3.2.9 Delete Branch
thunder's avatar
thunder committed
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319

    - Remove remote branch

      After PR is merged into the main repo, we can delete the branch of the remote repofrom the PR page.
      You can also use `git push origin:branch name` to delete remote branches, such as:

  ```
    git push origin :new_branch
  ```

- Delete local branch

  ```
      # Switch to the development branch, otherwise the current branch cannot be deleted
      git checkout develop
      
      # Delete new_ Branch Branch
      git branch -D new_branch
  ```

    <a name="Some_conventions_for_submitting_code"></a>

thunder's avatar
thunder committed
320
    #### 3.2.10 Some Conventions For Submitting Code
thunder's avatar
thunder committed
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349

    In order for official maintainers to better focus on the code itself when reviewing it, please follow the following conventions each time you submit your code:

    1)Please ensure that the unit tests in Travis-CI pass smoothly. If not, indicate that there is a problem with the submitted code, and the official maintainer generally does not review it.
    
    2)Before submitting a Pull Request.
    
    - Note the number of commits.

      Reason: If you only modify one file and submit more than a dozen commits, each commit will only make a few modifications, which can be very confusing to the reviewer. The reviewer needs to look at each commit individually to see what changes have been made, and does not exclude the fact that changes between commits overlap each other.
      
      Suggestion: Keep as few commits as possible each time you submit, and supplement your last commit with git commit --amend. For multiple commits that have been Push to a remote warehouse, you can refer to [squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed ).

    - Note the name of each commit: it should reflect the content of the current commit, not be too arbitrary.


    3) If you have solved a problem, add in the first comment box of the Pull Request:fix #issue_number,This will automatically close the corresponding Issue when the Pull Request is merged. Key words include:close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved,please choose the right vocabulary. Detailed reference [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
    
    In addition, in response to the reviewer's comments, you are requested to abide by the following conventions:
    
    1) Each review comment from an official maintainer would like a response, which would better enhance the contribution of the open source community.
    
    - If you agree to the review opinion and modify it accordingly, give a simple Done.
    - If you disagree with the review, please give your own reasons for refuting.
    
    2)If there are many reviews:
    
    - Please give an overview of the changes.
    - Please reply with `start a review', not directly. The reason is that each reply sends an e-mail message, which can cause a mail disaster.