dangminh214 commited on
Commit
b26e93d
·
0 Parent(s):

Clean initial commit (no large files, no LFS pointers)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitignore +9 -0
  2. D-FINE/.github/ISSUE_TEMPLATE/bug_report.md +38 -0
  3. D-FINE/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
  4. D-FINE/.github/workflows/is.yml +63 -0
  5. D-FINE/.github/workflows/pre-commit.yml +15 -0
  6. D-FINE/.gitignore +10 -0
  7. D-FINE/.pre-commit-config.yaml +68 -0
  8. D-FINE/Dockerfile +48 -0
  9. D-FINE/LICENSE +201 -0
  10. D-FINE/README.md +700 -0
  11. D-FINE/README_cn.md +673 -0
  12. D-FINE/README_ja.md +698 -0
  13. D-FINE/configs/dataset/coco_detection.yml +41 -0
  14. D-FINE/configs/dataset/crowdhuman_detection.yml +41 -0
  15. D-FINE/configs/dataset/custom_detection.yml +41 -0
  16. D-FINE/configs/dataset/obj365_detection.yml +41 -0
  17. D-FINE/configs/dataset/voc_detection.yml +40 -0
  18. D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_l_ch.yml +44 -0
  19. D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_m_ch.yml +60 -0
  20. D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_n_ch.yml +82 -0
  21. D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_s_ch.yml +65 -0
  22. D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_x_ch.yml +55 -0
  23. D-FINE/configs/dfine/custom/dfine_hgnetv2_l_custom.yml +44 -0
  24. D-FINE/configs/dfine/custom/dfine_hgnetv2_m_custom.yml +60 -0
  25. D-FINE/configs/dfine/custom/dfine_hgnetv2_n_custom.yml +76 -0
  26. D-FINE/configs/dfine/custom/dfine_hgnetv2_s_custom.yml +65 -0
  27. D-FINE/configs/dfine/custom/dfine_hgnetv2_x_custom.yml +55 -0
  28. D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_l_obj2custom.yml +53 -0
  29. D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_m_obj2custom.yml +66 -0
  30. D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml +67 -0
  31. D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_x_obj2custom.yml +62 -0
  32. D-FINE/configs/dfine/dfine_hgnetv2_l_coco.yml +44 -0
  33. D-FINE/configs/dfine/dfine_hgnetv2_m_coco.yml +60 -0
  34. D-FINE/configs/dfine/dfine_hgnetv2_n_coco.yml +82 -0
  35. D-FINE/configs/dfine/dfine_hgnetv2_s_coco.yml +61 -0
  36. D-FINE/configs/dfine/dfine_hgnetv2_x_coco.yml +56 -0
  37. D-FINE/configs/dfine/include/dataloader.yml +39 -0
  38. D-FINE/configs/dfine/include/dfine_hgnetv2.yml +82 -0
  39. D-FINE/configs/dfine/include/optimizer.yml +36 -0
  40. D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml +52 -0
  41. D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml +49 -0
  42. D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml +65 -0
  43. D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml +62 -0
  44. D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj2coco.yml +88 -0
  45. D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj365.yml +84 -0
  46. D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml +66 -0
  47. D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml +63 -0
  48. D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml +61 -0
  49. D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml +58 -0
  50. D-FINE/configs/runtime.yml +24 -0
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # ignore large files and runtime outputs
2
+ *.pth
3
+ *.pt
4
+ *.engine
5
+ *.onnx
6
+ engines/
7
+ D-FINE/weight/
8
+ examples/
9
+ output/
D-FINE/.github/ISSUE_TEMPLATE/bug_report.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Bug report
3
+ about: Create a report to help us improve
4
+ title: ''
5
+ labels: ''
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ **Describe the bug**
11
+ A clear and concise description of what the bug is.
12
+
13
+ **To Reproduce**
14
+ Steps to reproduce the behavior:
15
+ 1. Go to '...'
16
+ 2. Click on '....'
17
+ 3. Scroll down to '....'
18
+ 4. See error
19
+
20
+ **Expected behavior**
21
+ A clear and concise description of what you expected to happen.
22
+
23
+ **Screenshots**
24
+ If applicable, add screenshots to help explain your problem.
25
+
26
+ **Desktop (please complete the following information):**
27
+ - OS: [e.g. iOS]
28
+ - Browser [e.g. chrome, safari]
29
+ - Version [e.g. 22]
30
+
31
+ **Smartphone (please complete the following information):**
32
+ - Device: [e.g. iPhone6]
33
+ - OS: [e.g. iOS8.1]
34
+ - Browser [e.g. stock browser, safari]
35
+ - Version [e.g. 22]
36
+
37
+ **Additional context**
38
+ Add any other context about the problem here.
D-FINE/.github/ISSUE_TEMPLATE/feature_request.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Feature request
3
+ about: Suggest an idea for this project
4
+ title: ''
5
+ labels: ''
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ **Is your feature request related to a problem? Please describe.**
11
+ A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12
+
13
+ **Describe the solution you'd like**
14
+ A clear and concise description of what you want to happen.
15
+
16
+ **Describe alternatives you've considered**
17
+ A clear and concise description of any alternative solutions or features you've considered.
18
+
19
+ **Additional context**
20
+ Add any other context or screenshots about the feature request here.
D-FINE/.github/workflows/is.yml ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Issue Screening
2
+
3
+ on:
4
+ issues:
5
+ types: [opened, edited]
6
+
7
+ jobs:
8
+ screen-issues:
9
+ runs-on: ubuntu-latest
10
+
11
+ steps:
12
+ - name: Checkout repository
13
+ uses: actions/checkout@v2
14
+
15
+ - name: Get details and check for keywords
16
+ id: issue_check
17
+ uses: actions/github-script@v5
18
+ with:
19
+ script: |
20
+ const issue = context.payload.issue;
21
+ const issueNumber = issue.number;
22
+ const title = (issue.title || "").toLowerCase();
23
+ const body = (issue.body || "").toLowerCase();
24
+ core.setOutput('number', issueNumber);
25
+
26
+ const keywords = ["spam", "badword", "inappropriate", "suspicious", "unusual", "star", "stars", "buy", "buying"];
27
+ let containsKeyword = false;
28
+
29
+ console.log(`Checking issue #${issueNumber} for whole word keywords...`);
30
+ for (const keyword of keywords) {
31
+ const regex = new RegExp(`\\b${keyword}\\b`);
32
+ if (regex.test(title) || regex.test(body)) {
33
+ containsKeyword = true;
34
+ console.log(`Whole word keyword '${keyword}' found in issue #${issueNumber}.`);
35
+ break;
36
+ }
37
+ }
38
+
39
+ console.log(`Keyword check for issue #${issueNumber} completed. contains_keyword=${containsKeyword}`);
40
+ core.setOutput('contains_keyword', containsKeyword);
41
+
42
+ - name: Close and Modify Issue if it contains keywords
43
+ if: steps.issue_check.outputs.contains_keyword == 'true'
44
+ uses: actions/github-script@v5
45
+ with:
46
+ github-token: ${{ secrets.ISSUE }}
47
+ script: |
48
+ const issueNumber = ${{ steps.issue_check.outputs.number }};
49
+ try {
50
+ console.log(`Attempting to close, clear body, and rename title of issue #${issueNumber} due to keyword.`);
51
+ await github.rest.issues.update({
52
+ owner: context.repo.owner,
53
+ repo: context.repo.repo,
54
+ issue_number: issueNumber,
55
+ state: 'closed',
56
+ title: "Cleared suspicious issues",
57
+ body: ""
58
+ });
59
+ console.log(`Successfully closed, cleared body, and renamed title of issue #${issueNumber}.`);
60
+ } catch (error) {
61
+ console.error(`Failed to update issue #${issueNumber}:`, error);
62
+ throw error;
63
+ }
D-FINE/.github/workflows/pre-commit.yml ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: pre-commit
2
+
3
+ on:
4
+ pull_request:
5
+ branches: [master]
6
+ push:
7
+ branches: [master]
8
+
9
+ jobs:
10
+ pre-commit:
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v3
14
+ - uses: actions/setup-python@v3
15
+ - uses: pre-commit/[email protected]
D-FINE/.gitignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # existing entries
2
+ output/
3
+ *.pyc
4
+ wandb/
5
+ *.onnx
6
+ weight/dfine-s.pth
7
+
8
+ # ignore tensorRT engine files
9
+ *.engine
10
+ engines/
D-FINE/.pre-commit-config.yaml ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright The Lightning team.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ default_language_version:
16
+ python: python3
17
+
18
+ ci:
19
+ autofix_prs: true
20
+ autoupdate_commit_msg: "[pre-commit.ci] pre-commit suggestions"
21
+ autoupdate_schedule: quarterly
22
+ # submodules: true
23
+
24
+ repos:
25
+ - repo: https://github.com/pre-commit/pre-commit-hooks
26
+ rev: v5.0.0
27
+ hooks:
28
+ - id: end-of-file-fixer
29
+ - id: trailing-whitespace
30
+ # - id: check-json # skip for incompatibility with .devcontainer/devcontainer.json
31
+ - id: check-yaml
32
+ - id: check-toml
33
+ - id: check-docstring-first
34
+ - id: check-executables-have-shebangs
35
+ - id: check-case-conflict
36
+ # - id: check-added-large-files
37
+ # args: ["--maxkb=100", "--enforce-all"]
38
+ - id: detect-private-key
39
+
40
+ # - repo: https://github.com/PyCQA/docformatter
41
+ # rev: v1.7.5
42
+ # hooks:
43
+ # - id: docformatter
44
+ # additional_dependencies: [tomli]
45
+ # args: ["--in-place"]
46
+
47
+ - repo: https://github.com/executablebooks/mdformat
48
+ rev: 0.7.17
49
+ hooks:
50
+ - id: mdformat
51
+ exclude: '^.*\.md$'
52
+ args: ["--number"]
53
+ additional_dependencies:
54
+ - mdformat-gfm
55
+ - mdformat-black
56
+ - mdformat_frontmatter
57
+
58
+ - repo: https://github.com/astral-sh/ruff-pre-commit
59
+ rev: v0.6.9
60
+ hooks:
61
+ # try to fix what is possible
62
+ - id: ruff
63
+ args: ["--fix", "--ignore", "E501,F401,F403,F841,E741"]
64
+ # # perform formatting updates
65
+ # - id: ruff-format
66
+ # validate if all is fine with preview mode
67
+ - id: ruff
68
+ args: ["--ignore", "E501,F401,F403,F841,E741"]
D-FINE/Dockerfile ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM registry.cn-hangzhou.aliyuncs.com/peterande/dfine:v1
2
+
3
+ # FULL BUILDING INFO:
4
+
5
+ # docker login --username=xxx registry.cn-hangzhou.aliyuncs.com
6
+ # cd [PATH_2_Dockerfile]
7
+ # docker build -t xxx:v1 .
8
+ # docker tag xxx:v1 registry.cn-hangzhou.aliyuncs.com/xxx/xxx:v1
9
+ # docker push registry.cn-hangzhou.aliyuncs.com/xxx/xxx:v1
10
+
11
+ # FROM dockerpull.com/nvidia/cuda:12.0.1-cudnn8-devel-ubuntu18.04
12
+ # ARG DEBIAN_FRONTEND=noninteractive
13
+ # ENV PATH="/root/miniconda3/bin:${PATH}"
14
+ # ARG PATH="/root/miniconda3/bin:${PATH}"
15
+
16
+ # RUN sed -i "s/archive.ubuntu./mirrors.aliyun./g" /etc/apt/sources.list
17
+ # RUN sed -i "s/deb.debian.org/mirrors.aliyun.com/g" /etc/apt/sources.list
18
+ # RUN sed -i "s/security.debian.org/mirrors.aliyun.com\/debian-security/g" /etc/apt/sources.list
19
+ # RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
20
+
21
+ # RUN apt-get update && apt-get install -y --no-install-recommends apt-utils && \
22
+ # apt-get upgrade -y && \
23
+ # apt-get install -y vim git libgl1-mesa-glx libglib2.0-0 libsm6 && \
24
+ # apt-get install -y libxrender1 libxext6 tmux wget htop && \
25
+ # apt-get install -y build-essential gcc g++ gdb binutils pciutils net-tools iputils-ping iproute2 git vim wget curl make openssh-server openssh-client tmux tree man unzip unrar
26
+
27
+ # ENV PYTHONIOENCODING=UTF-8
28
+
29
+ # RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
30
+ # mkdir /root/.conda && \
31
+ # bash Miniconda3-latest-Linux-x86_64.sh -b && \
32
+ # rm -f Miniconda3-latest-Linux-x86_64.sh && \
33
+ # conda init bash
34
+
35
+ # RUN conda config --set show_channel_urls yes \
36
+ # && echo "channels:" > ~/.condarc \
37
+ # && echo " - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/" >> ~/.condarc \
38
+ # && echo " - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/" >> ~/.condarc \
39
+ # && echo "show_channel_urls: true" \
40
+ # && cat ~/.condarc \
41
+ # && pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple \
42
+ # && cat ~/.config/pip/pip.conf
43
+
44
+ # RUN python3 -m pip install --upgrade pip && \
45
+ # python3 -m pip install --upgrade setuptools
46
+
47
+ # RUN python3 -m pip install jupyterlab pycocotools PyYAML tensorboard scipy
48
+ # RUN python3 -m pip --default-timeout=10000 install torch torchvision
D-FINE/LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
D-FINE/README.md ADDED
@@ -0,0 +1,700 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
2
+
3
+ English | [简体中文](README_cn.md) | [日本語](README_ja.md) | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
4
+
5
+ <h2 align="center">
6
+ D-FINE: Redefine Regression Task of DETRs as Fine&#8209;grained&nbsp;Distribution&nbsp;Refinement
7
+ </h2>
8
+
9
+
10
+
11
+ <p align="center">
12
+ <a href="https://huggingface.co/spaces/developer0hye/D-FINE">
13
+ <img alt="hf" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
14
+ </a>
15
+ <a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
16
+ <img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
17
+ </a>
18
+ <a href="https://github.com/Peterande/D-FINE/pulls">
19
+ <img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
20
+ </a>
21
+ <a href="https://github.com/Peterande/D-FINE/issues">
22
+ <img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
23
+ </a>
24
+ <a href="https://arxiv.org/abs/2410.13842">
25
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
26
+ </a>
27
+ <!-- <a href="mailto: [email protected]">
28
+ <img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
29
+ </a> -->
30
+ <a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
31
+ <img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
32
+ </a>
33
+ <a href="https://github.com/Peterande/D-FINE">
34
+ <img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
35
+ </a>
36
+ </p>
37
+
38
+
39
+
40
+ <p align="center">
41
+ 📄 This is the official implementation of the paper:
42
+ <br>
43
+ <a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
44
+ </p>
45
+
46
+
47
+
48
+ <p align="center">
49
+ Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, and Feng Wu
50
+ </p>
51
+
52
+ <p align="center">
53
+ University of Science and Technology of China
54
+ </p>
55
+
56
+ <p align="center">
57
+ <a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
58
+ <img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
59
+ </a>
60
+ </p>
61
+
62
+ <!-- <table><tr>
63
+ <td><img src=https://github.com/Peterande/storage/blob/master/latency.png border=0 width=333></td>
64
+ <td><img src=https://github.com/Peterande/storage/blob/master/params.png border=0 width=333></td>
65
+ <td><img src=https://github.com/Peterande/storage/blob/master/flops.png border=0 width=333></td>
66
+ </tr></table> -->
67
+
68
+ <p align="center">
69
+ <strong>If you like D-FINE, please give us a ⭐! Your support motivates us to keep improving!</strong>
70
+ </p>
71
+
72
+ <p align="center">
73
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
74
+ </p>
75
+
76
+ D-FINE is a powerful real-time object detector that redefines the bounding box regression task in DETRs as Fine-grained Distribution Refinement (FDR) and introduces Global Optimal Localization Self-Distillation (GO-LSD), achieving outstanding performance without introducing additional inference and training costs.
77
+
78
+ <details open>
79
+ <summary> Video </summary>
80
+
81
+ We conduct object detection using D-FINE and YOLO11 on a complex street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenging conditions such as backlighting, motion blur, and dense crowds, D-FINE-X successfully detects nearly all targets, including subtle small objects like backpacks, bicycles, and traffic lights. Its confidence scores and the localization precision for blurred edges are significantly higher than those of YOLO11.
82
+
83
+ <!-- We use D-FINE and YOLO11 on a street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenges like backlighting, motion blur, and dense crowds, D-FINE-X outperforms YOLO11x, detecting more objects with higher confidence and better precision. -->
84
+
85
+ https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
86
+
87
+ </details>
88
+
89
+ ## 🚀 Updates
90
+ - [x] **\[2024.10.18\]** Release D-FINE series.
91
+ - [x] **\[2024.10.25\]** Add custom dataset finetuning configs ([#7](https://github.com/Peterande/D-FINE/issues/7)).
92
+ - [x] **\[2024.10.30\]** Update D-FINE-L (E25) pretrained model, with performance improved by 2.0%.
93
+ - [x] **\[2024.11.07\]** Release **D-FINE-N**, achiving 42.8% AP<sup>val</sup> on COCO @ 472 FPS<sup>T4</sup>!
94
+
95
+ ## Model Zoo
96
+
97
+ ### COCO
98
+ | Model | Dataset | AP<sup>val</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
99
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
100
+ **D&#8209;FINE&#8209;N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
101
+ **D&#8209;FINE&#8209;S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
102
+ **D&#8209;FINE&#8209;M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
103
+ **D&#8209;FINE&#8209;L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
104
+ **D&#8209;FINE&#8209;X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
105
+
106
+
107
+ ### Objects365+COCO
108
+ | Model | Dataset | AP<sup>val</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
109
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
110
+ **D&#8209;FINE&#8209;S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
111
+ **D&#8209;FINE&#8209;M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
112
+ **D&#8209;FINE&#8209;L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
113
+ **D&#8209;FINE&#8209;X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
114
+
115
+ **We highly recommend that you use the Objects365 pre-trained model for fine-tuning:**
116
+
117
+ ⚠️ **Important**: Please note that this is generally beneficial for complex scene understanding. If your categories are very simple, it might lead to overfitting and suboptimal performance.
118
+ <details>
119
+ <summary><strong> 🔥 Pretrained Models on Objects365 (Best generalization) </strong></summary>
120
+
121
+ | Model | Dataset | AP<sup>val</sup> | AP<sup>5000</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
122
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
123
+ **D&#8209;FINE&#8209;S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
124
+ **D&#8209;FINE&#8209;M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
125
+ **D&#8209;FINE&#8209;L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
126
+ **D&#8209;FINE&#8209;L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
127
+ **D&#8209;FINE&#8209;X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
128
+ - **E25**: Re-trained and extended the pretraining to 25 epochs.
129
+ - **AP<sup>val</sup>** is evaluated on *Objects365* full validation set.
130
+ - **AP<sup>5000</sup>** is evaluated on the first 5000 samples of the *Objects365* validation set.
131
+ </details>
132
+
133
+ **Notes:**
134
+ - **AP<sup>val</sup>** is evaluated on *MSCOCO val2017* dataset.
135
+ - **Latency** is evaluated on a single T4 GPU with $batch\\_size = 1$, $fp16$, and $TensorRT==10.4.0$.
136
+ - **Objects365+COCO** means finetuned model on *COCO* using pretrained weights trained on *Objects365*.
137
+
138
+
139
+
140
+ ## Quick start
141
+
142
+ ### Setup
143
+
144
+ ```shell
145
+ conda create -n dfine python=3.11.9
146
+ conda activate dfine
147
+ pip install -r requirements.txt
148
+ ```
149
+
150
+
151
+ ### Data Preparation
152
+
153
+ <details>
154
+ <summary> COCO2017 Dataset </summary>
155
+
156
+ 1. Download COCO2017 from [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) or [COCO](https://cocodataset.org/#download).
157
+ 1. Modify paths in [coco_detection.yml](./configs/dataset/coco_detection.yml)
158
+
159
+ ```yaml
160
+ train_dataloader:
161
+ img_folder: /data/COCO2017/train2017/
162
+ ann_file: /data/COCO2017/annotations/instances_train2017.json
163
+ val_dataloader:
164
+ img_folder: /data/COCO2017/val2017/
165
+ ann_file: /data/COCO2017/annotations/instances_val2017.json
166
+ ```
167
+
168
+ </details>
169
+
170
+ <details>
171
+ <summary> Objects365 Dataset </summary>
172
+
173
+ 1. Download Objects365 from [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365).
174
+
175
+ 2. Set the Base Directory:
176
+ ```shell
177
+ export BASE_DIR=/data/Objects365/data
178
+ ```
179
+
180
+ 3. Extract and organize the downloaded files, resulting directory structure:
181
+
182
+ ```shell
183
+ ${BASE_DIR}/train
184
+ ├── images
185
+ │ ├── v1
186
+ │ │ ├── patch0
187
+ │ │ │ ├── 000000000.jpg
188
+ │ │ │ ├── 000000001.jpg
189
+ │ │ │ └── ... (more images)
190
+ │ ├── v2
191
+ │ │ ├── patchx
192
+ │ │ │ ├── 000000000.jpg
193
+ │ │ │ ├── 000000001.jpg
194
+ │ │ │ └── ... (more images)
195
+ ├── zhiyuan_objv2_train.json
196
+ ```
197
+
198
+ ```shell
199
+ ${BASE_DIR}/val
200
+ ├── images
201
+ │ ├── v1
202
+ │ │ ├── patch0
203
+ │ │ │ ├── 000000000.jpg
204
+ │ │ │ └── ... (more images)
205
+ │ ├── v2
206
+ │ │ ├── patchx
207
+ │ │ │ ├── 000000000.jpg
208
+ │ │ │ └── ... (more images)
209
+ ├── zhiyuan_objv2_val.json
210
+ ```
211
+
212
+ 4. Create a New Directory to Store Images from the Validation Set:
213
+ ```shell
214
+ mkdir -p ${BASE_DIR}/train/images_from_val
215
+ ```
216
+
217
+ 5. Copy the v1 and v2 folders from the val directory into the train/images_from_val directory
218
+ ```shell
219
+ cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
220
+ cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
221
+ ```
222
+
223
+ 6. Run remap_obj365.py to merge a subset of the validation set into the training set. Specifically, this script moves samples with indices between 5000 and 800000 from the validation set to the training set.
224
+ ```shell
225
+ python tools/remap_obj365.py --base_dir ${BASE_DIR}
226
+ ```
227
+
228
+
229
+ 7. Run the resize_obj365.py script to resize any images in the dataset where the maximum edge length exceeds 640 pixels. Use the updated JSON file generated in Step 5 to process the sample data. Ensure that you resize images in both the train and val datasets to maintain consistency.
230
+ ```shell
231
+ python tools/resize_obj365.py --base_dir ${BASE_DIR}
232
+ ```
233
+
234
+ 8. Modify paths in [obj365_detection.yml](./configs/dataset/obj365_detection.yml)
235
+
236
+ ```yaml
237
+ train_dataloader:
238
+ img_folder: /data/Objects365/data/train
239
+ ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
240
+ val_dataloader:
241
+ img_folder: /data/Objects365/data/val/
242
+ ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
243
+ ```
244
+
245
+
246
+ </details>
247
+
248
+ <details>
249
+ <summary>CrowdHuman</summary>
250
+
251
+ Download COCO format dataset here: [url](https://aistudio.baidu.com/datasetdetail/231455)
252
+
253
+ </details>
254
+
255
+ <details>
256
+ <summary>Custom Dataset</summary>
257
+
258
+ To train on your custom dataset, you need to organize it in the COCO format. Follow the steps below to prepare your dataset:
259
+
260
+ 1. **Set `remap_mscoco_category` to `False`:**
261
+
262
+ This prevents the automatic remapping of category IDs to match the MSCOCO categories.
263
+
264
+ ```yaml
265
+ remap_mscoco_category: False
266
+ ```
267
+
268
+ 2. **Organize Images:**
269
+
270
+ Structure your dataset directories as follows:
271
+
272
+ ```shell
273
+ dataset/
274
+ ├── images/
275
+ │ ├── train/
276
+ │ │ ├── image1.jpg
277
+ │ │ ├── image2.jpg
278
+ │ │ └── ...
279
+ │ ├── val/
280
+ │ │ ├── image1.jpg
281
+ │ │ ├── image2.jpg
282
+ │ │ └── ...
283
+ └── annotations/
284
+ ├── instances_train.json
285
+ ├── instances_val.json
286
+ └── ...
287
+ ```
288
+
289
+ - **`images/train/`**: Contains all training images.
290
+ - **`images/val/`**: Contains all validation images.
291
+ - **`annotations/`**: Contains COCO-formatted annotation files.
292
+
293
+ 3. **Convert Annotations to COCO Format:**
294
+
295
+ If your annotations are not already in COCO format, you'll need to convert them. You can use the following Python script as a reference or utilize existing tools:
296
+
297
+ ```python
298
+ import json
299
+
300
+ def convert_to_coco(input_annotations, output_annotations):
301
+ # Implement conversion logic here
302
+ pass
303
+
304
+ if __name__ == "__main__":
305
+ convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
306
+ ```
307
+
308
+ 4. **Update Configuration Files:**
309
+
310
+ Modify your [custom_detection.yml](./configs/dataset/custom_detection.yml).
311
+
312
+ ```yaml
313
+ task: detection
314
+
315
+ evaluator:
316
+ type: CocoEvaluator
317
+ iou_types: ['bbox', ]
318
+
319
+ num_classes: 777 # your dataset classes
320
+ remap_mscoco_category: False
321
+
322
+ train_dataloader:
323
+ type: DataLoader
324
+ dataset:
325
+ type: CocoDetection
326
+ img_folder: /data/yourdataset/train
327
+ ann_file: /data/yourdataset/train/train.json
328
+ return_masks: False
329
+ transforms:
330
+ type: Compose
331
+ ops: ~
332
+ shuffle: True
333
+ num_workers: 4
334
+ drop_last: True
335
+ collate_fn:
336
+ type: BatchImageCollateFunction
337
+
338
+ val_dataloader:
339
+ type: DataLoader
340
+ dataset:
341
+ type: CocoDetection
342
+ img_folder: /data/yourdataset/val
343
+ ann_file: /data/yourdataset/val/ann.json
344
+ return_masks: False
345
+ transforms:
346
+ type: Compose
347
+ ops: ~
348
+ shuffle: False
349
+ num_workers: 4
350
+ drop_last: False
351
+ collate_fn:
352
+ type: BatchImageCollateFunction
353
+ ```
354
+
355
+ </details>
356
+
357
+
358
+ ## Usage
359
+ <details open>
360
+ <summary> COCO2017 </summary>
361
+
362
+ <!-- <summary>1. Training </summary> -->
363
+ 1. Set Model
364
+ ```shell
365
+ export model=l # n s m l x
366
+ ```
367
+
368
+ 2. Training
369
+ ```shell
370
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
371
+ ```
372
+
373
+ <!-- <summary>2. Testing </summary> -->
374
+ 3. Testing
375
+ ```shell
376
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
377
+ ```
378
+
379
+ <!-- <summary>3. Tuning </summary> -->
380
+ 4. Tuning
381
+ ```shell
382
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
383
+ ```
384
+ </details>
385
+
386
+
387
+ <details>
388
+ <summary> Objects365 to COCO2017 </summary>
389
+
390
+ 1. Set Model
391
+ ```shell
392
+ export model=l # n s m l x
393
+ ```
394
+
395
+ 2. Training on Objects365
396
+ ```shell
397
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
398
+ ```
399
+
400
+ 3. Tuning on COCO2017
401
+ ```shell
402
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
403
+ ```
404
+
405
+ <!-- <summary>2. Testing </summary> -->
406
+ 4. Testing
407
+ ```shell
408
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
409
+ ```
410
+ </details>
411
+
412
+
413
+ <details>
414
+ <summary> Custom Dataset </summary>
415
+
416
+ 1. Set Model
417
+ ```shell
418
+ export model=l # n s m l x
419
+ ```
420
+
421
+ 2. Training on Custom Dataset
422
+ ```shell
423
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
424
+ ```
425
+ <!-- <summary>2. Testing </summary> -->
426
+ 3. Testing
427
+ ```shell
428
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
429
+ ```
430
+
431
+ 4. Tuning on Custom Dataset
432
+ ```shell
433
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
434
+ ```
435
+
436
+ 5. **[Optional]** Modify Class Mappings:
437
+
438
+ When using the Objects365 pre-trained weights to train on your custom dataset, the example assumes that your dataset only contains the classes `'Person'` and `'Car'`. For faster convergence, you can modify `self.obj365_ids` in `src/solver/_solver.py` as follows:
439
+
440
+
441
+ ```python
442
+ self.obj365_ids = [0, 5] # Person, Cars
443
+ ```
444
+ You can replace these with any corresponding classes from your dataset. The list of Objects365 classes with their corresponding IDs:
445
+ https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
446
+
447
+ New training command:
448
+
449
+ ```shell
450
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
451
+ ```
452
+
453
+ However, if you don't wish to modify the class mappings, the pre-trained Objects365 weights will still work without any changes. Modifying the class mappings is optional and can potentially accelerate convergence for specific tasks.
454
+
455
+
456
+
457
+ </details>
458
+
459
+ <details>
460
+ <summary> Customizing Batch Size </summary>
461
+
462
+ For example, if you want to double the total batch size when training D-FINE-L on COCO2017, here are the steps you should follow:
463
+
464
+ 1. **Modify your [dataloader.yml](./configs/dfine/include/dataloader.yml)** to increase the `total_batch_size`:
465
+
466
+ ```yaml
467
+ train_dataloader:
468
+ total_batch_size: 64 # Previously it was 32, now doubled
469
+ ```
470
+
471
+ 2. **Modify your [dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml)**. Here’s how the key parameters should be adjusted:
472
+
473
+ ```yaml
474
+ optimizer:
475
+ type: AdamW
476
+ params:
477
+ -
478
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
479
+ lr: 0.000025 # doubled, linear scaling law
480
+ -
481
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
482
+ weight_decay: 0.
483
+
484
+ lr: 0.0005 # doubled, linear scaling law
485
+ betas: [0.9, 0.999]
486
+ weight_decay: 0.0001 # need a grid search
487
+
488
+ ema: # added EMA settings
489
+ decay: 0.9998 # adjusted by 1 - (1 - decay) * 2
490
+ warmups: 500 # halved
491
+
492
+ lr_warmup_scheduler:
493
+ warmup_duration: 250 # halved
494
+ ```
495
+
496
+ </details>
497
+
498
+
499
+ <details>
500
+ <summary> Customizing Input Size </summary>
501
+
502
+ If you'd like to train **D-FINE-L** on COCO2017 with an input size of 320x320, follow these steps:
503
+
504
+ 1. **Modify your [dataloader.yml](./configs/dfine/include/dataloader.yml)**:
505
+
506
+ ```yaml
507
+
508
+ train_dataloader:
509
+ dataset:
510
+ transforms:
511
+ ops:
512
+ - {type: Resize, size: [320, 320], }
513
+ collate_fn:
514
+ base_size: 320
515
+ dataset:
516
+ transforms:
517
+ ops:
518
+ - {type: Resize, size: [320, 320], }
519
+ ```
520
+
521
+ 2. **Modify your [dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml)**:
522
+
523
+ ```yaml
524
+ eval_spatial_size: [320, 320]
525
+ ```
526
+
527
+ </details>
528
+
529
+ ## Tools
530
+ <details>
531
+ <summary> Deployment </summary>
532
+
533
+ <!-- <summary>4. Export onnx </summary> -->
534
+ 1. Setup
535
+ ```shell
536
+ pip install onnx onnxsim
537
+ export model=l # n s m l x
538
+ ```
539
+
540
+ 2. Export onnx
541
+ ```shell
542
+ python tools/deployment/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
543
+ ```
544
+
545
+ 3. Export [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
546
+ ```shell
547
+ trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
548
+ ```
549
+
550
+ </details>
551
+
552
+ <details>
553
+ <summary> Inference (Visualization) </summary>
554
+
555
+
556
+ 1. Setup
557
+ ```shell
558
+ pip install -r tools/inference/requirements.txt
559
+ export model=l # n s m l x
560
+ ```
561
+
562
+
563
+ <!-- <summary>5. Inference </summary> -->
564
+ 2. Inference (onnxruntime / tensorrt / torch)
565
+
566
+ Inference on images and videos is now supported.
567
+ ```shell
568
+ python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
569
+ python tools/inference/trt_inf.py --trt model.engine --input image.jpg
570
+ python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
571
+ ```
572
+ </details>
573
+
574
+ <details>
575
+ <summary> Benchmark </summary>
576
+
577
+ 1. Setup
578
+ ```shell
579
+ pip install -r tools/benchmark/requirements.txt
580
+ export model=l # n s m l x
581
+ ```
582
+
583
+ <!-- <summary>6. Benchmark </summary> -->
584
+ 2. Model FLOPs, MACs, and Params
585
+ ```shell
586
+ python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
587
+ ```
588
+
589
+ 2. TensorRT Latency
590
+ ```shell
591
+ python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
592
+ ```
593
+ </details>
594
+
595
+ <details>
596
+ <summary> Fiftyone Visualization </summary>
597
+
598
+ 1. Setup
599
+ ```shell
600
+ pip install fiftyone
601
+ export model=l # n s m l x
602
+ ```
603
+ 4. Voxel51 Fiftyone Visualization ([fiftyone](https://github.com/voxel51/fiftyone))
604
+ ```shell
605
+ python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
606
+ ```
607
+ </details>
608
+
609
+ <details>
610
+ <summary> Others </summary>
611
+
612
+ 1. Auto Resume Training
613
+ ```shell
614
+ bash reference/safe_training.sh
615
+ ```
616
+
617
+ 2. Converting Model Weights
618
+ ```shell
619
+ python reference/convert_weight.py model.pth
620
+ ```
621
+ </details>
622
+
623
+ ## Figures and Visualizations
624
+
625
+ <details>
626
+ <summary> FDR and GO-LSD </summary>
627
+
628
+ 1. Overview of D-FINE with FDR. The probability distributions that act as a more fine-
629
+ grained intermediate representation are iteratively refined by the decoder layers in a residual manner.
630
+ Non-uniform weighting functions are applied to allow for finer localization.
631
+
632
+ <p align="center">
633
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="Fine-grained Distribution Refinement Process" width="1000">
634
+ </p>
635
+
636
+ 2. Overview of GO-LSD process. Localization knowledge from the final layer’s refined
637
+ distributions is distilled into earlier layers through DDF loss with decoupled weighting strategies.
638
+
639
+ <p align="center">
640
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSD Process" width="1000">
641
+ </p>
642
+
643
+ </details>
644
+
645
+ <details open>
646
+ <summary> Distributions </summary>
647
+
648
+ Visualizations of FDR across detection scenarios with initial and refined bounding boxes, along with unweighted and weighted distributions.
649
+
650
+ <p align="center">
651
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
652
+ </p>
653
+
654
+ </details>
655
+
656
+ <details>
657
+ <summary> Hard Cases </summary>
658
+
659
+ The following visualization demonstrates D-FINE's predictions in various complex detection scenarios. These include cases with occlusion, low-light conditions, motion blur, depth of field effects, and densely populated scenes. Despite these challenges, D-FINE consistently produces accurate localization results.
660
+
661
+ <p align="center">
662
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="D-FINE Predictions in Challenging Scenarios" width="1000">
663
+ </p>
664
+
665
+ </details>
666
+
667
+
668
+ <!-- <div style="display: flex; flex-wrap: wrap; justify-content: center; margin: 0; padding: 0;">
669
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" style="width:99.96%; margin: 0; padding: 0;" />
670
+ </div>
671
+
672
+ <table><tr>
673
+ <td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
674
+ </tr></table> -->
675
+
676
+
677
+
678
+
679
+ ## Citation
680
+ If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:
681
+ <details open>
682
+ <summary> bibtex </summary>
683
+
684
+ ```latex
685
+ @misc{peng2024dfine,
686
+ title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
687
+ author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
688
+ year={2024},
689
+ eprint={2410.13842},
690
+ archivePrefix={arXiv},
691
+ primaryClass={cs.CV}
692
+ }
693
+ ```
694
+ </details>
695
+
696
+ ## Acknowledgement
697
+ Our work is built upon [RT-DETR](https://github.com/lyuwenyu/RT-DETR).
698
+ Thanks to the inspirations from [RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), and [YOLOv9](https://github.com/WongKinYiu/yolov9).
699
+
700
+ ✨ Feel free to contribute and reach out if you have any questions! ✨
D-FINE/README_cn.md ADDED
@@ -0,0 +1,673 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
2
+
3
+ [English](README.md) | 简体中文 | [日本語](README_ja.md) | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
4
+
5
+ <h2 align="center">
6
+ D-FINE: Redefine Regression Task of DETRs as Fine&#8209;grained&nbsp;Distribution&nbsp;Refinement
7
+ </h2>
8
+
9
+ <p align="center">
10
+ <!-- <a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
11
+ <img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
12
+ </a> -->
13
+ <a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
14
+ <img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
15
+ </a>
16
+ <a href="https://github.com/Peterande/D-FINE/pulls">
17
+ <img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
18
+ </a>
19
+ <a href="https://github.com/Peterande/D-FINE/issues">
20
+ <img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
21
+ </a>
22
+ <a href="https://arxiv.org/abs/2410.13842">
23
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
24
+ </a>
25
+ <!-- <a href="mailto: [email protected]">
26
+ <img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
27
+ </a> -->
28
+ <a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
29
+ <img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
30
+ </a>
31
+ <a href="https://github.com/Peterande/D-FINE">
32
+ <img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
33
+ </a>
34
+ </p>
35
+
36
+ <p align="center">
37
+ 📄 这是该文章的官方实现:
38
+ <br>
39
+ <a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
40
+ </p>
41
+
42
+
43
+ <p align="center">
44
+ 彭岩松,李和倍,吴沛熹,张越一,孙晓艳,吴枫
45
+ </p>
46
+
47
+ <p align="center">
48
+ 中国科学技术大学
49
+ </p>
50
+
51
+ <p align="center">
52
+ <a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
53
+ <img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
54
+ </a>
55
+ </p>
56
+
57
+
58
+ <p align="center"> <strong>如果你喜欢 D-FINE,请给我们一个 ⭐!你的支持激励我们不断前进!</strong> </p>
59
+
60
+ <p align="center">
61
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
62
+ </p>
63
+
64
+ D-FINE 是一个强大的实时目标检测器,将 DETR 中的边界框回归任务重新定义为了细粒度的分布优化(FDR),并引入全局最优的定位自蒸馏(GO-LSD),在不增加额外推理和训练成本的情况下,实现了卓越的性能。
65
+
66
+ <details open>
67
+ <summary> 视频 </summary>
68
+
69
+ 我们分别使用 D-FINE 和 YOLO11 对 [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A) 上的一段复杂街景视频进行了目标检测。尽管存在逆光、虚化模糊和密集遮挡等不利因素,D-FINE-X 依然成功检测出几乎所有目标,包括背包、自行车和信号灯等难以察觉的小目标,其置信度、以及模糊边缘的定位准确度明显高于 YOLO11x。
70
+
71
+ https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
72
+
73
+ </details>
74
+
75
+ ## 🚀 Updates
76
+ - [x] **\[2024.10.18\]** 发布 D-FINE 系列。
77
+ - [x] **\[2024.10.25\]** 添加了自定义数据集微调配置文件 ([#7](https://github.com/Peterande/D-FINE/issues/7))。
78
+ - [x] **\[2024.10.30\]** 更新 D-FINE-L (E25) 预训练模型,性能提升了 2.0%。
79
+ - [x] **\[2024.11.07\]** 发布 **D-FINE-N**, 在 COCO 上达到 42.8% AP<sup>val</sup> @ 472 FPS<sup>T4</sup>!
80
+
81
+ ## 模型库
82
+
83
+ ### COCO
84
+ | 模型 | 数据集 | AP<sup>val</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
85
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
86
+ **D&#8209;FINE&#8209;N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
87
+ **D&#8209;FINE&#8209;S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
88
+ **D&#8209;FINE&#8209;M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
89
+ **D&#8209;FINE&#8209;L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
90
+ **D&#8209;FINE&#8209;X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
91
+
92
+ ### Objects365+COCO
93
+ | 模型 | 数据集 | AP<sup>val</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
94
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
95
+ **D&#8209;FINE&#8209;S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
96
+ **D&#8209;FINE&#8209;M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
97
+ **D&#8209;FINE&#8209;L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
98
+ **D&#8209;FINE&#8209;X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
99
+
100
+ **我们强烈推荐您使用 Objects365 预训练模型进行微调:**
101
+
102
+ ⚠️ 重要提醒:通常这种预训练模型对复杂场景的理解非常有用。如果您的类别非常简单,请注意,这可能会导致过拟合和次优性能。
103
+
104
+ <details> <summary><strong> 🔥 Objects365 预训练模型(泛化性最好)</strong></summary>
105
+
106
+ | 模型 | 数据集 | AP<sup>val</sup> | AP<sup>5000</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
107
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
108
+ **D&#8209;FINE&#8209;S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
109
+ **D&#8209;FINE&#8209;M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
110
+ **D&#8209;FINE&#8209;L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
111
+ **D&#8209;FINE&#8209;L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
112
+ **D&#8209;FINE&#8209;X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
113
+ - **E25**: 重新训练,并将训练延长至 25 个 epoch。
114
+ - **AP<sup>val</sup>** 是在 *Objects365* 完整的验证集上进行评估的。
115
+ - **AP<sup>5000</sup>** 是在 *Objects365* 验证集的前5000个样本上评估的。
116
+ </details>
117
+
118
+ **注意:**
119
+ - **AP<sup>val</sup>** 是在 *MSCOCO val2017* 数据集上评估的。
120
+ - **时延** 是在单张 T4 GPU 上以 $batch\\_size = 1$, $fp16$, 和 $TensorRT==10.4.0$ 评估的。
121
+ - **Objects365+COCO** 表示使用在 *Objects365* 上预训练的权重在 *COCO* 上微调的模型。
122
+
123
+
124
+
125
+ ## 快速开始
126
+
127
+ ### 设置
128
+
129
+ ```shell
130
+ conda create -n dfine python=3.11.9
131
+ conda activate dfine
132
+ pip install -r requirements.txt
133
+ ```
134
+
135
+ </details>
136
+
137
+
138
+
139
+ ### 数据集准备
140
+
141
+
142
+ <details>
143
+
144
+ <summary> COCO2017 数据集 </summary>
145
+
146
+ 1. 从 [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) 或者 [COCO](https://cocodataset.org/#download) 下载 COCO2017。
147
+ 1.修改 [coco_detection.yml](./configs/dataset/coco_detection.yml) 中的路径。
148
+
149
+ ```yaml
150
+ train_dataloader:
151
+ img_folder: /data/COCO2017/train2017/
152
+ ann_file: /data/COCO2017/annotations/instances_train2017.json
153
+ val_dataloader:
154
+ img_folder: /data/COCO2017/val2017/
155
+ ann_file: /data/COCO2017/annotations/instances_val2017.json
156
+ ```
157
+
158
+ </details>
159
+
160
+ <details>
161
+ <summary> Objects365 数据集 </summary>
162
+
163
+ 1. 从 [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365) 下载 Objects365。
164
+
165
+ 2. 设置数据集的基础目录:
166
+ ```shell
167
+ export BASE_DIR=/data/Objects365/data
168
+ ```
169
+
170
+ 3. 解压并整理目录结构如下:
171
+
172
+ ```shell
173
+ ${BASE_DIR}/train
174
+ ├── images
175
+ │ ├── v1
176
+ │ │ ├── patch0
177
+ │ │ │ ├── 000000000.jpg
178
+ │ │ │ ├── 000000001.jpg
179
+ │ │ │ └── ... (more images)
180
+ │ ├── v2
181
+ │ │ ├── patchx
182
+ │ │ │ ├── 000000000.jpg
183
+ │ │ │ ├── 000000001.jpg
184
+ │ │ │ └── ... (more images)
185
+ ├── zhiyuan_objv2_train.json
186
+ ```
187
+
188
+ ```shell
189
+ ${BASE_DIR}/val
190
+ ├── images
191
+ │ ├── v1
192
+ │ │ ├── patch0
193
+ │ │ │ ├── 000000000.jpg
194
+ │ │ │ └── ... (more images)
195
+ │ ├── v2
196
+ │ │ ├── patchx
197
+ │ │ │ ├── 000000000.jpg
198
+ │ │ │ └── ... (more images)
199
+ ├── zhiyuan_objv2_val.json
200
+ ```
201
+
202
+
203
+ 4. 创建一个新目录来存储验证集中的图像:
204
+ ```shell
205
+ mkdir -p ${BASE_DIR}/train/images_from_val
206
+ ```
207
+
208
+ 5. 将 val 目录中的 v1 和 v2 文件夹复制到 train/images_from_val 目录中
209
+ ```shell
210
+ cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
211
+ cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
212
+ ```
213
+
214
+
215
+ 6. 运行 remap_obj365.py 将验证集中的部分样本合并到训练集中。具体来说,该脚本将索引在 5000 到 800000 之间的样本从验证集移动到训练集。
216
+ ```shell
217
+ python tools/remap_obj365.py --base_dir ${BASE_DIR}
218
+ ```
219
+
220
+
221
+ 7. 运行 resize_obj365.py 脚本,将数据集中任何最大边长超过 640 像素的图像进行大小调整。使用步骤 5 中生成的更新后的 JSON 文件处理样本数据。
222
+ ```shell
223
+ python tools/resize_obj365.py --base_dir ${BASE_DIR}
224
+ ```
225
+
226
+ 8. 修改 [obj365_detection.yml](./configs/dataset/obj365_detection.yml) 中的路径。
227
+
228
+ ```yaml
229
+ train_dataloader:
230
+ img_folder: /data/Objects365/data/train
231
+ ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
232
+ val_dataloader:
233
+ img_folder: /data/Objects365/data/val/
234
+ ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
235
+ ```
236
+
237
+
238
+ </details>
239
+
240
+ <details>
241
+ <summary>CrowdHuman</summary>
242
+
243
+ 在此下载 COCO 格式的数据集:[链接](https://aistudio.baidu.com/datasetdetail/231455)
244
+
245
+ </details>
246
+
247
+ <details>
248
+ <summary>自定义数据集</summary>
249
+
250
+ 要在你的自定义数据集上训练,你需要将其组织为 COCO 格式。请按照以下步骤准备你的数据集:
251
+
252
+ 1. **将 `remap_mscoco_category` 设置为 `False`:**
253
+
254
+ 这可以防止类别 ID 自动映射以匹配 MSCOCO 类别。
255
+
256
+ ```yaml
257
+ remap_mscoco_category: False
258
+ ```
259
+
260
+ 2. **组织图像:**
261
+
262
+ 按以下结构组织你的数据集目录:
263
+
264
+ ```shell
265
+ dataset/
266
+ ├── images/
267
+ │ ├── train/
268
+ │ │ ├── image1.jpg
269
+ │ │ ├── image2.jpg
270
+ │ │ └── ...
271
+ │ ├── val/
272
+ │ │ ├── image1.jpg
273
+ │ │ ├── image2.jpg
274
+ │ │ └── ...
275
+ └── annotations/
276
+ ├── instances_train.json
277
+ ├── instances_val.json
278
+ └── ...
279
+ ```
280
+
281
+ - **`images/train/`**: 包含所有训练图像。
282
+ - **`images/val/`**: 包含所有验证图像。
283
+ - **`annotations/`**: 包含 COCO 格式的注释文件。
284
+
285
+ 3. **将注释转换为 COCO 格式:**
286
+
287
+ 如果你的注释尚未为 COCO 格式,你需要进行转换。你可以参考以下 Python 脚本或使用现有工具:
288
+
289
+ ```python
290
+ import json
291
+
292
+ def convert_to_coco(input_annotations, output_annotations):
293
+ # Implement conversion logic here
294
+ pass
295
+
296
+ if __name__ == "__main__":
297
+ convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
298
+ ```
299
+
300
+ 4. **更新配置文件:**
301
+
302
+ 修改你的 [custom_detection.yml](./configs/dataset/custom_detection.yml)。
303
+
304
+ ```yaml
305
+ task: detection
306
+
307
+ evaluator:
308
+ type: CocoEvaluator
309
+ iou_types: ['bbox', ]
310
+
311
+ num_classes: 777 # your dataset classes
312
+ remap_mscoco_category: False
313
+
314
+ train_dataloader:
315
+ type: DataLoader
316
+ dataset:
317
+ type: CocoDetection
318
+ img_folder: /data/yourdataset/train
319
+ ann_file: /data/yourdataset/train/train.json
320
+ return_masks: False
321
+ transforms:
322
+ type: Compose
323
+ ops: ~
324
+ shuffle: True
325
+ num_workers: 4
326
+ drop_last: True
327
+ collate_fn:
328
+ type: BatchImageCollateFunction
329
+
330
+ val_dataloader:
331
+ type: DataLoader
332
+ dataset:
333
+ type: CocoDetection
334
+ img_folder: /data/yourdataset/val
335
+ ann_file: /data/yourdataset/val/ann.json
336
+ return_masks: False
337
+ transforms:
338
+ type: Compose
339
+ ops: ~
340
+ shuffle: False
341
+ num_workers: 4
342
+ drop_last: False
343
+ collate_fn:
344
+ type: BatchImageCollateFunction
345
+ ```
346
+ </details>
347
+
348
+
349
+ ## 使用方法
350
+ <details open>
351
+ <summary> COCO2017 </summary>
352
+
353
+ <!-- <summary>1. Training </summary> -->
354
+ 1. 设置模型
355
+ ```shell
356
+ export model=l # n s m l x
357
+ ```
358
+
359
+ 2. 训练
360
+ ```shell
361
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
362
+ ```
363
+
364
+ <!-- <summary>2. Testing </summary> -->
365
+ 3. 测试
366
+ ```shell
367
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
368
+ ```
369
+
370
+ <!-- <summary>3. Tuning </summary> -->
371
+ 4. 微调
372
+ ```shell
373
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
374
+ ```
375
+ </details>
376
+
377
+
378
+ <details>
379
+ <summary> 在 Objects365 上训练,在COCO2017上微调 </summary>
380
+
381
+ 1. 设置模型
382
+ ```shell
383
+ export model=l # n s m l x
384
+ ```
385
+
386
+ 2. 在 Objects365 上训练
387
+ ```shell
388
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
389
+ ```
390
+
391
+ 3. 在 COCO2017 上微调
392
+ ```shell
393
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
394
+ ```
395
+
396
+ <!-- <summary>2. Testing </summary> -->
397
+ 4. 测试
398
+ ```shell
399
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
400
+ ```
401
+ </details>
402
+
403
+
404
+ <details>
405
+ <summary> 自定义数据集 </summary>
406
+
407
+ 1. 设置模型
408
+ ```shell
409
+ export model=l # n s m l x
410
+ ```
411
+
412
+ 2. 在自定义数据集上训练
413
+ ```shell
414
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
415
+ ```
416
+ <!-- <summary>2. Testing </summary> -->
417
+ 3. 测试
418
+ ```shell
419
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
420
+ ```
421
+
422
+ 4. 在自定义数据集上微调
423
+ ```shell
424
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
425
+ ```
426
+
427
+ 5. **[可选项]** 修改类映射:
428
+
429
+ 在使用 Objects365 预训练权重训练自定义数据集时,示例中假设自定义数据集仅有 `'Person'` 和 `'Car'` 类,您可以将其替换为数据集中对应的任何类别。为了加快收敛,可以在 `src/solver/_solver.py` 中修改 `self.obj365_ids`,如下所示:
430
+
431
+ ```python
432
+ self.obj365_ids = [0, 5] # Person, Cars
433
+ ```
434
+ Objects365 类及其对应 ID 的完整列表:
435
+ https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
436
+
437
+ 新的训练启动命令:
438
+
439
+ ```shell
440
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
441
+ ```
442
+
443
+ 如果您不想修改类映射,预训练的 Objects365 权重依然可以不做任何更改直接使用。修改类映射是可选的,但针对特定任务可能会加快收敛速度。
444
+
445
+ </details>
446
+
447
+ <details>
448
+ <summary> 自定义批次大小 </summary>
449
+
450
+ 例如,如果你想在训练 D-FINE-L 时将 COCO2017 的总批次大小增加一倍,请按照以下步骤操作:
451
+
452
+ 1. **修改你的 [dataloader.yml](./configs/dfine/include/dataloader.yml)**,增加 `total_batch_size`:
453
+
454
+ ```yaml
455
+ train_dataloader:
456
+ total_batch_size: 64 # 原来是 32,现在增加了一倍
457
+ ```
458
+
459
+ 2. **修改你的 [dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml)**。
460
+
461
+ ```yaml
462
+ optimizer:
463
+ type: AdamW
464
+ params:
465
+ -
466
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
467
+ lr: 0.000025 # 翻倍,线性缩放原则
468
+ -
469
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
470
+ weight_decay: 0.
471
+
472
+ lr: 0.0005 # 翻倍,线性缩放原则
473
+ betas: [0.9, 0.999]
474
+ weight_decay: 0.0001 # 需要网格搜索找到最优值
475
+
476
+ ema: # 添加 EMA 设置
477
+ decay: 0.9998 # 根据 1 - (1 - decay) * 2 调整
478
+ warmups: 500 # 减半
479
+
480
+ lr_warmup_scheduler:
481
+ warmup_duration: 250 # 减半
482
+ ```
483
+
484
+ </details>
485
+
486
+ <details>
487
+ <summary> 自定义输入尺寸 </summary>
488
+
489
+ 如果你想在 COCO2017 上使用 **D-FINE-L** 进行 320x320 尺寸的图片训练,按照以下步骤操作:
490
+
491
+ 1. **修改你的 [dataloader.yml](./configs/dfine/include/dataloader.yml)**:
492
+
493
+ ```yaml
494
+
495
+ train_dataloader:
496
+ dataset:
497
+ transforms:
498
+ ops:
499
+ - {type: Resize, size: [320, 320], }
500
+ collate_fn:
501
+ base_size: 320
502
+ dataset:
503
+ transforms:
504
+ ops:
505
+ - {type: Resize, size: [320, 320], }
506
+ ```
507
+
508
+ 2. **修改你的 [dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml)**:
509
+
510
+ ```yaml
511
+ eval_spatial_size: [320, 320]
512
+ ```
513
+
514
+ </details>
515
+
516
+
517
+ ## 工具
518
+
519
+ <details>
520
+ <summary> 部署 </summary>
521
+
522
+ <!-- <summary>4. Export onnx </summary> -->
523
+ 1. 设置
524
+ ```shell
525
+ pip install onnx onnxsim onnxruntime
526
+ export model=l # n s m l x
527
+ ```
528
+
529
+ 2. 导出 onnx
530
+ ```shell
531
+ python tools/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
532
+ ```
533
+
534
+ 3. 导出 [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
535
+ ```shell
536
+ trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
537
+ ```
538
+
539
+ </details>
540
+
541
+ <details>
542
+ <summary> 推理(可视化) </summary>
543
+
544
+
545
+ 1. 设置
546
+ ```shell
547
+ pip install -r tools/inference/requirements.txt
548
+ export model=l # n s m l x
549
+ ```
550
+
551
+
552
+ <!-- <summary>5. Inference </summary> -->
553
+ 2. 推理 (onnxruntime / tensorrt / torch)
554
+
555
+ 目前支持对图像和视频的推理。
556
+ ```shell
557
+ python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
558
+ python tools/inference/trt_inf.py --trt model.engine --input image.jpg
559
+ python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
560
+ ```
561
+ </details>
562
+
563
+ <details>
564
+ <summary> 基准测试 </summary>
565
+
566
+ 1. 设置
567
+ ```shell
568
+ pip install -r tools/benchmark/requirements.txt
569
+ export model=l # n s m l x
570
+ ```
571
+
572
+ <!-- <summary>6. Benchmark </summary> -->
573
+ 2. 模型 FLOPs、MACs、参数量
574
+ ```shell
575
+ python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
576
+ ```
577
+
578
+ 2. TensorRT 延迟
579
+ ```shell
580
+ python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
581
+ ```
582
+ </details>
583
+
584
+ <details>
585
+ <summary> Voxel51 Fiftyone 可视化 </summary>
586
+
587
+ 1. 设置
588
+ ```shell
589
+ pip install fiftyone
590
+ export model=l # n s m l x
591
+ ```
592
+ 4. Voxel51 Fiftyone 可视化 ([fiftyone](https://github.com/voxel51/fiftyone))
593
+ ```shell
594
+ python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
595
+ ```
596
+ </details>
597
+
598
+ <details>
599
+ <summary> 其他 </summary>
600
+
601
+ 1. 自动恢复(Auto Resume)训练
602
+ ```shell
603
+ bash reference/safe_training.sh
604
+ ```
605
+
606
+ 2. 模型权重转换
607
+ ```shell
608
+ python reference/convert_weight.py model.pth
609
+ ```
610
+ </details>
611
+
612
+ ## 图表与可视化
613
+
614
+ <details>
615
+ <summary> FDR 和 GO-LSD </summary>
616
+
617
+ D-FINE与FDR概览。概率分布作为更细粒度的中间表征,通过解码器层以残差方式进行迭代优化。应用非均匀加权函数以实现更精细的定位。
618
+ <p align="center">
619
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="细粒度分布优化过程" width="1000"> </p>
620
+ GO-LSD流程概览。通过DDF损失函数和解耦加权策略将最终层分布中的定位知识蒸馏到前面的层中。
621
+ <p align="center"> <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSD流程" width="1000"> </p>
622
+
623
+ </details>
624
+
625
+ <details open>
626
+ <summary> 分布可视化 </summary>
627
+
628
+ FDR在检测场景中的可视化,包括初始和优化后的边界框,以及未加权和加权的分布图。
629
+
630
+ <p align="center">
631
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
632
+ </p>
633
+
634
+ </details>
635
+
636
+ <details>
637
+ <summary> 困难场景 </summary>
638
+
639
+ 以下可视化展示了D-FINE在各种复杂检测场景中的预测结果。这些场景包括遮挡、低光条件、运动模糊、景深效果和密集场景。尽管面临这些挑战,D-FINE依然能够生成准确的定位结果。
640
+
641
+ <p align="center">
642
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="D-FINE在挑战性场景中的预测" width="1000">
643
+ </p>
644
+
645
+ </details>
646
+
647
+
648
+ <!-- <table><tr>
649
+ <td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
650
+ </tr></table> -->
651
+
652
+ ## 引用
653
+ 如果你在工作中使用了 `D-FINE` 或其方法,请引用以下 BibTeX 条目:
654
+ <details open>
655
+ <summary> bibtex </summary>
656
+
657
+ ```latex
658
+ @misc{peng2024dfine,
659
+ title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
660
+ author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
661
+ year={2024},
662
+ eprint={2410.13842},
663
+ archivePrefix={arXiv},
664
+ primaryClass={cs.CV}
665
+ }
666
+ ```
667
+ </details>
668
+
669
+ ## 致谢
670
+ 我们的工作基于 [RT-DETR](https://github.com/lyuwenyu/RT-DETR)。
671
+ 感谢 [RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), 和 [YOLOv9](https://github.com/WongKinYiu/yolov9) 的启发。
672
+
673
+ ✨ 欢迎贡献并在有任何问题时联系我! ✨
D-FINE/README_ja.md ADDED
@@ -0,0 +1,698 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--# [D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
2
+
3
+ [English](README.md) | [简体中文](README_cn.md) | 日本語 | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
4
+
5
+ <h2 align="center">
6
+ D-FINE: Redefine Regression Task of DETRs as Fine&#8209;grained&nbsp;Distribution&nbsp;Refinement
7
+ </h2>
8
+
9
+
10
+
11
+ <p align="center">
12
+ <a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
13
+ <img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
14
+ </a>
15
+ <a href="https://github.com/Peterande/D-FINE/pulls">
16
+ <img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
17
+ </a>
18
+ <a href="https://github.com/Peterande/D-FINE/issues">
19
+ <img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
20
+ </a>
21
+ <a href="https://arxiv.org/abs/2410.13842">
22
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
23
+ </a>
24
+ <!-- <a href="mailto: [email protected]">
25
+ <img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
26
+ </a> -->
27
+ <a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
28
+ <img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
29
+ </a>
30
+ <a href="https://github.com/Peterande/D-FINE">
31
+ <img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
32
+ </a>
33
+ </p>
34
+
35
+
36
+
37
+ <p align="center">
38
+ 📄 これは論文の公式実装です:
39
+ <br>
40
+ <a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
41
+ </p>
42
+ <p align="center">
43
+ D-FINE: DETRの回帰タスクを細粒度分布最適化として再定義
44
+ </p>
45
+
46
+
47
+
48
+ <p align="center">
49
+ Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, and Feng Wu
50
+ </p>
51
+
52
+ <p align="center">
53
+ 中国科学技術大学
54
+ </p>
55
+
56
+ <p align="center">
57
+ <a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
58
+ <img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
59
+ </a>
60
+ </p>
61
+
62
+ <!-- <table><tr>
63
+ <td><img src=https://github.com/Peterande/storage/blob/master/latency.png border=0 width=333></td>
64
+ <td><img src=https://github.com/Peterande/storage/blob/master/params.png border=0 width=333></td>
65
+ <td><img src=https://github.com/Peterande/storage/blob/master/flops.png border=0 width=333></td>
66
+ </tr></table> -->
67
+
68
+ <p align="center">
69
+ <strong>もしD-FINEが気に入ったら、ぜひ⭐をください!あなたのサポートが私たちのモチベーションになります!</strong>
70
+ </p>
71
+
72
+ <p align="center">
73
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
74
+ </p>
75
+
76
+ D-FINEは、DETRの境界ボックス回帰タスクを細粒度分布最適化(FDR)として再定義し、グローバル最適な位置特定自己蒸留(GO-LSD)を導入することで、追加の推論およびトレーニングコストを増やすことなく、優れたパフォーマンスを実現する強力なリアルタイムオブジェクト検出器です。
77
+
78
+ <details open>
79
+ <summary> ビデオ </summary>
80
+
81
+ D-FINEとYOLO11を使用して、[YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A)の複雑な街並みのビデオでオブジェクト検出を行いました。逆光、モーションブラー、密集した群衆などの厳しい条件にもかかわらず、D-FINE-Xはほぼすべてのターゲットを検出し、バックパック、自転車、信号機などの微妙な小さなオブジェクトも含まれます。その信頼スコアとぼやけたエッジの位置特定精度はYOLO11よりもはるかに高いです。
82
+
83
+ <!-- We use D-FINE and YOLO11 on a street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenges like backlighting, motion blur, and dense crowds, D-FINE-X outperforms YOLO11x, detecting more objects with higher confidence and better precision. -->
84
+
85
+ https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
86
+
87
+ </details>
88
+
89
+ ## 🚀 更新情報
90
+ - [x] **\[2024.10.18\]** D-FINEシリーズをリリース。
91
+ - [x] **\[2024.10.25\]** カスタムデータセットの微調整設定を追加 ([#7](https://github.com/Peterande/D-FINE/issues/7))。
92
+ - [x] **\[2024.10.30\]** D-FINE-L (E25) 事前トレーニングモデルを更新し、パフォーマンスが2.0%向上。
93
+ - [x] **\[2024.11.07\]** **D-FINE-N** をリリース, COCO で 42.8% の AP<sup>val</sup> を達成 @ 472 FPS<sup>T4</sup>!
94
+
95
+ ## モデルズー
96
+
97
+ ### COCO
98
+ | モデル | データセット | AP<sup>val</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
99
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
100
+ **D&#8209;FINE&#8209;N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
101
+ **D&#8209;FINE&#8209;S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
102
+ **D&#8209;FINE&#8209;M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
103
+ **D&#8209;FINE&#8209;L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
104
+ **D&#8209;FINE&#8209;X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
105
+
106
+
107
+ ### Objects365+COCO
108
+ | モデル | データセット | AP<sup>val</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
109
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
110
+ **D&#8209;FINE&#8209;S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
111
+ **D&#8209;FINE&#8209;M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
112
+ **D&#8209;FINE&#8209;L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
113
+ **D&#8209;FINE&#8209;X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
114
+
115
+ **微調整のために Objects365 の事前学習モデルを使用することを強くお勧めします:**
116
+
117
+ ⚠️ 重要なお知らせ:このプリトレインモデルは複雑なシーンの理解に有益ですが、カテゴリが非常に単純な場合、過学習や最適ではない性能につながる可能性がありますので、ご注意ください。
118
+
119
+ <details> <summary><strong> 🔥 Objects365で事前トレーニングされたモデル(最良の汎化性能)</strong></summary>
120
+
121
+
122
+ | モデル | データセット | AP<sup>val</sup> | AP<sup>5000</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
123
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
124
+ **D&#8209;FINE&#8209;S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
125
+ **D&#8209;FINE&#8209;M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
126
+ **D&#8209;FINE&#8209;L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
127
+ **D&#8209;FINE&#8209;L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
128
+ **D&#8209;FINE&#8209;X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
129
+ - **E25**: 再トレーニングし、事前トレーニングを25エポックに延長。
130
+ - **AP<sup>val</sup>** は *Objects365* のフルバリデーションセットで評価されます。
131
+ - **AP<sup>5000</sup>** は *Objects365* 検証セットの最初の5000サンプルで評価されます。
132
+ </details>
133
+
134
+ **注意事項:**
135
+ - **AP<sup>val</sup>** は *MSCOCO val2017* データセットで評価されます。
136
+ - **レイテンシ** は単一のT4 GPUで $batch\\_size = 1$, $fp16$, および $TensorRT==10.4.0$ で評価されます。
137
+ - **Objects365+COCO** は *Objects365* で事前トレーニングされた重みを使用して *COCO* で微調整されたモデルを意味します。
138
+
139
+
140
+
141
+ ## クイックスタート
142
+
143
+ ### セットアップ
144
+
145
+ ```shell
146
+ conda create -n dfine python=3.11.9
147
+ conda activate dfine
148
+ pip install -r requirements.txt
149
+ ```
150
+
151
+
152
+ ### データ準備
153
+
154
+ <details>
155
+ <summary> COCO2017 データセット </summary>
156
+
157
+ 1. [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) または [COCO](https://cocodataset.org/#download) からCOCO2017をダウンロードします。
158
+ 1. [coco_detection.yml](./configs/dataset/coco_detection.yml) のパスを修正します。
159
+
160
+ ```yaml
161
+ train_dataloader:
162
+ img_folder: /data/COCO2017/train2017/
163
+ ann_file: /data/COCO2017/annotations/instances_train2017.json
164
+ val_dataloader:
165
+ img_folder: /data/COCO2017/val2017/
166
+ ann_file: /data/COCO2017/annotations/instances_val2017.json
167
+ ```
168
+
169
+ </details>
170
+
171
+ <details>
172
+ <summary> Objects365 データセット </summary>
173
+
174
+ 1. [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365) からObjects365をダウンロードします。
175
+
176
+ 2. ベースディレクトリを設定します:
177
+ ```shell
178
+ export BASE_DIR=/data/Objects365/data
179
+ ```
180
+
181
+ 3. ダウンロードしたファイルを解凍し、以下のディレクトリ構造に整理します:
182
+
183
+ ```shell
184
+ ${BASE_DIR}/train
185
+ ├── images
186
+ │ ├── v1
187
+ │ │ ├── patch0
188
+ │ │ │ ├── 000000000.jpg
189
+ │ │ │ ├── 000000001.jpg
190
+ │ │ │ └── ... (more images)
191
+ │ ├── v2
192
+ │ │ ├── patchx
193
+ │ │ │ ├── 000000000.jpg
194
+ │ │ │ ├── 000000001.jpg
195
+ │ │ │ └── ... (more images)
196
+ ├── zhiyuan_objv2_train.json
197
+ ```
198
+
199
+ ```shell
200
+ ${BASE_DIR}/val
201
+ ├── images
202
+ │ ├── v1
203
+ │ │ ├── patch0
204
+ │ │ │ ├── 000000000.jpg
205
+ │ │ │ └── ... (more images)
206
+ │ ├── v2
207
+ │ │ ├── patchx
208
+ │ │ │ ├── 000000000.jpg
209
+ │ │ │ └── ... (more images)
210
+ ├── zhiyuan_objv2_val.json
211
+ ```
212
+
213
+ 4. 検証セットの画像を保存する新しいディレクトリを作成します:
214
+ ```shell
215
+ mkdir -p ${BASE_DIR}/train/images_from_val
216
+ ```
217
+
218
+ 5. valディレクトリのv1およびv2フォルダをtrain/images_from_valディレクトリにコピーします
219
+ ```shell
220
+ cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
221
+ cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
222
+ ```
223
+
224
+ 6. remap_obj365.pyを実行して、検証セットの一部をトレーニングセットにマージします。具体的には、このスクリプトはインデックスが5000から800000のサンプルを検証セットからトレーニングセットに移動します。
225
+ ```shell
226
+ python tools/remap_obj365.py --base_dir ${BASE_DIR}
227
+ ```
228
+
229
+
230
+ 7. resize_obj365.pyスクリプトを実行して、データセット内の最大エッジ長が640ピクセルを超える画像をリサイズします。ステップ5で生成された更新されたJSONファイルを使用してサンプルデータを処理します。トレーニングセットと検証セットの両方の画像をリサイズして、一貫性を保ちます。
231
+ ```shell
232
+ python tools/resize_obj365.py --base_dir ${BASE_DIR}
233
+ ```
234
+
235
+ 8. [obj365_detection.yml](./configs/dataset/obj365_detection.yml) のパスを修正します。
236
+
237
+ ```yaml
238
+ train_dataloader:
239
+ img_folder: /data/Objects365/data/train
240
+ ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
241
+ val_dataloader:
242
+ img_folder: /data/Objects365/data/val/
243
+ ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
244
+ ```
245
+
246
+
247
+ </details>
248
+
249
+ <details>
250
+ <summary>CrowdHuman</summary>
251
+
252
+ こちらからCOCOフォーマットのデータセットをダウンロードしてください:[リンク](https://aistudio.baidu.com/datasetdetail/231455)
253
+
254
+ </details>
255
+
256
+ <details>
257
+ <summary>カスタムデータセット</summary>
258
+
259
+ カスタムデータセットでトレーニングするには、COCO形式で整理する必要があります。以下の手順に従ってデータセットを準備してください:
260
+
261
+ 1. **`remap_mscoco_category` を `False` に設定します**:
262
+
263
+ これにより、カテゴリIDがMSCOCOカテゴリに自動的にマッピングされるのを防ぎます。
264
+
265
+ ```yaml
266
+ remap_mscoco_category: False
267
+ ```
268
+
269
+ 2. **画像を整理します**:
270
+
271
+ データセットディレクトリを以下のように構造化します:
272
+
273
+ ```shell
274
+ dataset/
275
+ ├── images/
276
+ │ ├── train/
277
+ │ │ ├── image1.jpg
278
+ │ │ ├── image2.jpg
279
+ │ │ └── ...
280
+ │ ├── val/
281
+ │ │ ├── image1.jpg
282
+ │ │ ├── image2.jpg
283
+ │ │ └── ...
284
+ └── annotations/
285
+ ├── instances_train.json
286
+ ├── instances_val.json
287
+ └── ...
288
+ ```
289
+
290
+ - **`images/train/`**: すべてのトレーニング画像を含みます。
291
+ - **`images/val/`**: すべての検証画像を含みます。
292
+ - **`annotations/`**: COCO形式の注釈ファイルを含みます。
293
+
294
+ 3. **注釈をCOCO形式に変換します**:
295
+
296
+ 注釈がまだCOCO形式でない場合は、変換する必要があります。以下のPythonスクリプトを参考にするか、既存のツールを利用してください:
297
+
298
+ ```python
299
+ import json
300
+
301
+ def convert_to_coco(input_annotations, output_annotations):
302
+ # 変換ロジックをここに実装します
303
+ pass
304
+
305
+ if __name__ == "__main__":
306
+ convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
307
+ ```
308
+
309
+ 4. **設定ファイルを更新します**:
310
+
311
+ [custom_detection.yml](./configs/dataset/custom_detection.yml) を修正します。
312
+
313
+ ```yaml
314
+ task: detection
315
+
316
+ evaluator:
317
+ type: CocoEvaluator
318
+ iou_types: ['bbox', ]
319
+
320
+ num_classes: 777 # データセットのクラス数
321
+ remap_mscoco_category: False
322
+
323
+ train_dataloader:
324
+ type: DataLoader
325
+ dataset:
326
+ type: CocoDetection
327
+ img_folder: /data/yourdataset/train
328
+ ann_file: /data/yourdataset/train/train.json
329
+ return_masks: False
330
+ transforms:
331
+ type: Compose
332
+ ops: ~
333
+ shuffle: True
334
+ num_workers: 4
335
+ drop_last: True
336
+ collate_fn:
337
+ type: BatchImageCollateFunction
338
+
339
+ val_dataloader:
340
+ type: DataLoader
341
+ dataset:
342
+ type: CocoDetection
343
+ img_folder: /data/yourdataset/val
344
+ ann_file: /data/yourdataset/val/ann.json
345
+ return_masks: False
346
+ transforms:
347
+ type: Compose
348
+ ops: ~
349
+ shuffle: False
350
+ num_workers: 4
351
+ drop_last: False
352
+ collate_fn:
353
+ type: BatchImageCollateFunction
354
+ ```
355
+
356
+ </details>
357
+
358
+
359
+ ## 使用方法
360
+ <details open>
361
+ <summary> COCO2017 </summary>
362
+
363
+ <!-- <summary>1. トレーニング </summary> -->
364
+ 1. モデルを設定します
365
+ ```shell
366
+ export model=l # n s m l x
367
+ ```
368
+
369
+ 2. トレーニング
370
+ ```shell
371
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
372
+ ```
373
+
374
+ <!-- <summary>2. テスト </summary> -->
375
+ 3. テスト
376
+ ```shell
377
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
378
+ ```
379
+
380
+ <!-- <summary>3. 微調整 </summary> -->
381
+ 4. 微調整
382
+ ```shell
383
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
384
+ ```
385
+ </details>
386
+
387
+
388
+ <details>
389
+ <summary> Objects365からCOCO2017へ </summary>
390
+
391
+ 1. モデルを設定します
392
+ ```shell
393
+ export model=l # n s m l x
394
+ ```
395
+
396
+ 2. Objects365でトレーニング
397
+ ```shell
398
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
399
+ ```
400
+
401
+ 3. COCO2017で微調整
402
+ ```shell
403
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
404
+ ```
405
+
406
+ <!-- <summary>2. テスト </summary> -->
407
+ 4. テスト
408
+ ```shell
409
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
410
+ ```
411
+ </details>
412
+
413
+
414
+ <details>
415
+ <summary> カス���ムデータセット </summary>
416
+
417
+ 1. モデルを設定します
418
+ ```shell
419
+ export model=l # n s m l x
420
+ ```
421
+
422
+ 2. カスタムデータセットでトレーニング
423
+ ```shell
424
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
425
+ ```
426
+ <!-- <summary>2. テスト </summary> -->
427
+ 3. テスト
428
+ ```shell
429
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
430
+ ```
431
+
432
+ 4. カスタムデータセットで微調整
433
+ ```shell
434
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
435
+ ```
436
+
437
+ 5. **[オプション]** クラスマッピングを変更します:
438
+
439
+ Objects365の事前トレーニング済みの重みを使用してカスタムデータセットでトレーニングする場合、例ではデータセットに `'Person'` と `'Car'` クラスのみが含まれていると仮定しています。特定のタスクに対して収束を早めるために、`src/solver/_solver.py` の `self.obj365_ids` を以下のように変更できます:
440
+
441
+ ```python
442
+ self.obj365_ids = [0, 5] # Person, Cars
443
+ ```
444
+ これらをデータセットの対応するクラスに置き換えることができます。Objects365クラスとその対応IDのリスト:
445
+ https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
446
+
447
+ 新しいトレーニングコマンド:
448
+
449
+ ```shell
450
+ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
451
+ ```
452
+
453
+ ただし、クラスマッピングを変更したくない場合、事前トレーニング済みのObjects365の重みは変更なしでそのまま使用できます。クラスマッピングの変更はオプションであり、特定のタスクに対して収束を早める可能性があります。
454
+
455
+
456
+
457
+ </details>
458
+
459
+ <details>
460
+ <summary> バッチサイズのカスタマイズ </summary>
461
+
462
+ 例えば、COCO2017でD-FINE-Lをトレーニングする際にバッチサイズを2倍にしたい場合、以下の手順に従ってください:
463
+
464
+ 1. **[dataloader.yml](./configs/dfine/include/dataloader.yml) を修正して `total_batch_size` を増やします**:
465
+
466
+ ```yaml
467
+ train_dataloader:
468
+ total_batch_size: 64 # 以前は32、今は2倍
469
+ ```
470
+
471
+ 2. **[dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) を修正します**。以下のように主要なパラメータを調整します:
472
+
473
+ ```yaml
474
+ optimizer:
475
+ type: AdamW
476
+ params:
477
+ -
478
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
479
+ lr: 0.000025 # 2倍、線形スケーリング法則
480
+ -
481
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
482
+ weight_decay: 0.
483
+
484
+ lr: 0.0005 # 2倍、線形スケーリング法則
485
+ betas: [0.9, 0.999]
486
+ weight_decay: 0.0001 # グリッドサーチが必要です
487
+
488
+ ema: # EMA設定を追加
489
+ decay: 0.9998 # 1 - (1 - decay) * 2 によって調整
490
+ warmups: 500 # 半分
491
+
492
+ lr_warmup_scheduler:
493
+ warmup_duration: 250 # 半分
494
+ ```
495
+
496
+ </details>
497
+
498
+
499
+ <details>
500
+ <summary> 入力サイズのカスタマイズ </summary>
501
+
502
+ COCO2017で **D-FINE-L** を320x320の入力サイズでトレーニングしたい場合、以下の手順に従ってください:
503
+
504
+ 1. **[dataloader.yml](./configs/dfine/include/dataloader.yml) を修正します**:
505
+
506
+ ```yaml
507
+
508
+ train_dataloader:
509
+ dataset:
510
+ transforms:
511
+ ops:
512
+ - {type: Resize, size: [320, 320], }
513
+ collate_fn:
514
+ base_size: 320
515
+ dataset:
516
+ transforms:
517
+ ops:
518
+ - {type: Resize, size: [320, 320], }
519
+ ```
520
+
521
+ 2. **[dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml) を修正します**:
522
+
523
+ ```yaml
524
+ eval_spatial_size: [320, 320]
525
+ ```
526
+
527
+ </details>
528
+
529
+ ## ツール
530
+ <details>
531
+ <summary> デプロイ </summary>
532
+
533
+ <!-- <summary>4. onnxのエクスポート </summary> -->
534
+ 1. セットアップ
535
+ ```shell
536
+ pip install onnx onnxsim
537
+ export model=l # n s m l x
538
+ ```
539
+
540
+ 2. onnxのエクスポート
541
+ ```shell
542
+ python tools/deployment/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
543
+ ```
544
+
545
+ 3. [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) のエクスポート
546
+ ```shell
547
+ trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
548
+ ```
549
+
550
+ </details>
551
+
552
+ <details>
553
+ <summary> 推論(可視化) </summary>
554
+
555
+
556
+ 1. セットアップ
557
+ ```shell
558
+ pip install -r tools/inference/requirements.txt
559
+ export model=l # n s m l x
560
+ ```
561
+
562
+
563
+ <!-- <summary>5. 推論 </summary> -->
564
+ 2. 推論 (onnxruntime / tensorrt / torch)
565
+
566
+ 現在、画像とビデオの推論がサポートされています。
567
+ ```shell
568
+ python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
569
+ python tools/inference/trt_inf.py --trt model.engine --input image.jpg
570
+ python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
571
+ ```
572
+ </details>
573
+
574
+ <details>
575
+ <summary> ベンチマーク </summary>
576
+
577
+ 1. セットアップ
578
+ ```shell
579
+ pip install -r tools/benchmark/requirements.txt
580
+ export model=l # n s m l x
581
+ ```
582
+
583
+ <!-- <summary>6. ベンチマーク </summary> -->
584
+ 2. モデルのFLOPs、MACs、およびパラメータ数
585
+ ```shell
586
+ python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
587
+ ```
588
+
589
+ 2. TensorRTのレイテンシ
590
+ ```shell
591
+ python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
592
+ ```
593
+ </details>
594
+
595
+ <details>
596
+ <summary> Fiftyoneの可視化 </summary>
597
+
598
+ 1. セットアップ
599
+ ```shell
600
+ pip install fiftyone
601
+ export model=l # n s m l x
602
+ ```
603
+ 4. Voxel51 Fiftyoneの可視化 ([fiftyone](https://github.com/voxel51/fiftyone))
604
+ ```shell
605
+ python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
606
+ ```
607
+ </details>
608
+
609
+ <details>
610
+ <summary> その他 </summary>
611
+
612
+ 1. 自動再開トレーニング
613
+ ```shell
614
+ bash reference/safe_training.sh
615
+ ```
616
+
617
+ 2. モデルの重みの変換
618
+ ```shell
619
+ python reference/convert_weight.py model.pth
620
+ ```
621
+ </details>
622
+
623
+ ## 図と可視化
624
+
625
+ <details>
626
+ <summary> FDRとGO-LSD </summary>
627
+
628
+ 1. FDRを搭載したD-FINEの概要。より細粒度の中間表現として機能する確率分布は、残差的にデコーダ層によって逐次最適化されます。
629
+ 不均一な重み付け関数が適用され、より細かい位置特定が可能になります。
630
+
631
+ <p align="center">
632
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="細粒度分布最適化プロセス" width="1000">
633
+ </p>
634
+
635
+ 2. GO-LSDプロセスの概要。最終層の最適化された分布からの位置特定知識は、デカップリングされた重み付け戦略を使用してDDF損失を通じて前の層に蒸留されます。
636
+
637
+ <p align="center">
638
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSDプロセス" width="1000">
639
+ </p>
640
+
641
+ </details>
642
+
643
+ <details open>
644
+ <summary> 分布 </summary>
645
+
646
+ 初期および最適化された境界ボックスと、未重み付けおよび重み付けされた分布とともに、さまざまな検出シナリオにおけるFDRの可視化。
647
+
648
+ <p align="center">
649
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
650
+ </p>
651
+
652
+ </details>
653
+
654
+ <details>
655
+ <summary> 難しいケース </summary>
656
+
657
+ 以下の可視化は、さまざまな複雑な検出シナリオにおけるD-FINEの予測を示しています。これらのシナリオには、遮蔽、低光条件、モーションブラー、被写界深度効果、および密集したシーンが含まれます。これらの課題にもかかわらず、D-FINEは一貫して正確な位置特定結果を生成します。
658
+
659
+ <p align="center">
660
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="複雑なシナリオにおけるD-FINEの予測" width="1000">
661
+ </p>
662
+
663
+ </details>
664
+
665
+
666
+ <!-- <div style="display: flex; flex-wrap: wrap; justify-content: center; margin: 0; padding: 0;">
667
+ <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" style="width:99.96%; margin: 0; padding: 0;" />
668
+ </div>
669
+
670
+ <table><tr>
671
+ <td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
672
+ </tr></table> -->
673
+
674
+
675
+
676
+
677
+ ## 引用
678
+ もし`D-FINE`やその方法をあなたの仕事で使用する場合、以下のBibTeXエントリを引用してください:
679
+ <details open>
680
+ <summary> bibtex </summary>
681
+
682
+ ```latex
683
+ @misc{peng2024dfine,
684
+ title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
685
+ author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
686
+ year={2024},
687
+ eprint={2410.13842},
688
+ archivePrefix={arXiv},
689
+ primaryClass={cs.CV}
690
+ }
691
+ ```
692
+ </details>
693
+
694
+ ## 謝辞
695
+ 私たちの仕事は [RT-DETR](https://github.com/lyuwenyu/RT-DETR) に基づいています。
696
+ [RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), および [YOLOv9](https://github.com/WongKinYiu/yolov9) からのインスピレーションに感謝します。
697
+
698
+ ✨ 貢献を歓迎し、質問があればお気軽にお問い合わせください! ✨
D-FINE/configs/dataset/coco_detection.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ evaluator:
4
+ type: CocoEvaluator
5
+ iou_types: ['bbox', ]
6
+
7
+ num_classes: 80
8
+ remap_mscoco_category: True
9
+
10
+ train_dataloader:
11
+ type: DataLoader
12
+ dataset:
13
+ type: CocoDetection
14
+ img_folder: /data/COCO2017/train2017/
15
+ ann_file: /data/COCO2017/annotations/instances_train2017.json
16
+ return_masks: False
17
+ transforms:
18
+ type: Compose
19
+ ops: ~
20
+ shuffle: True
21
+ num_workers: 4
22
+ drop_last: True
23
+ collate_fn:
24
+ type: BatchImageCollateFunction
25
+
26
+
27
+ val_dataloader:
28
+ type: DataLoader
29
+ dataset:
30
+ type: CocoDetection
31
+ img_folder: /data/COCO2017/val2017/
32
+ ann_file: /data/COCO2017/annotations/instances_val2017.json
33
+ return_masks: False
34
+ transforms:
35
+ type: Compose
36
+ ops: ~
37
+ shuffle: False
38
+ num_workers: 4
39
+ drop_last: False
40
+ collate_fn:
41
+ type: BatchImageCollateFunction
D-FINE/configs/dataset/crowdhuman_detection.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ evaluator:
4
+ type: CocoEvaluator
5
+ iou_types: ['bbox', ]
6
+
7
+ num_classes: 1 # your dataset classes
8
+ remap_mscoco_category: False
9
+
10
+ train_dataloader:
11
+ type: DataLoader
12
+ dataset:
13
+ type: CocoDetection
14
+ img_folder: /data/CrowdHuman/coco/CrowdHuman_train
15
+ ann_file: /data/CrowdHuman/coco/Chuman-train.json
16
+ return_masks: False
17
+ transforms:
18
+ type: Compose
19
+ ops: ~
20
+ shuffle: True
21
+ num_workers: 4
22
+ drop_last: True
23
+ collate_fn:
24
+ type: BatchImageCollateFunction
25
+
26
+
27
+ val_dataloader:
28
+ type: DataLoader
29
+ dataset:
30
+ type: CocoDetection
31
+ img_folder: /data/CrowdHuman/coco/CrowdHuman_val
32
+ ann_file: /data/CrowdHuman/coco/Chuman-val.json
33
+ return_masks: False
34
+ transforms:
35
+ type: Compose
36
+ ops: ~
37
+ shuffle: False
38
+ num_workers: 4
39
+ drop_last: False
40
+ collate_fn:
41
+ type: BatchImageCollateFunction
D-FINE/configs/dataset/custom_detection.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ evaluator:
4
+ type: CocoEvaluator
5
+ iou_types: ['bbox', ]
6
+
7
+ num_classes: 777 # your dataset classes
8
+ remap_mscoco_category: False
9
+
10
+ train_dataloader:
11
+ type: DataLoader
12
+ dataset:
13
+ type: CocoDetection
14
+ img_folder: /data/yourdataset/train
15
+ ann_file: /data/yourdataset/train/train.json
16
+ return_masks: False
17
+ transforms:
18
+ type: Compose
19
+ ops: ~
20
+ shuffle: True
21
+ num_workers: 4
22
+ drop_last: True
23
+ collate_fn:
24
+ type: BatchImageCollateFunction
25
+
26
+
27
+ val_dataloader:
28
+ type: DataLoader
29
+ dataset:
30
+ type: CocoDetection
31
+ img_folder: /data/yourdataset/val
32
+ ann_file: /data/yourdataset/val/val.json
33
+ return_masks: False
34
+ transforms:
35
+ type: Compose
36
+ ops: ~
37
+ shuffle: False
38
+ num_workers: 4
39
+ drop_last: False
40
+ collate_fn:
41
+ type: BatchImageCollateFunction
D-FINE/configs/dataset/obj365_detection.yml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ evaluator:
4
+ type: CocoEvaluator
5
+ iou_types: ['bbox', ]
6
+
7
+ num_classes: 366
8
+ remap_mscoco_category: False
9
+
10
+ train_dataloader:
11
+ type: DataLoader
12
+ dataset:
13
+ type: CocoDetection
14
+ img_folder: /data/Objects365/data/train
15
+ ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
16
+ return_masks: False
17
+ transforms:
18
+ type: Compose
19
+ ops: ~
20
+ shuffle: True
21
+ num_workers: 4
22
+ drop_last: True
23
+ collate_fn:
24
+ type: BatchImageCollateFunction
25
+
26
+
27
+ val_dataloader:
28
+ type: DataLoader
29
+ dataset:
30
+ type: CocoDetection
31
+ img_folder: /data/Objects365/data/val/
32
+ ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
33
+ return_masks: False
34
+ transforms:
35
+ type: Compose
36
+ ops: ~
37
+ shuffle: False
38
+ num_workers: 4
39
+ drop_last: False
40
+ collate_fn:
41
+ type: BatchImageCollateFunction
D-FINE/configs/dataset/voc_detection.yml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ evaluator:
4
+ type: CocoEvaluator
5
+ iou_types: ['bbox', ]
6
+
7
+ num_classes: 20
8
+
9
+ train_dataloader:
10
+ type: DataLoader
11
+ dataset:
12
+ type: VOCDetection
13
+ root: ./dataset/voc/
14
+ ann_file: trainval.txt
15
+ label_file: label_list.txt
16
+ transforms:
17
+ type: Compose
18
+ ops: ~
19
+ shuffle: True
20
+ num_workers: 4
21
+ drop_last: True
22
+ collate_fn:
23
+ type: BatchImageCollateFunction
24
+
25
+
26
+ val_dataloader:
27
+ type: DataLoader
28
+ dataset:
29
+ type: VOCDetection
30
+ root: ./dataset/voc/
31
+ ann_file: test.txt
32
+ label_file: label_list.txt
33
+ transforms:
34
+ type: Compose
35
+ ops: ~
36
+ shuffle: False
37
+ num_workers: 4
38
+ drop_last: False
39
+ collate_fn:
40
+ type: BatchImageCollateFunction
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_l_ch.yml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/crowdhuman_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_crowdhuman
10
+
11
+
12
+ HGNetv2:
13
+ name: 'B4'
14
+ return_idx: [1, 2, 3]
15
+ freeze_stem_only: True
16
+ freeze_at: 0
17
+ freeze_norm: True
18
+
19
+ optimizer:
20
+ type: AdamW
21
+ params:
22
+ -
23
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
24
+ lr: 0.0000125
25
+ -
26
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
27
+ weight_decay: 0.
28
+
29
+ lr: 0.00025
30
+ betas: [0.9, 0.999]
31
+ weight_decay: 0.000125
32
+
33
+
34
+ # Increase to search for the optimal ema
35
+ epochs: 140
36
+ train_dataloader:
37
+ dataset:
38
+ transforms:
39
+ policy:
40
+ epoch: 120
41
+ collate_fn:
42
+ stop_epoch: 120
43
+ ema_restart_decay: 0.9999
44
+ base_size_repeat: 4
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_m_ch.yml ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/crowdhuman_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_m_crowdhuman
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 4 # 5 6
24
+ eval_idx: -1 # -2 -3
25
+
26
+ HybridEncoder:
27
+ in_channels: [384, 768, 1536]
28
+ hidden_dim: 256
29
+ depth_mult: 0.67
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.000025
37
+ -
38
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
39
+ lr: 0.000025
40
+ weight_decay: 0.
41
+ -
42
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
43
+ weight_decay: 0.
44
+
45
+ lr: 0.00025
46
+ betas: [0.9, 0.999]
47
+ weight_decay: 0.000125
48
+
49
+
50
+ # Increase to search for the optimal ema
51
+ epochs: 220
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 200
57
+ collate_fn:
58
+ stop_epoch: 200
59
+ ema_restart_decay: 0.9999
60
+ base_size_repeat: 6
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_n_ch.yml ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/crowdhuman_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_n_crowdhuman
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+
23
+ HybridEncoder:
24
+ in_channels: [512, 1024]
25
+ feat_strides: [16, 32]
26
+
27
+ # intra
28
+ hidden_dim: 128
29
+ use_encoder_idx: [1]
30
+ dim_feedforward: 512
31
+
32
+ # cross
33
+ expansion: 0.34
34
+ depth_mult: 0.5
35
+
36
+
37
+ DFINETransformer:
38
+ feat_channels: [128, 128]
39
+ feat_strides: [16, 32]
40
+ hidden_dim: 128
41
+ dim_feedforward: 512
42
+ num_levels: 2
43
+
44
+ num_layers: 3
45
+ eval_idx: -1
46
+
47
+ num_points: [6, 6]
48
+
49
+ optimizer:
50
+ type: AdamW
51
+ params:
52
+ -
53
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
54
+ lr: 0.0004
55
+ -
56
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
57
+ lr: 0.0004
58
+ weight_decay: 0.
59
+ -
60
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
61
+ weight_decay: 0.
62
+
63
+ lr: 0.0008
64
+ betas: [0.9, 0.999]
65
+ weight_decay: 0.0001
66
+
67
+
68
+ # Increase to search for the optimal ema
69
+ epochs: 220
70
+ train_dataloader:
71
+ total_batch_size: 128
72
+ dataset:
73
+ transforms:
74
+ policy:
75
+ epoch: 200
76
+ collate_fn:
77
+ stop_epoch: 200
78
+ ema_restart_decay: 0.9999
79
+ base_size_repeat: ~
80
+
81
+ val_dataloader:
82
+ total_batch_size: 256
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_s_ch.yml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/crowdhuman_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_crowdhuman
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 3 # 4 5 6
24
+ eval_idx: -1 # -2 -3 -4
25
+
26
+ HybridEncoder:
27
+ in_channels: [256, 512, 1024]
28
+ hidden_dim: 256
29
+ depth_mult: 0.34
30
+ expansion: 0.5
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.0002
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.0002
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.0004
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.0001
49
+
50
+
51
+ # Increase to search for the optimal ema
52
+ epochs: 220
53
+ train_dataloader:
54
+ total_batch_size: 64
55
+ dataset:
56
+ transforms:
57
+ policy:
58
+ epoch: 200
59
+ collate_fn:
60
+ stop_epoch: 200
61
+ ema_restart_decay: 0.9999
62
+ base_size_repeat: 20
63
+
64
+ val_dataloader:
65
+ total_batch_size: 128
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_x_ch.yml ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/crowdhuman_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_crowdhuman
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ HybridEncoder:
23
+ hidden_dim: 384
24
+ dim_feedforward: 2048
25
+
26
+ DFINETransformer:
27
+ feat_channels: [384, 384, 384]
28
+ reg_scale: 8
29
+
30
+ optimizer:
31
+ type: AdamW
32
+ params:
33
+ -
34
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
35
+ lr: 0.0000025
36
+ -
37
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
38
+ weight_decay: 0.
39
+
40
+ lr: 0.00025
41
+ betas: [0.9, 0.999]
42
+ weight_decay: 0.000125
43
+
44
+
45
+ # Increase to search for the optimal ema
46
+ epochs: 140
47
+ train_dataloader:
48
+ dataset:
49
+ transforms:
50
+ policy:
51
+ epoch: 120
52
+ collate_fn:
53
+ stop_epoch: 120
54
+ ema_restart_decay: 0.9998
55
+ base_size_repeat: 3
D-FINE/configs/dfine/custom/dfine_hgnetv2_l_custom.yml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/custom_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_custom
10
+
11
+
12
+ HGNetv2:
13
+ name: 'B4'
14
+ return_idx: [1, 2, 3]
15
+ freeze_stem_only: True
16
+ freeze_at: 0
17
+ freeze_norm: True
18
+
19
+ optimizer:
20
+ type: AdamW
21
+ params:
22
+ -
23
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
24
+ lr: 0.0000125
25
+ -
26
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
27
+ weight_decay: 0.
28
+
29
+ lr: 0.00025
30
+ betas: [0.9, 0.999]
31
+ weight_decay: 0.000125
32
+
33
+
34
+ # Increase to search for the optimal ema
35
+ epochs: 80 # 72 + 2n
36
+ train_dataloader:
37
+ dataset:
38
+ transforms:
39
+ policy:
40
+ epoch: 72
41
+ collate_fn:
42
+ stop_epoch: 72
43
+ ema_restart_decay: 0.9999
44
+ base_size_repeat: 4
D-FINE/configs/dfine/custom/dfine_hgnetv2_m_custom.yml ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/custom_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_m_custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 4 # 5 6
24
+ eval_idx: -1 # -2 -3
25
+
26
+ HybridEncoder:
27
+ in_channels: [384, 768, 1536]
28
+ hidden_dim: 256
29
+ depth_mult: 0.67
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.000025
37
+ -
38
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
39
+ lr: 0.000025
40
+ weight_decay: 0.
41
+ -
42
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
43
+ weight_decay: 0.
44
+
45
+ lr: 0.00025
46
+ betas: [0.9, 0.999]
47
+ weight_decay: 0.000125
48
+
49
+
50
+ # Increase to search for the optimal ema
51
+ epochs: 132 # 120 + 4n
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 120
57
+ collate_fn:
58
+ stop_epoch: 120
59
+ ema_restart_decay: 0.9999
60
+ base_size_repeat: 6
D-FINE/configs/dfine/custom/dfine_hgnetv2_n_custom.yml ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__:
2
+ [
3
+ "../../dataset/custom_detection.yml",
4
+ "../../runtime.yml",
5
+ "../include/dataloader.yml",
6
+ "../include/optimizer.yml",
7
+ "../include/dfine_hgnetv2.yml",
8
+ ]
9
+
10
+ output_dir: ../../../inference_output
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: "B0"
17
+ return_idx: [2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ HybridEncoder:
23
+ in_channels: [512, 1024]
24
+ feat_strides: [16, 32]
25
+
26
+ # intra
27
+ hidden_dim: 128
28
+ use_encoder_idx: [1]
29
+ dim_feedforward: 512
30
+
31
+ # cross
32
+ expansion: 0.34
33
+ depth_mult: 0.5
34
+
35
+ DFINETransformer:
36
+ feat_channels: [128, 128]
37
+ feat_strides: [16, 32]
38
+ hidden_dim: 128
39
+ dim_feedforward: 512
40
+ num_levels: 2
41
+
42
+ num_layers: 3
43
+ eval_idx: -1
44
+
45
+ num_points: [6, 6]
46
+
47
+ optimizer:
48
+ type: AdamW
49
+ params:
50
+ - params: "^(?=.*backbone)(?!.*norm|bn).*$"
51
+ lr: 0.0004
52
+ - params: "^(?=.*backbone)(?=.*norm|bn).*$"
53
+ lr: 0.0004
54
+ weight_decay: 0.
55
+ - params: "^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$"
56
+ weight_decay: 0.
57
+
58
+ lr: 0.0008
59
+ betas: [0.9, 0.999]
60
+ weight_decay: 0.0001
61
+
62
+ # Increase to search for the optimal ema
63
+ epochs: 135
64
+ train_dataloader:
65
+ total_batch_size: 32
66
+ dataset:
67
+ transforms:
68
+ policy:
69
+ epoch: 123
70
+ collate_fn:
71
+ stop_epoch: 123
72
+ ema_restart_decay: 0.9999
73
+ base_size_repeat: ~
74
+
75
+ val_dataloader:
76
+ total_batch_size: 32
D-FINE/configs/dfine/custom/dfine_hgnetv2_s_custom.yml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/custom_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 3 # 4 5 6
24
+ eval_idx: -1 # -2 -3 -4
25
+
26
+ HybridEncoder:
27
+ in_channels: [256, 512, 1024]
28
+ hidden_dim: 256
29
+ depth_mult: 0.34
30
+ expansion: 0.5
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.0002
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.0002
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.0004
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.0001
49
+
50
+
51
+ # Increase to search for the optimal ema
52
+ epochs: 220
53
+ train_dataloader:
54
+ total_batch_size: 64
55
+ dataset:
56
+ transforms:
57
+ policy:
58
+ epoch: 200
59
+ collate_fn:
60
+ stop_epoch: 200
61
+ ema_restart_decay: 0.9999
62
+ base_size_repeat: 20
63
+
64
+ val_dataloader:
65
+ total_batch_size: 128
D-FINE/configs/dfine/custom/dfine_hgnetv2_x_custom.yml ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/custom_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ HybridEncoder:
23
+ hidden_dim: 384
24
+ dim_feedforward: 2048
25
+
26
+ DFINETransformer:
27
+ feat_channels: [384, 384, 384]
28
+ reg_scale: 8
29
+
30
+ optimizer:
31
+ type: AdamW
32
+ params:
33
+ -
34
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
35
+ lr: 0.0000025
36
+ -
37
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
38
+ weight_decay: 0.
39
+
40
+ lr: 0.00025
41
+ betas: [0.9, 0.999]
42
+ weight_decay: 0.000125
43
+
44
+
45
+ # Increase to search for the optimal ema
46
+ epochs: 80 # 72 + 2n
47
+ train_dataloader:
48
+ dataset:
49
+ transforms:
50
+ policy:
51
+ epoch: 72
52
+ collate_fn:
53
+ stop_epoch: 72
54
+ ema_restart_decay: 0.9998
55
+ base_size_repeat: 3
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_l_obj2custom.yml ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../../dataset/custom_detection.yml',
3
+ '../../../runtime.yml',
4
+ '../../include/dataloader.yml',
5
+ '../../include/optimizer.yml',
6
+ '../../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_obj2custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B4'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+ pretrained: False
22
+
23
+ optimizer:
24
+ type: AdamW
25
+ params:
26
+ -
27
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
28
+ lr: 0.0000125
29
+ -
30
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
31
+ weight_decay: 0.
32
+
33
+ lr: 0.00025
34
+ betas: [0.9, 0.999]
35
+ weight_decay: 0.000125
36
+
37
+
38
+ epochs: 36 # Early stop
39
+ train_dataloader:
40
+ dataset:
41
+ transforms:
42
+ policy:
43
+ epoch: 30
44
+ collate_fn:
45
+ stop_epoch: 30
46
+ ema_restart_decay: 0.9999
47
+ base_size_repeat: 4
48
+
49
+ ema:
50
+ warmups: 0
51
+
52
+ lr_warmup_scheduler:
53
+ warmup_duration: 0
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_m_obj2custom.yml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../../dataset/custom_detection.yml',
3
+ '../../../runtime.yml',
4
+ '../../include/dataloader.yml',
5
+ '../../include/optimizer.yml',
6
+ '../../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_m_obj2custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+ pretrained: False
22
+
23
+ DFINETransformer:
24
+ num_layers: 4 # 5 6
25
+ eval_idx: -1 # -2 -3
26
+
27
+ HybridEncoder:
28
+ in_channels: [384, 768, 1536]
29
+ hidden_dim: 256
30
+ depth_mult: 0.67
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.000025
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.000025
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.00025
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.000125
49
+
50
+
51
+ epochs: 56 # Early stop
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 48
57
+ collate_fn:
58
+ stop_epoch: 48
59
+ ema_restart_decay: 0.9999
60
+ base_size_repeat: 6
61
+
62
+ ema:
63
+ warmups: 0
64
+
65
+ lr_warmup_scheduler:
66
+ warmup_duration: 0
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../../dataset/custom_detection.yml',
3
+ '../../../runtime.yml',
4
+ '../../include/dataloader.yml',
5
+ '../../include/optimizer.yml',
6
+ '../../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_obj2custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+ pretrained: False
22
+
23
+ DFINETransformer:
24
+ num_layers: 3 # 4 5 6
25
+ eval_idx: -1 # -2 -3 -4
26
+
27
+ HybridEncoder:
28
+ in_channels: [256, 512, 1024]
29
+ hidden_dim: 256
30
+ depth_mult: 0.34
31
+ expansion: 0.5
32
+
33
+ optimizer:
34
+ type: AdamW
35
+ params:
36
+ -
37
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
38
+ lr: 0.000125
39
+ -
40
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
41
+ lr: 0.000125
42
+ weight_decay: 0.
43
+ -
44
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
45
+ weight_decay: 0.
46
+
47
+ lr: 0.00025
48
+ betas: [0.9, 0.999]
49
+ weight_decay: 0.000125
50
+
51
+
52
+ epochs: 64 # Early stop
53
+ train_dataloader:
54
+ dataset:
55
+ transforms:
56
+ policy:
57
+ epoch: 56
58
+ collate_fn:
59
+ stop_epoch: 56
60
+ ema_restart_decay: 0.9999
61
+ base_size_repeat: 10
62
+
63
+ ema:
64
+ warmups: 0
65
+
66
+ lr_warmup_scheduler:
67
+ warmup_duration: 0
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_x_obj2custom.yml ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../../dataset/custom_detection.yml',
3
+ '../../../runtime.yml',
4
+ '../../include/dataloader.yml',
5
+ '../../include/optimizer.yml',
6
+ '../../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_obj2custom
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+ pretrained: False
22
+
23
+ HybridEncoder:
24
+ # intra
25
+ hidden_dim: 384
26
+ dim_feedforward: 2048
27
+
28
+ DFINETransformer:
29
+ feat_channels: [384, 384, 384]
30
+ reg_scale: 8
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.0000025
38
+ -
39
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
40
+ weight_decay: 0.
41
+
42
+ lr: 0.00025
43
+ betas: [0.9, 0.999]
44
+ weight_decay: 0.000125
45
+
46
+
47
+ epochs: 36 # Early stop
48
+ train_dataloader:
49
+ dataset:
50
+ transforms:
51
+ policy:
52
+ epoch: 30
53
+ collate_fn:
54
+ stop_epoch: 30
55
+ ema_restart_decay: 0.9999
56
+ base_size_repeat: 3
57
+
58
+ ema:
59
+ warmups: 0
60
+
61
+ lr_warmup_scheduler:
62
+ warmup_duration: 0
D-FINE/configs/dfine/dfine_hgnetv2_l_coco.yml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../dataset/coco_detection.yml',
3
+ '../runtime.yml',
4
+ './include/dataloader.yml',
5
+ './include/optimizer.yml',
6
+ './include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_coco
10
+
11
+
12
+ HGNetv2:
13
+ name: 'B4'
14
+ return_idx: [1, 2, 3]
15
+ freeze_stem_only: True
16
+ freeze_at: 0
17
+ freeze_norm: True
18
+
19
+ optimizer:
20
+ type: AdamW
21
+ params:
22
+ -
23
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
24
+ lr: 0.0000125
25
+ -
26
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
27
+ weight_decay: 0.
28
+
29
+ lr: 0.00025
30
+ betas: [0.9, 0.999]
31
+ weight_decay: 0.000125
32
+
33
+
34
+ # Increase to search for the optimal ema
35
+ epochs: 80 # 72 + 2n
36
+ train_dataloader:
37
+ dataset:
38
+ transforms:
39
+ policy:
40
+ epoch: 72
41
+ collate_fn:
42
+ stop_epoch: 72
43
+ ema_restart_decay: 0.9999
44
+ base_size_repeat: 4
D-FINE/configs/dfine/dfine_hgnetv2_m_coco.yml ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../dataset/coco_detection.yml',
3
+ '../runtime.yml',
4
+ './include/dataloader.yml',
5
+ './include/optimizer.yml',
6
+ './include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_m_coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 4 # 5 6
24
+ eval_idx: -1 # -2 -3
25
+
26
+ HybridEncoder:
27
+ in_channels: [384, 768, 1536]
28
+ hidden_dim: 256
29
+ depth_mult: 0.67
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.00002
37
+ -
38
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
39
+ lr: 0.00002
40
+ weight_decay: 0.
41
+ -
42
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
43
+ weight_decay: 0.
44
+
45
+ lr: 0.0002
46
+ betas: [0.9, 0.999]
47
+ weight_decay: 0.0001
48
+
49
+
50
+ # Increase to search for the optimal ema
51
+ epochs: 132 # 120 + 4n
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 120
57
+ collate_fn:
58
+ stop_epoch: 120
59
+ ema_restart_decay: 0.9999
60
+ base_size_repeat: 6
D-FINE/configs/dfine/dfine_hgnetv2_n_coco.yml ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../dataset/coco_detection.yml',
3
+ '../runtime.yml',
4
+ './include/dataloader.yml',
5
+ './include/optimizer.yml',
6
+ './include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_n_coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+
23
+ HybridEncoder:
24
+ in_channels: [512, 1024]
25
+ feat_strides: [16, 32]
26
+
27
+ # intra
28
+ hidden_dim: 128
29
+ use_encoder_idx: [1]
30
+ dim_feedforward: 512
31
+
32
+ # cross
33
+ expansion: 0.34
34
+ depth_mult: 0.5
35
+
36
+
37
+ DFINETransformer:
38
+ feat_channels: [128, 128]
39
+ feat_strides: [16, 32]
40
+ hidden_dim: 128
41
+ dim_feedforward: 512
42
+ num_levels: 2
43
+
44
+ num_layers: 3
45
+ eval_idx: -1
46
+
47
+ num_points: [6, 6]
48
+
49
+ optimizer:
50
+ type: AdamW
51
+ params:
52
+ -
53
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
54
+ lr: 0.0004
55
+ -
56
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
57
+ lr: 0.0004
58
+ weight_decay: 0.
59
+ -
60
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
61
+ weight_decay: 0.
62
+
63
+ lr: 0.0008
64
+ betas: [0.9, 0.999]
65
+ weight_decay: 0.0001
66
+
67
+
68
+ # Increase to search for the optimal ema
69
+ epochs: 160 # 148 + 4n
70
+ train_dataloader:
71
+ total_batch_size: 128
72
+ dataset:
73
+ transforms:
74
+ policy:
75
+ epoch: 148
76
+ collate_fn:
77
+ stop_epoch: 148
78
+ ema_restart_decay: 0.9999
79
+ base_size_repeat: ~
80
+
81
+ val_dataloader:
82
+ total_batch_size: 256
D-FINE/configs/dfine/dfine_hgnetv2_s_coco.yml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../dataset/coco_detection.yml',
3
+ '../runtime.yml',
4
+ './include/dataloader.yml',
5
+ './include/optimizer.yml',
6
+ './include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 3 # 4 5 6
24
+ eval_idx: -1 # -2 -3 -4
25
+
26
+ HybridEncoder:
27
+ in_channels: [256, 512, 1024]
28
+ hidden_dim: 256
29
+ depth_mult: 0.34
30
+ expansion: 0.5
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.0001
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.0001
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.0002
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.0001
49
+
50
+
51
+ # Increase to search for the optimal ema
52
+ epochs: 132 # 120 + 4n
53
+ train_dataloader:
54
+ dataset:
55
+ transforms:
56
+ policy:
57
+ epoch: 120
58
+ collate_fn:
59
+ stop_epoch: 120
60
+ ema_restart_decay: 0.9999
61
+ base_size_repeat: 20
D-FINE/configs/dfine/dfine_hgnetv2_x_coco.yml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../dataset/coco_detection.yml',
3
+ '../runtime.yml',
4
+ './include/dataloader.yml',
5
+ './include/optimizer.yml',
6
+ './include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ HybridEncoder:
23
+ # intra
24
+ hidden_dim: 384
25
+ dim_feedforward: 2048
26
+
27
+ DFINETransformer:
28
+ feat_channels: [384, 384, 384]
29
+ reg_scale: 8
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.0000025
37
+ -
38
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
39
+ weight_decay: 0.
40
+
41
+ lr: 0.00025
42
+ betas: [0.9, 0.999]
43
+ weight_decay: 0.000125
44
+
45
+
46
+ # Increase to search for the optimal ema
47
+ epochs: 80 # 72 + 2n
48
+ train_dataloader:
49
+ dataset:
50
+ transforms:
51
+ policy:
52
+ epoch: 72
53
+ collate_fn:
54
+ stop_epoch: 72
55
+ ema_restart_decay: 0.9998
56
+ base_size_repeat: 3
D-FINE/configs/dfine/include/dataloader.yml ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ train_dataloader:
3
+ dataset:
4
+ transforms:
5
+ ops:
6
+ - {type: RandomPhotometricDistort, p: 0.5}
7
+ - {type: RandomZoomOut, fill: 0}
8
+ - {type: RandomIoUCrop, p: 0.8}
9
+ - {type: SanitizeBoundingBoxes, min_size: 1}
10
+ - {type: RandomHorizontalFlip}
11
+ - {type: Resize, size: [640, 640], }
12
+ - {type: SanitizeBoundingBoxes, min_size: 1}
13
+ - {type: ConvertPILImage, dtype: 'float32', scale: True}
14
+ - {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
15
+ policy:
16
+ name: stop_epoch
17
+ epoch: 72 # epoch in [71, ~) stop `ops`
18
+ ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']
19
+
20
+ collate_fn:
21
+ type: BatchImageCollateFunction
22
+ base_size: 640
23
+ base_size_repeat: 3
24
+ stop_epoch: 72 # epoch in [72, ~) stop `multiscales`
25
+
26
+ shuffle: True
27
+ total_batch_size: 32 # total batch size equals to 32 (4 * 8)
28
+ num_workers: 4
29
+
30
+
31
+ val_dataloader:
32
+ dataset:
33
+ transforms:
34
+ ops:
35
+ - {type: Resize, size: [640, 640], }
36
+ - {type: ConvertPILImage, dtype: 'float32', scale: True}
37
+ shuffle: False
38
+ total_batch_size: 64
39
+ num_workers: 4
D-FINE/configs/dfine/include/dfine_hgnetv2.yml ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task: detection
2
+
3
+ model: DFINE
4
+ criterion: DFINECriterion
5
+ postprocessor: DFINEPostProcessor
6
+
7
+ use_focal_loss: True
8
+ eval_spatial_size: [640, 640] # h w
9
+
10
+ DFINE:
11
+ backbone: HGNetv2
12
+ encoder: HybridEncoder
13
+ decoder: DFINETransformer
14
+
15
+ HGNetv2:
16
+ pretrained: True
17
+ local_model_dir: weight/hgnetv2/
18
+
19
+ HybridEncoder:
20
+ in_channels: [512, 1024, 2048]
21
+ feat_strides: [8, 16, 32]
22
+
23
+ # intra
24
+ hidden_dim: 256
25
+ use_encoder_idx: [2]
26
+ num_encoder_layers: 1
27
+ nhead: 8
28
+ dim_feedforward: 1024
29
+ dropout: 0.
30
+ enc_act: 'gelu'
31
+
32
+ # cross
33
+ expansion: 1.0
34
+ depth_mult: 1
35
+ act: 'silu'
36
+
37
+
38
+ DFINETransformer:
39
+ feat_channels: [256, 256, 256]
40
+ feat_strides: [8, 16, 32]
41
+ hidden_dim: 256
42
+ num_levels: 3
43
+
44
+ num_layers: 6
45
+ eval_idx: -1
46
+ num_queries: 300
47
+
48
+ num_denoising: 100
49
+ label_noise_ratio: 0.5
50
+ box_noise_scale: 1.0
51
+
52
+ # NEW
53
+ reg_max: 32
54
+ reg_scale: 4
55
+
56
+ # Auxiliary decoder layers dimension scaling
57
+ # "eg. If num_layers: 6 eval_idx: -4,
58
+ # then layer 3, 4, 5 are auxiliary decoder layers."
59
+ layer_scale: 1 # 2
60
+
61
+
62
+ num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
63
+ cross_attn_method: default # default, discrete
64
+ query_select_method: default # default, agnostic
65
+
66
+
67
+ DFINEPostProcessor:
68
+ num_top_queries: 300
69
+
70
+
71
+ DFINECriterion:
72
+ weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
73
+ losses: ['vfl', 'boxes', 'local']
74
+ alpha: 0.75
75
+ gamma: 2.0
76
+ reg_max: 32
77
+
78
+ matcher:
79
+ type: HungarianMatcher
80
+ weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
81
+ alpha: 0.25
82
+ gamma: 2.0
D-FINE/configs/dfine/include/optimizer.yml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ use_amp: True
2
+ use_ema: True
3
+ ema:
4
+ type: ModelEMA
5
+ decay: 0.9999
6
+ warmups: 1000
7
+ start: 0
8
+
9
+
10
+ epochs: 72
11
+ clip_max_norm: 0.1
12
+
13
+
14
+ optimizer:
15
+ type: AdamW
16
+ params:
17
+ -
18
+ params: '^(?=.*backbone)(?!.*norm).*$'
19
+ lr: 0.0000125
20
+ -
21
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
22
+ weight_decay: 0.
23
+
24
+ lr: 0.00025
25
+ betas: [0.9, 0.999]
26
+ weight_decay: 0.000125
27
+
28
+
29
+ lr_scheduler:
30
+ type: MultiStepLR
31
+ milestones: [500]
32
+ gamma: 0.1
33
+
34
+ lr_warmup_scheduler:
35
+ type: LinearWarmup
36
+ warmup_duration: 500
D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/coco_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_obj2coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B4'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ optimizer:
23
+ type: AdamW
24
+ params:
25
+ -
26
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
27
+ lr: 0.0000125
28
+ -
29
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
30
+ weight_decay: 0.
31
+
32
+ lr: 0.00025
33
+ betas: [0.9, 0.999]
34
+ weight_decay: 0.000125
35
+
36
+
37
+ epochs: 36 # Early stop
38
+ train_dataloader:
39
+ dataset:
40
+ transforms:
41
+ policy:
42
+ epoch: 30
43
+ collate_fn:
44
+ stop_epoch: 30
45
+ ema_restart_decay: 0.9999
46
+ base_size_repeat: 4
47
+
48
+ ema:
49
+ warmups: 0
50
+
51
+ lr_warmup_scheduler:
52
+ warmup_duration: 0
D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/obj365_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_l_obj365
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B4'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ optimizer:
23
+ type: AdamW
24
+ params:
25
+ -
26
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
27
+ lr: 0.0000125
28
+ -
29
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
30
+ weight_decay: 0.
31
+
32
+ lr: 0.00025
33
+ betas: [0.9, 0.999]
34
+ weight_decay: 0.000125
35
+ # weight_decay: 0.00005 # Faster convergence (optional)
36
+
37
+
38
+ epochs: 24 # Early stop
39
+ train_dataloader:
40
+ dataset:
41
+ transforms:
42
+ policy:
43
+ epoch: 500
44
+ collate_fn:
45
+ stop_epoch: 500
46
+ base_size_repeat: 4
47
+
48
+ checkpoint_freq: 1
49
+ print_freq: 1000
D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/coco_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_m_obj2coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 4 # 5 6
24
+ eval_idx: -1 # -2 -3
25
+
26
+ HybridEncoder:
27
+ in_channels: [384, 768, 1536]
28
+ hidden_dim: 256
29
+ depth_mult: 0.67
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.000025
37
+ -
38
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
39
+ lr: 0.000025
40
+ weight_decay: 0.
41
+ -
42
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
43
+ weight_decay: 0.
44
+
45
+ lr: 0.00025
46
+ betas: [0.9, 0.999]
47
+ weight_decay: 0.000125
48
+
49
+
50
+ epochs: 56 # Early stop
51
+ train_dataloader:
52
+ dataset:
53
+ transforms:
54
+ policy:
55
+ epoch: 48
56
+ collate_fn:
57
+ stop_epoch: 48
58
+ ema_restart_decay: 0.9999
59
+ base_size_repeat: 6
60
+
61
+ ema:
62
+ warmups: 0
63
+
64
+ lr_warmup_scheduler:
65
+ warmup_duration: 0
D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/obj365_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: .output/dfine_hgnetv2_s_obj365
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B2'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 4 # 5 6
24
+ eval_idx: -1 # -2 -3
25
+
26
+ HybridEncoder:
27
+ in_channels: [384, 768, 1536]
28
+ hidden_dim: 256
29
+ depth_mult: 0.67
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.000025
37
+ -
38
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
39
+ lr: 0.000025
40
+ weight_decay: 0.
41
+ -
42
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
43
+ weight_decay: 0.
44
+
45
+ lr: 0.00025
46
+ betas: [0.9, 0.999]
47
+ weight_decay: 0.000125
48
+ # weight_decay: 0.00005 # Faster convergence (optional)
49
+
50
+
51
+ epochs: 36 # Early stop
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 500
57
+ collate_fn:
58
+ stop_epoch: 500
59
+ base_size_repeat: 6
60
+
61
+ checkpoint_freq: 1
62
+ print_freq: 1000
D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj2coco.yml ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/coco_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_n_obj2coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+
23
+ HybridEncoder:
24
+ in_channels: [512, 1024]
25
+ feat_strides: [16, 32]
26
+
27
+ # intra
28
+ hidden_dim: 128
29
+ use_encoder_idx: [1]
30
+ dim_feedforward: 512
31
+
32
+ # cross
33
+ expansion: 0.34
34
+ depth_mult: 0.5
35
+
36
+
37
+ DFINETransformer:
38
+ feat_channels: [128, 128]
39
+ feat_strides: [16, 32]
40
+ hidden_dim: 128
41
+ dim_feedforward: 512
42
+ num_levels: 2
43
+
44
+ num_layers: 3
45
+ eval_idx: -1
46
+
47
+ num_points: [6, 6]
48
+
49
+ optimizer:
50
+ type: AdamW
51
+ params:
52
+ -
53
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
54
+ lr: 0.0004
55
+ -
56
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
57
+ lr: 0.0004
58
+ weight_decay: 0.
59
+ -
60
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
61
+ weight_decay: 0.
62
+
63
+ lr: 0.0008
64
+ betas: [0.9, 0.999]
65
+ weight_decay: 0.0001
66
+
67
+
68
+
69
+ epochs: 64 # Early stop
70
+ train_dataloader:
71
+ total_batch_size: 128
72
+ dataset:
73
+ transforms:
74
+ policy:
75
+ epoch: 56
76
+ collate_fn:
77
+ stop_epoch: 56
78
+ ema_restart_decay: 0.9999
79
+ base_size_repeat: ~
80
+
81
+ ema:
82
+ warmups: 0
83
+
84
+ lr_warmup_scheduler:
85
+ warmup_duration: 0
86
+
87
+ val_dataloader:
88
+ total_batch_size: 256
D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj365.yml ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/obj365_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_n_obj365
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+
23
+ HybridEncoder:
24
+ in_channels: [512, 1024]
25
+ feat_strides: [16, 32]
26
+
27
+ # intra
28
+ hidden_dim: 128
29
+ use_encoder_idx: [1]
30
+ dim_feedforward: 512
31
+
32
+ # cross
33
+ expansion: 0.34
34
+ depth_mult: 0.5
35
+
36
+
37
+ DFINETransformer:
38
+ feat_channels: [128, 128]
39
+ feat_strides: [16, 32]
40
+ hidden_dim: 128
41
+ dim_feedforward: 512
42
+ num_levels: 2
43
+
44
+ num_layers: 3
45
+ eval_idx: -1
46
+
47
+ num_points: [6, 6]
48
+
49
+ optimizer:
50
+ type: AdamW
51
+ params:
52
+ -
53
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
54
+ lr: 0.0004
55
+ -
56
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
57
+ lr: 0.0004
58
+ weight_decay: 0.
59
+ -
60
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
61
+ weight_decay: 0.
62
+
63
+ lr: 0.0008
64
+ betas: [0.9, 0.999]
65
+ weight_decay: 0.0001
66
+
67
+
68
+
69
+ epochs: 48 # Early stop
70
+ train_dataloader:
71
+ total_batch_size: 128
72
+ dataset:
73
+ transforms:
74
+ policy:
75
+ epoch: 500
76
+ collate_fn:
77
+ stop_epoch: 500
78
+ base_size_repeat: ~
79
+
80
+ checkpoint_freq: 1
81
+ print_freq: 500
82
+
83
+ val_dataloader:
84
+ total_batch_size: 256
D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/coco_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_obj2coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 3 # 4 5 6
24
+ eval_idx: -1 # -2 -3 -4
25
+
26
+ HybridEncoder:
27
+ in_channels: [256, 512, 1024]
28
+ hidden_dim: 256
29
+ depth_mult: 0.34
30
+ expansion: 0.5
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.000125
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.000125
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.00025
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.000125
49
+
50
+
51
+ epochs: 64 # Early stop
52
+ train_dataloader:
53
+ dataset:
54
+ transforms:
55
+ policy:
56
+ epoch: 56
57
+ collate_fn:
58
+ stop_epoch: 56
59
+ ema_restart_decay: 0.9999
60
+ base_size_repeat: 10
61
+
62
+ ema:
63
+ warmups: 0
64
+
65
+ lr_warmup_scheduler:
66
+ warmup_duration: 0
D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/obj365_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_s_obj365
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B0'
17
+ return_idx: [1, 2, 3]
18
+ freeze_at: -1
19
+ freeze_norm: False
20
+ use_lab: True
21
+
22
+ DFINETransformer:
23
+ num_layers: 3 # 4 5 6
24
+ eval_idx: -1 # -2 -3 -4
25
+
26
+ HybridEncoder:
27
+ in_channels: [256, 512, 1024]
28
+ hidden_dim: 256
29
+ depth_mult: 0.34
30
+ expansion: 0.5
31
+
32
+ optimizer:
33
+ type: AdamW
34
+ params:
35
+ -
36
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
37
+ lr: 0.000125
38
+ -
39
+ params: '^(?=.*backbone)(?=.*norm|bn).*$'
40
+ lr: 0.000125
41
+ weight_decay: 0.
42
+ -
43
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
44
+ weight_decay: 0.
45
+
46
+ lr: 0.00025
47
+ betas: [0.9, 0.999]
48
+ weight_decay: 0.000125
49
+ # weight_decay: 0.00005 # Faster convergence (optional)
50
+
51
+
52
+ epochs: 36 # Early stop
53
+ train_dataloader:
54
+ dataset:
55
+ transforms:
56
+ policy:
57
+ epoch: 500
58
+ collate_fn:
59
+ stop_epoch: 500
60
+ base_size_repeat: 20
61
+
62
+ checkpoint_freq: 1
63
+ print_freq: 1000
D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/coco_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_obj2coco
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ HybridEncoder:
23
+ # intra
24
+ hidden_dim: 384
25
+ dim_feedforward: 2048
26
+
27
+ DFINETransformer:
28
+ feat_channels: [384, 384, 384]
29
+ reg_scale: 8
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.0000025
37
+ -
38
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
39
+ weight_decay: 0.
40
+
41
+ lr: 0.00025
42
+ betas: [0.9, 0.999]
43
+ weight_decay: 0.000125
44
+
45
+
46
+ epochs: 36 # Early stop
47
+ train_dataloader:
48
+ dataset:
49
+ transforms:
50
+ policy:
51
+ epoch: 30
52
+ collate_fn:
53
+ stop_epoch: 30
54
+ ema_restart_decay: 0.9999
55
+ base_size_repeat: 3
56
+
57
+ ema:
58
+ warmups: 0
59
+
60
+ lr_warmup_scheduler:
61
+ warmup_duration: 0
D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __include__: [
2
+ '../../dataset/obj365_detection.yml',
3
+ '../../runtime.yml',
4
+ '../include/dataloader.yml',
5
+ '../include/optimizer.yml',
6
+ '../include/dfine_hgnetv2.yml',
7
+ ]
8
+
9
+ output_dir: ./output/dfine_hgnetv2_x_obj365
10
+
11
+
12
+ DFINE:
13
+ backbone: HGNetv2
14
+
15
+ HGNetv2:
16
+ name: 'B5'
17
+ return_idx: [1, 2, 3]
18
+ freeze_stem_only: True
19
+ freeze_at: 0
20
+ freeze_norm: True
21
+
22
+ HybridEncoder:
23
+ # intra
24
+ hidden_dim: 384
25
+ dim_feedforward: 2048
26
+
27
+ DFINETransformer:
28
+ feat_channels: [384, 384, 384]
29
+ reg_scale: 8
30
+
31
+ optimizer:
32
+ type: AdamW
33
+ params:
34
+ -
35
+ params: '^(?=.*backbone)(?!.*norm|bn).*$'
36
+ lr: 0.0000025
37
+ -
38
+ params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
39
+ weight_decay: 0.
40
+
41
+ lr: 0.00025
42
+ betas: [0.9, 0.999]
43
+ weight_decay: 0.000125
44
+ # weight_decay: 0.00005 # Faster convergence (optional)
45
+
46
+
47
+ epochs: 24 # Early stop
48
+ train_dataloader:
49
+ dataset:
50
+ transforms:
51
+ policy:
52
+ epoch: 500
53
+ collate_fn:
54
+ stop_epoch: 500
55
+ base_size_repeat: 3
56
+
57
+ checkpoint_freq: 1
58
+ print_freq: 1000
D-FINE/configs/runtime.yml ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ print_freq: 100
2
+ output_dir: './logs'
3
+ checkpoint_freq: 12
4
+
5
+
6
+ sync_bn: True
7
+ find_unused_parameters: False
8
+
9
+
10
+ use_amp: False
11
+ scaler:
12
+ type: GradScaler
13
+ enabled: True
14
+
15
+
16
+ use_ema: False
17
+ ema:
18
+ type: ModelEMA
19
+ decay: 0.9999
20
+ warmups: 1000
21
+
22
+ use_wandb: False
23
+ project_name: D-FINE # for wandb
24
+ exp_name: baseline # wandb experiment name