Spaces:
No application file
No application file
Commit
·
b26e93d
0
Parent(s):
Clean initial commit (no large files, no LFS pointers)
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitignore +9 -0
- D-FINE/.github/ISSUE_TEMPLATE/bug_report.md +38 -0
- D-FINE/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
- D-FINE/.github/workflows/is.yml +63 -0
- D-FINE/.github/workflows/pre-commit.yml +15 -0
- D-FINE/.gitignore +10 -0
- D-FINE/.pre-commit-config.yaml +68 -0
- D-FINE/Dockerfile +48 -0
- D-FINE/LICENSE +201 -0
- D-FINE/README.md +700 -0
- D-FINE/README_cn.md +673 -0
- D-FINE/README_ja.md +698 -0
- D-FINE/configs/dataset/coco_detection.yml +41 -0
- D-FINE/configs/dataset/crowdhuman_detection.yml +41 -0
- D-FINE/configs/dataset/custom_detection.yml +41 -0
- D-FINE/configs/dataset/obj365_detection.yml +41 -0
- D-FINE/configs/dataset/voc_detection.yml +40 -0
- D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_l_ch.yml +44 -0
- D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_m_ch.yml +60 -0
- D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_n_ch.yml +82 -0
- D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_s_ch.yml +65 -0
- D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_x_ch.yml +55 -0
- D-FINE/configs/dfine/custom/dfine_hgnetv2_l_custom.yml +44 -0
- D-FINE/configs/dfine/custom/dfine_hgnetv2_m_custom.yml +60 -0
- D-FINE/configs/dfine/custom/dfine_hgnetv2_n_custom.yml +76 -0
- D-FINE/configs/dfine/custom/dfine_hgnetv2_s_custom.yml +65 -0
- D-FINE/configs/dfine/custom/dfine_hgnetv2_x_custom.yml +55 -0
- D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_l_obj2custom.yml +53 -0
- D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_m_obj2custom.yml +66 -0
- D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml +67 -0
- D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_x_obj2custom.yml +62 -0
- D-FINE/configs/dfine/dfine_hgnetv2_l_coco.yml +44 -0
- D-FINE/configs/dfine/dfine_hgnetv2_m_coco.yml +60 -0
- D-FINE/configs/dfine/dfine_hgnetv2_n_coco.yml +82 -0
- D-FINE/configs/dfine/dfine_hgnetv2_s_coco.yml +61 -0
- D-FINE/configs/dfine/dfine_hgnetv2_x_coco.yml +56 -0
- D-FINE/configs/dfine/include/dataloader.yml +39 -0
- D-FINE/configs/dfine/include/dfine_hgnetv2.yml +82 -0
- D-FINE/configs/dfine/include/optimizer.yml +36 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml +52 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml +49 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml +65 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml +62 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj2coco.yml +88 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj365.yml +84 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml +66 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml +63 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml +61 -0
- D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml +58 -0
- D-FINE/configs/runtime.yml +24 -0
.gitignore
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ignore large files and runtime outputs
|
2 |
+
*.pth
|
3 |
+
*.pt
|
4 |
+
*.engine
|
5 |
+
*.onnx
|
6 |
+
engines/
|
7 |
+
D-FINE/weight/
|
8 |
+
examples/
|
9 |
+
output/
|
D-FINE/.github/ISSUE_TEMPLATE/bug_report.md
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
name: Bug report
|
3 |
+
about: Create a report to help us improve
|
4 |
+
title: ''
|
5 |
+
labels: ''
|
6 |
+
assignees: ''
|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
**Describe the bug**
|
11 |
+
A clear and concise description of what the bug is.
|
12 |
+
|
13 |
+
**To Reproduce**
|
14 |
+
Steps to reproduce the behavior:
|
15 |
+
1. Go to '...'
|
16 |
+
2. Click on '....'
|
17 |
+
3. Scroll down to '....'
|
18 |
+
4. See error
|
19 |
+
|
20 |
+
**Expected behavior**
|
21 |
+
A clear and concise description of what you expected to happen.
|
22 |
+
|
23 |
+
**Screenshots**
|
24 |
+
If applicable, add screenshots to help explain your problem.
|
25 |
+
|
26 |
+
**Desktop (please complete the following information):**
|
27 |
+
- OS: [e.g. iOS]
|
28 |
+
- Browser [e.g. chrome, safari]
|
29 |
+
- Version [e.g. 22]
|
30 |
+
|
31 |
+
**Smartphone (please complete the following information):**
|
32 |
+
- Device: [e.g. iPhone6]
|
33 |
+
- OS: [e.g. iOS8.1]
|
34 |
+
- Browser [e.g. stock browser, safari]
|
35 |
+
- Version [e.g. 22]
|
36 |
+
|
37 |
+
**Additional context**
|
38 |
+
Add any other context about the problem here.
|
D-FINE/.github/ISSUE_TEMPLATE/feature_request.md
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
name: Feature request
|
3 |
+
about: Suggest an idea for this project
|
4 |
+
title: ''
|
5 |
+
labels: ''
|
6 |
+
assignees: ''
|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
**Is your feature request related to a problem? Please describe.**
|
11 |
+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
|
12 |
+
|
13 |
+
**Describe the solution you'd like**
|
14 |
+
A clear and concise description of what you want to happen.
|
15 |
+
|
16 |
+
**Describe alternatives you've considered**
|
17 |
+
A clear and concise description of any alternative solutions or features you've considered.
|
18 |
+
|
19 |
+
**Additional context**
|
20 |
+
Add any other context or screenshots about the feature request here.
|
D-FINE/.github/workflows/is.yml
ADDED
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
name: Issue Screening
|
2 |
+
|
3 |
+
on:
|
4 |
+
issues:
|
5 |
+
types: [opened, edited]
|
6 |
+
|
7 |
+
jobs:
|
8 |
+
screen-issues:
|
9 |
+
runs-on: ubuntu-latest
|
10 |
+
|
11 |
+
steps:
|
12 |
+
- name: Checkout repository
|
13 |
+
uses: actions/checkout@v2
|
14 |
+
|
15 |
+
- name: Get details and check for keywords
|
16 |
+
id: issue_check
|
17 |
+
uses: actions/github-script@v5
|
18 |
+
with:
|
19 |
+
script: |
|
20 |
+
const issue = context.payload.issue;
|
21 |
+
const issueNumber = issue.number;
|
22 |
+
const title = (issue.title || "").toLowerCase();
|
23 |
+
const body = (issue.body || "").toLowerCase();
|
24 |
+
core.setOutput('number', issueNumber);
|
25 |
+
|
26 |
+
const keywords = ["spam", "badword", "inappropriate", "suspicious", "unusual", "star", "stars", "buy", "buying"];
|
27 |
+
let containsKeyword = false;
|
28 |
+
|
29 |
+
console.log(`Checking issue #${issueNumber} for whole word keywords...`);
|
30 |
+
for (const keyword of keywords) {
|
31 |
+
const regex = new RegExp(`\\b${keyword}\\b`);
|
32 |
+
if (regex.test(title) || regex.test(body)) {
|
33 |
+
containsKeyword = true;
|
34 |
+
console.log(`Whole word keyword '${keyword}' found in issue #${issueNumber}.`);
|
35 |
+
break;
|
36 |
+
}
|
37 |
+
}
|
38 |
+
|
39 |
+
console.log(`Keyword check for issue #${issueNumber} completed. contains_keyword=${containsKeyword}`);
|
40 |
+
core.setOutput('contains_keyword', containsKeyword);
|
41 |
+
|
42 |
+
- name: Close and Modify Issue if it contains keywords
|
43 |
+
if: steps.issue_check.outputs.contains_keyword == 'true'
|
44 |
+
uses: actions/github-script@v5
|
45 |
+
with:
|
46 |
+
github-token: ${{ secrets.ISSUE }}
|
47 |
+
script: |
|
48 |
+
const issueNumber = ${{ steps.issue_check.outputs.number }};
|
49 |
+
try {
|
50 |
+
console.log(`Attempting to close, clear body, and rename title of issue #${issueNumber} due to keyword.`);
|
51 |
+
await github.rest.issues.update({
|
52 |
+
owner: context.repo.owner,
|
53 |
+
repo: context.repo.repo,
|
54 |
+
issue_number: issueNumber,
|
55 |
+
state: 'closed',
|
56 |
+
title: "Cleared suspicious issues",
|
57 |
+
body: ""
|
58 |
+
});
|
59 |
+
console.log(`Successfully closed, cleared body, and renamed title of issue #${issueNumber}.`);
|
60 |
+
} catch (error) {
|
61 |
+
console.error(`Failed to update issue #${issueNumber}:`, error);
|
62 |
+
throw error;
|
63 |
+
}
|
D-FINE/.github/workflows/pre-commit.yml
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
name: pre-commit
|
2 |
+
|
3 |
+
on:
|
4 |
+
pull_request:
|
5 |
+
branches: [master]
|
6 |
+
push:
|
7 |
+
branches: [master]
|
8 |
+
|
9 |
+
jobs:
|
10 |
+
pre-commit:
|
11 |
+
runs-on: ubuntu-latest
|
12 |
+
steps:
|
13 |
+
- uses: actions/checkout@v3
|
14 |
+
- uses: actions/setup-python@v3
|
15 |
+
- uses: pre-commit/[email protected]
|
D-FINE/.gitignore
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# existing entries
|
2 |
+
output/
|
3 |
+
*.pyc
|
4 |
+
wandb/
|
5 |
+
*.onnx
|
6 |
+
weight/dfine-s.pth
|
7 |
+
|
8 |
+
# ignore tensorRT engine files
|
9 |
+
*.engine
|
10 |
+
engines/
|
D-FINE/.pre-commit-config.yaml
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright The Lightning team.
|
2 |
+
#
|
3 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
+
# you may not use this file except in compliance with the License.
|
5 |
+
# You may obtain a copy of the License at
|
6 |
+
#
|
7 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8 |
+
#
|
9 |
+
# Unless required by applicable law or agreed to in writing, software
|
10 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
+
# See the License for the specific language governing permissions and
|
13 |
+
# limitations under the License.
|
14 |
+
|
15 |
+
default_language_version:
|
16 |
+
python: python3
|
17 |
+
|
18 |
+
ci:
|
19 |
+
autofix_prs: true
|
20 |
+
autoupdate_commit_msg: "[pre-commit.ci] pre-commit suggestions"
|
21 |
+
autoupdate_schedule: quarterly
|
22 |
+
# submodules: true
|
23 |
+
|
24 |
+
repos:
|
25 |
+
- repo: https://github.com/pre-commit/pre-commit-hooks
|
26 |
+
rev: v5.0.0
|
27 |
+
hooks:
|
28 |
+
- id: end-of-file-fixer
|
29 |
+
- id: trailing-whitespace
|
30 |
+
# - id: check-json # skip for incompatibility with .devcontainer/devcontainer.json
|
31 |
+
- id: check-yaml
|
32 |
+
- id: check-toml
|
33 |
+
- id: check-docstring-first
|
34 |
+
- id: check-executables-have-shebangs
|
35 |
+
- id: check-case-conflict
|
36 |
+
# - id: check-added-large-files
|
37 |
+
# args: ["--maxkb=100", "--enforce-all"]
|
38 |
+
- id: detect-private-key
|
39 |
+
|
40 |
+
# - repo: https://github.com/PyCQA/docformatter
|
41 |
+
# rev: v1.7.5
|
42 |
+
# hooks:
|
43 |
+
# - id: docformatter
|
44 |
+
# additional_dependencies: [tomli]
|
45 |
+
# args: ["--in-place"]
|
46 |
+
|
47 |
+
- repo: https://github.com/executablebooks/mdformat
|
48 |
+
rev: 0.7.17
|
49 |
+
hooks:
|
50 |
+
- id: mdformat
|
51 |
+
exclude: '^.*\.md$'
|
52 |
+
args: ["--number"]
|
53 |
+
additional_dependencies:
|
54 |
+
- mdformat-gfm
|
55 |
+
- mdformat-black
|
56 |
+
- mdformat_frontmatter
|
57 |
+
|
58 |
+
- repo: https://github.com/astral-sh/ruff-pre-commit
|
59 |
+
rev: v0.6.9
|
60 |
+
hooks:
|
61 |
+
# try to fix what is possible
|
62 |
+
- id: ruff
|
63 |
+
args: ["--fix", "--ignore", "E501,F401,F403,F841,E741"]
|
64 |
+
# # perform formatting updates
|
65 |
+
# - id: ruff-format
|
66 |
+
# validate if all is fine with preview mode
|
67 |
+
- id: ruff
|
68 |
+
args: ["--ignore", "E501,F401,F403,F841,E741"]
|
D-FINE/Dockerfile
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FROM registry.cn-hangzhou.aliyuncs.com/peterande/dfine:v1
|
2 |
+
|
3 |
+
# FULL BUILDING INFO:
|
4 |
+
|
5 |
+
# docker login --username=xxx registry.cn-hangzhou.aliyuncs.com
|
6 |
+
# cd [PATH_2_Dockerfile]
|
7 |
+
# docker build -t xxx:v1 .
|
8 |
+
# docker tag xxx:v1 registry.cn-hangzhou.aliyuncs.com/xxx/xxx:v1
|
9 |
+
# docker push registry.cn-hangzhou.aliyuncs.com/xxx/xxx:v1
|
10 |
+
|
11 |
+
# FROM dockerpull.com/nvidia/cuda:12.0.1-cudnn8-devel-ubuntu18.04
|
12 |
+
# ARG DEBIAN_FRONTEND=noninteractive
|
13 |
+
# ENV PATH="/root/miniconda3/bin:${PATH}"
|
14 |
+
# ARG PATH="/root/miniconda3/bin:${PATH}"
|
15 |
+
|
16 |
+
# RUN sed -i "s/archive.ubuntu./mirrors.aliyun./g" /etc/apt/sources.list
|
17 |
+
# RUN sed -i "s/deb.debian.org/mirrors.aliyun.com/g" /etc/apt/sources.list
|
18 |
+
# RUN sed -i "s/security.debian.org/mirrors.aliyun.com\/debian-security/g" /etc/apt/sources.list
|
19 |
+
# RUN sed -i 's/archive.ubuntu.com/mirrors.ustc.edu.cn/g' /etc/apt/sources.list
|
20 |
+
|
21 |
+
# RUN apt-get update && apt-get install -y --no-install-recommends apt-utils && \
|
22 |
+
# apt-get upgrade -y && \
|
23 |
+
# apt-get install -y vim git libgl1-mesa-glx libglib2.0-0 libsm6 && \
|
24 |
+
# apt-get install -y libxrender1 libxext6 tmux wget htop && \
|
25 |
+
# apt-get install -y build-essential gcc g++ gdb binutils pciutils net-tools iputils-ping iproute2 git vim wget curl make openssh-server openssh-client tmux tree man unzip unrar
|
26 |
+
|
27 |
+
# ENV PYTHONIOENCODING=UTF-8
|
28 |
+
|
29 |
+
# RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
30 |
+
# mkdir /root/.conda && \
|
31 |
+
# bash Miniconda3-latest-Linux-x86_64.sh -b && \
|
32 |
+
# rm -f Miniconda3-latest-Linux-x86_64.sh && \
|
33 |
+
# conda init bash
|
34 |
+
|
35 |
+
# RUN conda config --set show_channel_urls yes \
|
36 |
+
# && echo "channels:" > ~/.condarc \
|
37 |
+
# && echo " - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/" >> ~/.condarc \
|
38 |
+
# && echo " - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/" >> ~/.condarc \
|
39 |
+
# && echo "show_channel_urls: true" \
|
40 |
+
# && cat ~/.condarc \
|
41 |
+
# && pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple \
|
42 |
+
# && cat ~/.config/pip/pip.conf
|
43 |
+
|
44 |
+
# RUN python3 -m pip install --upgrade pip && \
|
45 |
+
# python3 -m pip install --upgrade setuptools
|
46 |
+
|
47 |
+
# RUN python3 -m pip install jupyterlab pycocotools PyYAML tensorboard scipy
|
48 |
+
# RUN python3 -m pip --default-timeout=10000 install torch torchvision
|
D-FINE/LICENSE
ADDED
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Apache License
|
2 |
+
Version 2.0, January 2004
|
3 |
+
http://www.apache.org/licenses/
|
4 |
+
|
5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
6 |
+
|
7 |
+
1. Definitions.
|
8 |
+
|
9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
11 |
+
|
12 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
13 |
+
the copyright owner that is granting the License.
|
14 |
+
|
15 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
16 |
+
other entities that control, are controlled by, or are under common
|
17 |
+
control with that entity. For the purposes of this definition,
|
18 |
+
"control" means (i) the power, direct or indirect, to cause the
|
19 |
+
direction or management of such entity, whether by contract or
|
20 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
21 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
22 |
+
|
23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
24 |
+
exercising permissions granted by this License.
|
25 |
+
|
26 |
+
"Source" form shall mean the preferred form for making modifications,
|
27 |
+
including but not limited to software source code, documentation
|
28 |
+
source, and configuration files.
|
29 |
+
|
30 |
+
"Object" form shall mean any form resulting from mechanical
|
31 |
+
transformation or translation of a Source form, including but
|
32 |
+
not limited to compiled object code, generated documentation,
|
33 |
+
and conversions to other media types.
|
34 |
+
|
35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
36 |
+
Object form, made available under the License, as indicated by a
|
37 |
+
copyright notice that is included in or attached to the work
|
38 |
+
(an example is provided in the Appendix below).
|
39 |
+
|
40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
41 |
+
form, that is based on (or derived from) the Work and for which the
|
42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
44 |
+
of this License, Derivative Works shall not include works that remain
|
45 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
46 |
+
the Work and Derivative Works thereof.
|
47 |
+
|
48 |
+
"Contribution" shall mean any work of authorship, including
|
49 |
+
the original version of the Work and any modifications or additions
|
50 |
+
to that Work or Derivative Works thereof, that is intentionally
|
51 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
52 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
53 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
54 |
+
means any form of electronic, verbal, or written communication sent
|
55 |
+
to the Licensor or its representatives, including but not limited to
|
56 |
+
communication on electronic mailing lists, source code control systems,
|
57 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
58 |
+
Licensor for the purpose of discussing and improving the Work, but
|
59 |
+
excluding communication that is conspicuously marked or otherwise
|
60 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
61 |
+
|
62 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
63 |
+
on behalf of whom a Contribution has been received by Licensor and
|
64 |
+
subsequently incorporated within the Work.
|
65 |
+
|
66 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
67 |
+
this License, each Contributor hereby grants to You a perpetual,
|
68 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
69 |
+
copyright license to reproduce, prepare Derivative Works of,
|
70 |
+
publicly display, publicly perform, sublicense, and distribute the
|
71 |
+
Work and such Derivative Works in Source or Object form.
|
72 |
+
|
73 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
74 |
+
this License, each Contributor hereby grants to You a perpetual,
|
75 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
76 |
+
(except as stated in this section) patent license to make, have made,
|
77 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
78 |
+
where such license applies only to those patent claims licensable
|
79 |
+
by such Contributor that are necessarily infringed by their
|
80 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
81 |
+
with the Work to which such Contribution(s) was submitted. If You
|
82 |
+
institute patent litigation against any entity (including a
|
83 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
84 |
+
or a Contribution incorporated within the Work constitutes direct
|
85 |
+
or contributory patent infringement, then any patent licenses
|
86 |
+
granted to You under this License for that Work shall terminate
|
87 |
+
as of the date such litigation is filed.
|
88 |
+
|
89 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
90 |
+
Work or Derivative Works thereof in any medium, with or without
|
91 |
+
modifications, and in Source or Object form, provided that You
|
92 |
+
meet the following conditions:
|
93 |
+
|
94 |
+
(a) You must give any other recipients of the Work or
|
95 |
+
Derivative Works a copy of this License; and
|
96 |
+
|
97 |
+
(b) You must cause any modified files to carry prominent notices
|
98 |
+
stating that You changed the files; and
|
99 |
+
|
100 |
+
(c) You must retain, in the Source form of any Derivative Works
|
101 |
+
that You distribute, all copyright, patent, trademark, and
|
102 |
+
attribution notices from the Source form of the Work,
|
103 |
+
excluding those notices that do not pertain to any part of
|
104 |
+
the Derivative Works; and
|
105 |
+
|
106 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
107 |
+
distribution, then any Derivative Works that You distribute must
|
108 |
+
include a readable copy of the attribution notices contained
|
109 |
+
within such NOTICE file, excluding those notices that do not
|
110 |
+
pertain to any part of the Derivative Works, in at least one
|
111 |
+
of the following places: within a NOTICE text file distributed
|
112 |
+
as part of the Derivative Works; within the Source form or
|
113 |
+
documentation, if provided along with the Derivative Works; or,
|
114 |
+
within a display generated by the Derivative Works, if and
|
115 |
+
wherever such third-party notices normally appear. The contents
|
116 |
+
of the NOTICE file are for informational purposes only and
|
117 |
+
do not modify the License. You may add Your own attribution
|
118 |
+
notices within Derivative Works that You distribute, alongside
|
119 |
+
or as an addendum to the NOTICE text from the Work, provided
|
120 |
+
that such additional attribution notices cannot be construed
|
121 |
+
as modifying the License.
|
122 |
+
|
123 |
+
You may add Your own copyright statement to Your modifications and
|
124 |
+
may provide additional or different license terms and conditions
|
125 |
+
for use, reproduction, or distribution of Your modifications, or
|
126 |
+
for any such Derivative Works as a whole, provided Your use,
|
127 |
+
reproduction, and distribution of the Work otherwise complies with
|
128 |
+
the conditions stated in this License.
|
129 |
+
|
130 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
131 |
+
any Contribution intentionally submitted for inclusion in the Work
|
132 |
+
by You to the Licensor shall be under the terms and conditions of
|
133 |
+
this License, without any additional terms or conditions.
|
134 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
135 |
+
the terms of any separate license agreement you may have executed
|
136 |
+
with Licensor regarding such Contributions.
|
137 |
+
|
138 |
+
6. Trademarks. This License does not grant permission to use the trade
|
139 |
+
names, trademarks, service marks, or product names of the Licensor,
|
140 |
+
except as required for reasonable and customary use in describing the
|
141 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
142 |
+
|
143 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
144 |
+
agreed to in writing, Licensor provides the Work (and each
|
145 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
146 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
147 |
+
implied, including, without limitation, any warranties or conditions
|
148 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
149 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
150 |
+
appropriateness of using or redistributing the Work and assume any
|
151 |
+
risks associated with Your exercise of permissions under this License.
|
152 |
+
|
153 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
154 |
+
whether in tort (including negligence), contract, or otherwise,
|
155 |
+
unless required by applicable law (such as deliberate and grossly
|
156 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
157 |
+
liable to You for damages, including any direct, indirect, special,
|
158 |
+
incidental, or consequential damages of any character arising as a
|
159 |
+
result of this License or out of the use or inability to use the
|
160 |
+
Work (including but not limited to damages for loss of goodwill,
|
161 |
+
work stoppage, computer failure or malfunction, or any and all
|
162 |
+
other commercial damages or losses), even if such Contributor
|
163 |
+
has been advised of the possibility of such damages.
|
164 |
+
|
165 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
166 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
167 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
168 |
+
or other liability obligations and/or rights consistent with this
|
169 |
+
License. However, in accepting such obligations, You may act only
|
170 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
171 |
+
of any other Contributor, and only if You agree to indemnify,
|
172 |
+
defend, and hold each Contributor harmless for any liability
|
173 |
+
incurred by, or claims asserted against, such Contributor by reason
|
174 |
+
of your accepting any such warranty or additional liability.
|
175 |
+
|
176 |
+
END OF TERMS AND CONDITIONS
|
177 |
+
|
178 |
+
APPENDIX: How to apply the Apache License to your work.
|
179 |
+
|
180 |
+
To apply the Apache License to your work, attach the following
|
181 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
182 |
+
replaced with your own identifying information. (Don't include
|
183 |
+
the brackets!) The text should be enclosed in the appropriate
|
184 |
+
comment syntax for the file format. We also recommend that a
|
185 |
+
file or class name and description of purpose be included on the
|
186 |
+
same "printed page" as the copyright notice for easier
|
187 |
+
identification within third-party archives.
|
188 |
+
|
189 |
+
Copyright [yyyy] [name of copyright owner]
|
190 |
+
|
191 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
192 |
+
you may not use this file except in compliance with the License.
|
193 |
+
You may obtain a copy of the License at
|
194 |
+
|
195 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
196 |
+
|
197 |
+
Unless required by applicable law or agreed to in writing, software
|
198 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
199 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
200 |
+
See the License for the specific language governing permissions and
|
201 |
+
limitations under the License.
|
D-FINE/README.md
ADDED
@@ -0,0 +1,700 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
|
2 |
+
|
3 |
+
English | [简体中文](README_cn.md) | [日本語](README_ja.md) | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
|
4 |
+
|
5 |
+
<h2 align="center">
|
6 |
+
D-FINE: Redefine Regression Task of DETRs as Fine‑grained Distribution Refinement
|
7 |
+
</h2>
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
<a href="https://huggingface.co/spaces/developer0hye/D-FINE">
|
13 |
+
<img alt="hf" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue">
|
14 |
+
</a>
|
15 |
+
<a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
|
16 |
+
<img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
|
17 |
+
</a>
|
18 |
+
<a href="https://github.com/Peterande/D-FINE/pulls">
|
19 |
+
<img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
|
20 |
+
</a>
|
21 |
+
<a href="https://github.com/Peterande/D-FINE/issues">
|
22 |
+
<img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
|
23 |
+
</a>
|
24 |
+
<a href="https://arxiv.org/abs/2410.13842">
|
25 |
+
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
|
26 |
+
</a>
|
27 |
+
<!-- <a href="mailto: [email protected]">
|
28 |
+
<img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
|
29 |
+
</a> -->
|
30 |
+
<a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
|
31 |
+
<img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
|
32 |
+
</a>
|
33 |
+
<a href="https://github.com/Peterande/D-FINE">
|
34 |
+
<img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
|
35 |
+
</a>
|
36 |
+
</p>
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
<p align="center">
|
41 |
+
📄 This is the official implementation of the paper:
|
42 |
+
<br>
|
43 |
+
<a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
|
44 |
+
</p>
|
45 |
+
|
46 |
+
|
47 |
+
|
48 |
+
<p align="center">
|
49 |
+
Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, and Feng Wu
|
50 |
+
</p>
|
51 |
+
|
52 |
+
<p align="center">
|
53 |
+
University of Science and Technology of China
|
54 |
+
</p>
|
55 |
+
|
56 |
+
<p align="center">
|
57 |
+
<a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
|
58 |
+
<img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
|
59 |
+
</a>
|
60 |
+
</p>
|
61 |
+
|
62 |
+
<!-- <table><tr>
|
63 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/latency.png border=0 width=333></td>
|
64 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/params.png border=0 width=333></td>
|
65 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/flops.png border=0 width=333></td>
|
66 |
+
</tr></table> -->
|
67 |
+
|
68 |
+
<p align="center">
|
69 |
+
<strong>If you like D-FINE, please give us a ⭐! Your support motivates us to keep improving!</strong>
|
70 |
+
</p>
|
71 |
+
|
72 |
+
<p align="center">
|
73 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
|
74 |
+
</p>
|
75 |
+
|
76 |
+
D-FINE is a powerful real-time object detector that redefines the bounding box regression task in DETRs as Fine-grained Distribution Refinement (FDR) and introduces Global Optimal Localization Self-Distillation (GO-LSD), achieving outstanding performance without introducing additional inference and training costs.
|
77 |
+
|
78 |
+
<details open>
|
79 |
+
<summary> Video </summary>
|
80 |
+
|
81 |
+
We conduct object detection using D-FINE and YOLO11 on a complex street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenging conditions such as backlighting, motion blur, and dense crowds, D-FINE-X successfully detects nearly all targets, including subtle small objects like backpacks, bicycles, and traffic lights. Its confidence scores and the localization precision for blurred edges are significantly higher than those of YOLO11.
|
82 |
+
|
83 |
+
<!-- We use D-FINE and YOLO11 on a street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenges like backlighting, motion blur, and dense crowds, D-FINE-X outperforms YOLO11x, detecting more objects with higher confidence and better precision. -->
|
84 |
+
|
85 |
+
https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
|
86 |
+
|
87 |
+
</details>
|
88 |
+
|
89 |
+
## 🚀 Updates
|
90 |
+
- [x] **\[2024.10.18\]** Release D-FINE series.
|
91 |
+
- [x] **\[2024.10.25\]** Add custom dataset finetuning configs ([#7](https://github.com/Peterande/D-FINE/issues/7)).
|
92 |
+
- [x] **\[2024.10.30\]** Update D-FINE-L (E25) pretrained model, with performance improved by 2.0%.
|
93 |
+
- [x] **\[2024.11.07\]** Release **D-FINE-N**, achiving 42.8% AP<sup>val</sup> on COCO @ 472 FPS<sup>T4</sup>!
|
94 |
+
|
95 |
+
## Model Zoo
|
96 |
+
|
97 |
+
### COCO
|
98 |
+
| Model | Dataset | AP<sup>val</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
|
99 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
100 |
+
**D‑FINE‑N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
|
101 |
+
**D‑FINE‑S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
|
102 |
+
**D‑FINE‑M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
|
103 |
+
**D‑FINE‑L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
|
104 |
+
**D‑FINE‑X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
|
105 |
+
|
106 |
+
|
107 |
+
### Objects365+COCO
|
108 |
+
| Model | Dataset | AP<sup>val</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
|
109 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
110 |
+
**D‑FINE‑S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
|
111 |
+
**D‑FINE‑M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
|
112 |
+
**D‑FINE‑L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
|
113 |
+
**D‑FINE‑X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
|
114 |
+
|
115 |
+
**We highly recommend that you use the Objects365 pre-trained model for fine-tuning:**
|
116 |
+
|
117 |
+
⚠️ **Important**: Please note that this is generally beneficial for complex scene understanding. If your categories are very simple, it might lead to overfitting and suboptimal performance.
|
118 |
+
<details>
|
119 |
+
<summary><strong> 🔥 Pretrained Models on Objects365 (Best generalization) </strong></summary>
|
120 |
+
|
121 |
+
| Model | Dataset | AP<sup>val</sup> | AP<sup>5000</sup> | #Params | Latency | GFLOPs | config | checkpoint | logs |
|
122 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
123 |
+
**D‑FINE‑S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
|
124 |
+
**D‑FINE‑M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
|
125 |
+
**D‑FINE‑L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
|
126 |
+
**D‑FINE‑L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
|
127 |
+
**D‑FINE‑X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
|
128 |
+
- **E25**: Re-trained and extended the pretraining to 25 epochs.
|
129 |
+
- **AP<sup>val</sup>** is evaluated on *Objects365* full validation set.
|
130 |
+
- **AP<sup>5000</sup>** is evaluated on the first 5000 samples of the *Objects365* validation set.
|
131 |
+
</details>
|
132 |
+
|
133 |
+
**Notes:**
|
134 |
+
- **AP<sup>val</sup>** is evaluated on *MSCOCO val2017* dataset.
|
135 |
+
- **Latency** is evaluated on a single T4 GPU with $batch\\_size = 1$, $fp16$, and $TensorRT==10.4.0$.
|
136 |
+
- **Objects365+COCO** means finetuned model on *COCO* using pretrained weights trained on *Objects365*.
|
137 |
+
|
138 |
+
|
139 |
+
|
140 |
+
## Quick start
|
141 |
+
|
142 |
+
### Setup
|
143 |
+
|
144 |
+
```shell
|
145 |
+
conda create -n dfine python=3.11.9
|
146 |
+
conda activate dfine
|
147 |
+
pip install -r requirements.txt
|
148 |
+
```
|
149 |
+
|
150 |
+
|
151 |
+
### Data Preparation
|
152 |
+
|
153 |
+
<details>
|
154 |
+
<summary> COCO2017 Dataset </summary>
|
155 |
+
|
156 |
+
1. Download COCO2017 from [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) or [COCO](https://cocodataset.org/#download).
|
157 |
+
1. Modify paths in [coco_detection.yml](./configs/dataset/coco_detection.yml)
|
158 |
+
|
159 |
+
```yaml
|
160 |
+
train_dataloader:
|
161 |
+
img_folder: /data/COCO2017/train2017/
|
162 |
+
ann_file: /data/COCO2017/annotations/instances_train2017.json
|
163 |
+
val_dataloader:
|
164 |
+
img_folder: /data/COCO2017/val2017/
|
165 |
+
ann_file: /data/COCO2017/annotations/instances_val2017.json
|
166 |
+
```
|
167 |
+
|
168 |
+
</details>
|
169 |
+
|
170 |
+
<details>
|
171 |
+
<summary> Objects365 Dataset </summary>
|
172 |
+
|
173 |
+
1. Download Objects365 from [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365).
|
174 |
+
|
175 |
+
2. Set the Base Directory:
|
176 |
+
```shell
|
177 |
+
export BASE_DIR=/data/Objects365/data
|
178 |
+
```
|
179 |
+
|
180 |
+
3. Extract and organize the downloaded files, resulting directory structure:
|
181 |
+
|
182 |
+
```shell
|
183 |
+
${BASE_DIR}/train
|
184 |
+
├── images
|
185 |
+
│ ├── v1
|
186 |
+
│ │ ├── patch0
|
187 |
+
│ │ │ ├── 000000000.jpg
|
188 |
+
│ │ │ ├── 000000001.jpg
|
189 |
+
│ │ │ └── ... (more images)
|
190 |
+
│ ├── v2
|
191 |
+
│ │ ├── patchx
|
192 |
+
│ │ │ ├── 000000000.jpg
|
193 |
+
│ │ │ ├── 000000001.jpg
|
194 |
+
│ │ │ └── ... (more images)
|
195 |
+
├── zhiyuan_objv2_train.json
|
196 |
+
```
|
197 |
+
|
198 |
+
```shell
|
199 |
+
${BASE_DIR}/val
|
200 |
+
├── images
|
201 |
+
│ ├── v1
|
202 |
+
│ │ ├── patch0
|
203 |
+
│ │ │ ├── 000000000.jpg
|
204 |
+
│ │ │ └── ... (more images)
|
205 |
+
│ ├── v2
|
206 |
+
│ │ ├── patchx
|
207 |
+
│ │ │ ├── 000000000.jpg
|
208 |
+
│ │ │ └── ... (more images)
|
209 |
+
├── zhiyuan_objv2_val.json
|
210 |
+
```
|
211 |
+
|
212 |
+
4. Create a New Directory to Store Images from the Validation Set:
|
213 |
+
```shell
|
214 |
+
mkdir -p ${BASE_DIR}/train/images_from_val
|
215 |
+
```
|
216 |
+
|
217 |
+
5. Copy the v1 and v2 folders from the val directory into the train/images_from_val directory
|
218 |
+
```shell
|
219 |
+
cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
|
220 |
+
cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
|
221 |
+
```
|
222 |
+
|
223 |
+
6. Run remap_obj365.py to merge a subset of the validation set into the training set. Specifically, this script moves samples with indices between 5000 and 800000 from the validation set to the training set.
|
224 |
+
```shell
|
225 |
+
python tools/remap_obj365.py --base_dir ${BASE_DIR}
|
226 |
+
```
|
227 |
+
|
228 |
+
|
229 |
+
7. Run the resize_obj365.py script to resize any images in the dataset where the maximum edge length exceeds 640 pixels. Use the updated JSON file generated in Step 5 to process the sample data. Ensure that you resize images in both the train and val datasets to maintain consistency.
|
230 |
+
```shell
|
231 |
+
python tools/resize_obj365.py --base_dir ${BASE_DIR}
|
232 |
+
```
|
233 |
+
|
234 |
+
8. Modify paths in [obj365_detection.yml](./configs/dataset/obj365_detection.yml)
|
235 |
+
|
236 |
+
```yaml
|
237 |
+
train_dataloader:
|
238 |
+
img_folder: /data/Objects365/data/train
|
239 |
+
ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
|
240 |
+
val_dataloader:
|
241 |
+
img_folder: /data/Objects365/data/val/
|
242 |
+
ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
|
243 |
+
```
|
244 |
+
|
245 |
+
|
246 |
+
</details>
|
247 |
+
|
248 |
+
<details>
|
249 |
+
<summary>CrowdHuman</summary>
|
250 |
+
|
251 |
+
Download COCO format dataset here: [url](https://aistudio.baidu.com/datasetdetail/231455)
|
252 |
+
|
253 |
+
</details>
|
254 |
+
|
255 |
+
<details>
|
256 |
+
<summary>Custom Dataset</summary>
|
257 |
+
|
258 |
+
To train on your custom dataset, you need to organize it in the COCO format. Follow the steps below to prepare your dataset:
|
259 |
+
|
260 |
+
1. **Set `remap_mscoco_category` to `False`:**
|
261 |
+
|
262 |
+
This prevents the automatic remapping of category IDs to match the MSCOCO categories.
|
263 |
+
|
264 |
+
```yaml
|
265 |
+
remap_mscoco_category: False
|
266 |
+
```
|
267 |
+
|
268 |
+
2. **Organize Images:**
|
269 |
+
|
270 |
+
Structure your dataset directories as follows:
|
271 |
+
|
272 |
+
```shell
|
273 |
+
dataset/
|
274 |
+
├── images/
|
275 |
+
│ ├── train/
|
276 |
+
│ │ ├── image1.jpg
|
277 |
+
│ │ ├── image2.jpg
|
278 |
+
│ │ └── ...
|
279 |
+
│ ├── val/
|
280 |
+
│ │ ├── image1.jpg
|
281 |
+
│ │ ├── image2.jpg
|
282 |
+
│ │ └── ...
|
283 |
+
└── annotations/
|
284 |
+
├── instances_train.json
|
285 |
+
├── instances_val.json
|
286 |
+
└── ...
|
287 |
+
```
|
288 |
+
|
289 |
+
- **`images/train/`**: Contains all training images.
|
290 |
+
- **`images/val/`**: Contains all validation images.
|
291 |
+
- **`annotations/`**: Contains COCO-formatted annotation files.
|
292 |
+
|
293 |
+
3. **Convert Annotations to COCO Format:**
|
294 |
+
|
295 |
+
If your annotations are not already in COCO format, you'll need to convert them. You can use the following Python script as a reference or utilize existing tools:
|
296 |
+
|
297 |
+
```python
|
298 |
+
import json
|
299 |
+
|
300 |
+
def convert_to_coco(input_annotations, output_annotations):
|
301 |
+
# Implement conversion logic here
|
302 |
+
pass
|
303 |
+
|
304 |
+
if __name__ == "__main__":
|
305 |
+
convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
|
306 |
+
```
|
307 |
+
|
308 |
+
4. **Update Configuration Files:**
|
309 |
+
|
310 |
+
Modify your [custom_detection.yml](./configs/dataset/custom_detection.yml).
|
311 |
+
|
312 |
+
```yaml
|
313 |
+
task: detection
|
314 |
+
|
315 |
+
evaluator:
|
316 |
+
type: CocoEvaluator
|
317 |
+
iou_types: ['bbox', ]
|
318 |
+
|
319 |
+
num_classes: 777 # your dataset classes
|
320 |
+
remap_mscoco_category: False
|
321 |
+
|
322 |
+
train_dataloader:
|
323 |
+
type: DataLoader
|
324 |
+
dataset:
|
325 |
+
type: CocoDetection
|
326 |
+
img_folder: /data/yourdataset/train
|
327 |
+
ann_file: /data/yourdataset/train/train.json
|
328 |
+
return_masks: False
|
329 |
+
transforms:
|
330 |
+
type: Compose
|
331 |
+
ops: ~
|
332 |
+
shuffle: True
|
333 |
+
num_workers: 4
|
334 |
+
drop_last: True
|
335 |
+
collate_fn:
|
336 |
+
type: BatchImageCollateFunction
|
337 |
+
|
338 |
+
val_dataloader:
|
339 |
+
type: DataLoader
|
340 |
+
dataset:
|
341 |
+
type: CocoDetection
|
342 |
+
img_folder: /data/yourdataset/val
|
343 |
+
ann_file: /data/yourdataset/val/ann.json
|
344 |
+
return_masks: False
|
345 |
+
transforms:
|
346 |
+
type: Compose
|
347 |
+
ops: ~
|
348 |
+
shuffle: False
|
349 |
+
num_workers: 4
|
350 |
+
drop_last: False
|
351 |
+
collate_fn:
|
352 |
+
type: BatchImageCollateFunction
|
353 |
+
```
|
354 |
+
|
355 |
+
</details>
|
356 |
+
|
357 |
+
|
358 |
+
## Usage
|
359 |
+
<details open>
|
360 |
+
<summary> COCO2017 </summary>
|
361 |
+
|
362 |
+
<!-- <summary>1. Training </summary> -->
|
363 |
+
1. Set Model
|
364 |
+
```shell
|
365 |
+
export model=l # n s m l x
|
366 |
+
```
|
367 |
+
|
368 |
+
2. Training
|
369 |
+
```shell
|
370 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
|
371 |
+
```
|
372 |
+
|
373 |
+
<!-- <summary>2. Testing </summary> -->
|
374 |
+
3. Testing
|
375 |
+
```shell
|
376 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
377 |
+
```
|
378 |
+
|
379 |
+
<!-- <summary>3. Tuning </summary> -->
|
380 |
+
4. Tuning
|
381 |
+
```shell
|
382 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
|
383 |
+
```
|
384 |
+
</details>
|
385 |
+
|
386 |
+
|
387 |
+
<details>
|
388 |
+
<summary> Objects365 to COCO2017 </summary>
|
389 |
+
|
390 |
+
1. Set Model
|
391 |
+
```shell
|
392 |
+
export model=l # n s m l x
|
393 |
+
```
|
394 |
+
|
395 |
+
2. Training on Objects365
|
396 |
+
```shell
|
397 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
|
398 |
+
```
|
399 |
+
|
400 |
+
3. Tuning on COCO2017
|
401 |
+
```shell
|
402 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
|
403 |
+
```
|
404 |
+
|
405 |
+
<!-- <summary>2. Testing </summary> -->
|
406 |
+
4. Testing
|
407 |
+
```shell
|
408 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
409 |
+
```
|
410 |
+
</details>
|
411 |
+
|
412 |
+
|
413 |
+
<details>
|
414 |
+
<summary> Custom Dataset </summary>
|
415 |
+
|
416 |
+
1. Set Model
|
417 |
+
```shell
|
418 |
+
export model=l # n s m l x
|
419 |
+
```
|
420 |
+
|
421 |
+
2. Training on Custom Dataset
|
422 |
+
```shell
|
423 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
|
424 |
+
```
|
425 |
+
<!-- <summary>2. Testing </summary> -->
|
426 |
+
3. Testing
|
427 |
+
```shell
|
428 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
|
429 |
+
```
|
430 |
+
|
431 |
+
4. Tuning on Custom Dataset
|
432 |
+
```shell
|
433 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
|
434 |
+
```
|
435 |
+
|
436 |
+
5. **[Optional]** Modify Class Mappings:
|
437 |
+
|
438 |
+
When using the Objects365 pre-trained weights to train on your custom dataset, the example assumes that your dataset only contains the classes `'Person'` and `'Car'`. For faster convergence, you can modify `self.obj365_ids` in `src/solver/_solver.py` as follows:
|
439 |
+
|
440 |
+
|
441 |
+
```python
|
442 |
+
self.obj365_ids = [0, 5] # Person, Cars
|
443 |
+
```
|
444 |
+
You can replace these with any corresponding classes from your dataset. The list of Objects365 classes with their corresponding IDs:
|
445 |
+
https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
|
446 |
+
|
447 |
+
New training command:
|
448 |
+
|
449 |
+
```shell
|
450 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
|
451 |
+
```
|
452 |
+
|
453 |
+
However, if you don't wish to modify the class mappings, the pre-trained Objects365 weights will still work without any changes. Modifying the class mappings is optional and can potentially accelerate convergence for specific tasks.
|
454 |
+
|
455 |
+
|
456 |
+
|
457 |
+
</details>
|
458 |
+
|
459 |
+
<details>
|
460 |
+
<summary> Customizing Batch Size </summary>
|
461 |
+
|
462 |
+
For example, if you want to double the total batch size when training D-FINE-L on COCO2017, here are the steps you should follow:
|
463 |
+
|
464 |
+
1. **Modify your [dataloader.yml](./configs/dfine/include/dataloader.yml)** to increase the `total_batch_size`:
|
465 |
+
|
466 |
+
```yaml
|
467 |
+
train_dataloader:
|
468 |
+
total_batch_size: 64 # Previously it was 32, now doubled
|
469 |
+
```
|
470 |
+
|
471 |
+
2. **Modify your [dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml)**. Here’s how the key parameters should be adjusted:
|
472 |
+
|
473 |
+
```yaml
|
474 |
+
optimizer:
|
475 |
+
type: AdamW
|
476 |
+
params:
|
477 |
+
-
|
478 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
479 |
+
lr: 0.000025 # doubled, linear scaling law
|
480 |
+
-
|
481 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
482 |
+
weight_decay: 0.
|
483 |
+
|
484 |
+
lr: 0.0005 # doubled, linear scaling law
|
485 |
+
betas: [0.9, 0.999]
|
486 |
+
weight_decay: 0.0001 # need a grid search
|
487 |
+
|
488 |
+
ema: # added EMA settings
|
489 |
+
decay: 0.9998 # adjusted by 1 - (1 - decay) * 2
|
490 |
+
warmups: 500 # halved
|
491 |
+
|
492 |
+
lr_warmup_scheduler:
|
493 |
+
warmup_duration: 250 # halved
|
494 |
+
```
|
495 |
+
|
496 |
+
</details>
|
497 |
+
|
498 |
+
|
499 |
+
<details>
|
500 |
+
<summary> Customizing Input Size </summary>
|
501 |
+
|
502 |
+
If you'd like to train **D-FINE-L** on COCO2017 with an input size of 320x320, follow these steps:
|
503 |
+
|
504 |
+
1. **Modify your [dataloader.yml](./configs/dfine/include/dataloader.yml)**:
|
505 |
+
|
506 |
+
```yaml
|
507 |
+
|
508 |
+
train_dataloader:
|
509 |
+
dataset:
|
510 |
+
transforms:
|
511 |
+
ops:
|
512 |
+
- {type: Resize, size: [320, 320], }
|
513 |
+
collate_fn:
|
514 |
+
base_size: 320
|
515 |
+
dataset:
|
516 |
+
transforms:
|
517 |
+
ops:
|
518 |
+
- {type: Resize, size: [320, 320], }
|
519 |
+
```
|
520 |
+
|
521 |
+
2. **Modify your [dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml)**:
|
522 |
+
|
523 |
+
```yaml
|
524 |
+
eval_spatial_size: [320, 320]
|
525 |
+
```
|
526 |
+
|
527 |
+
</details>
|
528 |
+
|
529 |
+
## Tools
|
530 |
+
<details>
|
531 |
+
<summary> Deployment </summary>
|
532 |
+
|
533 |
+
<!-- <summary>4. Export onnx </summary> -->
|
534 |
+
1. Setup
|
535 |
+
```shell
|
536 |
+
pip install onnx onnxsim
|
537 |
+
export model=l # n s m l x
|
538 |
+
```
|
539 |
+
|
540 |
+
2. Export onnx
|
541 |
+
```shell
|
542 |
+
python tools/deployment/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
543 |
+
```
|
544 |
+
|
545 |
+
3. Export [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
|
546 |
+
```shell
|
547 |
+
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
|
548 |
+
```
|
549 |
+
|
550 |
+
</details>
|
551 |
+
|
552 |
+
<details>
|
553 |
+
<summary> Inference (Visualization) </summary>
|
554 |
+
|
555 |
+
|
556 |
+
1. Setup
|
557 |
+
```shell
|
558 |
+
pip install -r tools/inference/requirements.txt
|
559 |
+
export model=l # n s m l x
|
560 |
+
```
|
561 |
+
|
562 |
+
|
563 |
+
<!-- <summary>5. Inference </summary> -->
|
564 |
+
2. Inference (onnxruntime / tensorrt / torch)
|
565 |
+
|
566 |
+
Inference on images and videos is now supported.
|
567 |
+
```shell
|
568 |
+
python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
|
569 |
+
python tools/inference/trt_inf.py --trt model.engine --input image.jpg
|
570 |
+
python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
|
571 |
+
```
|
572 |
+
</details>
|
573 |
+
|
574 |
+
<details>
|
575 |
+
<summary> Benchmark </summary>
|
576 |
+
|
577 |
+
1. Setup
|
578 |
+
```shell
|
579 |
+
pip install -r tools/benchmark/requirements.txt
|
580 |
+
export model=l # n s m l x
|
581 |
+
```
|
582 |
+
|
583 |
+
<!-- <summary>6. Benchmark </summary> -->
|
584 |
+
2. Model FLOPs, MACs, and Params
|
585 |
+
```shell
|
586 |
+
python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
|
587 |
+
```
|
588 |
+
|
589 |
+
2. TensorRT Latency
|
590 |
+
```shell
|
591 |
+
python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
|
592 |
+
```
|
593 |
+
</details>
|
594 |
+
|
595 |
+
<details>
|
596 |
+
<summary> Fiftyone Visualization </summary>
|
597 |
+
|
598 |
+
1. Setup
|
599 |
+
```shell
|
600 |
+
pip install fiftyone
|
601 |
+
export model=l # n s m l x
|
602 |
+
```
|
603 |
+
4. Voxel51 Fiftyone Visualization ([fiftyone](https://github.com/voxel51/fiftyone))
|
604 |
+
```shell
|
605 |
+
python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
606 |
+
```
|
607 |
+
</details>
|
608 |
+
|
609 |
+
<details>
|
610 |
+
<summary> Others </summary>
|
611 |
+
|
612 |
+
1. Auto Resume Training
|
613 |
+
```shell
|
614 |
+
bash reference/safe_training.sh
|
615 |
+
```
|
616 |
+
|
617 |
+
2. Converting Model Weights
|
618 |
+
```shell
|
619 |
+
python reference/convert_weight.py model.pth
|
620 |
+
```
|
621 |
+
</details>
|
622 |
+
|
623 |
+
## Figures and Visualizations
|
624 |
+
|
625 |
+
<details>
|
626 |
+
<summary> FDR and GO-LSD </summary>
|
627 |
+
|
628 |
+
1. Overview of D-FINE with FDR. The probability distributions that act as a more fine-
|
629 |
+
grained intermediate representation are iteratively refined by the decoder layers in a residual manner.
|
630 |
+
Non-uniform weighting functions are applied to allow for finer localization.
|
631 |
+
|
632 |
+
<p align="center">
|
633 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="Fine-grained Distribution Refinement Process" width="1000">
|
634 |
+
</p>
|
635 |
+
|
636 |
+
2. Overview of GO-LSD process. Localization knowledge from the final layer’s refined
|
637 |
+
distributions is distilled into earlier layers through DDF loss with decoupled weighting strategies.
|
638 |
+
|
639 |
+
<p align="center">
|
640 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSD Process" width="1000">
|
641 |
+
</p>
|
642 |
+
|
643 |
+
</details>
|
644 |
+
|
645 |
+
<details open>
|
646 |
+
<summary> Distributions </summary>
|
647 |
+
|
648 |
+
Visualizations of FDR across detection scenarios with initial and refined bounding boxes, along with unweighted and weighted distributions.
|
649 |
+
|
650 |
+
<p align="center">
|
651 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
|
652 |
+
</p>
|
653 |
+
|
654 |
+
</details>
|
655 |
+
|
656 |
+
<details>
|
657 |
+
<summary> Hard Cases </summary>
|
658 |
+
|
659 |
+
The following visualization demonstrates D-FINE's predictions in various complex detection scenarios. These include cases with occlusion, low-light conditions, motion blur, depth of field effects, and densely populated scenes. Despite these challenges, D-FINE consistently produces accurate localization results.
|
660 |
+
|
661 |
+
<p align="center">
|
662 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="D-FINE Predictions in Challenging Scenarios" width="1000">
|
663 |
+
</p>
|
664 |
+
|
665 |
+
</details>
|
666 |
+
|
667 |
+
|
668 |
+
<!-- <div style="display: flex; flex-wrap: wrap; justify-content: center; margin: 0; padding: 0;">
|
669 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" style="width:99.96%; margin: 0; padding: 0;" />
|
670 |
+
</div>
|
671 |
+
|
672 |
+
<table><tr>
|
673 |
+
<td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
|
674 |
+
</tr></table> -->
|
675 |
+
|
676 |
+
|
677 |
+
|
678 |
+
|
679 |
+
## Citation
|
680 |
+
If you use `D-FINE` or its methods in your work, please cite the following BibTeX entries:
|
681 |
+
<details open>
|
682 |
+
<summary> bibtex </summary>
|
683 |
+
|
684 |
+
```latex
|
685 |
+
@misc{peng2024dfine,
|
686 |
+
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
|
687 |
+
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
|
688 |
+
year={2024},
|
689 |
+
eprint={2410.13842},
|
690 |
+
archivePrefix={arXiv},
|
691 |
+
primaryClass={cs.CV}
|
692 |
+
}
|
693 |
+
```
|
694 |
+
</details>
|
695 |
+
|
696 |
+
## Acknowledgement
|
697 |
+
Our work is built upon [RT-DETR](https://github.com/lyuwenyu/RT-DETR).
|
698 |
+
Thanks to the inspirations from [RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), and [YOLOv9](https://github.com/WongKinYiu/yolov9).
|
699 |
+
|
700 |
+
✨ Feel free to contribute and reach out if you have any questions! ✨
|
D-FINE/README_cn.md
ADDED
@@ -0,0 +1,673 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!--# [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
|
2 |
+
|
3 |
+
[English](README.md) | 简体中文 | [日本語](README_ja.md) | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
|
4 |
+
|
5 |
+
<h2 align="center">
|
6 |
+
D-FINE: Redefine Regression Task of DETRs as Fine‑grained Distribution Refinement
|
7 |
+
</h2>
|
8 |
+
|
9 |
+
<p align="center">
|
10 |
+
<!-- <a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
|
11 |
+
<img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
|
12 |
+
</a> -->
|
13 |
+
<a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
|
14 |
+
<img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
|
15 |
+
</a>
|
16 |
+
<a href="https://github.com/Peterande/D-FINE/pulls">
|
17 |
+
<img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
|
18 |
+
</a>
|
19 |
+
<a href="https://github.com/Peterande/D-FINE/issues">
|
20 |
+
<img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
|
21 |
+
</a>
|
22 |
+
<a href="https://arxiv.org/abs/2410.13842">
|
23 |
+
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
|
24 |
+
</a>
|
25 |
+
<!-- <a href="mailto: [email protected]">
|
26 |
+
<img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
|
27 |
+
</a> -->
|
28 |
+
<a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
|
29 |
+
<img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
|
30 |
+
</a>
|
31 |
+
<a href="https://github.com/Peterande/D-FINE">
|
32 |
+
<img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
|
33 |
+
</a>
|
34 |
+
</p>
|
35 |
+
|
36 |
+
<p align="center">
|
37 |
+
📄 这是该文章的官方实现:
|
38 |
+
<br>
|
39 |
+
<a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
|
40 |
+
</p>
|
41 |
+
|
42 |
+
|
43 |
+
<p align="center">
|
44 |
+
彭岩松,李和倍,吴沛熹,张越一,孙晓艳,吴枫
|
45 |
+
</p>
|
46 |
+
|
47 |
+
<p align="center">
|
48 |
+
中国科学技术大学
|
49 |
+
</p>
|
50 |
+
|
51 |
+
<p align="center">
|
52 |
+
<a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
|
53 |
+
<img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
|
54 |
+
</a>
|
55 |
+
</p>
|
56 |
+
|
57 |
+
|
58 |
+
<p align="center"> <strong>如果你喜欢 D-FINE,请给我们一个 ⭐!你的支持激励我们不断前进!</strong> </p>
|
59 |
+
|
60 |
+
<p align="center">
|
61 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
|
62 |
+
</p>
|
63 |
+
|
64 |
+
D-FINE 是一个强大的实时目标检测器,将 DETR 中的边界框回归任务重新定义为了细粒度的分布优化(FDR),并引入全局最优的定位自蒸馏(GO-LSD),在不增加额外推理和训练成本的情况下,实现了卓越的性能。
|
65 |
+
|
66 |
+
<details open>
|
67 |
+
<summary> 视频 </summary>
|
68 |
+
|
69 |
+
我们分别使用 D-FINE 和 YOLO11 对 [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A) 上的一段复杂街景视频进行了目标检测。尽管存在逆光、虚化模糊和密集遮挡等不利因素,D-FINE-X 依然成功检测出几乎所有目标,包括背包、自行车和信号灯等难以察觉的小目标,其置信度、以及模糊边缘的定位准确度明显高于 YOLO11x。
|
70 |
+
|
71 |
+
https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
|
72 |
+
|
73 |
+
</details>
|
74 |
+
|
75 |
+
## 🚀 Updates
|
76 |
+
- [x] **\[2024.10.18\]** 发布 D-FINE 系列。
|
77 |
+
- [x] **\[2024.10.25\]** 添加了自定义数据集微调配置文件 ([#7](https://github.com/Peterande/D-FINE/issues/7))。
|
78 |
+
- [x] **\[2024.10.30\]** 更新 D-FINE-L (E25) 预训练模型,性能提升了 2.0%。
|
79 |
+
- [x] **\[2024.11.07\]** 发布 **D-FINE-N**, 在 COCO 上达到 42.8% AP<sup>val</sup> @ 472 FPS<sup>T4</sup>!
|
80 |
+
|
81 |
+
## 模型库
|
82 |
+
|
83 |
+
### COCO
|
84 |
+
| 模型 | 数据集 | AP<sup>val</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
|
85 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
86 |
+
**D‑FINE‑N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
|
87 |
+
**D‑FINE‑S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
|
88 |
+
**D‑FINE‑M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
|
89 |
+
**D‑FINE‑L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
|
90 |
+
**D‑FINE‑X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
|
91 |
+
|
92 |
+
### Objects365+COCO
|
93 |
+
| 模型 | 数据集 | AP<sup>val</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
|
94 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
95 |
+
**D‑FINE‑S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
|
96 |
+
**D‑FINE‑M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
|
97 |
+
**D‑FINE‑L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
|
98 |
+
**D‑FINE‑X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
|
99 |
+
|
100 |
+
**我们强烈推荐您使用 Objects365 预训练模型进行微调:**
|
101 |
+
|
102 |
+
⚠️ 重要提醒:通常这种预训练模型对复杂场景的理解非常有用。如果您的类别非常简单,请注意,这可能会导致过拟合和次优性能。
|
103 |
+
|
104 |
+
<details> <summary><strong> 🔥 Objects365 预训练模型(泛化性最好)</strong></summary>
|
105 |
+
|
106 |
+
| 模型 | 数据集 | AP<sup>val</sup> | AP<sup>5000</sup> | 参数量 | 时延 (ms) | GFLOPs | 配置 | 权重 | 日志 |
|
107 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
108 |
+
**D‑FINE‑S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
|
109 |
+
**D‑FINE‑M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
|
110 |
+
**D‑FINE‑L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
|
111 |
+
**D‑FINE‑L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
|
112 |
+
**D‑FINE‑X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
|
113 |
+
- **E25**: 重新训练,并将训练延长至 25 个 epoch。
|
114 |
+
- **AP<sup>val</sup>** 是在 *Objects365* 完整的验证集上进行评估的。
|
115 |
+
- **AP<sup>5000</sup>** 是在 *Objects365* 验证集的前5000个样本上评估的。
|
116 |
+
</details>
|
117 |
+
|
118 |
+
**注意:**
|
119 |
+
- **AP<sup>val</sup>** 是在 *MSCOCO val2017* 数据集上评估的。
|
120 |
+
- **时延** 是在单张 T4 GPU 上以 $batch\\_size = 1$, $fp16$, 和 $TensorRT==10.4.0$ 评估的。
|
121 |
+
- **Objects365+COCO** 表示使用在 *Objects365* 上预训练的权重在 *COCO* 上微调的模型。
|
122 |
+
|
123 |
+
|
124 |
+
|
125 |
+
## 快速开始
|
126 |
+
|
127 |
+
### 设置
|
128 |
+
|
129 |
+
```shell
|
130 |
+
conda create -n dfine python=3.11.9
|
131 |
+
conda activate dfine
|
132 |
+
pip install -r requirements.txt
|
133 |
+
```
|
134 |
+
|
135 |
+
</details>
|
136 |
+
|
137 |
+
|
138 |
+
|
139 |
+
### 数据集准备
|
140 |
+
|
141 |
+
|
142 |
+
<details>
|
143 |
+
|
144 |
+
<summary> COCO2017 数据集 </summary>
|
145 |
+
|
146 |
+
1. 从 [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) 或者 [COCO](https://cocodataset.org/#download) 下载 COCO2017。
|
147 |
+
1.修改 [coco_detection.yml](./configs/dataset/coco_detection.yml) 中的路径。
|
148 |
+
|
149 |
+
```yaml
|
150 |
+
train_dataloader:
|
151 |
+
img_folder: /data/COCO2017/train2017/
|
152 |
+
ann_file: /data/COCO2017/annotations/instances_train2017.json
|
153 |
+
val_dataloader:
|
154 |
+
img_folder: /data/COCO2017/val2017/
|
155 |
+
ann_file: /data/COCO2017/annotations/instances_val2017.json
|
156 |
+
```
|
157 |
+
|
158 |
+
</details>
|
159 |
+
|
160 |
+
<details>
|
161 |
+
<summary> Objects365 数据集 </summary>
|
162 |
+
|
163 |
+
1. 从 [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365) 下载 Objects365。
|
164 |
+
|
165 |
+
2. 设置数据集的基础目录:
|
166 |
+
```shell
|
167 |
+
export BASE_DIR=/data/Objects365/data
|
168 |
+
```
|
169 |
+
|
170 |
+
3. 解压并整理目录结构如下:
|
171 |
+
|
172 |
+
```shell
|
173 |
+
${BASE_DIR}/train
|
174 |
+
├── images
|
175 |
+
│ ├── v1
|
176 |
+
│ │ ├── patch0
|
177 |
+
│ │ │ ├── 000000000.jpg
|
178 |
+
│ │ │ ├── 000000001.jpg
|
179 |
+
│ │ │ └── ... (more images)
|
180 |
+
│ ├── v2
|
181 |
+
│ │ ├── patchx
|
182 |
+
│ │ │ ├── 000000000.jpg
|
183 |
+
│ │ │ ├── 000000001.jpg
|
184 |
+
│ │ │ └── ... (more images)
|
185 |
+
├── zhiyuan_objv2_train.json
|
186 |
+
```
|
187 |
+
|
188 |
+
```shell
|
189 |
+
${BASE_DIR}/val
|
190 |
+
├── images
|
191 |
+
│ ├── v1
|
192 |
+
│ │ ├── patch0
|
193 |
+
│ │ │ ├── 000000000.jpg
|
194 |
+
│ │ │ └── ... (more images)
|
195 |
+
│ ├── v2
|
196 |
+
│ │ ├── patchx
|
197 |
+
│ │ │ ├── 000000000.jpg
|
198 |
+
│ │ │ └── ... (more images)
|
199 |
+
├── zhiyuan_objv2_val.json
|
200 |
+
```
|
201 |
+
|
202 |
+
|
203 |
+
4. 创建一个新目录来存储验证集中的图像:
|
204 |
+
```shell
|
205 |
+
mkdir -p ${BASE_DIR}/train/images_from_val
|
206 |
+
```
|
207 |
+
|
208 |
+
5. 将 val 目录中的 v1 和 v2 文件夹复制到 train/images_from_val 目录中
|
209 |
+
```shell
|
210 |
+
cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
|
211 |
+
cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
|
212 |
+
```
|
213 |
+
|
214 |
+
|
215 |
+
6. 运行 remap_obj365.py 将验证集中的部分样本合并到训练集中。具体来说,该脚本将索引在 5000 到 800000 之间的样本从验证集移动到训练集。
|
216 |
+
```shell
|
217 |
+
python tools/remap_obj365.py --base_dir ${BASE_DIR}
|
218 |
+
```
|
219 |
+
|
220 |
+
|
221 |
+
7. 运行 resize_obj365.py 脚本,将数据集中任何最大边长超过 640 像素的图像进行大小调整。使用步骤 5 中生成的更新后的 JSON 文件处理样本数据。
|
222 |
+
```shell
|
223 |
+
python tools/resize_obj365.py --base_dir ${BASE_DIR}
|
224 |
+
```
|
225 |
+
|
226 |
+
8. 修改 [obj365_detection.yml](./configs/dataset/obj365_detection.yml) 中的路径。
|
227 |
+
|
228 |
+
```yaml
|
229 |
+
train_dataloader:
|
230 |
+
img_folder: /data/Objects365/data/train
|
231 |
+
ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
|
232 |
+
val_dataloader:
|
233 |
+
img_folder: /data/Objects365/data/val/
|
234 |
+
ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
|
235 |
+
```
|
236 |
+
|
237 |
+
|
238 |
+
</details>
|
239 |
+
|
240 |
+
<details>
|
241 |
+
<summary>CrowdHuman</summary>
|
242 |
+
|
243 |
+
在此下载 COCO 格式的数据集:[链接](https://aistudio.baidu.com/datasetdetail/231455)
|
244 |
+
|
245 |
+
</details>
|
246 |
+
|
247 |
+
<details>
|
248 |
+
<summary>自定义数据集</summary>
|
249 |
+
|
250 |
+
要在你的自定义数据集上训练,你需要将其组织为 COCO 格式。请按照以下步骤准备你的数据集:
|
251 |
+
|
252 |
+
1. **将 `remap_mscoco_category` 设置为 `False`:**
|
253 |
+
|
254 |
+
这可以防止类别 ID 自动映射以匹配 MSCOCO 类别。
|
255 |
+
|
256 |
+
```yaml
|
257 |
+
remap_mscoco_category: False
|
258 |
+
```
|
259 |
+
|
260 |
+
2. **组织图像:**
|
261 |
+
|
262 |
+
按以下结构组织你的数据集目录:
|
263 |
+
|
264 |
+
```shell
|
265 |
+
dataset/
|
266 |
+
├── images/
|
267 |
+
│ ├── train/
|
268 |
+
│ │ ├── image1.jpg
|
269 |
+
│ │ ├── image2.jpg
|
270 |
+
│ │ └── ...
|
271 |
+
│ ├── val/
|
272 |
+
│ │ ├── image1.jpg
|
273 |
+
│ │ ├── image2.jpg
|
274 |
+
│ │ └── ...
|
275 |
+
└── annotations/
|
276 |
+
├── instances_train.json
|
277 |
+
├── instances_val.json
|
278 |
+
└── ...
|
279 |
+
```
|
280 |
+
|
281 |
+
- **`images/train/`**: 包含所有训练图像。
|
282 |
+
- **`images/val/`**: 包含所有验证图像。
|
283 |
+
- **`annotations/`**: 包含 COCO 格式的注释文件。
|
284 |
+
|
285 |
+
3. **将注释转换为 COCO 格式:**
|
286 |
+
|
287 |
+
如果你的注释尚未为 COCO 格式,你需要进行转换。你可以参考以下 Python 脚本或使用现有工具:
|
288 |
+
|
289 |
+
```python
|
290 |
+
import json
|
291 |
+
|
292 |
+
def convert_to_coco(input_annotations, output_annotations):
|
293 |
+
# Implement conversion logic here
|
294 |
+
pass
|
295 |
+
|
296 |
+
if __name__ == "__main__":
|
297 |
+
convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
|
298 |
+
```
|
299 |
+
|
300 |
+
4. **更新配置文件:**
|
301 |
+
|
302 |
+
修改你的 [custom_detection.yml](./configs/dataset/custom_detection.yml)。
|
303 |
+
|
304 |
+
```yaml
|
305 |
+
task: detection
|
306 |
+
|
307 |
+
evaluator:
|
308 |
+
type: CocoEvaluator
|
309 |
+
iou_types: ['bbox', ]
|
310 |
+
|
311 |
+
num_classes: 777 # your dataset classes
|
312 |
+
remap_mscoco_category: False
|
313 |
+
|
314 |
+
train_dataloader:
|
315 |
+
type: DataLoader
|
316 |
+
dataset:
|
317 |
+
type: CocoDetection
|
318 |
+
img_folder: /data/yourdataset/train
|
319 |
+
ann_file: /data/yourdataset/train/train.json
|
320 |
+
return_masks: False
|
321 |
+
transforms:
|
322 |
+
type: Compose
|
323 |
+
ops: ~
|
324 |
+
shuffle: True
|
325 |
+
num_workers: 4
|
326 |
+
drop_last: True
|
327 |
+
collate_fn:
|
328 |
+
type: BatchImageCollateFunction
|
329 |
+
|
330 |
+
val_dataloader:
|
331 |
+
type: DataLoader
|
332 |
+
dataset:
|
333 |
+
type: CocoDetection
|
334 |
+
img_folder: /data/yourdataset/val
|
335 |
+
ann_file: /data/yourdataset/val/ann.json
|
336 |
+
return_masks: False
|
337 |
+
transforms:
|
338 |
+
type: Compose
|
339 |
+
ops: ~
|
340 |
+
shuffle: False
|
341 |
+
num_workers: 4
|
342 |
+
drop_last: False
|
343 |
+
collate_fn:
|
344 |
+
type: BatchImageCollateFunction
|
345 |
+
```
|
346 |
+
</details>
|
347 |
+
|
348 |
+
|
349 |
+
## 使用方法
|
350 |
+
<details open>
|
351 |
+
<summary> COCO2017 </summary>
|
352 |
+
|
353 |
+
<!-- <summary>1. Training </summary> -->
|
354 |
+
1. 设置模型
|
355 |
+
```shell
|
356 |
+
export model=l # n s m l x
|
357 |
+
```
|
358 |
+
|
359 |
+
2. 训练
|
360 |
+
```shell
|
361 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
|
362 |
+
```
|
363 |
+
|
364 |
+
<!-- <summary>2. Testing </summary> -->
|
365 |
+
3. 测试
|
366 |
+
```shell
|
367 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
368 |
+
```
|
369 |
+
|
370 |
+
<!-- <summary>3. Tuning </summary> -->
|
371 |
+
4. 微调
|
372 |
+
```shell
|
373 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
|
374 |
+
```
|
375 |
+
</details>
|
376 |
+
|
377 |
+
|
378 |
+
<details>
|
379 |
+
<summary> 在 Objects365 上训练,在COCO2017上微调 </summary>
|
380 |
+
|
381 |
+
1. 设置模型
|
382 |
+
```shell
|
383 |
+
export model=l # n s m l x
|
384 |
+
```
|
385 |
+
|
386 |
+
2. 在 Objects365 上训练
|
387 |
+
```shell
|
388 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
|
389 |
+
```
|
390 |
+
|
391 |
+
3. 在 COCO2017 上微调
|
392 |
+
```shell
|
393 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
|
394 |
+
```
|
395 |
+
|
396 |
+
<!-- <summary>2. Testing </summary> -->
|
397 |
+
4. 测试
|
398 |
+
```shell
|
399 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
400 |
+
```
|
401 |
+
</details>
|
402 |
+
|
403 |
+
|
404 |
+
<details>
|
405 |
+
<summary> 自定义数据集 </summary>
|
406 |
+
|
407 |
+
1. 设置模型
|
408 |
+
```shell
|
409 |
+
export model=l # n s m l x
|
410 |
+
```
|
411 |
+
|
412 |
+
2. 在自定义数据集上训练
|
413 |
+
```shell
|
414 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
|
415 |
+
```
|
416 |
+
<!-- <summary>2. Testing </summary> -->
|
417 |
+
3. 测试
|
418 |
+
```shell
|
419 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
|
420 |
+
```
|
421 |
+
|
422 |
+
4. 在自定义数据集上微调
|
423 |
+
```shell
|
424 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
|
425 |
+
```
|
426 |
+
|
427 |
+
5. **[可选项]** 修改类映射:
|
428 |
+
|
429 |
+
在使用 Objects365 预训练权重训练自定义数据集时,示例中假设自定义数据集仅有 `'Person'` 和 `'Car'` 类,您可以将其替换为数据集中对应的任何类别。为了加快收敛,可以在 `src/solver/_solver.py` 中修改 `self.obj365_ids`,如下所示:
|
430 |
+
|
431 |
+
```python
|
432 |
+
self.obj365_ids = [0, 5] # Person, Cars
|
433 |
+
```
|
434 |
+
Objects365 类及其对应 ID 的完整列表:
|
435 |
+
https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
|
436 |
+
|
437 |
+
新的训练启动命令:
|
438 |
+
|
439 |
+
```shell
|
440 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
|
441 |
+
```
|
442 |
+
|
443 |
+
如果您不想修改类映射,预训练的 Objects365 权重依然可以不做任何更改直接使用。修改类映射是可选的,但针对特定任务可能会加快收敛速度。
|
444 |
+
|
445 |
+
</details>
|
446 |
+
|
447 |
+
<details>
|
448 |
+
<summary> 自定义批次大小 </summary>
|
449 |
+
|
450 |
+
例如,如果你想在训练 D-FINE-L 时将 COCO2017 的总批次大小增加一倍,请按照以下步骤操作:
|
451 |
+
|
452 |
+
1. **修改你的 [dataloader.yml](./configs/dfine/include/dataloader.yml)**,增加 `total_batch_size`:
|
453 |
+
|
454 |
+
```yaml
|
455 |
+
train_dataloader:
|
456 |
+
total_batch_size: 64 # 原来是 32,现在增加了一倍
|
457 |
+
```
|
458 |
+
|
459 |
+
2. **修改你的 [dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml)**。
|
460 |
+
|
461 |
+
```yaml
|
462 |
+
optimizer:
|
463 |
+
type: AdamW
|
464 |
+
params:
|
465 |
+
-
|
466 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
467 |
+
lr: 0.000025 # 翻倍,线性缩放原则
|
468 |
+
-
|
469 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
470 |
+
weight_decay: 0.
|
471 |
+
|
472 |
+
lr: 0.0005 # 翻倍,线性缩放原则
|
473 |
+
betas: [0.9, 0.999]
|
474 |
+
weight_decay: 0.0001 # 需要网格搜索找到最优值
|
475 |
+
|
476 |
+
ema: # 添加 EMA 设置
|
477 |
+
decay: 0.9998 # 根据 1 - (1 - decay) * 2 调整
|
478 |
+
warmups: 500 # 减半
|
479 |
+
|
480 |
+
lr_warmup_scheduler:
|
481 |
+
warmup_duration: 250 # 减半
|
482 |
+
```
|
483 |
+
|
484 |
+
</details>
|
485 |
+
|
486 |
+
<details>
|
487 |
+
<summary> 自定义输入尺寸 </summary>
|
488 |
+
|
489 |
+
如果你想在 COCO2017 上使用 **D-FINE-L** 进行 320x320 尺寸的图片训练,按照以下步骤操作:
|
490 |
+
|
491 |
+
1. **修改你的 [dataloader.yml](./configs/dfine/include/dataloader.yml)**:
|
492 |
+
|
493 |
+
```yaml
|
494 |
+
|
495 |
+
train_dataloader:
|
496 |
+
dataset:
|
497 |
+
transforms:
|
498 |
+
ops:
|
499 |
+
- {type: Resize, size: [320, 320], }
|
500 |
+
collate_fn:
|
501 |
+
base_size: 320
|
502 |
+
dataset:
|
503 |
+
transforms:
|
504 |
+
ops:
|
505 |
+
- {type: Resize, size: [320, 320], }
|
506 |
+
```
|
507 |
+
|
508 |
+
2. **修改你的 [dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml)**:
|
509 |
+
|
510 |
+
```yaml
|
511 |
+
eval_spatial_size: [320, 320]
|
512 |
+
```
|
513 |
+
|
514 |
+
</details>
|
515 |
+
|
516 |
+
|
517 |
+
## 工具
|
518 |
+
|
519 |
+
<details>
|
520 |
+
<summary> 部署 </summary>
|
521 |
+
|
522 |
+
<!-- <summary>4. Export onnx </summary> -->
|
523 |
+
1. 设置
|
524 |
+
```shell
|
525 |
+
pip install onnx onnxsim onnxruntime
|
526 |
+
export model=l # n s m l x
|
527 |
+
```
|
528 |
+
|
529 |
+
2. 导出 onnx
|
530 |
+
```shell
|
531 |
+
python tools/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
532 |
+
```
|
533 |
+
|
534 |
+
3. 导出 [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
|
535 |
+
```shell
|
536 |
+
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
|
537 |
+
```
|
538 |
+
|
539 |
+
</details>
|
540 |
+
|
541 |
+
<details>
|
542 |
+
<summary> 推理(可视化) </summary>
|
543 |
+
|
544 |
+
|
545 |
+
1. 设置
|
546 |
+
```shell
|
547 |
+
pip install -r tools/inference/requirements.txt
|
548 |
+
export model=l # n s m l x
|
549 |
+
```
|
550 |
+
|
551 |
+
|
552 |
+
<!-- <summary>5. Inference </summary> -->
|
553 |
+
2. 推理 (onnxruntime / tensorrt / torch)
|
554 |
+
|
555 |
+
目前支持对图像和视频的推理。
|
556 |
+
```shell
|
557 |
+
python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
|
558 |
+
python tools/inference/trt_inf.py --trt model.engine --input image.jpg
|
559 |
+
python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
|
560 |
+
```
|
561 |
+
</details>
|
562 |
+
|
563 |
+
<details>
|
564 |
+
<summary> 基准测试 </summary>
|
565 |
+
|
566 |
+
1. 设置
|
567 |
+
```shell
|
568 |
+
pip install -r tools/benchmark/requirements.txt
|
569 |
+
export model=l # n s m l x
|
570 |
+
```
|
571 |
+
|
572 |
+
<!-- <summary>6. Benchmark </summary> -->
|
573 |
+
2. 模型 FLOPs、MACs、参数量
|
574 |
+
```shell
|
575 |
+
python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
|
576 |
+
```
|
577 |
+
|
578 |
+
2. TensorRT 延迟
|
579 |
+
```shell
|
580 |
+
python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
|
581 |
+
```
|
582 |
+
</details>
|
583 |
+
|
584 |
+
<details>
|
585 |
+
<summary> Voxel51 Fiftyone 可视化 </summary>
|
586 |
+
|
587 |
+
1. 设置
|
588 |
+
```shell
|
589 |
+
pip install fiftyone
|
590 |
+
export model=l # n s m l x
|
591 |
+
```
|
592 |
+
4. Voxel51 Fiftyone 可视化 ([fiftyone](https://github.com/voxel51/fiftyone))
|
593 |
+
```shell
|
594 |
+
python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
595 |
+
```
|
596 |
+
</details>
|
597 |
+
|
598 |
+
<details>
|
599 |
+
<summary> 其他 </summary>
|
600 |
+
|
601 |
+
1. 自动恢复(Auto Resume)训练
|
602 |
+
```shell
|
603 |
+
bash reference/safe_training.sh
|
604 |
+
```
|
605 |
+
|
606 |
+
2. 模型权重转换
|
607 |
+
```shell
|
608 |
+
python reference/convert_weight.py model.pth
|
609 |
+
```
|
610 |
+
</details>
|
611 |
+
|
612 |
+
## 图表与可视化
|
613 |
+
|
614 |
+
<details>
|
615 |
+
<summary> FDR 和 GO-LSD </summary>
|
616 |
+
|
617 |
+
D-FINE与FDR概览。概率分布作为更细粒度的中间表征,通过解码器层以残差方式进行迭代优化。应用非均匀加权函数以实现更精细的定位。
|
618 |
+
<p align="center">
|
619 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="细粒度分布优化过程" width="1000"> </p>
|
620 |
+
GO-LSD流程概览。通过DDF损失函数和解耦加权策略将最终层分布中的定位知识蒸馏到前面的层中。
|
621 |
+
<p align="center"> <img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSD流程" width="1000"> </p>
|
622 |
+
|
623 |
+
</details>
|
624 |
+
|
625 |
+
<details open>
|
626 |
+
<summary> 分布可视化 </summary>
|
627 |
+
|
628 |
+
FDR在检测场景中的可视化,包括初始和优化后的边界框,以及未加权和加权的分布图。
|
629 |
+
|
630 |
+
<p align="center">
|
631 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
|
632 |
+
</p>
|
633 |
+
|
634 |
+
</details>
|
635 |
+
|
636 |
+
<details>
|
637 |
+
<summary> 困难场景 </summary>
|
638 |
+
|
639 |
+
以下可视化展示了D-FINE在各种复杂检测场景中的预测结果。这些场景包括遮挡、低光条件、运动模糊、景深效果和密集场景。尽管面临这些挑战,D-FINE依然能够生成准确的定位结果。
|
640 |
+
|
641 |
+
<p align="center">
|
642 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="D-FINE在挑战性场景中的预测" width="1000">
|
643 |
+
</p>
|
644 |
+
|
645 |
+
</details>
|
646 |
+
|
647 |
+
|
648 |
+
<!-- <table><tr>
|
649 |
+
<td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
|
650 |
+
</tr></table> -->
|
651 |
+
|
652 |
+
## 引用
|
653 |
+
如果你在工作中使用了 `D-FINE` 或其方法,请引用以下 BibTeX 条目:
|
654 |
+
<details open>
|
655 |
+
<summary> bibtex </summary>
|
656 |
+
|
657 |
+
```latex
|
658 |
+
@misc{peng2024dfine,
|
659 |
+
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
|
660 |
+
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
|
661 |
+
year={2024},
|
662 |
+
eprint={2410.13842},
|
663 |
+
archivePrefix={arXiv},
|
664 |
+
primaryClass={cs.CV}
|
665 |
+
}
|
666 |
+
```
|
667 |
+
</details>
|
668 |
+
|
669 |
+
## 致谢
|
670 |
+
我们的工作基于 [RT-DETR](https://github.com/lyuwenyu/RT-DETR)。
|
671 |
+
感谢 [RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), 和 [YOLOv9](https://github.com/WongKinYiu/yolov9) 的启发。
|
672 |
+
|
673 |
+
✨ 欢迎贡献并在有任何问题时联系我! ✨
|
D-FINE/README_ja.md
ADDED
@@ -0,0 +1,698 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!--# [D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/xxxxxx) -->
|
2 |
+
|
3 |
+
[English](README.md) | [简体中文](README_cn.md) | 日本語 | [English Blog](src/zoo/dfine/blog.md) | [中文博客](src/zoo/dfine/blog_cn.md)
|
4 |
+
|
5 |
+
<h2 align="center">
|
6 |
+
D-FINE: Redefine Regression Task of DETRs as Fine‑grained Distribution Refinement
|
7 |
+
</h2>
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
<p align="center">
|
12 |
+
<a href="https://github.com/Peterande/D-FINE/blob/master/LICENSE">
|
13 |
+
<img alt="license" src="https://img.shields.io/badge/LICENSE-Apache%202.0-blue">
|
14 |
+
</a>
|
15 |
+
<a href="https://github.com/Peterande/D-FINE/pulls">
|
16 |
+
<img alt="prs" src="https://img.shields.io/github/issues-pr/Peterande/D-FINE">
|
17 |
+
</a>
|
18 |
+
<a href="https://github.com/Peterande/D-FINE/issues">
|
19 |
+
<img alt="issues" src="https://img.shields.io/github/issues/Peterande/D-FINE?color=olive">
|
20 |
+
</a>
|
21 |
+
<a href="https://arxiv.org/abs/2410.13842">
|
22 |
+
<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2410.13842-red">
|
23 |
+
</a>
|
24 |
+
<!-- <a href="mailto: [email protected]">
|
25 |
+
<img alt="email" src="https://img.shields.io/badge/contact_me-email-yellow">
|
26 |
+
</a> -->
|
27 |
+
<a href="https://results.pre-commit.ci/latest/github/Peterande/D-FINE/master">
|
28 |
+
<img alt="pre-commit.ci status" src="https://results.pre-commit.ci/badge/github/Peterande/D-FINE/master.svg">
|
29 |
+
</a>
|
30 |
+
<a href="https://github.com/Peterande/D-FINE">
|
31 |
+
<img alt="stars" src="https://img.shields.io/github/stars/Peterande/D-FINE">
|
32 |
+
</a>
|
33 |
+
</p>
|
34 |
+
|
35 |
+
|
36 |
+
|
37 |
+
<p align="center">
|
38 |
+
📄 これは論文の公式実装です:
|
39 |
+
<br>
|
40 |
+
<a href="https://arxiv.org/abs/2410.13842">D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement</a>
|
41 |
+
</p>
|
42 |
+
<p align="center">
|
43 |
+
D-FINE: DETRの回帰タスクを細粒度分布最適化として再定義
|
44 |
+
</p>
|
45 |
+
|
46 |
+
|
47 |
+
|
48 |
+
<p align="center">
|
49 |
+
Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, and Feng Wu
|
50 |
+
</p>
|
51 |
+
|
52 |
+
<p align="center">
|
53 |
+
中国科学技術大学
|
54 |
+
</p>
|
55 |
+
|
56 |
+
<p align="center">
|
57 |
+
<a href="https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=d-fine-redefine-regression-task-in-detrs-as">
|
58 |
+
<img alt="sota" src="https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/d-fine-redefine-regression-task-in-detrs-as/real-time-object-detection-on-coco">
|
59 |
+
</a>
|
60 |
+
</p>
|
61 |
+
|
62 |
+
<!-- <table><tr>
|
63 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/latency.png border=0 width=333></td>
|
64 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/params.png border=0 width=333></td>
|
65 |
+
<td><img src=https://github.com/Peterande/storage/blob/master/flops.png border=0 width=333></td>
|
66 |
+
</tr></table> -->
|
67 |
+
|
68 |
+
<p align="center">
|
69 |
+
<strong>もしD-FINEが気に入ったら、ぜひ⭐をください!あなたのサポートが私たちのモチベーションになります!</strong>
|
70 |
+
</p>
|
71 |
+
|
72 |
+
<p align="center">
|
73 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/stats_padded.png" width="1000">
|
74 |
+
</p>
|
75 |
+
|
76 |
+
D-FINEは、DETRの境界ボックス回帰タスクを細粒度分布最適化(FDR)として再定義し、グローバル最適な位置特定自己蒸留(GO-LSD)を導入することで、追加の推論およびトレーニングコストを増やすことなく、優れたパフォーマンスを実現する強力なリアルタイムオブジェクト検出器です。
|
77 |
+
|
78 |
+
<details open>
|
79 |
+
<summary> ビデオ </summary>
|
80 |
+
|
81 |
+
D-FINEとYOLO11を使用して、[YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A)の複雑な街並みのビデオでオブジェクト検出を行いました。逆光、モーションブラー、密集した群衆などの厳しい条件にもかかわらず、D-FINE-Xはほぼすべてのターゲットを検出し、バックパック、自転車、信号機などの微妙な小さなオブジェクトも含まれます。その信頼スコアとぼやけたエッジの位置特定精度はYOLO11よりもはるかに高いです。
|
82 |
+
|
83 |
+
<!-- We use D-FINE and YOLO11 on a street scene video from [YouTube](https://www.youtube.com/watch?v=CfhEWj9sd9A). Despite challenges like backlighting, motion blur, and dense crowds, D-FINE-X outperforms YOLO11x, detecting more objects with higher confidence and better precision. -->
|
84 |
+
|
85 |
+
https://github.com/user-attachments/assets/e5933d8e-3c8a-400e-870b-4e452f5321d9
|
86 |
+
|
87 |
+
</details>
|
88 |
+
|
89 |
+
## 🚀 更新情報
|
90 |
+
- [x] **\[2024.10.18\]** D-FINEシリーズをリリース。
|
91 |
+
- [x] **\[2024.10.25\]** カスタムデータセットの微調整設定を追加 ([#7](https://github.com/Peterande/D-FINE/issues/7))。
|
92 |
+
- [x] **\[2024.10.30\]** D-FINE-L (E25) 事前トレーニングモデルを更新し、パフォーマンスが2.0%向上。
|
93 |
+
- [x] **\[2024.11.07\]** **D-FINE-N** をリリース, COCO で 42.8% の AP<sup>val</sup> を達成 @ 472 FPS<sup>T4</sup>!
|
94 |
+
|
95 |
+
## モデルズー
|
96 |
+
|
97 |
+
### COCO
|
98 |
+
| モデル | データセット | AP<sup>val</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
|
99 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
100 |
+
**D‑FINE‑N** | COCO | **42.8** | 4M | 2.12ms | 7 | [yml](./configs/dfine/dfine_hgnetv2_n_coco.yml) | [42.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_n_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_n_coco_log.txt)
|
101 |
+
**D‑FINE‑S** | COCO | **48.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/dfine_hgnetv2_s_coco.yml) | [48.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_s_coco_log.txt)
|
102 |
+
**D‑FINE‑M** | COCO | **52.3** | 19M | 5.62ms | 57 | [yml](./configs/dfine/dfine_hgnetv2_m_coco.yml) | [52.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_m_coco_log.txt)
|
103 |
+
**D‑FINE‑L** | COCO | **54.0** | 31M | 8.07ms | 91 | [yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) | [54.0](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_l_coco_log.txt)
|
104 |
+
**D‑FINE‑X** | COCO | **55.8** | 62M | 12.89ms | 202 | [yml](./configs/dfine/dfine_hgnetv2_x_coco.yml) | [55.8](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/coco/dfine_x_coco_log.txt)
|
105 |
+
|
106 |
+
|
107 |
+
### Objects365+COCO
|
108 |
+
| モデル | データセット | AP<sup>val</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
|
109 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
110 |
+
**D‑FINE‑S** | Objects365+COCO | **50.7** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml) | [50.7](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_s_obj2coco_log.txt)
|
111 |
+
**D‑FINE‑M** | Objects365+COCO | **55.1** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml) | [55.1](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_m_obj2coco_log.txt)
|
112 |
+
**D‑FINE‑L** | Objects365+COCO | **57.3** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml) | [57.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj2coco_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_l_obj2coco_log_e25.txt)
|
113 |
+
**D‑FINE‑X** | Objects365+COCO | **59.3** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml) | [59.3](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj2coco.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj2coco/dfine_x_obj2coco_log.txt)
|
114 |
+
|
115 |
+
**微調整のために Objects365 の事前学習モデルを使用することを強くお勧めします:**
|
116 |
+
|
117 |
+
⚠️ 重要なお知らせ:このプリトレインモデルは複雑なシーンの理解に有益ですが、カテゴリが非常に単純な場合、過学習や最適ではない性能につながる可能性がありますので、ご注意ください。
|
118 |
+
|
119 |
+
<details> <summary><strong> 🔥 Objects365で事前トレーニングされたモデル(最良の汎化性能)</strong></summary>
|
120 |
+
|
121 |
+
|
122 |
+
| モデル | データセット | AP<sup>val</sup> | AP<sup>5000</sup> | パラメータ数 | レイテンシ | GFLOPs | config | checkpoint | logs |
|
123 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
124 |
+
**D‑FINE‑S** | Objects365 | **31.0** | **30.5** | 10M | 3.49ms | 25 | [yml](./configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml) | [30.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_s_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_s_obj365_log.txt)
|
125 |
+
**D‑FINE‑M** | Objects365 | **38.6** | **37.4** | 19M | 5.62ms | 57 | [yml](./configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml) | [37.4](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_m_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_m_obj365_log.txt)
|
126 |
+
**D‑FINE‑L** | Objects365 | - | **40.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [40.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log.txt)
|
127 |
+
**D‑FINE‑L (E25)** | Objects365 | **44.7** | **42.6** | 31M | 8.07ms | 91 | [yml](./configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml) | [42.6](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_l_obj365_e25.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_l_obj365_log_e25.txt)
|
128 |
+
**D‑FINE‑X** | Objects365 | **49.5** | **46.5** | 62M | 12.89ms | 202 | [yml](./configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml) | [46.5](https://github.com/Peterande/storage/releases/download/dfinev1.0/dfine_x_obj365.pth) | [url](https://raw.githubusercontent.com/Peterande/storage/refs/heads/master/logs/obj365/dfine_x_obj365_log.txt)
|
129 |
+
- **E25**: 再トレーニングし、事前トレーニングを25エポックに延長。
|
130 |
+
- **AP<sup>val</sup>** は *Objects365* のフルバリデーションセットで評価されます。
|
131 |
+
- **AP<sup>5000</sup>** は *Objects365* 検証セットの最初の5000サンプルで評価されます。
|
132 |
+
</details>
|
133 |
+
|
134 |
+
**注意事項:**
|
135 |
+
- **AP<sup>val</sup>** は *MSCOCO val2017* データセットで評価されます。
|
136 |
+
- **レイテンシ** は単一のT4 GPUで $batch\\_size = 1$, $fp16$, および $TensorRT==10.4.0$ で評価されます。
|
137 |
+
- **Objects365+COCO** は *Objects365* で事前トレーニングされた重みを使用して *COCO* で微調整されたモデルを意味します。
|
138 |
+
|
139 |
+
|
140 |
+
|
141 |
+
## クイックスタート
|
142 |
+
|
143 |
+
### セットアップ
|
144 |
+
|
145 |
+
```shell
|
146 |
+
conda create -n dfine python=3.11.9
|
147 |
+
conda activate dfine
|
148 |
+
pip install -r requirements.txt
|
149 |
+
```
|
150 |
+
|
151 |
+
|
152 |
+
### データ準備
|
153 |
+
|
154 |
+
<details>
|
155 |
+
<summary> COCO2017 データセット </summary>
|
156 |
+
|
157 |
+
1. [OpenDataLab](https://opendatalab.com/OpenDataLab/COCO_2017) または [COCO](https://cocodataset.org/#download) からCOCO2017をダウンロードします。
|
158 |
+
1. [coco_detection.yml](./configs/dataset/coco_detection.yml) のパスを修正します。
|
159 |
+
|
160 |
+
```yaml
|
161 |
+
train_dataloader:
|
162 |
+
img_folder: /data/COCO2017/train2017/
|
163 |
+
ann_file: /data/COCO2017/annotations/instances_train2017.json
|
164 |
+
val_dataloader:
|
165 |
+
img_folder: /data/COCO2017/val2017/
|
166 |
+
ann_file: /data/COCO2017/annotations/instances_val2017.json
|
167 |
+
```
|
168 |
+
|
169 |
+
</details>
|
170 |
+
|
171 |
+
<details>
|
172 |
+
<summary> Objects365 データセット </summary>
|
173 |
+
|
174 |
+
1. [OpenDataLab](https://opendatalab.com/OpenDataLab/Objects365) からObjects365をダウンロードします。
|
175 |
+
|
176 |
+
2. ベースディレクトリを設定します:
|
177 |
+
```shell
|
178 |
+
export BASE_DIR=/data/Objects365/data
|
179 |
+
```
|
180 |
+
|
181 |
+
3. ダウンロードしたファイルを解凍し、以下のディレクトリ構造に整理します:
|
182 |
+
|
183 |
+
```shell
|
184 |
+
${BASE_DIR}/train
|
185 |
+
├── images
|
186 |
+
│ ├── v1
|
187 |
+
│ │ ├── patch0
|
188 |
+
│ │ │ ├── 000000000.jpg
|
189 |
+
│ │ │ ├── 000000001.jpg
|
190 |
+
│ │ │ └── ... (more images)
|
191 |
+
│ ├── v2
|
192 |
+
│ │ ├── patchx
|
193 |
+
│ │ │ ├── 000000000.jpg
|
194 |
+
│ │ │ ├── 000000001.jpg
|
195 |
+
│ │ │ └── ... (more images)
|
196 |
+
├── zhiyuan_objv2_train.json
|
197 |
+
```
|
198 |
+
|
199 |
+
```shell
|
200 |
+
${BASE_DIR}/val
|
201 |
+
├── images
|
202 |
+
│ ├── v1
|
203 |
+
│ │ ├── patch0
|
204 |
+
│ │ │ ├── 000000000.jpg
|
205 |
+
│ │ │ └── ... (more images)
|
206 |
+
│ ├── v2
|
207 |
+
│ │ ├── patchx
|
208 |
+
│ │ │ ├── 000000000.jpg
|
209 |
+
│ │ │ └── ... (more images)
|
210 |
+
├── zhiyuan_objv2_val.json
|
211 |
+
```
|
212 |
+
|
213 |
+
4. 検証セットの画像を保存する新しいディレクトリを作成します:
|
214 |
+
```shell
|
215 |
+
mkdir -p ${BASE_DIR}/train/images_from_val
|
216 |
+
```
|
217 |
+
|
218 |
+
5. valディレクトリのv1およびv2フォルダをtrain/images_from_valディレクトリにコピーします
|
219 |
+
```shell
|
220 |
+
cp -r ${BASE_DIR}/val/images/v1 ${BASE_DIR}/train/images_from_val/
|
221 |
+
cp -r ${BASE_DIR}/val/images/v2 ${BASE_DIR}/train/images_from_val/
|
222 |
+
```
|
223 |
+
|
224 |
+
6. remap_obj365.pyを実行して、検証セットの一部をトレーニングセットにマージします。具体的には、このスクリプトはインデックスが5000から800000のサンプルを検証セットからトレーニングセットに移動します。
|
225 |
+
```shell
|
226 |
+
python tools/remap_obj365.py --base_dir ${BASE_DIR}
|
227 |
+
```
|
228 |
+
|
229 |
+
|
230 |
+
7. resize_obj365.pyスクリプトを実行して、データセット内の最大エッジ長が640ピクセルを超える画像をリサイズします。ステップ5で生成された更新されたJSONファイルを使用してサンプルデータを処理します。トレーニングセットと検証セットの両方の画像をリサイズして、一貫性を保ちます。
|
231 |
+
```shell
|
232 |
+
python tools/resize_obj365.py --base_dir ${BASE_DIR}
|
233 |
+
```
|
234 |
+
|
235 |
+
8. [obj365_detection.yml](./configs/dataset/obj365_detection.yml) のパスを修正します。
|
236 |
+
|
237 |
+
```yaml
|
238 |
+
train_dataloader:
|
239 |
+
img_folder: /data/Objects365/data/train
|
240 |
+
ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
|
241 |
+
val_dataloader:
|
242 |
+
img_folder: /data/Objects365/data/val/
|
243 |
+
ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
|
244 |
+
```
|
245 |
+
|
246 |
+
|
247 |
+
</details>
|
248 |
+
|
249 |
+
<details>
|
250 |
+
<summary>CrowdHuman</summary>
|
251 |
+
|
252 |
+
こちらからCOCOフォーマットのデータセットをダウンロードしてください:[リンク](https://aistudio.baidu.com/datasetdetail/231455)
|
253 |
+
|
254 |
+
</details>
|
255 |
+
|
256 |
+
<details>
|
257 |
+
<summary>カスタムデータセット</summary>
|
258 |
+
|
259 |
+
カスタムデータセットでトレーニングするには、COCO形式で整理する必要があります。以下の手順に従ってデータセットを準備してください:
|
260 |
+
|
261 |
+
1. **`remap_mscoco_category` を `False` に設定します**:
|
262 |
+
|
263 |
+
これにより、カテゴリIDがMSCOCOカテゴリに自動的にマッピングされるのを防ぎます。
|
264 |
+
|
265 |
+
```yaml
|
266 |
+
remap_mscoco_category: False
|
267 |
+
```
|
268 |
+
|
269 |
+
2. **画像を整理します**:
|
270 |
+
|
271 |
+
データセットディレクトリを以下のように構造化します:
|
272 |
+
|
273 |
+
```shell
|
274 |
+
dataset/
|
275 |
+
├── images/
|
276 |
+
│ ├── train/
|
277 |
+
│ │ ├── image1.jpg
|
278 |
+
│ │ ├── image2.jpg
|
279 |
+
│ │ └── ...
|
280 |
+
│ ├── val/
|
281 |
+
│ │ ├── image1.jpg
|
282 |
+
│ │ ├── image2.jpg
|
283 |
+
│ │ └── ...
|
284 |
+
└── annotations/
|
285 |
+
├── instances_train.json
|
286 |
+
├── instances_val.json
|
287 |
+
└── ...
|
288 |
+
```
|
289 |
+
|
290 |
+
- **`images/train/`**: すべてのトレーニング画像を含みます。
|
291 |
+
- **`images/val/`**: すべての検証画像を含みます。
|
292 |
+
- **`annotations/`**: COCO形式の注釈ファイルを含みます。
|
293 |
+
|
294 |
+
3. **注釈をCOCO形式に変換します**:
|
295 |
+
|
296 |
+
注釈がまだCOCO形式でない場合は、変換する必要があります。以下のPythonスクリプトを参考にするか、既存のツールを利用してください:
|
297 |
+
|
298 |
+
```python
|
299 |
+
import json
|
300 |
+
|
301 |
+
def convert_to_coco(input_annotations, output_annotations):
|
302 |
+
# 変換ロジックをここに実装します
|
303 |
+
pass
|
304 |
+
|
305 |
+
if __name__ == "__main__":
|
306 |
+
convert_to_coco('path/to/your_annotations.json', 'dataset/annotations/instances_train.json')
|
307 |
+
```
|
308 |
+
|
309 |
+
4. **設定ファイルを更新します**:
|
310 |
+
|
311 |
+
[custom_detection.yml](./configs/dataset/custom_detection.yml) を修正します。
|
312 |
+
|
313 |
+
```yaml
|
314 |
+
task: detection
|
315 |
+
|
316 |
+
evaluator:
|
317 |
+
type: CocoEvaluator
|
318 |
+
iou_types: ['bbox', ]
|
319 |
+
|
320 |
+
num_classes: 777 # データセットのクラス数
|
321 |
+
remap_mscoco_category: False
|
322 |
+
|
323 |
+
train_dataloader:
|
324 |
+
type: DataLoader
|
325 |
+
dataset:
|
326 |
+
type: CocoDetection
|
327 |
+
img_folder: /data/yourdataset/train
|
328 |
+
ann_file: /data/yourdataset/train/train.json
|
329 |
+
return_masks: False
|
330 |
+
transforms:
|
331 |
+
type: Compose
|
332 |
+
ops: ~
|
333 |
+
shuffle: True
|
334 |
+
num_workers: 4
|
335 |
+
drop_last: True
|
336 |
+
collate_fn:
|
337 |
+
type: BatchImageCollateFunction
|
338 |
+
|
339 |
+
val_dataloader:
|
340 |
+
type: DataLoader
|
341 |
+
dataset:
|
342 |
+
type: CocoDetection
|
343 |
+
img_folder: /data/yourdataset/val
|
344 |
+
ann_file: /data/yourdataset/val/ann.json
|
345 |
+
return_masks: False
|
346 |
+
transforms:
|
347 |
+
type: Compose
|
348 |
+
ops: ~
|
349 |
+
shuffle: False
|
350 |
+
num_workers: 4
|
351 |
+
drop_last: False
|
352 |
+
collate_fn:
|
353 |
+
type: BatchImageCollateFunction
|
354 |
+
```
|
355 |
+
|
356 |
+
</details>
|
357 |
+
|
358 |
+
|
359 |
+
## 使用方法
|
360 |
+
<details open>
|
361 |
+
<summary> COCO2017 </summary>
|
362 |
+
|
363 |
+
<!-- <summary>1. トレーニング </summary> -->
|
364 |
+
1. モデルを設定します
|
365 |
+
```shell
|
366 |
+
export model=l # n s m l x
|
367 |
+
```
|
368 |
+
|
369 |
+
2. トレーニング
|
370 |
+
```shell
|
371 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0
|
372 |
+
```
|
373 |
+
|
374 |
+
<!-- <summary>2. テスト </summary> -->
|
375 |
+
3. テスト
|
376 |
+
```shell
|
377 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
378 |
+
```
|
379 |
+
|
380 |
+
<!-- <summary>3. 微調整 </summary> -->
|
381 |
+
4. 微調整
|
382 |
+
```shell
|
383 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --use-amp --seed=0 -t model.pth
|
384 |
+
```
|
385 |
+
</details>
|
386 |
+
|
387 |
+
|
388 |
+
<details>
|
389 |
+
<summary> Objects365からCOCO2017へ </summary>
|
390 |
+
|
391 |
+
1. モデルを設定します
|
392 |
+
```shell
|
393 |
+
export model=l # n s m l x
|
394 |
+
```
|
395 |
+
|
396 |
+
2. Objects365でトレーニング
|
397 |
+
```shell
|
398 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj365.yml --use-amp --seed=0
|
399 |
+
```
|
400 |
+
|
401 |
+
3. COCO2017で微調整
|
402 |
+
```shell
|
403 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/objects365/dfine_hgnetv2_${model}_obj2coco.yml --use-amp --seed=0 -t model.pth
|
404 |
+
```
|
405 |
+
|
406 |
+
<!-- <summary>2. テスト </summary> -->
|
407 |
+
4. テスト
|
408 |
+
```shell
|
409 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml --test-only -r model.pth
|
410 |
+
```
|
411 |
+
</details>
|
412 |
+
|
413 |
+
|
414 |
+
<details>
|
415 |
+
<summary> カス���ムデータセット </summary>
|
416 |
+
|
417 |
+
1. モデルを設定します
|
418 |
+
```shell
|
419 |
+
export model=l # n s m l x
|
420 |
+
```
|
421 |
+
|
422 |
+
2. カスタムデータセットでトレーニング
|
423 |
+
```shell
|
424 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0
|
425 |
+
```
|
426 |
+
<!-- <summary>2. テスト </summary> -->
|
427 |
+
3. テスト
|
428 |
+
```shell
|
429 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --test-only -r model.pth
|
430 |
+
```
|
431 |
+
|
432 |
+
4. カスタムデータセットで微調整
|
433 |
+
```shell
|
434 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/objects365/dfine_hgnetv2_${model}_obj2custom.yml --use-amp --seed=0 -t model.pth
|
435 |
+
```
|
436 |
+
|
437 |
+
5. **[オプション]** クラスマッピングを変更します:
|
438 |
+
|
439 |
+
Objects365の事前トレーニング済みの重みを使用してカスタムデータセットでトレーニングする場合、例ではデータセットに `'Person'` と `'Car'` クラスのみが含まれていると仮定しています。特定のタスクに対して収束を早めるために、`src/solver/_solver.py` の `self.obj365_ids` を以下のように変更できます:
|
440 |
+
|
441 |
+
```python
|
442 |
+
self.obj365_ids = [0, 5] # Person, Cars
|
443 |
+
```
|
444 |
+
これらをデータセットの対応するクラスに置き換えることができます。Objects365クラスとその対応IDのリスト:
|
445 |
+
https://github.com/Peterande/D-FINE/blob/352a94ece291e26e1957df81277bef00fe88a8e3/src/solver/_solver.py#L330
|
446 |
+
|
447 |
+
新しいトレーニングコマンド:
|
448 |
+
|
449 |
+
```shell
|
450 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7777 --nproc_per_node=4 train.py -c configs/dfine/custom/dfine_hgnetv2_${model}_custom.yml --use-amp --seed=0 -t model.pth
|
451 |
+
```
|
452 |
+
|
453 |
+
ただし、クラスマッピングを変更したくない場合、事前トレーニング済みのObjects365の重みは変更なしでそのまま使用できます。クラスマッピングの変更はオプションであり、特定のタスクに対して収束を早める可能性があります。
|
454 |
+
|
455 |
+
|
456 |
+
|
457 |
+
</details>
|
458 |
+
|
459 |
+
<details>
|
460 |
+
<summary> バッチサイズのカスタマイズ </summary>
|
461 |
+
|
462 |
+
例えば、COCO2017でD-FINE-Lをトレーニングする際にバッチサイズを2倍にしたい場合、以下の手順に従ってください:
|
463 |
+
|
464 |
+
1. **[dataloader.yml](./configs/dfine/include/dataloader.yml) を修正して `total_batch_size` を増やします**:
|
465 |
+
|
466 |
+
```yaml
|
467 |
+
train_dataloader:
|
468 |
+
total_batch_size: 64 # 以前は32、今は2倍
|
469 |
+
```
|
470 |
+
|
471 |
+
2. **[dfine_hgnetv2_l_coco.yml](./configs/dfine/dfine_hgnetv2_l_coco.yml) を修正します**。以下のように主要なパラメータを調整します:
|
472 |
+
|
473 |
+
```yaml
|
474 |
+
optimizer:
|
475 |
+
type: AdamW
|
476 |
+
params:
|
477 |
+
-
|
478 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
479 |
+
lr: 0.000025 # 2倍、線形スケーリング法則
|
480 |
+
-
|
481 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
482 |
+
weight_decay: 0.
|
483 |
+
|
484 |
+
lr: 0.0005 # 2倍、線形スケーリング法則
|
485 |
+
betas: [0.9, 0.999]
|
486 |
+
weight_decay: 0.0001 # グリッドサーチが必要です
|
487 |
+
|
488 |
+
ema: # EMA設定を追加
|
489 |
+
decay: 0.9998 # 1 - (1 - decay) * 2 によって調整
|
490 |
+
warmups: 500 # 半分
|
491 |
+
|
492 |
+
lr_warmup_scheduler:
|
493 |
+
warmup_duration: 250 # 半分
|
494 |
+
```
|
495 |
+
|
496 |
+
</details>
|
497 |
+
|
498 |
+
|
499 |
+
<details>
|
500 |
+
<summary> 入力サイズのカスタマイズ </summary>
|
501 |
+
|
502 |
+
COCO2017で **D-FINE-L** を320x320の入力サイズでトレーニングしたい場合、以下の手順に従ってください:
|
503 |
+
|
504 |
+
1. **[dataloader.yml](./configs/dfine/include/dataloader.yml) を修正します**:
|
505 |
+
|
506 |
+
```yaml
|
507 |
+
|
508 |
+
train_dataloader:
|
509 |
+
dataset:
|
510 |
+
transforms:
|
511 |
+
ops:
|
512 |
+
- {type: Resize, size: [320, 320], }
|
513 |
+
collate_fn:
|
514 |
+
base_size: 320
|
515 |
+
dataset:
|
516 |
+
transforms:
|
517 |
+
ops:
|
518 |
+
- {type: Resize, size: [320, 320], }
|
519 |
+
```
|
520 |
+
|
521 |
+
2. **[dfine_hgnetv2.yml](./configs/dfine/include/dfine_hgnetv2.yml) を修正します**:
|
522 |
+
|
523 |
+
```yaml
|
524 |
+
eval_spatial_size: [320, 320]
|
525 |
+
```
|
526 |
+
|
527 |
+
</details>
|
528 |
+
|
529 |
+
## ツール
|
530 |
+
<details>
|
531 |
+
<summary> デプロイ </summary>
|
532 |
+
|
533 |
+
<!-- <summary>4. onnxのエクスポート </summary> -->
|
534 |
+
1. セットアップ
|
535 |
+
```shell
|
536 |
+
pip install onnx onnxsim
|
537 |
+
export model=l # n s m l x
|
538 |
+
```
|
539 |
+
|
540 |
+
2. onnxのエクスポート
|
541 |
+
```shell
|
542 |
+
python tools/deployment/export_onnx.py --check -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
543 |
+
```
|
544 |
+
|
545 |
+
3. [tensorrt](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html) のエクスポート
|
546 |
+
```shell
|
547 |
+
trtexec --onnx="model.onnx" --saveEngine="model.engine" --fp16
|
548 |
+
```
|
549 |
+
|
550 |
+
</details>
|
551 |
+
|
552 |
+
<details>
|
553 |
+
<summary> 推論(可視化) </summary>
|
554 |
+
|
555 |
+
|
556 |
+
1. セットアップ
|
557 |
+
```shell
|
558 |
+
pip install -r tools/inference/requirements.txt
|
559 |
+
export model=l # n s m l x
|
560 |
+
```
|
561 |
+
|
562 |
+
|
563 |
+
<!-- <summary>5. 推論 </summary> -->
|
564 |
+
2. 推論 (onnxruntime / tensorrt / torch)
|
565 |
+
|
566 |
+
現在、画像とビデオの推論がサポートされています。
|
567 |
+
```shell
|
568 |
+
python tools/inference/onnx_inf.py --onnx model.onnx --input image.jpg # video.mp4
|
569 |
+
python tools/inference/trt_inf.py --trt model.engine --input image.jpg
|
570 |
+
python tools/inference/torch_inf.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth --input image.jpg --device cuda:0
|
571 |
+
```
|
572 |
+
</details>
|
573 |
+
|
574 |
+
<details>
|
575 |
+
<summary> ベンチマーク </summary>
|
576 |
+
|
577 |
+
1. セットアップ
|
578 |
+
```shell
|
579 |
+
pip install -r tools/benchmark/requirements.txt
|
580 |
+
export model=l # n s m l x
|
581 |
+
```
|
582 |
+
|
583 |
+
<!-- <summary>6. ベンチマーク </summary> -->
|
584 |
+
2. モデルのFLOPs、MACs、およびパラメータ数
|
585 |
+
```shell
|
586 |
+
python tools/benchmark/get_info.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml
|
587 |
+
```
|
588 |
+
|
589 |
+
2. TensorRTのレイテンシ
|
590 |
+
```shell
|
591 |
+
python tools/benchmark/trt_benchmark.py --COCO_dir path/to/COCO2017 --engine_dir model.engine
|
592 |
+
```
|
593 |
+
</details>
|
594 |
+
|
595 |
+
<details>
|
596 |
+
<summary> Fiftyoneの可視化 </summary>
|
597 |
+
|
598 |
+
1. セットアップ
|
599 |
+
```shell
|
600 |
+
pip install fiftyone
|
601 |
+
export model=l # n s m l x
|
602 |
+
```
|
603 |
+
4. Voxel51 Fiftyoneの可視化 ([fiftyone](https://github.com/voxel51/fiftyone))
|
604 |
+
```shell
|
605 |
+
python tools/visualization/fiftyone_vis.py -c configs/dfine/dfine_hgnetv2_${model}_coco.yml -r model.pth
|
606 |
+
```
|
607 |
+
</details>
|
608 |
+
|
609 |
+
<details>
|
610 |
+
<summary> その他 </summary>
|
611 |
+
|
612 |
+
1. 自動再開トレーニング
|
613 |
+
```shell
|
614 |
+
bash reference/safe_training.sh
|
615 |
+
```
|
616 |
+
|
617 |
+
2. モデルの重みの変換
|
618 |
+
```shell
|
619 |
+
python reference/convert_weight.py model.pth
|
620 |
+
```
|
621 |
+
</details>
|
622 |
+
|
623 |
+
## 図と可視化
|
624 |
+
|
625 |
+
<details>
|
626 |
+
<summary> FDRとGO-LSD </summary>
|
627 |
+
|
628 |
+
1. FDRを搭載したD-FINEの概要。より細粒度の中間表現として機能する確率分布は、残差的にデコーダ層によって逐次最適化されます。
|
629 |
+
不均一な重み付け関数が適用され、より細かい位置特定が可能になります。
|
630 |
+
|
631 |
+
<p align="center">
|
632 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/fdr-1.jpg" alt="細粒度分布最適化プロセス" width="1000">
|
633 |
+
</p>
|
634 |
+
|
635 |
+
2. GO-LSDプロセスの概要。最終層の最適化された分布からの位置特定知識は、デカップリングされた重み付け戦略を使用してDDF損失を通じて前の層に蒸留されます。
|
636 |
+
|
637 |
+
<p align="center">
|
638 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/go_lsd-1.jpg" alt="GO-LSDプロセス" width="1000">
|
639 |
+
</p>
|
640 |
+
|
641 |
+
</details>
|
642 |
+
|
643 |
+
<details open>
|
644 |
+
<summary> 分布 </summary>
|
645 |
+
|
646 |
+
初期および最適化された境界ボックスと、未重み付けおよび重み付けされた分布とともに、さまざまな検出シナリオにおけるFDRの可視化。
|
647 |
+
|
648 |
+
<p align="center">
|
649 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" width="1000">
|
650 |
+
</p>
|
651 |
+
|
652 |
+
</details>
|
653 |
+
|
654 |
+
<details>
|
655 |
+
<summary> 難しいケース </summary>
|
656 |
+
|
657 |
+
以下の可視化は、さまざまな複雑な検出シナリオにおけるD-FINEの予測を示しています。これらのシナリオには、遮蔽、低光条件、モーションブラー、被写界深度効果、および密集したシーンが含まれます。これらの課題にもかかわらず、D-FINEは一貫して正確な位置特定結果を生成します。
|
658 |
+
|
659 |
+
<p align="center">
|
660 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/hard_case-1.jpg" alt="複雑なシナリオにおけるD-FINEの予測" width="1000">
|
661 |
+
</p>
|
662 |
+
|
663 |
+
</details>
|
664 |
+
|
665 |
+
|
666 |
+
<!-- <div style="display: flex; flex-wrap: wrap; justify-content: center; margin: 0; padding: 0;">
|
667 |
+
<img src="https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg" style="width:99.96%; margin: 0; padding: 0;" />
|
668 |
+
</div>
|
669 |
+
|
670 |
+
<table><tr>
|
671 |
+
<td><img src=https://raw.githubusercontent.com/Peterande/storage/master/figs/merged_image.jpg border=0 width=1000></td>
|
672 |
+
</tr></table> -->
|
673 |
+
|
674 |
+
|
675 |
+
|
676 |
+
|
677 |
+
## 引用
|
678 |
+
もし`D-FINE`やその方法をあなたの仕事で使用する場合、以下のBibTeXエントリを引用してください:
|
679 |
+
<details open>
|
680 |
+
<summary> bibtex </summary>
|
681 |
+
|
682 |
+
```latex
|
683 |
+
@misc{peng2024dfine,
|
684 |
+
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
|
685 |
+
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
|
686 |
+
year={2024},
|
687 |
+
eprint={2410.13842},
|
688 |
+
archivePrefix={arXiv},
|
689 |
+
primaryClass={cs.CV}
|
690 |
+
}
|
691 |
+
```
|
692 |
+
</details>
|
693 |
+
|
694 |
+
## 謝辞
|
695 |
+
私たちの仕事は [RT-DETR](https://github.com/lyuwenyu/RT-DETR) に基づいています。
|
696 |
+
[RT-DETR](https://github.com/lyuwenyu/RT-DETR), [GFocal](https://github.com/implus/GFocal), [LD](https://github.com/HikariTJU/LD), および [YOLOv9](https://github.com/WongKinYiu/yolov9) からのインスピレーションに感謝します。
|
697 |
+
|
698 |
+
✨ 貢献を歓迎し、質問があればお気軽にお問い合わせください! ✨
|
D-FINE/configs/dataset/coco_detection.yml
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
evaluator:
|
4 |
+
type: CocoEvaluator
|
5 |
+
iou_types: ['bbox', ]
|
6 |
+
|
7 |
+
num_classes: 80
|
8 |
+
remap_mscoco_category: True
|
9 |
+
|
10 |
+
train_dataloader:
|
11 |
+
type: DataLoader
|
12 |
+
dataset:
|
13 |
+
type: CocoDetection
|
14 |
+
img_folder: /data/COCO2017/train2017/
|
15 |
+
ann_file: /data/COCO2017/annotations/instances_train2017.json
|
16 |
+
return_masks: False
|
17 |
+
transforms:
|
18 |
+
type: Compose
|
19 |
+
ops: ~
|
20 |
+
shuffle: True
|
21 |
+
num_workers: 4
|
22 |
+
drop_last: True
|
23 |
+
collate_fn:
|
24 |
+
type: BatchImageCollateFunction
|
25 |
+
|
26 |
+
|
27 |
+
val_dataloader:
|
28 |
+
type: DataLoader
|
29 |
+
dataset:
|
30 |
+
type: CocoDetection
|
31 |
+
img_folder: /data/COCO2017/val2017/
|
32 |
+
ann_file: /data/COCO2017/annotations/instances_val2017.json
|
33 |
+
return_masks: False
|
34 |
+
transforms:
|
35 |
+
type: Compose
|
36 |
+
ops: ~
|
37 |
+
shuffle: False
|
38 |
+
num_workers: 4
|
39 |
+
drop_last: False
|
40 |
+
collate_fn:
|
41 |
+
type: BatchImageCollateFunction
|
D-FINE/configs/dataset/crowdhuman_detection.yml
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
evaluator:
|
4 |
+
type: CocoEvaluator
|
5 |
+
iou_types: ['bbox', ]
|
6 |
+
|
7 |
+
num_classes: 1 # your dataset classes
|
8 |
+
remap_mscoco_category: False
|
9 |
+
|
10 |
+
train_dataloader:
|
11 |
+
type: DataLoader
|
12 |
+
dataset:
|
13 |
+
type: CocoDetection
|
14 |
+
img_folder: /data/CrowdHuman/coco/CrowdHuman_train
|
15 |
+
ann_file: /data/CrowdHuman/coco/Chuman-train.json
|
16 |
+
return_masks: False
|
17 |
+
transforms:
|
18 |
+
type: Compose
|
19 |
+
ops: ~
|
20 |
+
shuffle: True
|
21 |
+
num_workers: 4
|
22 |
+
drop_last: True
|
23 |
+
collate_fn:
|
24 |
+
type: BatchImageCollateFunction
|
25 |
+
|
26 |
+
|
27 |
+
val_dataloader:
|
28 |
+
type: DataLoader
|
29 |
+
dataset:
|
30 |
+
type: CocoDetection
|
31 |
+
img_folder: /data/CrowdHuman/coco/CrowdHuman_val
|
32 |
+
ann_file: /data/CrowdHuman/coco/Chuman-val.json
|
33 |
+
return_masks: False
|
34 |
+
transforms:
|
35 |
+
type: Compose
|
36 |
+
ops: ~
|
37 |
+
shuffle: False
|
38 |
+
num_workers: 4
|
39 |
+
drop_last: False
|
40 |
+
collate_fn:
|
41 |
+
type: BatchImageCollateFunction
|
D-FINE/configs/dataset/custom_detection.yml
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
evaluator:
|
4 |
+
type: CocoEvaluator
|
5 |
+
iou_types: ['bbox', ]
|
6 |
+
|
7 |
+
num_classes: 777 # your dataset classes
|
8 |
+
remap_mscoco_category: False
|
9 |
+
|
10 |
+
train_dataloader:
|
11 |
+
type: DataLoader
|
12 |
+
dataset:
|
13 |
+
type: CocoDetection
|
14 |
+
img_folder: /data/yourdataset/train
|
15 |
+
ann_file: /data/yourdataset/train/train.json
|
16 |
+
return_masks: False
|
17 |
+
transforms:
|
18 |
+
type: Compose
|
19 |
+
ops: ~
|
20 |
+
shuffle: True
|
21 |
+
num_workers: 4
|
22 |
+
drop_last: True
|
23 |
+
collate_fn:
|
24 |
+
type: BatchImageCollateFunction
|
25 |
+
|
26 |
+
|
27 |
+
val_dataloader:
|
28 |
+
type: DataLoader
|
29 |
+
dataset:
|
30 |
+
type: CocoDetection
|
31 |
+
img_folder: /data/yourdataset/val
|
32 |
+
ann_file: /data/yourdataset/val/val.json
|
33 |
+
return_masks: False
|
34 |
+
transforms:
|
35 |
+
type: Compose
|
36 |
+
ops: ~
|
37 |
+
shuffle: False
|
38 |
+
num_workers: 4
|
39 |
+
drop_last: False
|
40 |
+
collate_fn:
|
41 |
+
type: BatchImageCollateFunction
|
D-FINE/configs/dataset/obj365_detection.yml
ADDED
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
evaluator:
|
4 |
+
type: CocoEvaluator
|
5 |
+
iou_types: ['bbox', ]
|
6 |
+
|
7 |
+
num_classes: 366
|
8 |
+
remap_mscoco_category: False
|
9 |
+
|
10 |
+
train_dataloader:
|
11 |
+
type: DataLoader
|
12 |
+
dataset:
|
13 |
+
type: CocoDetection
|
14 |
+
img_folder: /data/Objects365/data/train
|
15 |
+
ann_file: /data/Objects365/data/train/new_zhiyuan_objv2_train_resized.json
|
16 |
+
return_masks: False
|
17 |
+
transforms:
|
18 |
+
type: Compose
|
19 |
+
ops: ~
|
20 |
+
shuffle: True
|
21 |
+
num_workers: 4
|
22 |
+
drop_last: True
|
23 |
+
collate_fn:
|
24 |
+
type: BatchImageCollateFunction
|
25 |
+
|
26 |
+
|
27 |
+
val_dataloader:
|
28 |
+
type: DataLoader
|
29 |
+
dataset:
|
30 |
+
type: CocoDetection
|
31 |
+
img_folder: /data/Objects365/data/val/
|
32 |
+
ann_file: /data/Objects365/data/val/new_zhiyuan_objv2_val_resized.json
|
33 |
+
return_masks: False
|
34 |
+
transforms:
|
35 |
+
type: Compose
|
36 |
+
ops: ~
|
37 |
+
shuffle: False
|
38 |
+
num_workers: 4
|
39 |
+
drop_last: False
|
40 |
+
collate_fn:
|
41 |
+
type: BatchImageCollateFunction
|
D-FINE/configs/dataset/voc_detection.yml
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
evaluator:
|
4 |
+
type: CocoEvaluator
|
5 |
+
iou_types: ['bbox', ]
|
6 |
+
|
7 |
+
num_classes: 20
|
8 |
+
|
9 |
+
train_dataloader:
|
10 |
+
type: DataLoader
|
11 |
+
dataset:
|
12 |
+
type: VOCDetection
|
13 |
+
root: ./dataset/voc/
|
14 |
+
ann_file: trainval.txt
|
15 |
+
label_file: label_list.txt
|
16 |
+
transforms:
|
17 |
+
type: Compose
|
18 |
+
ops: ~
|
19 |
+
shuffle: True
|
20 |
+
num_workers: 4
|
21 |
+
drop_last: True
|
22 |
+
collate_fn:
|
23 |
+
type: BatchImageCollateFunction
|
24 |
+
|
25 |
+
|
26 |
+
val_dataloader:
|
27 |
+
type: DataLoader
|
28 |
+
dataset:
|
29 |
+
type: VOCDetection
|
30 |
+
root: ./dataset/voc/
|
31 |
+
ann_file: test.txt
|
32 |
+
label_file: label_list.txt
|
33 |
+
transforms:
|
34 |
+
type: Compose
|
35 |
+
ops: ~
|
36 |
+
shuffle: False
|
37 |
+
num_workers: 4
|
38 |
+
drop_last: False
|
39 |
+
collate_fn:
|
40 |
+
type: BatchImageCollateFunction
|
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_l_ch.yml
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/crowdhuman_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_crowdhuman
|
10 |
+
|
11 |
+
|
12 |
+
HGNetv2:
|
13 |
+
name: 'B4'
|
14 |
+
return_idx: [1, 2, 3]
|
15 |
+
freeze_stem_only: True
|
16 |
+
freeze_at: 0
|
17 |
+
freeze_norm: True
|
18 |
+
|
19 |
+
optimizer:
|
20 |
+
type: AdamW
|
21 |
+
params:
|
22 |
+
-
|
23 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
24 |
+
lr: 0.0000125
|
25 |
+
-
|
26 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
27 |
+
weight_decay: 0.
|
28 |
+
|
29 |
+
lr: 0.00025
|
30 |
+
betas: [0.9, 0.999]
|
31 |
+
weight_decay: 0.000125
|
32 |
+
|
33 |
+
|
34 |
+
# Increase to search for the optimal ema
|
35 |
+
epochs: 140
|
36 |
+
train_dataloader:
|
37 |
+
dataset:
|
38 |
+
transforms:
|
39 |
+
policy:
|
40 |
+
epoch: 120
|
41 |
+
collate_fn:
|
42 |
+
stop_epoch: 120
|
43 |
+
ema_restart_decay: 0.9999
|
44 |
+
base_size_repeat: 4
|
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_m_ch.yml
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/crowdhuman_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_m_crowdhuman
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 4 # 5 6
|
24 |
+
eval_idx: -1 # -2 -3
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [384, 768, 1536]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.67
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
39 |
+
lr: 0.000025
|
40 |
+
weight_decay: 0.
|
41 |
+
-
|
42 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
43 |
+
weight_decay: 0.
|
44 |
+
|
45 |
+
lr: 0.00025
|
46 |
+
betas: [0.9, 0.999]
|
47 |
+
weight_decay: 0.000125
|
48 |
+
|
49 |
+
|
50 |
+
# Increase to search for the optimal ema
|
51 |
+
epochs: 220
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 200
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 200
|
59 |
+
ema_restart_decay: 0.9999
|
60 |
+
base_size_repeat: 6
|
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_n_ch.yml
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/crowdhuman_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_n_crowdhuman
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
|
23 |
+
HybridEncoder:
|
24 |
+
in_channels: [512, 1024]
|
25 |
+
feat_strides: [16, 32]
|
26 |
+
|
27 |
+
# intra
|
28 |
+
hidden_dim: 128
|
29 |
+
use_encoder_idx: [1]
|
30 |
+
dim_feedforward: 512
|
31 |
+
|
32 |
+
# cross
|
33 |
+
expansion: 0.34
|
34 |
+
depth_mult: 0.5
|
35 |
+
|
36 |
+
|
37 |
+
DFINETransformer:
|
38 |
+
feat_channels: [128, 128]
|
39 |
+
feat_strides: [16, 32]
|
40 |
+
hidden_dim: 128
|
41 |
+
dim_feedforward: 512
|
42 |
+
num_levels: 2
|
43 |
+
|
44 |
+
num_layers: 3
|
45 |
+
eval_idx: -1
|
46 |
+
|
47 |
+
num_points: [6, 6]
|
48 |
+
|
49 |
+
optimizer:
|
50 |
+
type: AdamW
|
51 |
+
params:
|
52 |
+
-
|
53 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
54 |
+
lr: 0.0004
|
55 |
+
-
|
56 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
57 |
+
lr: 0.0004
|
58 |
+
weight_decay: 0.
|
59 |
+
-
|
60 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
61 |
+
weight_decay: 0.
|
62 |
+
|
63 |
+
lr: 0.0008
|
64 |
+
betas: [0.9, 0.999]
|
65 |
+
weight_decay: 0.0001
|
66 |
+
|
67 |
+
|
68 |
+
# Increase to search for the optimal ema
|
69 |
+
epochs: 220
|
70 |
+
train_dataloader:
|
71 |
+
total_batch_size: 128
|
72 |
+
dataset:
|
73 |
+
transforms:
|
74 |
+
policy:
|
75 |
+
epoch: 200
|
76 |
+
collate_fn:
|
77 |
+
stop_epoch: 200
|
78 |
+
ema_restart_decay: 0.9999
|
79 |
+
base_size_repeat: ~
|
80 |
+
|
81 |
+
val_dataloader:
|
82 |
+
total_batch_size: 256
|
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_s_ch.yml
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/crowdhuman_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_crowdhuman
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 3 # 4 5 6
|
24 |
+
eval_idx: -1 # -2 -3 -4
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [256, 512, 1024]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.34
|
30 |
+
expansion: 0.5
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.0002
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.0002
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.0004
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.0001
|
49 |
+
|
50 |
+
|
51 |
+
# Increase to search for the optimal ema
|
52 |
+
epochs: 220
|
53 |
+
train_dataloader:
|
54 |
+
total_batch_size: 64
|
55 |
+
dataset:
|
56 |
+
transforms:
|
57 |
+
policy:
|
58 |
+
epoch: 200
|
59 |
+
collate_fn:
|
60 |
+
stop_epoch: 200
|
61 |
+
ema_restart_decay: 0.9999
|
62 |
+
base_size_repeat: 20
|
63 |
+
|
64 |
+
val_dataloader:
|
65 |
+
total_batch_size: 128
|
D-FINE/configs/dfine/crowdhuman/dfine_hgnetv2_x_ch.yml
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/crowdhuman_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_crowdhuman
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
hidden_dim: 384
|
24 |
+
dim_feedforward: 2048
|
25 |
+
|
26 |
+
DFINETransformer:
|
27 |
+
feat_channels: [384, 384, 384]
|
28 |
+
reg_scale: 8
|
29 |
+
|
30 |
+
optimizer:
|
31 |
+
type: AdamW
|
32 |
+
params:
|
33 |
+
-
|
34 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
35 |
+
lr: 0.0000025
|
36 |
+
-
|
37 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
38 |
+
weight_decay: 0.
|
39 |
+
|
40 |
+
lr: 0.00025
|
41 |
+
betas: [0.9, 0.999]
|
42 |
+
weight_decay: 0.000125
|
43 |
+
|
44 |
+
|
45 |
+
# Increase to search for the optimal ema
|
46 |
+
epochs: 140
|
47 |
+
train_dataloader:
|
48 |
+
dataset:
|
49 |
+
transforms:
|
50 |
+
policy:
|
51 |
+
epoch: 120
|
52 |
+
collate_fn:
|
53 |
+
stop_epoch: 120
|
54 |
+
ema_restart_decay: 0.9998
|
55 |
+
base_size_repeat: 3
|
D-FINE/configs/dfine/custom/dfine_hgnetv2_l_custom.yml
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/custom_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_custom
|
10 |
+
|
11 |
+
|
12 |
+
HGNetv2:
|
13 |
+
name: 'B4'
|
14 |
+
return_idx: [1, 2, 3]
|
15 |
+
freeze_stem_only: True
|
16 |
+
freeze_at: 0
|
17 |
+
freeze_norm: True
|
18 |
+
|
19 |
+
optimizer:
|
20 |
+
type: AdamW
|
21 |
+
params:
|
22 |
+
-
|
23 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
24 |
+
lr: 0.0000125
|
25 |
+
-
|
26 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
27 |
+
weight_decay: 0.
|
28 |
+
|
29 |
+
lr: 0.00025
|
30 |
+
betas: [0.9, 0.999]
|
31 |
+
weight_decay: 0.000125
|
32 |
+
|
33 |
+
|
34 |
+
# Increase to search for the optimal ema
|
35 |
+
epochs: 80 # 72 + 2n
|
36 |
+
train_dataloader:
|
37 |
+
dataset:
|
38 |
+
transforms:
|
39 |
+
policy:
|
40 |
+
epoch: 72
|
41 |
+
collate_fn:
|
42 |
+
stop_epoch: 72
|
43 |
+
ema_restart_decay: 0.9999
|
44 |
+
base_size_repeat: 4
|
D-FINE/configs/dfine/custom/dfine_hgnetv2_m_custom.yml
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/custom_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_m_custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 4 # 5 6
|
24 |
+
eval_idx: -1 # -2 -3
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [384, 768, 1536]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.67
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
39 |
+
lr: 0.000025
|
40 |
+
weight_decay: 0.
|
41 |
+
-
|
42 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
43 |
+
weight_decay: 0.
|
44 |
+
|
45 |
+
lr: 0.00025
|
46 |
+
betas: [0.9, 0.999]
|
47 |
+
weight_decay: 0.000125
|
48 |
+
|
49 |
+
|
50 |
+
# Increase to search for the optimal ema
|
51 |
+
epochs: 132 # 120 + 4n
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 120
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 120
|
59 |
+
ema_restart_decay: 0.9999
|
60 |
+
base_size_repeat: 6
|
D-FINE/configs/dfine/custom/dfine_hgnetv2_n_custom.yml
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__:
|
2 |
+
[
|
3 |
+
"../../dataset/custom_detection.yml",
|
4 |
+
"../../runtime.yml",
|
5 |
+
"../include/dataloader.yml",
|
6 |
+
"../include/optimizer.yml",
|
7 |
+
"../include/dfine_hgnetv2.yml",
|
8 |
+
]
|
9 |
+
|
10 |
+
output_dir: ../../../inference_output
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: "B0"
|
17 |
+
return_idx: [2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
in_channels: [512, 1024]
|
24 |
+
feat_strides: [16, 32]
|
25 |
+
|
26 |
+
# intra
|
27 |
+
hidden_dim: 128
|
28 |
+
use_encoder_idx: [1]
|
29 |
+
dim_feedforward: 512
|
30 |
+
|
31 |
+
# cross
|
32 |
+
expansion: 0.34
|
33 |
+
depth_mult: 0.5
|
34 |
+
|
35 |
+
DFINETransformer:
|
36 |
+
feat_channels: [128, 128]
|
37 |
+
feat_strides: [16, 32]
|
38 |
+
hidden_dim: 128
|
39 |
+
dim_feedforward: 512
|
40 |
+
num_levels: 2
|
41 |
+
|
42 |
+
num_layers: 3
|
43 |
+
eval_idx: -1
|
44 |
+
|
45 |
+
num_points: [6, 6]
|
46 |
+
|
47 |
+
optimizer:
|
48 |
+
type: AdamW
|
49 |
+
params:
|
50 |
+
- params: "^(?=.*backbone)(?!.*norm|bn).*$"
|
51 |
+
lr: 0.0004
|
52 |
+
- params: "^(?=.*backbone)(?=.*norm|bn).*$"
|
53 |
+
lr: 0.0004
|
54 |
+
weight_decay: 0.
|
55 |
+
- params: "^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$"
|
56 |
+
weight_decay: 0.
|
57 |
+
|
58 |
+
lr: 0.0008
|
59 |
+
betas: [0.9, 0.999]
|
60 |
+
weight_decay: 0.0001
|
61 |
+
|
62 |
+
# Increase to search for the optimal ema
|
63 |
+
epochs: 135
|
64 |
+
train_dataloader:
|
65 |
+
total_batch_size: 32
|
66 |
+
dataset:
|
67 |
+
transforms:
|
68 |
+
policy:
|
69 |
+
epoch: 123
|
70 |
+
collate_fn:
|
71 |
+
stop_epoch: 123
|
72 |
+
ema_restart_decay: 0.9999
|
73 |
+
base_size_repeat: ~
|
74 |
+
|
75 |
+
val_dataloader:
|
76 |
+
total_batch_size: 32
|
D-FINE/configs/dfine/custom/dfine_hgnetv2_s_custom.yml
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/custom_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 3 # 4 5 6
|
24 |
+
eval_idx: -1 # -2 -3 -4
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [256, 512, 1024]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.34
|
30 |
+
expansion: 0.5
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.0002
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.0002
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.0004
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.0001
|
49 |
+
|
50 |
+
|
51 |
+
# Increase to search for the optimal ema
|
52 |
+
epochs: 220
|
53 |
+
train_dataloader:
|
54 |
+
total_batch_size: 64
|
55 |
+
dataset:
|
56 |
+
transforms:
|
57 |
+
policy:
|
58 |
+
epoch: 200
|
59 |
+
collate_fn:
|
60 |
+
stop_epoch: 200
|
61 |
+
ema_restart_decay: 0.9999
|
62 |
+
base_size_repeat: 20
|
63 |
+
|
64 |
+
val_dataloader:
|
65 |
+
total_batch_size: 128
|
D-FINE/configs/dfine/custom/dfine_hgnetv2_x_custom.yml
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/custom_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
hidden_dim: 384
|
24 |
+
dim_feedforward: 2048
|
25 |
+
|
26 |
+
DFINETransformer:
|
27 |
+
feat_channels: [384, 384, 384]
|
28 |
+
reg_scale: 8
|
29 |
+
|
30 |
+
optimizer:
|
31 |
+
type: AdamW
|
32 |
+
params:
|
33 |
+
-
|
34 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
35 |
+
lr: 0.0000025
|
36 |
+
-
|
37 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
38 |
+
weight_decay: 0.
|
39 |
+
|
40 |
+
lr: 0.00025
|
41 |
+
betas: [0.9, 0.999]
|
42 |
+
weight_decay: 0.000125
|
43 |
+
|
44 |
+
|
45 |
+
# Increase to search for the optimal ema
|
46 |
+
epochs: 80 # 72 + 2n
|
47 |
+
train_dataloader:
|
48 |
+
dataset:
|
49 |
+
transforms:
|
50 |
+
policy:
|
51 |
+
epoch: 72
|
52 |
+
collate_fn:
|
53 |
+
stop_epoch: 72
|
54 |
+
ema_restart_decay: 0.9998
|
55 |
+
base_size_repeat: 3
|
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_l_obj2custom.yml
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../../dataset/custom_detection.yml',
|
3 |
+
'../../../runtime.yml',
|
4 |
+
'../../include/dataloader.yml',
|
5 |
+
'../../include/optimizer.yml',
|
6 |
+
'../../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_obj2custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B4'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
pretrained: False
|
22 |
+
|
23 |
+
optimizer:
|
24 |
+
type: AdamW
|
25 |
+
params:
|
26 |
+
-
|
27 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
28 |
+
lr: 0.0000125
|
29 |
+
-
|
30 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
31 |
+
weight_decay: 0.
|
32 |
+
|
33 |
+
lr: 0.00025
|
34 |
+
betas: [0.9, 0.999]
|
35 |
+
weight_decay: 0.000125
|
36 |
+
|
37 |
+
|
38 |
+
epochs: 36 # Early stop
|
39 |
+
train_dataloader:
|
40 |
+
dataset:
|
41 |
+
transforms:
|
42 |
+
policy:
|
43 |
+
epoch: 30
|
44 |
+
collate_fn:
|
45 |
+
stop_epoch: 30
|
46 |
+
ema_restart_decay: 0.9999
|
47 |
+
base_size_repeat: 4
|
48 |
+
|
49 |
+
ema:
|
50 |
+
warmups: 0
|
51 |
+
|
52 |
+
lr_warmup_scheduler:
|
53 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_m_obj2custom.yml
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../../dataset/custom_detection.yml',
|
3 |
+
'../../../runtime.yml',
|
4 |
+
'../../include/dataloader.yml',
|
5 |
+
'../../include/optimizer.yml',
|
6 |
+
'../../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_m_obj2custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
pretrained: False
|
22 |
+
|
23 |
+
DFINETransformer:
|
24 |
+
num_layers: 4 # 5 6
|
25 |
+
eval_idx: -1 # -2 -3
|
26 |
+
|
27 |
+
HybridEncoder:
|
28 |
+
in_channels: [384, 768, 1536]
|
29 |
+
hidden_dim: 256
|
30 |
+
depth_mult: 0.67
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.000025
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.000025
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.00025
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.000125
|
49 |
+
|
50 |
+
|
51 |
+
epochs: 56 # Early stop
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 48
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 48
|
59 |
+
ema_restart_decay: 0.9999
|
60 |
+
base_size_repeat: 6
|
61 |
+
|
62 |
+
ema:
|
63 |
+
warmups: 0
|
64 |
+
|
65 |
+
lr_warmup_scheduler:
|
66 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_s_obj2custom.yml
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../../dataset/custom_detection.yml',
|
3 |
+
'../../../runtime.yml',
|
4 |
+
'../../include/dataloader.yml',
|
5 |
+
'../../include/optimizer.yml',
|
6 |
+
'../../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_obj2custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
pretrained: False
|
22 |
+
|
23 |
+
DFINETransformer:
|
24 |
+
num_layers: 3 # 4 5 6
|
25 |
+
eval_idx: -1 # -2 -3 -4
|
26 |
+
|
27 |
+
HybridEncoder:
|
28 |
+
in_channels: [256, 512, 1024]
|
29 |
+
hidden_dim: 256
|
30 |
+
depth_mult: 0.34
|
31 |
+
expansion: 0.5
|
32 |
+
|
33 |
+
optimizer:
|
34 |
+
type: AdamW
|
35 |
+
params:
|
36 |
+
-
|
37 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
38 |
+
lr: 0.000125
|
39 |
+
-
|
40 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
41 |
+
lr: 0.000125
|
42 |
+
weight_decay: 0.
|
43 |
+
-
|
44 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
45 |
+
weight_decay: 0.
|
46 |
+
|
47 |
+
lr: 0.00025
|
48 |
+
betas: [0.9, 0.999]
|
49 |
+
weight_decay: 0.000125
|
50 |
+
|
51 |
+
|
52 |
+
epochs: 64 # Early stop
|
53 |
+
train_dataloader:
|
54 |
+
dataset:
|
55 |
+
transforms:
|
56 |
+
policy:
|
57 |
+
epoch: 56
|
58 |
+
collate_fn:
|
59 |
+
stop_epoch: 56
|
60 |
+
ema_restart_decay: 0.9999
|
61 |
+
base_size_repeat: 10
|
62 |
+
|
63 |
+
ema:
|
64 |
+
warmups: 0
|
65 |
+
|
66 |
+
lr_warmup_scheduler:
|
67 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/custom/objects365/dfine_hgnetv2_x_obj2custom.yml
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../../dataset/custom_detection.yml',
|
3 |
+
'../../../runtime.yml',
|
4 |
+
'../../include/dataloader.yml',
|
5 |
+
'../../include/optimizer.yml',
|
6 |
+
'../../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_obj2custom
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
pretrained: False
|
22 |
+
|
23 |
+
HybridEncoder:
|
24 |
+
# intra
|
25 |
+
hidden_dim: 384
|
26 |
+
dim_feedforward: 2048
|
27 |
+
|
28 |
+
DFINETransformer:
|
29 |
+
feat_channels: [384, 384, 384]
|
30 |
+
reg_scale: 8
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.0000025
|
38 |
+
-
|
39 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
40 |
+
weight_decay: 0.
|
41 |
+
|
42 |
+
lr: 0.00025
|
43 |
+
betas: [0.9, 0.999]
|
44 |
+
weight_decay: 0.000125
|
45 |
+
|
46 |
+
|
47 |
+
epochs: 36 # Early stop
|
48 |
+
train_dataloader:
|
49 |
+
dataset:
|
50 |
+
transforms:
|
51 |
+
policy:
|
52 |
+
epoch: 30
|
53 |
+
collate_fn:
|
54 |
+
stop_epoch: 30
|
55 |
+
ema_restart_decay: 0.9999
|
56 |
+
base_size_repeat: 3
|
57 |
+
|
58 |
+
ema:
|
59 |
+
warmups: 0
|
60 |
+
|
61 |
+
lr_warmup_scheduler:
|
62 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/dfine_hgnetv2_l_coco.yml
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../dataset/coco_detection.yml',
|
3 |
+
'../runtime.yml',
|
4 |
+
'./include/dataloader.yml',
|
5 |
+
'./include/optimizer.yml',
|
6 |
+
'./include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_coco
|
10 |
+
|
11 |
+
|
12 |
+
HGNetv2:
|
13 |
+
name: 'B4'
|
14 |
+
return_idx: [1, 2, 3]
|
15 |
+
freeze_stem_only: True
|
16 |
+
freeze_at: 0
|
17 |
+
freeze_norm: True
|
18 |
+
|
19 |
+
optimizer:
|
20 |
+
type: AdamW
|
21 |
+
params:
|
22 |
+
-
|
23 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
24 |
+
lr: 0.0000125
|
25 |
+
-
|
26 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
27 |
+
weight_decay: 0.
|
28 |
+
|
29 |
+
lr: 0.00025
|
30 |
+
betas: [0.9, 0.999]
|
31 |
+
weight_decay: 0.000125
|
32 |
+
|
33 |
+
|
34 |
+
# Increase to search for the optimal ema
|
35 |
+
epochs: 80 # 72 + 2n
|
36 |
+
train_dataloader:
|
37 |
+
dataset:
|
38 |
+
transforms:
|
39 |
+
policy:
|
40 |
+
epoch: 72
|
41 |
+
collate_fn:
|
42 |
+
stop_epoch: 72
|
43 |
+
ema_restart_decay: 0.9999
|
44 |
+
base_size_repeat: 4
|
D-FINE/configs/dfine/dfine_hgnetv2_m_coco.yml
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../dataset/coco_detection.yml',
|
3 |
+
'../runtime.yml',
|
4 |
+
'./include/dataloader.yml',
|
5 |
+
'./include/optimizer.yml',
|
6 |
+
'./include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_m_coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 4 # 5 6
|
24 |
+
eval_idx: -1 # -2 -3
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [384, 768, 1536]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.67
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.00002
|
37 |
+
-
|
38 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
39 |
+
lr: 0.00002
|
40 |
+
weight_decay: 0.
|
41 |
+
-
|
42 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
43 |
+
weight_decay: 0.
|
44 |
+
|
45 |
+
lr: 0.0002
|
46 |
+
betas: [0.9, 0.999]
|
47 |
+
weight_decay: 0.0001
|
48 |
+
|
49 |
+
|
50 |
+
# Increase to search for the optimal ema
|
51 |
+
epochs: 132 # 120 + 4n
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 120
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 120
|
59 |
+
ema_restart_decay: 0.9999
|
60 |
+
base_size_repeat: 6
|
D-FINE/configs/dfine/dfine_hgnetv2_n_coco.yml
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../dataset/coco_detection.yml',
|
3 |
+
'../runtime.yml',
|
4 |
+
'./include/dataloader.yml',
|
5 |
+
'./include/optimizer.yml',
|
6 |
+
'./include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_n_coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
|
23 |
+
HybridEncoder:
|
24 |
+
in_channels: [512, 1024]
|
25 |
+
feat_strides: [16, 32]
|
26 |
+
|
27 |
+
# intra
|
28 |
+
hidden_dim: 128
|
29 |
+
use_encoder_idx: [1]
|
30 |
+
dim_feedforward: 512
|
31 |
+
|
32 |
+
# cross
|
33 |
+
expansion: 0.34
|
34 |
+
depth_mult: 0.5
|
35 |
+
|
36 |
+
|
37 |
+
DFINETransformer:
|
38 |
+
feat_channels: [128, 128]
|
39 |
+
feat_strides: [16, 32]
|
40 |
+
hidden_dim: 128
|
41 |
+
dim_feedforward: 512
|
42 |
+
num_levels: 2
|
43 |
+
|
44 |
+
num_layers: 3
|
45 |
+
eval_idx: -1
|
46 |
+
|
47 |
+
num_points: [6, 6]
|
48 |
+
|
49 |
+
optimizer:
|
50 |
+
type: AdamW
|
51 |
+
params:
|
52 |
+
-
|
53 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
54 |
+
lr: 0.0004
|
55 |
+
-
|
56 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
57 |
+
lr: 0.0004
|
58 |
+
weight_decay: 0.
|
59 |
+
-
|
60 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
61 |
+
weight_decay: 0.
|
62 |
+
|
63 |
+
lr: 0.0008
|
64 |
+
betas: [0.9, 0.999]
|
65 |
+
weight_decay: 0.0001
|
66 |
+
|
67 |
+
|
68 |
+
# Increase to search for the optimal ema
|
69 |
+
epochs: 160 # 148 + 4n
|
70 |
+
train_dataloader:
|
71 |
+
total_batch_size: 128
|
72 |
+
dataset:
|
73 |
+
transforms:
|
74 |
+
policy:
|
75 |
+
epoch: 148
|
76 |
+
collate_fn:
|
77 |
+
stop_epoch: 148
|
78 |
+
ema_restart_decay: 0.9999
|
79 |
+
base_size_repeat: ~
|
80 |
+
|
81 |
+
val_dataloader:
|
82 |
+
total_batch_size: 256
|
D-FINE/configs/dfine/dfine_hgnetv2_s_coco.yml
ADDED
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../dataset/coco_detection.yml',
|
3 |
+
'../runtime.yml',
|
4 |
+
'./include/dataloader.yml',
|
5 |
+
'./include/optimizer.yml',
|
6 |
+
'./include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 3 # 4 5 6
|
24 |
+
eval_idx: -1 # -2 -3 -4
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [256, 512, 1024]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.34
|
30 |
+
expansion: 0.5
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.0001
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.0001
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.0002
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.0001
|
49 |
+
|
50 |
+
|
51 |
+
# Increase to search for the optimal ema
|
52 |
+
epochs: 132 # 120 + 4n
|
53 |
+
train_dataloader:
|
54 |
+
dataset:
|
55 |
+
transforms:
|
56 |
+
policy:
|
57 |
+
epoch: 120
|
58 |
+
collate_fn:
|
59 |
+
stop_epoch: 120
|
60 |
+
ema_restart_decay: 0.9999
|
61 |
+
base_size_repeat: 20
|
D-FINE/configs/dfine/dfine_hgnetv2_x_coco.yml
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../dataset/coco_detection.yml',
|
3 |
+
'../runtime.yml',
|
4 |
+
'./include/dataloader.yml',
|
5 |
+
'./include/optimizer.yml',
|
6 |
+
'./include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
# intra
|
24 |
+
hidden_dim: 384
|
25 |
+
dim_feedforward: 2048
|
26 |
+
|
27 |
+
DFINETransformer:
|
28 |
+
feat_channels: [384, 384, 384]
|
29 |
+
reg_scale: 8
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.0000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
39 |
+
weight_decay: 0.
|
40 |
+
|
41 |
+
lr: 0.00025
|
42 |
+
betas: [0.9, 0.999]
|
43 |
+
weight_decay: 0.000125
|
44 |
+
|
45 |
+
|
46 |
+
# Increase to search for the optimal ema
|
47 |
+
epochs: 80 # 72 + 2n
|
48 |
+
train_dataloader:
|
49 |
+
dataset:
|
50 |
+
transforms:
|
51 |
+
policy:
|
52 |
+
epoch: 72
|
53 |
+
collate_fn:
|
54 |
+
stop_epoch: 72
|
55 |
+
ema_restart_decay: 0.9998
|
56 |
+
base_size_repeat: 3
|
D-FINE/configs/dfine/include/dataloader.yml
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
train_dataloader:
|
3 |
+
dataset:
|
4 |
+
transforms:
|
5 |
+
ops:
|
6 |
+
- {type: RandomPhotometricDistort, p: 0.5}
|
7 |
+
- {type: RandomZoomOut, fill: 0}
|
8 |
+
- {type: RandomIoUCrop, p: 0.8}
|
9 |
+
- {type: SanitizeBoundingBoxes, min_size: 1}
|
10 |
+
- {type: RandomHorizontalFlip}
|
11 |
+
- {type: Resize, size: [640, 640], }
|
12 |
+
- {type: SanitizeBoundingBoxes, min_size: 1}
|
13 |
+
- {type: ConvertPILImage, dtype: 'float32', scale: True}
|
14 |
+
- {type: ConvertBoxes, fmt: 'cxcywh', normalize: True}
|
15 |
+
policy:
|
16 |
+
name: stop_epoch
|
17 |
+
epoch: 72 # epoch in [71, ~) stop `ops`
|
18 |
+
ops: ['RandomPhotometricDistort', 'RandomZoomOut', 'RandomIoUCrop']
|
19 |
+
|
20 |
+
collate_fn:
|
21 |
+
type: BatchImageCollateFunction
|
22 |
+
base_size: 640
|
23 |
+
base_size_repeat: 3
|
24 |
+
stop_epoch: 72 # epoch in [72, ~) stop `multiscales`
|
25 |
+
|
26 |
+
shuffle: True
|
27 |
+
total_batch_size: 32 # total batch size equals to 32 (4 * 8)
|
28 |
+
num_workers: 4
|
29 |
+
|
30 |
+
|
31 |
+
val_dataloader:
|
32 |
+
dataset:
|
33 |
+
transforms:
|
34 |
+
ops:
|
35 |
+
- {type: Resize, size: [640, 640], }
|
36 |
+
- {type: ConvertPILImage, dtype: 'float32', scale: True}
|
37 |
+
shuffle: False
|
38 |
+
total_batch_size: 64
|
39 |
+
num_workers: 4
|
D-FINE/configs/dfine/include/dfine_hgnetv2.yml
ADDED
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
task: detection
|
2 |
+
|
3 |
+
model: DFINE
|
4 |
+
criterion: DFINECriterion
|
5 |
+
postprocessor: DFINEPostProcessor
|
6 |
+
|
7 |
+
use_focal_loss: True
|
8 |
+
eval_spatial_size: [640, 640] # h w
|
9 |
+
|
10 |
+
DFINE:
|
11 |
+
backbone: HGNetv2
|
12 |
+
encoder: HybridEncoder
|
13 |
+
decoder: DFINETransformer
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
pretrained: True
|
17 |
+
local_model_dir: weight/hgnetv2/
|
18 |
+
|
19 |
+
HybridEncoder:
|
20 |
+
in_channels: [512, 1024, 2048]
|
21 |
+
feat_strides: [8, 16, 32]
|
22 |
+
|
23 |
+
# intra
|
24 |
+
hidden_dim: 256
|
25 |
+
use_encoder_idx: [2]
|
26 |
+
num_encoder_layers: 1
|
27 |
+
nhead: 8
|
28 |
+
dim_feedforward: 1024
|
29 |
+
dropout: 0.
|
30 |
+
enc_act: 'gelu'
|
31 |
+
|
32 |
+
# cross
|
33 |
+
expansion: 1.0
|
34 |
+
depth_mult: 1
|
35 |
+
act: 'silu'
|
36 |
+
|
37 |
+
|
38 |
+
DFINETransformer:
|
39 |
+
feat_channels: [256, 256, 256]
|
40 |
+
feat_strides: [8, 16, 32]
|
41 |
+
hidden_dim: 256
|
42 |
+
num_levels: 3
|
43 |
+
|
44 |
+
num_layers: 6
|
45 |
+
eval_idx: -1
|
46 |
+
num_queries: 300
|
47 |
+
|
48 |
+
num_denoising: 100
|
49 |
+
label_noise_ratio: 0.5
|
50 |
+
box_noise_scale: 1.0
|
51 |
+
|
52 |
+
# NEW
|
53 |
+
reg_max: 32
|
54 |
+
reg_scale: 4
|
55 |
+
|
56 |
+
# Auxiliary decoder layers dimension scaling
|
57 |
+
# "eg. If num_layers: 6 eval_idx: -4,
|
58 |
+
# then layer 3, 4, 5 are auxiliary decoder layers."
|
59 |
+
layer_scale: 1 # 2
|
60 |
+
|
61 |
+
|
62 |
+
num_points: [3, 6, 3] # [4, 4, 4] [3, 6, 3]
|
63 |
+
cross_attn_method: default # default, discrete
|
64 |
+
query_select_method: default # default, agnostic
|
65 |
+
|
66 |
+
|
67 |
+
DFINEPostProcessor:
|
68 |
+
num_top_queries: 300
|
69 |
+
|
70 |
+
|
71 |
+
DFINECriterion:
|
72 |
+
weight_dict: {loss_vfl: 1, loss_bbox: 5, loss_giou: 2, loss_fgl: 0.15, loss_ddf: 1.5}
|
73 |
+
losses: ['vfl', 'boxes', 'local']
|
74 |
+
alpha: 0.75
|
75 |
+
gamma: 2.0
|
76 |
+
reg_max: 32
|
77 |
+
|
78 |
+
matcher:
|
79 |
+
type: HungarianMatcher
|
80 |
+
weight_dict: {cost_class: 2, cost_bbox: 5, cost_giou: 2}
|
81 |
+
alpha: 0.25
|
82 |
+
gamma: 2.0
|
D-FINE/configs/dfine/include/optimizer.yml
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
use_amp: True
|
2 |
+
use_ema: True
|
3 |
+
ema:
|
4 |
+
type: ModelEMA
|
5 |
+
decay: 0.9999
|
6 |
+
warmups: 1000
|
7 |
+
start: 0
|
8 |
+
|
9 |
+
|
10 |
+
epochs: 72
|
11 |
+
clip_max_norm: 0.1
|
12 |
+
|
13 |
+
|
14 |
+
optimizer:
|
15 |
+
type: AdamW
|
16 |
+
params:
|
17 |
+
-
|
18 |
+
params: '^(?=.*backbone)(?!.*norm).*$'
|
19 |
+
lr: 0.0000125
|
20 |
+
-
|
21 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
22 |
+
weight_decay: 0.
|
23 |
+
|
24 |
+
lr: 0.00025
|
25 |
+
betas: [0.9, 0.999]
|
26 |
+
weight_decay: 0.000125
|
27 |
+
|
28 |
+
|
29 |
+
lr_scheduler:
|
30 |
+
type: MultiStepLR
|
31 |
+
milestones: [500]
|
32 |
+
gamma: 0.1
|
33 |
+
|
34 |
+
lr_warmup_scheduler:
|
35 |
+
type: LinearWarmup
|
36 |
+
warmup_duration: 500
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj2coco.yml
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/coco_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_obj2coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B4'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
optimizer:
|
23 |
+
type: AdamW
|
24 |
+
params:
|
25 |
+
-
|
26 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
27 |
+
lr: 0.0000125
|
28 |
+
-
|
29 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
30 |
+
weight_decay: 0.
|
31 |
+
|
32 |
+
lr: 0.00025
|
33 |
+
betas: [0.9, 0.999]
|
34 |
+
weight_decay: 0.000125
|
35 |
+
|
36 |
+
|
37 |
+
epochs: 36 # Early stop
|
38 |
+
train_dataloader:
|
39 |
+
dataset:
|
40 |
+
transforms:
|
41 |
+
policy:
|
42 |
+
epoch: 30
|
43 |
+
collate_fn:
|
44 |
+
stop_epoch: 30
|
45 |
+
ema_restart_decay: 0.9999
|
46 |
+
base_size_repeat: 4
|
47 |
+
|
48 |
+
ema:
|
49 |
+
warmups: 0
|
50 |
+
|
51 |
+
lr_warmup_scheduler:
|
52 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_l_obj365.yml
ADDED
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/obj365_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_l_obj365
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B4'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
optimizer:
|
23 |
+
type: AdamW
|
24 |
+
params:
|
25 |
+
-
|
26 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
27 |
+
lr: 0.0000125
|
28 |
+
-
|
29 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
30 |
+
weight_decay: 0.
|
31 |
+
|
32 |
+
lr: 0.00025
|
33 |
+
betas: [0.9, 0.999]
|
34 |
+
weight_decay: 0.000125
|
35 |
+
# weight_decay: 0.00005 # Faster convergence (optional)
|
36 |
+
|
37 |
+
|
38 |
+
epochs: 24 # Early stop
|
39 |
+
train_dataloader:
|
40 |
+
dataset:
|
41 |
+
transforms:
|
42 |
+
policy:
|
43 |
+
epoch: 500
|
44 |
+
collate_fn:
|
45 |
+
stop_epoch: 500
|
46 |
+
base_size_repeat: 4
|
47 |
+
|
48 |
+
checkpoint_freq: 1
|
49 |
+
print_freq: 1000
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj2coco.yml
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/coco_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_m_obj2coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 4 # 5 6
|
24 |
+
eval_idx: -1 # -2 -3
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [384, 768, 1536]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.67
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
39 |
+
lr: 0.000025
|
40 |
+
weight_decay: 0.
|
41 |
+
-
|
42 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
43 |
+
weight_decay: 0.
|
44 |
+
|
45 |
+
lr: 0.00025
|
46 |
+
betas: [0.9, 0.999]
|
47 |
+
weight_decay: 0.000125
|
48 |
+
|
49 |
+
|
50 |
+
epochs: 56 # Early stop
|
51 |
+
train_dataloader:
|
52 |
+
dataset:
|
53 |
+
transforms:
|
54 |
+
policy:
|
55 |
+
epoch: 48
|
56 |
+
collate_fn:
|
57 |
+
stop_epoch: 48
|
58 |
+
ema_restart_decay: 0.9999
|
59 |
+
base_size_repeat: 6
|
60 |
+
|
61 |
+
ema:
|
62 |
+
warmups: 0
|
63 |
+
|
64 |
+
lr_warmup_scheduler:
|
65 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_m_obj365.yml
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/obj365_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: .output/dfine_hgnetv2_s_obj365
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B2'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 4 # 5 6
|
24 |
+
eval_idx: -1 # -2 -3
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [384, 768, 1536]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.67
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
39 |
+
lr: 0.000025
|
40 |
+
weight_decay: 0.
|
41 |
+
-
|
42 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
43 |
+
weight_decay: 0.
|
44 |
+
|
45 |
+
lr: 0.00025
|
46 |
+
betas: [0.9, 0.999]
|
47 |
+
weight_decay: 0.000125
|
48 |
+
# weight_decay: 0.00005 # Faster convergence (optional)
|
49 |
+
|
50 |
+
|
51 |
+
epochs: 36 # Early stop
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 500
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 500
|
59 |
+
base_size_repeat: 6
|
60 |
+
|
61 |
+
checkpoint_freq: 1
|
62 |
+
print_freq: 1000
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj2coco.yml
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/coco_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_n_obj2coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
|
23 |
+
HybridEncoder:
|
24 |
+
in_channels: [512, 1024]
|
25 |
+
feat_strides: [16, 32]
|
26 |
+
|
27 |
+
# intra
|
28 |
+
hidden_dim: 128
|
29 |
+
use_encoder_idx: [1]
|
30 |
+
dim_feedforward: 512
|
31 |
+
|
32 |
+
# cross
|
33 |
+
expansion: 0.34
|
34 |
+
depth_mult: 0.5
|
35 |
+
|
36 |
+
|
37 |
+
DFINETransformer:
|
38 |
+
feat_channels: [128, 128]
|
39 |
+
feat_strides: [16, 32]
|
40 |
+
hidden_dim: 128
|
41 |
+
dim_feedforward: 512
|
42 |
+
num_levels: 2
|
43 |
+
|
44 |
+
num_layers: 3
|
45 |
+
eval_idx: -1
|
46 |
+
|
47 |
+
num_points: [6, 6]
|
48 |
+
|
49 |
+
optimizer:
|
50 |
+
type: AdamW
|
51 |
+
params:
|
52 |
+
-
|
53 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
54 |
+
lr: 0.0004
|
55 |
+
-
|
56 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
57 |
+
lr: 0.0004
|
58 |
+
weight_decay: 0.
|
59 |
+
-
|
60 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
61 |
+
weight_decay: 0.
|
62 |
+
|
63 |
+
lr: 0.0008
|
64 |
+
betas: [0.9, 0.999]
|
65 |
+
weight_decay: 0.0001
|
66 |
+
|
67 |
+
|
68 |
+
|
69 |
+
epochs: 64 # Early stop
|
70 |
+
train_dataloader:
|
71 |
+
total_batch_size: 128
|
72 |
+
dataset:
|
73 |
+
transforms:
|
74 |
+
policy:
|
75 |
+
epoch: 56
|
76 |
+
collate_fn:
|
77 |
+
stop_epoch: 56
|
78 |
+
ema_restart_decay: 0.9999
|
79 |
+
base_size_repeat: ~
|
80 |
+
|
81 |
+
ema:
|
82 |
+
warmups: 0
|
83 |
+
|
84 |
+
lr_warmup_scheduler:
|
85 |
+
warmup_duration: 0
|
86 |
+
|
87 |
+
val_dataloader:
|
88 |
+
total_batch_size: 256
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_n_obj365.yml
ADDED
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/obj365_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_n_obj365
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
|
23 |
+
HybridEncoder:
|
24 |
+
in_channels: [512, 1024]
|
25 |
+
feat_strides: [16, 32]
|
26 |
+
|
27 |
+
# intra
|
28 |
+
hidden_dim: 128
|
29 |
+
use_encoder_idx: [1]
|
30 |
+
dim_feedforward: 512
|
31 |
+
|
32 |
+
# cross
|
33 |
+
expansion: 0.34
|
34 |
+
depth_mult: 0.5
|
35 |
+
|
36 |
+
|
37 |
+
DFINETransformer:
|
38 |
+
feat_channels: [128, 128]
|
39 |
+
feat_strides: [16, 32]
|
40 |
+
hidden_dim: 128
|
41 |
+
dim_feedforward: 512
|
42 |
+
num_levels: 2
|
43 |
+
|
44 |
+
num_layers: 3
|
45 |
+
eval_idx: -1
|
46 |
+
|
47 |
+
num_points: [6, 6]
|
48 |
+
|
49 |
+
optimizer:
|
50 |
+
type: AdamW
|
51 |
+
params:
|
52 |
+
-
|
53 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
54 |
+
lr: 0.0004
|
55 |
+
-
|
56 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
57 |
+
lr: 0.0004
|
58 |
+
weight_decay: 0.
|
59 |
+
-
|
60 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
61 |
+
weight_decay: 0.
|
62 |
+
|
63 |
+
lr: 0.0008
|
64 |
+
betas: [0.9, 0.999]
|
65 |
+
weight_decay: 0.0001
|
66 |
+
|
67 |
+
|
68 |
+
|
69 |
+
epochs: 48 # Early stop
|
70 |
+
train_dataloader:
|
71 |
+
total_batch_size: 128
|
72 |
+
dataset:
|
73 |
+
transforms:
|
74 |
+
policy:
|
75 |
+
epoch: 500
|
76 |
+
collate_fn:
|
77 |
+
stop_epoch: 500
|
78 |
+
base_size_repeat: ~
|
79 |
+
|
80 |
+
checkpoint_freq: 1
|
81 |
+
print_freq: 500
|
82 |
+
|
83 |
+
val_dataloader:
|
84 |
+
total_batch_size: 256
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj2coco.yml
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/coco_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_obj2coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 3 # 4 5 6
|
24 |
+
eval_idx: -1 # -2 -3 -4
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [256, 512, 1024]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.34
|
30 |
+
expansion: 0.5
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.000125
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.000125
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.00025
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.000125
|
49 |
+
|
50 |
+
|
51 |
+
epochs: 64 # Early stop
|
52 |
+
train_dataloader:
|
53 |
+
dataset:
|
54 |
+
transforms:
|
55 |
+
policy:
|
56 |
+
epoch: 56
|
57 |
+
collate_fn:
|
58 |
+
stop_epoch: 56
|
59 |
+
ema_restart_decay: 0.9999
|
60 |
+
base_size_repeat: 10
|
61 |
+
|
62 |
+
ema:
|
63 |
+
warmups: 0
|
64 |
+
|
65 |
+
lr_warmup_scheduler:
|
66 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_s_obj365.yml
ADDED
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/obj365_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_s_obj365
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B0'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_at: -1
|
19 |
+
freeze_norm: False
|
20 |
+
use_lab: True
|
21 |
+
|
22 |
+
DFINETransformer:
|
23 |
+
num_layers: 3 # 4 5 6
|
24 |
+
eval_idx: -1 # -2 -3 -4
|
25 |
+
|
26 |
+
HybridEncoder:
|
27 |
+
in_channels: [256, 512, 1024]
|
28 |
+
hidden_dim: 256
|
29 |
+
depth_mult: 0.34
|
30 |
+
expansion: 0.5
|
31 |
+
|
32 |
+
optimizer:
|
33 |
+
type: AdamW
|
34 |
+
params:
|
35 |
+
-
|
36 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
37 |
+
lr: 0.000125
|
38 |
+
-
|
39 |
+
params: '^(?=.*backbone)(?=.*norm|bn).*$'
|
40 |
+
lr: 0.000125
|
41 |
+
weight_decay: 0.
|
42 |
+
-
|
43 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn|bias)).*$'
|
44 |
+
weight_decay: 0.
|
45 |
+
|
46 |
+
lr: 0.00025
|
47 |
+
betas: [0.9, 0.999]
|
48 |
+
weight_decay: 0.000125
|
49 |
+
# weight_decay: 0.00005 # Faster convergence (optional)
|
50 |
+
|
51 |
+
|
52 |
+
epochs: 36 # Early stop
|
53 |
+
train_dataloader:
|
54 |
+
dataset:
|
55 |
+
transforms:
|
56 |
+
policy:
|
57 |
+
epoch: 500
|
58 |
+
collate_fn:
|
59 |
+
stop_epoch: 500
|
60 |
+
base_size_repeat: 20
|
61 |
+
|
62 |
+
checkpoint_freq: 1
|
63 |
+
print_freq: 1000
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj2coco.yml
ADDED
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/coco_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_obj2coco
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
# intra
|
24 |
+
hidden_dim: 384
|
25 |
+
dim_feedforward: 2048
|
26 |
+
|
27 |
+
DFINETransformer:
|
28 |
+
feat_channels: [384, 384, 384]
|
29 |
+
reg_scale: 8
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.0000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
39 |
+
weight_decay: 0.
|
40 |
+
|
41 |
+
lr: 0.00025
|
42 |
+
betas: [0.9, 0.999]
|
43 |
+
weight_decay: 0.000125
|
44 |
+
|
45 |
+
|
46 |
+
epochs: 36 # Early stop
|
47 |
+
train_dataloader:
|
48 |
+
dataset:
|
49 |
+
transforms:
|
50 |
+
policy:
|
51 |
+
epoch: 30
|
52 |
+
collate_fn:
|
53 |
+
stop_epoch: 30
|
54 |
+
ema_restart_decay: 0.9999
|
55 |
+
base_size_repeat: 3
|
56 |
+
|
57 |
+
ema:
|
58 |
+
warmups: 0
|
59 |
+
|
60 |
+
lr_warmup_scheduler:
|
61 |
+
warmup_duration: 0
|
D-FINE/configs/dfine/objects365/dfine_hgnetv2_x_obj365.yml
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__include__: [
|
2 |
+
'../../dataset/obj365_detection.yml',
|
3 |
+
'../../runtime.yml',
|
4 |
+
'../include/dataloader.yml',
|
5 |
+
'../include/optimizer.yml',
|
6 |
+
'../include/dfine_hgnetv2.yml',
|
7 |
+
]
|
8 |
+
|
9 |
+
output_dir: ./output/dfine_hgnetv2_x_obj365
|
10 |
+
|
11 |
+
|
12 |
+
DFINE:
|
13 |
+
backbone: HGNetv2
|
14 |
+
|
15 |
+
HGNetv2:
|
16 |
+
name: 'B5'
|
17 |
+
return_idx: [1, 2, 3]
|
18 |
+
freeze_stem_only: True
|
19 |
+
freeze_at: 0
|
20 |
+
freeze_norm: True
|
21 |
+
|
22 |
+
HybridEncoder:
|
23 |
+
# intra
|
24 |
+
hidden_dim: 384
|
25 |
+
dim_feedforward: 2048
|
26 |
+
|
27 |
+
DFINETransformer:
|
28 |
+
feat_channels: [384, 384, 384]
|
29 |
+
reg_scale: 8
|
30 |
+
|
31 |
+
optimizer:
|
32 |
+
type: AdamW
|
33 |
+
params:
|
34 |
+
-
|
35 |
+
params: '^(?=.*backbone)(?!.*norm|bn).*$'
|
36 |
+
lr: 0.0000025
|
37 |
+
-
|
38 |
+
params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bn)).*$'
|
39 |
+
weight_decay: 0.
|
40 |
+
|
41 |
+
lr: 0.00025
|
42 |
+
betas: [0.9, 0.999]
|
43 |
+
weight_decay: 0.000125
|
44 |
+
# weight_decay: 0.00005 # Faster convergence (optional)
|
45 |
+
|
46 |
+
|
47 |
+
epochs: 24 # Early stop
|
48 |
+
train_dataloader:
|
49 |
+
dataset:
|
50 |
+
transforms:
|
51 |
+
policy:
|
52 |
+
epoch: 500
|
53 |
+
collate_fn:
|
54 |
+
stop_epoch: 500
|
55 |
+
base_size_repeat: 3
|
56 |
+
|
57 |
+
checkpoint_freq: 1
|
58 |
+
print_freq: 1000
|
D-FINE/configs/runtime.yml
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
print_freq: 100
|
2 |
+
output_dir: './logs'
|
3 |
+
checkpoint_freq: 12
|
4 |
+
|
5 |
+
|
6 |
+
sync_bn: True
|
7 |
+
find_unused_parameters: False
|
8 |
+
|
9 |
+
|
10 |
+
use_amp: False
|
11 |
+
scaler:
|
12 |
+
type: GradScaler
|
13 |
+
enabled: True
|
14 |
+
|
15 |
+
|
16 |
+
use_ema: False
|
17 |
+
ema:
|
18 |
+
type: ModelEMA
|
19 |
+
decay: 0.9999
|
20 |
+
warmups: 1000
|
21 |
+
|
22 |
+
use_wandb: False
|
23 |
+
project_name: D-FINE # for wandb
|
24 |
+
exp_name: baseline # wandb experiment name
|