Spaces:
Running
Running
zach
commited on
Commit
·
a35e164
1
Parent(s):
a61d782
Remove gradio-env directory from tracking
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- gradio-env/bin/Activate.ps1 +0 -247
- gradio-env/bin/activate +0 -63
- gradio-env/bin/activate.csh +0 -26
- gradio-env/bin/activate.fish +0 -69
- gradio-env/bin/dotenv +0 -8
- gradio-env/bin/normalizer +0 -8
- gradio-env/bin/pip +0 -8
- gradio-env/bin/pip3 +0 -8
- gradio-env/bin/pip3.11 +0 -8
- gradio-env/bin/python +0 -1
- gradio-env/bin/python3 +0 -1
- gradio-env/bin/python3.11 +0 -1
- gradio-env/lib/python3.11/site-packages/_distutils_hack/__init__.py +0 -239
- gradio-env/lib/python3.11/site-packages/_distutils_hack/__pycache__/__init__.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/_distutils_hack/__pycache__/override.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/_distutils_hack/override.py +0 -1
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/INSTALLER +0 -1
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/LICENSE +0 -20
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/METADATA +0 -68
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/RECORD +0 -14
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/WHEEL +0 -5
- gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/top_level.txt +0 -1
- gradio-env/lib/python3.11/site-packages/certifi/__init__.py +0 -4
- gradio-env/lib/python3.11/site-packages/certifi/__main__.py +0 -12
- gradio-env/lib/python3.11/site-packages/certifi/__pycache__/__init__.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/certifi/__pycache__/__main__.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/certifi/__pycache__/core.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/certifi/cacert.pem +0 -0
- gradio-env/lib/python3.11/site-packages/certifi/core.py +0 -114
- gradio-env/lib/python3.11/site-packages/certifi/py.typed +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/INSTALLER +0 -1
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/LICENSE +0 -21
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/METADATA +0 -721
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/RECORD +0 -35
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/WHEEL +0 -5
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/entry_points.txt +0 -2
- gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/top_level.txt +0 -1
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__init__.py +0 -48
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__main__.py +0 -6
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/__init__.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/__main__.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/api.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/cd.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/constant.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/legacy.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/md.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/models.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/utils.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/version.cpython-311.pyc +0 -0
- gradio-env/lib/python3.11/site-packages/charset_normalizer/api.py +0 -668
gradio-env/bin/Activate.ps1
DELETED
@@ -1,247 +0,0 @@
|
|
1 |
-
<#
|
2 |
-
.Synopsis
|
3 |
-
Activate a Python virtual environment for the current PowerShell session.
|
4 |
-
|
5 |
-
.Description
|
6 |
-
Pushes the python executable for a virtual environment to the front of the
|
7 |
-
$Env:PATH environment variable and sets the prompt to signify that you are
|
8 |
-
in a Python virtual environment. Makes use of the command line switches as
|
9 |
-
well as the `pyvenv.cfg` file values present in the virtual environment.
|
10 |
-
|
11 |
-
.Parameter VenvDir
|
12 |
-
Path to the directory that contains the virtual environment to activate. The
|
13 |
-
default value for this is the parent of the directory that the Activate.ps1
|
14 |
-
script is located within.
|
15 |
-
|
16 |
-
.Parameter Prompt
|
17 |
-
The prompt prefix to display when this virtual environment is activated. By
|
18 |
-
default, this prompt is the name of the virtual environment folder (VenvDir)
|
19 |
-
surrounded by parentheses and followed by a single space (ie. '(.venv) ').
|
20 |
-
|
21 |
-
.Example
|
22 |
-
Activate.ps1
|
23 |
-
Activates the Python virtual environment that contains the Activate.ps1 script.
|
24 |
-
|
25 |
-
.Example
|
26 |
-
Activate.ps1 -Verbose
|
27 |
-
Activates the Python virtual environment that contains the Activate.ps1 script,
|
28 |
-
and shows extra information about the activation as it executes.
|
29 |
-
|
30 |
-
.Example
|
31 |
-
Activate.ps1 -VenvDir C:\Users\MyUser\Common\.venv
|
32 |
-
Activates the Python virtual environment located in the specified location.
|
33 |
-
|
34 |
-
.Example
|
35 |
-
Activate.ps1 -Prompt "MyPython"
|
36 |
-
Activates the Python virtual environment that contains the Activate.ps1 script,
|
37 |
-
and prefixes the current prompt with the specified string (surrounded in
|
38 |
-
parentheses) while the virtual environment is active.
|
39 |
-
|
40 |
-
.Notes
|
41 |
-
On Windows, it may be required to enable this Activate.ps1 script by setting the
|
42 |
-
execution policy for the user. You can do this by issuing the following PowerShell
|
43 |
-
command:
|
44 |
-
|
45 |
-
PS C:\> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
46 |
-
|
47 |
-
For more information on Execution Policies:
|
48 |
-
https://go.microsoft.com/fwlink/?LinkID=135170
|
49 |
-
|
50 |
-
#>
|
51 |
-
Param(
|
52 |
-
[Parameter(Mandatory = $false)]
|
53 |
-
[String]
|
54 |
-
$VenvDir,
|
55 |
-
[Parameter(Mandatory = $false)]
|
56 |
-
[String]
|
57 |
-
$Prompt
|
58 |
-
)
|
59 |
-
|
60 |
-
<# Function declarations --------------------------------------------------- #>
|
61 |
-
|
62 |
-
<#
|
63 |
-
.Synopsis
|
64 |
-
Remove all shell session elements added by the Activate script, including the
|
65 |
-
addition of the virtual environment's Python executable from the beginning of
|
66 |
-
the PATH variable.
|
67 |
-
|
68 |
-
.Parameter NonDestructive
|
69 |
-
If present, do not remove this function from the global namespace for the
|
70 |
-
session.
|
71 |
-
|
72 |
-
#>
|
73 |
-
function global:deactivate ([switch]$NonDestructive) {
|
74 |
-
# Revert to original values
|
75 |
-
|
76 |
-
# The prior prompt:
|
77 |
-
if (Test-Path -Path Function:_OLD_VIRTUAL_PROMPT) {
|
78 |
-
Copy-Item -Path Function:_OLD_VIRTUAL_PROMPT -Destination Function:prompt
|
79 |
-
Remove-Item -Path Function:_OLD_VIRTUAL_PROMPT
|
80 |
-
}
|
81 |
-
|
82 |
-
# The prior PYTHONHOME:
|
83 |
-
if (Test-Path -Path Env:_OLD_VIRTUAL_PYTHONHOME) {
|
84 |
-
Copy-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME -Destination Env:PYTHONHOME
|
85 |
-
Remove-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME
|
86 |
-
}
|
87 |
-
|
88 |
-
# The prior PATH:
|
89 |
-
if (Test-Path -Path Env:_OLD_VIRTUAL_PATH) {
|
90 |
-
Copy-Item -Path Env:_OLD_VIRTUAL_PATH -Destination Env:PATH
|
91 |
-
Remove-Item -Path Env:_OLD_VIRTUAL_PATH
|
92 |
-
}
|
93 |
-
|
94 |
-
# Just remove the VIRTUAL_ENV altogether:
|
95 |
-
if (Test-Path -Path Env:VIRTUAL_ENV) {
|
96 |
-
Remove-Item -Path env:VIRTUAL_ENV
|
97 |
-
}
|
98 |
-
|
99 |
-
# Just remove VIRTUAL_ENV_PROMPT altogether.
|
100 |
-
if (Test-Path -Path Env:VIRTUAL_ENV_PROMPT) {
|
101 |
-
Remove-Item -Path env:VIRTUAL_ENV_PROMPT
|
102 |
-
}
|
103 |
-
|
104 |
-
# Just remove the _PYTHON_VENV_PROMPT_PREFIX altogether:
|
105 |
-
if (Get-Variable -Name "_PYTHON_VENV_PROMPT_PREFIX" -ErrorAction SilentlyContinue) {
|
106 |
-
Remove-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Scope Global -Force
|
107 |
-
}
|
108 |
-
|
109 |
-
# Leave deactivate function in the global namespace if requested:
|
110 |
-
if (-not $NonDestructive) {
|
111 |
-
Remove-Item -Path function:deactivate
|
112 |
-
}
|
113 |
-
}
|
114 |
-
|
115 |
-
<#
|
116 |
-
.Description
|
117 |
-
Get-PyVenvConfig parses the values from the pyvenv.cfg file located in the
|
118 |
-
given folder, and returns them in a map.
|
119 |
-
|
120 |
-
For each line in the pyvenv.cfg file, if that line can be parsed into exactly
|
121 |
-
two strings separated by `=` (with any amount of whitespace surrounding the =)
|
122 |
-
then it is considered a `key = value` line. The left hand string is the key,
|
123 |
-
the right hand is the value.
|
124 |
-
|
125 |
-
If the value starts with a `'` or a `"` then the first and last character is
|
126 |
-
stripped from the value before being captured.
|
127 |
-
|
128 |
-
.Parameter ConfigDir
|
129 |
-
Path to the directory that contains the `pyvenv.cfg` file.
|
130 |
-
#>
|
131 |
-
function Get-PyVenvConfig(
|
132 |
-
[String]
|
133 |
-
$ConfigDir
|
134 |
-
) {
|
135 |
-
Write-Verbose "Given ConfigDir=$ConfigDir, obtain values in pyvenv.cfg"
|
136 |
-
|
137 |
-
# Ensure the file exists, and issue a warning if it doesn't (but still allow the function to continue).
|
138 |
-
$pyvenvConfigPath = Join-Path -Resolve -Path $ConfigDir -ChildPath 'pyvenv.cfg' -ErrorAction Continue
|
139 |
-
|
140 |
-
# An empty map will be returned if no config file is found.
|
141 |
-
$pyvenvConfig = @{ }
|
142 |
-
|
143 |
-
if ($pyvenvConfigPath) {
|
144 |
-
|
145 |
-
Write-Verbose "File exists, parse `key = value` lines"
|
146 |
-
$pyvenvConfigContent = Get-Content -Path $pyvenvConfigPath
|
147 |
-
|
148 |
-
$pyvenvConfigContent | ForEach-Object {
|
149 |
-
$keyval = $PSItem -split "\s*=\s*", 2
|
150 |
-
if ($keyval[0] -and $keyval[1]) {
|
151 |
-
$val = $keyval[1]
|
152 |
-
|
153 |
-
# Remove extraneous quotations around a string value.
|
154 |
-
if ("'""".Contains($val.Substring(0, 1))) {
|
155 |
-
$val = $val.Substring(1, $val.Length - 2)
|
156 |
-
}
|
157 |
-
|
158 |
-
$pyvenvConfig[$keyval[0]] = $val
|
159 |
-
Write-Verbose "Adding Key: '$($keyval[0])'='$val'"
|
160 |
-
}
|
161 |
-
}
|
162 |
-
}
|
163 |
-
return $pyvenvConfig
|
164 |
-
}
|
165 |
-
|
166 |
-
|
167 |
-
<# Begin Activate script --------------------------------------------------- #>
|
168 |
-
|
169 |
-
# Determine the containing directory of this script
|
170 |
-
$VenvExecPath = Split-Path -Parent $MyInvocation.MyCommand.Definition
|
171 |
-
$VenvExecDir = Get-Item -Path $VenvExecPath
|
172 |
-
|
173 |
-
Write-Verbose "Activation script is located in path: '$VenvExecPath'"
|
174 |
-
Write-Verbose "VenvExecDir Fullname: '$($VenvExecDir.FullName)"
|
175 |
-
Write-Verbose "VenvExecDir Name: '$($VenvExecDir.Name)"
|
176 |
-
|
177 |
-
# Set values required in priority: CmdLine, ConfigFile, Default
|
178 |
-
# First, get the location of the virtual environment, it might not be
|
179 |
-
# VenvExecDir if specified on the command line.
|
180 |
-
if ($VenvDir) {
|
181 |
-
Write-Verbose "VenvDir given as parameter, using '$VenvDir' to determine values"
|
182 |
-
}
|
183 |
-
else {
|
184 |
-
Write-Verbose "VenvDir not given as a parameter, using parent directory name as VenvDir."
|
185 |
-
$VenvDir = $VenvExecDir.Parent.FullName.TrimEnd("\\/")
|
186 |
-
Write-Verbose "VenvDir=$VenvDir"
|
187 |
-
}
|
188 |
-
|
189 |
-
# Next, read the `pyvenv.cfg` file to determine any required value such
|
190 |
-
# as `prompt`.
|
191 |
-
$pyvenvCfg = Get-PyVenvConfig -ConfigDir $VenvDir
|
192 |
-
|
193 |
-
# Next, set the prompt from the command line, or the config file, or
|
194 |
-
# just use the name of the virtual environment folder.
|
195 |
-
if ($Prompt) {
|
196 |
-
Write-Verbose "Prompt specified as argument, using '$Prompt'"
|
197 |
-
}
|
198 |
-
else {
|
199 |
-
Write-Verbose "Prompt not specified as argument to script, checking pyvenv.cfg value"
|
200 |
-
if ($pyvenvCfg -and $pyvenvCfg['prompt']) {
|
201 |
-
Write-Verbose " Setting based on value in pyvenv.cfg='$($pyvenvCfg['prompt'])'"
|
202 |
-
$Prompt = $pyvenvCfg['prompt'];
|
203 |
-
}
|
204 |
-
else {
|
205 |
-
Write-Verbose " Setting prompt based on parent's directory's name. (Is the directory name passed to venv module when creating the virtual environment)"
|
206 |
-
Write-Verbose " Got leaf-name of $VenvDir='$(Split-Path -Path $venvDir -Leaf)'"
|
207 |
-
$Prompt = Split-Path -Path $venvDir -Leaf
|
208 |
-
}
|
209 |
-
}
|
210 |
-
|
211 |
-
Write-Verbose "Prompt = '$Prompt'"
|
212 |
-
Write-Verbose "VenvDir='$VenvDir'"
|
213 |
-
|
214 |
-
# Deactivate any currently active virtual environment, but leave the
|
215 |
-
# deactivate function in place.
|
216 |
-
deactivate -nondestructive
|
217 |
-
|
218 |
-
# Now set the environment variable VIRTUAL_ENV, used by many tools to determine
|
219 |
-
# that there is an activated venv.
|
220 |
-
$env:VIRTUAL_ENV = $VenvDir
|
221 |
-
|
222 |
-
if (-not $Env:VIRTUAL_ENV_DISABLE_PROMPT) {
|
223 |
-
|
224 |
-
Write-Verbose "Setting prompt to '$Prompt'"
|
225 |
-
|
226 |
-
# Set the prompt to include the env name
|
227 |
-
# Make sure _OLD_VIRTUAL_PROMPT is global
|
228 |
-
function global:_OLD_VIRTUAL_PROMPT { "" }
|
229 |
-
Copy-Item -Path function:prompt -Destination function:_OLD_VIRTUAL_PROMPT
|
230 |
-
New-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Description "Python virtual environment prompt prefix" -Scope Global -Option ReadOnly -Visibility Public -Value $Prompt
|
231 |
-
|
232 |
-
function global:prompt {
|
233 |
-
Write-Host -NoNewline -ForegroundColor Green "($_PYTHON_VENV_PROMPT_PREFIX) "
|
234 |
-
_OLD_VIRTUAL_PROMPT
|
235 |
-
}
|
236 |
-
$env:VIRTUAL_ENV_PROMPT = $Prompt
|
237 |
-
}
|
238 |
-
|
239 |
-
# Clear PYTHONHOME
|
240 |
-
if (Test-Path -Path Env:PYTHONHOME) {
|
241 |
-
Copy-Item -Path Env:PYTHONHOME -Destination Env:_OLD_VIRTUAL_PYTHONHOME
|
242 |
-
Remove-Item -Path Env:PYTHONHOME
|
243 |
-
}
|
244 |
-
|
245 |
-
# Add the venv to the PATH
|
246 |
-
Copy-Item -Path Env:PATH -Destination Env:_OLD_VIRTUAL_PATH
|
247 |
-
$Env:PATH = "$VenvExecDir$([System.IO.Path]::PathSeparator)$Env:PATH"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/activate
DELETED
@@ -1,63 +0,0 @@
|
|
1 |
-
# This file must be used with "source bin/activate" *from bash*
|
2 |
-
# you cannot run it directly
|
3 |
-
|
4 |
-
deactivate () {
|
5 |
-
# reset old environment variables
|
6 |
-
if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
|
7 |
-
PATH="${_OLD_VIRTUAL_PATH:-}"
|
8 |
-
export PATH
|
9 |
-
unset _OLD_VIRTUAL_PATH
|
10 |
-
fi
|
11 |
-
if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
|
12 |
-
PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
|
13 |
-
export PYTHONHOME
|
14 |
-
unset _OLD_VIRTUAL_PYTHONHOME
|
15 |
-
fi
|
16 |
-
|
17 |
-
# Call hash to forget past commands. Without forgetting
|
18 |
-
# past commands the $PATH changes we made may not be respected
|
19 |
-
hash -r 2> /dev/null
|
20 |
-
|
21 |
-
if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
|
22 |
-
PS1="${_OLD_VIRTUAL_PS1:-}"
|
23 |
-
export PS1
|
24 |
-
unset _OLD_VIRTUAL_PS1
|
25 |
-
fi
|
26 |
-
|
27 |
-
unset VIRTUAL_ENV
|
28 |
-
unset VIRTUAL_ENV_PROMPT
|
29 |
-
if [ ! "${1:-}" = "nondestructive" ] ; then
|
30 |
-
# Self destruct!
|
31 |
-
unset -f deactivate
|
32 |
-
fi
|
33 |
-
}
|
34 |
-
|
35 |
-
# unset irrelevant variables
|
36 |
-
deactivate nondestructive
|
37 |
-
|
38 |
-
VIRTUAL_ENV=/Users/zach/Dev/tts-arena/gradio-env
|
39 |
-
export VIRTUAL_ENV
|
40 |
-
|
41 |
-
_OLD_VIRTUAL_PATH="$PATH"
|
42 |
-
PATH="$VIRTUAL_ENV/"bin":$PATH"
|
43 |
-
export PATH
|
44 |
-
|
45 |
-
# unset PYTHONHOME if set
|
46 |
-
# this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
|
47 |
-
# could use `if (set -u; : $PYTHONHOME) ;` in bash
|
48 |
-
if [ -n "${PYTHONHOME:-}" ] ; then
|
49 |
-
_OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
|
50 |
-
unset PYTHONHOME
|
51 |
-
fi
|
52 |
-
|
53 |
-
if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
|
54 |
-
_OLD_VIRTUAL_PS1="${PS1:-}"
|
55 |
-
PS1='(gradio-env) '"${PS1:-}"
|
56 |
-
export PS1
|
57 |
-
VIRTUAL_ENV_PROMPT='(gradio-env) '
|
58 |
-
export VIRTUAL_ENV_PROMPT
|
59 |
-
fi
|
60 |
-
|
61 |
-
# Call hash to forget past commands. Without forgetting
|
62 |
-
# past commands the $PATH changes we made may not be respected
|
63 |
-
hash -r 2> /dev/null
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/activate.csh
DELETED
@@ -1,26 +0,0 @@
|
|
1 |
-
# This file must be used with "source bin/activate.csh" *from csh*.
|
2 |
-
# You cannot run it directly.
|
3 |
-
# Created by Davide Di Blasi <[email protected]>.
|
4 |
-
# Ported to Python 3.3 venv by Andrew Svetlov <[email protected]>
|
5 |
-
|
6 |
-
alias deactivate 'test $?_OLD_VIRTUAL_PATH != 0 && setenv PATH "$_OLD_VIRTUAL_PATH" && unset _OLD_VIRTUAL_PATH; rehash; test $?_OLD_VIRTUAL_PROMPT != 0 && set prompt="$_OLD_VIRTUAL_PROMPT" && unset _OLD_VIRTUAL_PROMPT; unsetenv VIRTUAL_ENV; unsetenv VIRTUAL_ENV_PROMPT; test "\!:*" != "nondestructive" && unalias deactivate'
|
7 |
-
|
8 |
-
# Unset irrelevant variables.
|
9 |
-
deactivate nondestructive
|
10 |
-
|
11 |
-
setenv VIRTUAL_ENV /Users/zach/Dev/tts-arena/gradio-env
|
12 |
-
|
13 |
-
set _OLD_VIRTUAL_PATH="$PATH"
|
14 |
-
setenv PATH "$VIRTUAL_ENV/"bin":$PATH"
|
15 |
-
|
16 |
-
|
17 |
-
set _OLD_VIRTUAL_PROMPT="$prompt"
|
18 |
-
|
19 |
-
if (! "$?VIRTUAL_ENV_DISABLE_PROMPT") then
|
20 |
-
set prompt = '(gradio-env) '"$prompt"
|
21 |
-
setenv VIRTUAL_ENV_PROMPT '(gradio-env) '
|
22 |
-
endif
|
23 |
-
|
24 |
-
alias pydoc python -m pydoc
|
25 |
-
|
26 |
-
rehash
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/activate.fish
DELETED
@@ -1,69 +0,0 @@
|
|
1 |
-
# This file must be used with "source <venv>/bin/activate.fish" *from fish*
|
2 |
-
# (https://fishshell.com/); you cannot run it directly.
|
3 |
-
|
4 |
-
function deactivate -d "Exit virtual environment and return to normal shell environment"
|
5 |
-
# reset old environment variables
|
6 |
-
if test -n "$_OLD_VIRTUAL_PATH"
|
7 |
-
set -gx PATH $_OLD_VIRTUAL_PATH
|
8 |
-
set -e _OLD_VIRTUAL_PATH
|
9 |
-
end
|
10 |
-
if test -n "$_OLD_VIRTUAL_PYTHONHOME"
|
11 |
-
set -gx PYTHONHOME $_OLD_VIRTUAL_PYTHONHOME
|
12 |
-
set -e _OLD_VIRTUAL_PYTHONHOME
|
13 |
-
end
|
14 |
-
|
15 |
-
if test -n "$_OLD_FISH_PROMPT_OVERRIDE"
|
16 |
-
set -e _OLD_FISH_PROMPT_OVERRIDE
|
17 |
-
# prevents error when using nested fish instances (Issue #93858)
|
18 |
-
if functions -q _old_fish_prompt
|
19 |
-
functions -e fish_prompt
|
20 |
-
functions -c _old_fish_prompt fish_prompt
|
21 |
-
functions -e _old_fish_prompt
|
22 |
-
end
|
23 |
-
end
|
24 |
-
|
25 |
-
set -e VIRTUAL_ENV
|
26 |
-
set -e VIRTUAL_ENV_PROMPT
|
27 |
-
if test "$argv[1]" != "nondestructive"
|
28 |
-
# Self-destruct!
|
29 |
-
functions -e deactivate
|
30 |
-
end
|
31 |
-
end
|
32 |
-
|
33 |
-
# Unset irrelevant variables.
|
34 |
-
deactivate nondestructive
|
35 |
-
|
36 |
-
set -gx VIRTUAL_ENV /Users/zach/Dev/tts-arena/gradio-env
|
37 |
-
|
38 |
-
set -gx _OLD_VIRTUAL_PATH $PATH
|
39 |
-
set -gx PATH "$VIRTUAL_ENV/"bin $PATH
|
40 |
-
|
41 |
-
# Unset PYTHONHOME if set.
|
42 |
-
if set -q PYTHONHOME
|
43 |
-
set -gx _OLD_VIRTUAL_PYTHONHOME $PYTHONHOME
|
44 |
-
set -e PYTHONHOME
|
45 |
-
end
|
46 |
-
|
47 |
-
if test -z "$VIRTUAL_ENV_DISABLE_PROMPT"
|
48 |
-
# fish uses a function instead of an env var to generate the prompt.
|
49 |
-
|
50 |
-
# Save the current fish_prompt function as the function _old_fish_prompt.
|
51 |
-
functions -c fish_prompt _old_fish_prompt
|
52 |
-
|
53 |
-
# With the original prompt function renamed, we can override with our own.
|
54 |
-
function fish_prompt
|
55 |
-
# Save the return status of the last command.
|
56 |
-
set -l old_status $status
|
57 |
-
|
58 |
-
# Output the venv prompt; color taken from the blue of the Python logo.
|
59 |
-
printf "%s%s%s" (set_color 4B8BBE) '(gradio-env) ' (set_color normal)
|
60 |
-
|
61 |
-
# Restore the return status of the previous command.
|
62 |
-
echo "exit $old_status" | .
|
63 |
-
# Output the original/"old" prompt.
|
64 |
-
_old_fish_prompt
|
65 |
-
end
|
66 |
-
|
67 |
-
set -gx _OLD_FISH_PROMPT_OVERRIDE "$VIRTUAL_ENV"
|
68 |
-
set -gx VIRTUAL_ENV_PROMPT '(gradio-env) '
|
69 |
-
end
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/dotenv
DELETED
@@ -1,8 +0,0 @@
|
|
1 |
-
#!/Users/zach/Dev/tts-arena/gradio-env/bin/python3.11
|
2 |
-
# -*- coding: utf-8 -*-
|
3 |
-
import re
|
4 |
-
import sys
|
5 |
-
from dotenv.__main__ import cli
|
6 |
-
if __name__ == '__main__':
|
7 |
-
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
|
8 |
-
sys.exit(cli())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/normalizer
DELETED
@@ -1,8 +0,0 @@
|
|
1 |
-
#!/Users/zach/Dev/tts-arena/gradio-env/bin/python3.11
|
2 |
-
# -*- coding: utf-8 -*-
|
3 |
-
import re
|
4 |
-
import sys
|
5 |
-
from charset_normalizer import cli
|
6 |
-
if __name__ == '__main__':
|
7 |
-
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
|
8 |
-
sys.exit(cli.cli_detect())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/pip
DELETED
@@ -1,8 +0,0 @@
|
|
1 |
-
#!/Users/zach/Dev/tts-arena/gradio-env/bin/python3.11
|
2 |
-
# -*- coding: utf-8 -*-
|
3 |
-
import re
|
4 |
-
import sys
|
5 |
-
from pip._internal.cli.main import main
|
6 |
-
if __name__ == '__main__':
|
7 |
-
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
|
8 |
-
sys.exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/pip3
DELETED
@@ -1,8 +0,0 @@
|
|
1 |
-
#!/Users/zach/Dev/tts-arena/gradio-env/bin/python3.11
|
2 |
-
# -*- coding: utf-8 -*-
|
3 |
-
import re
|
4 |
-
import sys
|
5 |
-
from pip._internal.cli.main import main
|
6 |
-
if __name__ == '__main__':
|
7 |
-
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
|
8 |
-
sys.exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/pip3.11
DELETED
@@ -1,8 +0,0 @@
|
|
1 |
-
#!/Users/zach/Dev/tts-arena/gradio-env/bin/python3.11
|
2 |
-
# -*- coding: utf-8 -*-
|
3 |
-
import re
|
4 |
-
import sys
|
5 |
-
from pip._internal.cli.main import main
|
6 |
-
if __name__ == '__main__':
|
7 |
-
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
|
8 |
-
sys.exit(main())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/bin/python
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
python3.11
|
|
|
|
gradio-env/bin/python3
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
python3.11
|
|
|
|
gradio-env/bin/python3.11
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
/opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/bin/python3.11
|
|
|
|
gradio-env/lib/python3.11/site-packages/_distutils_hack/__init__.py
DELETED
@@ -1,239 +0,0 @@
|
|
1 |
-
# don't import any costly modules
|
2 |
-
import os
|
3 |
-
import sys
|
4 |
-
|
5 |
-
report_url = (
|
6 |
-
"https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml"
|
7 |
-
)
|
8 |
-
|
9 |
-
|
10 |
-
def warn_distutils_present():
|
11 |
-
if 'distutils' not in sys.modules:
|
12 |
-
return
|
13 |
-
import warnings
|
14 |
-
|
15 |
-
warnings.warn(
|
16 |
-
"Distutils was imported before Setuptools, but importing Setuptools "
|
17 |
-
"also replaces the `distutils` module in `sys.modules`. This may lead "
|
18 |
-
"to undesirable behaviors or errors. To avoid these issues, avoid "
|
19 |
-
"using distutils directly, ensure that setuptools is installed in the "
|
20 |
-
"traditional way (e.g. not an editable install), and/or make sure "
|
21 |
-
"that setuptools is always imported before distutils."
|
22 |
-
)
|
23 |
-
|
24 |
-
|
25 |
-
def clear_distutils():
|
26 |
-
if 'distutils' not in sys.modules:
|
27 |
-
return
|
28 |
-
import warnings
|
29 |
-
|
30 |
-
warnings.warn(
|
31 |
-
"Setuptools is replacing distutils. Support for replacing "
|
32 |
-
"an already imported distutils is deprecated. In the future, "
|
33 |
-
"this condition will fail. "
|
34 |
-
f"Register concerns at {report_url}"
|
35 |
-
)
|
36 |
-
mods = [
|
37 |
-
name
|
38 |
-
for name in sys.modules
|
39 |
-
if name == "distutils" or name.startswith("distutils.")
|
40 |
-
]
|
41 |
-
for name in mods:
|
42 |
-
del sys.modules[name]
|
43 |
-
|
44 |
-
|
45 |
-
def enabled():
|
46 |
-
"""
|
47 |
-
Allow selection of distutils by environment variable.
|
48 |
-
"""
|
49 |
-
which = os.environ.get('SETUPTOOLS_USE_DISTUTILS', 'local')
|
50 |
-
if which == 'stdlib':
|
51 |
-
import warnings
|
52 |
-
|
53 |
-
warnings.warn(
|
54 |
-
"Reliance on distutils from stdlib is deprecated. Users "
|
55 |
-
"must rely on setuptools to provide the distutils module. "
|
56 |
-
"Avoid importing distutils or import setuptools first, "
|
57 |
-
"and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. "
|
58 |
-
f"Register concerns at {report_url}"
|
59 |
-
)
|
60 |
-
return which == 'local'
|
61 |
-
|
62 |
-
|
63 |
-
def ensure_local_distutils():
|
64 |
-
import importlib
|
65 |
-
|
66 |
-
clear_distutils()
|
67 |
-
|
68 |
-
# With the DistutilsMetaFinder in place,
|
69 |
-
# perform an import to cause distutils to be
|
70 |
-
# loaded from setuptools._distutils. Ref #2906.
|
71 |
-
with shim():
|
72 |
-
importlib.import_module('distutils')
|
73 |
-
|
74 |
-
# check that submodules load as expected
|
75 |
-
core = importlib.import_module('distutils.core')
|
76 |
-
assert '_distutils' in core.__file__, core.__file__
|
77 |
-
assert 'setuptools._distutils.log' not in sys.modules
|
78 |
-
|
79 |
-
|
80 |
-
def do_override():
|
81 |
-
"""
|
82 |
-
Ensure that the local copy of distutils is preferred over stdlib.
|
83 |
-
|
84 |
-
See https://github.com/pypa/setuptools/issues/417#issuecomment-392298401
|
85 |
-
for more motivation.
|
86 |
-
"""
|
87 |
-
if enabled():
|
88 |
-
warn_distutils_present()
|
89 |
-
ensure_local_distutils()
|
90 |
-
|
91 |
-
|
92 |
-
class _TrivialRe:
|
93 |
-
def __init__(self, *patterns) -> None:
|
94 |
-
self._patterns = patterns
|
95 |
-
|
96 |
-
def match(self, string):
|
97 |
-
return all(pat in string for pat in self._patterns)
|
98 |
-
|
99 |
-
|
100 |
-
class DistutilsMetaFinder:
|
101 |
-
def find_spec(self, fullname, path, target=None):
|
102 |
-
# optimization: only consider top level modules and those
|
103 |
-
# found in the CPython test suite.
|
104 |
-
if path is not None and not fullname.startswith('test.'):
|
105 |
-
return None
|
106 |
-
|
107 |
-
method_name = 'spec_for_{fullname}'.format(**locals())
|
108 |
-
method = getattr(self, method_name, lambda: None)
|
109 |
-
return method()
|
110 |
-
|
111 |
-
def spec_for_distutils(self):
|
112 |
-
if self.is_cpython():
|
113 |
-
return None
|
114 |
-
|
115 |
-
import importlib
|
116 |
-
import importlib.abc
|
117 |
-
import importlib.util
|
118 |
-
|
119 |
-
try:
|
120 |
-
mod = importlib.import_module('setuptools._distutils')
|
121 |
-
except Exception:
|
122 |
-
# There are a couple of cases where setuptools._distutils
|
123 |
-
# may not be present:
|
124 |
-
# - An older Setuptools without a local distutils is
|
125 |
-
# taking precedence. Ref #2957.
|
126 |
-
# - Path manipulation during sitecustomize removes
|
127 |
-
# setuptools from the path but only after the hook
|
128 |
-
# has been loaded. Ref #2980.
|
129 |
-
# In either case, fall back to stdlib behavior.
|
130 |
-
return None
|
131 |
-
|
132 |
-
class DistutilsLoader(importlib.abc.Loader):
|
133 |
-
def create_module(self, spec):
|
134 |
-
mod.__name__ = 'distutils'
|
135 |
-
return mod
|
136 |
-
|
137 |
-
def exec_module(self, module):
|
138 |
-
pass
|
139 |
-
|
140 |
-
return importlib.util.spec_from_loader(
|
141 |
-
'distutils', DistutilsLoader(), origin=mod.__file__
|
142 |
-
)
|
143 |
-
|
144 |
-
@staticmethod
|
145 |
-
def is_cpython():
|
146 |
-
"""
|
147 |
-
Suppress supplying distutils for CPython (build and tests).
|
148 |
-
Ref #2965 and #3007.
|
149 |
-
"""
|
150 |
-
return os.path.isfile('pybuilddir.txt')
|
151 |
-
|
152 |
-
def spec_for_pip(self):
|
153 |
-
"""
|
154 |
-
Ensure stdlib distutils when running under pip.
|
155 |
-
See pypa/pip#8761 for rationale.
|
156 |
-
"""
|
157 |
-
if sys.version_info >= (3, 12) or self.pip_imported_during_build():
|
158 |
-
return
|
159 |
-
clear_distutils()
|
160 |
-
self.spec_for_distutils = lambda: None
|
161 |
-
|
162 |
-
@classmethod
|
163 |
-
def pip_imported_during_build(cls):
|
164 |
-
"""
|
165 |
-
Detect if pip is being imported in a build script. Ref #2355.
|
166 |
-
"""
|
167 |
-
import traceback
|
168 |
-
|
169 |
-
return any(
|
170 |
-
cls.frame_file_is_setup(frame) for frame, line in traceback.walk_stack(None)
|
171 |
-
)
|
172 |
-
|
173 |
-
@staticmethod
|
174 |
-
def frame_file_is_setup(frame):
|
175 |
-
"""
|
176 |
-
Return True if the indicated frame suggests a setup.py file.
|
177 |
-
"""
|
178 |
-
# some frames may not have __file__ (#2940)
|
179 |
-
return frame.f_globals.get('__file__', '').endswith('setup.py')
|
180 |
-
|
181 |
-
def spec_for_sensitive_tests(self):
|
182 |
-
"""
|
183 |
-
Ensure stdlib distutils when running select tests under CPython.
|
184 |
-
|
185 |
-
python/cpython#91169
|
186 |
-
"""
|
187 |
-
clear_distutils()
|
188 |
-
self.spec_for_distutils = lambda: None
|
189 |
-
|
190 |
-
sensitive_tests = (
|
191 |
-
[
|
192 |
-
'test.test_distutils',
|
193 |
-
'test.test_peg_generator',
|
194 |
-
'test.test_importlib',
|
195 |
-
]
|
196 |
-
if sys.version_info < (3, 10)
|
197 |
-
else [
|
198 |
-
'test.test_distutils',
|
199 |
-
]
|
200 |
-
)
|
201 |
-
|
202 |
-
|
203 |
-
for name in DistutilsMetaFinder.sensitive_tests:
|
204 |
-
setattr(
|
205 |
-
DistutilsMetaFinder,
|
206 |
-
f'spec_for_{name}',
|
207 |
-
DistutilsMetaFinder.spec_for_sensitive_tests,
|
208 |
-
)
|
209 |
-
|
210 |
-
|
211 |
-
DISTUTILS_FINDER = DistutilsMetaFinder()
|
212 |
-
|
213 |
-
|
214 |
-
def add_shim():
|
215 |
-
DISTUTILS_FINDER in sys.meta_path or insert_shim()
|
216 |
-
|
217 |
-
|
218 |
-
class shim:
|
219 |
-
def __enter__(self) -> None:
|
220 |
-
insert_shim()
|
221 |
-
|
222 |
-
def __exit__(self, exc: object, value: object, tb: object) -> None:
|
223 |
-
_remove_shim()
|
224 |
-
|
225 |
-
|
226 |
-
def insert_shim():
|
227 |
-
sys.meta_path.insert(0, DISTUTILS_FINDER)
|
228 |
-
|
229 |
-
|
230 |
-
def _remove_shim():
|
231 |
-
try:
|
232 |
-
sys.meta_path.remove(DISTUTILS_FINDER)
|
233 |
-
except ValueError:
|
234 |
-
pass
|
235 |
-
|
236 |
-
|
237 |
-
if sys.version_info < (3, 12):
|
238 |
-
# DistutilsMetaFinder can only be disabled in Python < 3.12 (PEP 632)
|
239 |
-
remove_shim = _remove_shim
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/_distutils_hack/__pycache__/__init__.cpython-311.pyc
DELETED
Binary file (12 kB)
|
|
gradio-env/lib/python3.11/site-packages/_distutils_hack/__pycache__/override.cpython-311.pyc
DELETED
Binary file (324 Bytes)
|
|
gradio-env/lib/python3.11/site-packages/_distutils_hack/override.py
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
__import__('_distutils_hack').do_override()
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/INSTALLER
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
pip
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/LICENSE
DELETED
@@ -1,20 +0,0 @@
|
|
1 |
-
This package contains a modified version of ca-bundle.crt:
|
2 |
-
|
3 |
-
ca-bundle.crt -- Bundle of CA Root Certificates
|
4 |
-
|
5 |
-
This is a bundle of X.509 certificates of public Certificate Authorities
|
6 |
-
(CA). These were automatically extracted from Mozilla's root certificates
|
7 |
-
file (certdata.txt). This file can be found in the mozilla source tree:
|
8 |
-
https://hg.mozilla.org/mozilla-central/file/tip/security/nss/lib/ckfw/builtins/certdata.txt
|
9 |
-
It contains the certificates in PEM format and therefore
|
10 |
-
can be directly used with curl / libcurl / php_curl, or with
|
11 |
-
an Apache+mod_ssl webserver for SSL client authentication.
|
12 |
-
Just configure this file as the SSLCACertificateFile.#
|
13 |
-
|
14 |
-
***** BEGIN LICENSE BLOCK *****
|
15 |
-
This Source Code Form is subject to the terms of the Mozilla Public License,
|
16 |
-
v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain
|
17 |
-
one at http://mozilla.org/MPL/2.0/.
|
18 |
-
|
19 |
-
***** END LICENSE BLOCK *****
|
20 |
-
@(#) $RCSfile: certdata.txt,v $ $Revision: 1.80 $ $Date: 2011/11/03 15:11:58 $
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/METADATA
DELETED
@@ -1,68 +0,0 @@
|
|
1 |
-
Metadata-Version: 2.1
|
2 |
-
Name: certifi
|
3 |
-
Version: 2024.12.14
|
4 |
-
Summary: Python package for providing Mozilla's CA Bundle.
|
5 |
-
Home-page: https://github.com/certifi/python-certifi
|
6 |
-
Author: Kenneth Reitz
|
7 |
-
Author-email: [email protected]
|
8 |
-
License: MPL-2.0
|
9 |
-
Project-URL: Source, https://github.com/certifi/python-certifi
|
10 |
-
Classifier: Development Status :: 5 - Production/Stable
|
11 |
-
Classifier: Intended Audience :: Developers
|
12 |
-
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
|
13 |
-
Classifier: Natural Language :: English
|
14 |
-
Classifier: Programming Language :: Python
|
15 |
-
Classifier: Programming Language :: Python :: 3
|
16 |
-
Classifier: Programming Language :: Python :: 3 :: Only
|
17 |
-
Classifier: Programming Language :: Python :: 3.6
|
18 |
-
Classifier: Programming Language :: Python :: 3.7
|
19 |
-
Classifier: Programming Language :: Python :: 3.8
|
20 |
-
Classifier: Programming Language :: Python :: 3.9
|
21 |
-
Classifier: Programming Language :: Python :: 3.10
|
22 |
-
Classifier: Programming Language :: Python :: 3.11
|
23 |
-
Classifier: Programming Language :: Python :: 3.12
|
24 |
-
Classifier: Programming Language :: Python :: 3.13
|
25 |
-
Requires-Python: >=3.6
|
26 |
-
License-File: LICENSE
|
27 |
-
|
28 |
-
Certifi: Python SSL Certificates
|
29 |
-
================================
|
30 |
-
|
31 |
-
Certifi provides Mozilla's carefully curated collection of Root Certificates for
|
32 |
-
validating the trustworthiness of SSL certificates while verifying the identity
|
33 |
-
of TLS hosts. It has been extracted from the `Requests`_ project.
|
34 |
-
|
35 |
-
Installation
|
36 |
-
------------
|
37 |
-
|
38 |
-
``certifi`` is available on PyPI. Simply install it with ``pip``::
|
39 |
-
|
40 |
-
$ pip install certifi
|
41 |
-
|
42 |
-
Usage
|
43 |
-
-----
|
44 |
-
|
45 |
-
To reference the installed certificate authority (CA) bundle, you can use the
|
46 |
-
built-in function::
|
47 |
-
|
48 |
-
>>> import certifi
|
49 |
-
|
50 |
-
>>> certifi.where()
|
51 |
-
'/usr/local/lib/python3.7/site-packages/certifi/cacert.pem'
|
52 |
-
|
53 |
-
Or from the command line::
|
54 |
-
|
55 |
-
$ python -m certifi
|
56 |
-
/usr/local/lib/python3.7/site-packages/certifi/cacert.pem
|
57 |
-
|
58 |
-
Enjoy!
|
59 |
-
|
60 |
-
.. _`Requests`: https://requests.readthedocs.io/en/master/
|
61 |
-
|
62 |
-
Addition/Removal of Certificates
|
63 |
-
--------------------------------
|
64 |
-
|
65 |
-
Certifi does not support any addition/removal or other modification of the
|
66 |
-
CA trust store content. This project is intended to provide a reliable and
|
67 |
-
highly portable root of trust to python deployments. Look to upstream projects
|
68 |
-
for methods to use alternate trust.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/RECORD
DELETED
@@ -1,14 +0,0 @@
|
|
1 |
-
certifi-2024.12.14.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
2 |
-
certifi-2024.12.14.dist-info/LICENSE,sha256=6TcW2mucDVpKHfYP5pWzcPBpVgPSH2-D8FPkLPwQyvc,989
|
3 |
-
certifi-2024.12.14.dist-info/METADATA,sha256=z71eRGTFszr4qsHenZ_vG2Fd5bV9PBWmJgShthc8IkY,2274
|
4 |
-
certifi-2024.12.14.dist-info/RECORD,,
|
5 |
-
certifi-2024.12.14.dist-info/WHEEL,sha256=PZUExdf71Ui_so67QXpySuHtCi3-J3wvF4ORK6k_S8U,91
|
6 |
-
certifi-2024.12.14.dist-info/top_level.txt,sha256=KMu4vUCfsjLrkPbSNdgdekS-pVJzBAJFO__nI8NF6-U,8
|
7 |
-
certifi/__init__.py,sha256=LqjNcwt1sYSS3uhPXrf6jJzVCuHtNVpuirg5rb7mVm8,94
|
8 |
-
certifi/__main__.py,sha256=xBBoj905TUWBLRGANOcf7oi6e-3dMP4cEoG9OyMs11g,243
|
9 |
-
certifi/__pycache__/__init__.cpython-311.pyc,,
|
10 |
-
certifi/__pycache__/__main__.cpython-311.pyc,,
|
11 |
-
certifi/__pycache__/core.cpython-311.pyc,,
|
12 |
-
certifi/cacert.pem,sha256=gHiXJU84Oif0XkT0llbzeKurIUHt5DpK08JCCll90j8,294769
|
13 |
-
certifi/core.py,sha256=qRDDFyXVJwTB_EmoGppaXU_R9qCZvhl-EzxPMuV3nTA,4426
|
14 |
-
certifi/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/WHEEL
DELETED
@@ -1,5 +0,0 @@
|
|
1 |
-
Wheel-Version: 1.0
|
2 |
-
Generator: setuptools (75.6.0)
|
3 |
-
Root-Is-Purelib: true
|
4 |
-
Tag: py3-none-any
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi-2024.12.14.dist-info/top_level.txt
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
certifi
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi/__init__.py
DELETED
@@ -1,4 +0,0 @@
|
|
1 |
-
from .core import contents, where
|
2 |
-
|
3 |
-
__all__ = ["contents", "where"]
|
4 |
-
__version__ = "2024.12.14"
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi/__main__.py
DELETED
@@ -1,12 +0,0 @@
|
|
1 |
-
import argparse
|
2 |
-
|
3 |
-
from certifi import contents, where
|
4 |
-
|
5 |
-
parser = argparse.ArgumentParser()
|
6 |
-
parser.add_argument("-c", "--contents", action="store_true")
|
7 |
-
args = parser.parse_args()
|
8 |
-
|
9 |
-
if args.contents:
|
10 |
-
print(contents())
|
11 |
-
else:
|
12 |
-
print(where())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi/__pycache__/__init__.cpython-311.pyc
DELETED
Binary file (336 Bytes)
|
|
gradio-env/lib/python3.11/site-packages/certifi/__pycache__/__main__.cpython-311.pyc
DELETED
Binary file (725 Bytes)
|
|
gradio-env/lib/python3.11/site-packages/certifi/__pycache__/core.cpython-311.pyc
DELETED
Binary file (3.77 kB)
|
|
gradio-env/lib/python3.11/site-packages/certifi/cacert.pem
DELETED
The diff for this file is too large to render.
See raw diff
|
|
gradio-env/lib/python3.11/site-packages/certifi/core.py
DELETED
@@ -1,114 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
certifi.py
|
3 |
-
~~~~~~~~~~
|
4 |
-
|
5 |
-
This module returns the installation location of cacert.pem or its contents.
|
6 |
-
"""
|
7 |
-
import sys
|
8 |
-
import atexit
|
9 |
-
|
10 |
-
def exit_cacert_ctx() -> None:
|
11 |
-
_CACERT_CTX.__exit__(None, None, None) # type: ignore[union-attr]
|
12 |
-
|
13 |
-
|
14 |
-
if sys.version_info >= (3, 11):
|
15 |
-
|
16 |
-
from importlib.resources import as_file, files
|
17 |
-
|
18 |
-
_CACERT_CTX = None
|
19 |
-
_CACERT_PATH = None
|
20 |
-
|
21 |
-
def where() -> str:
|
22 |
-
# This is slightly terrible, but we want to delay extracting the file
|
23 |
-
# in cases where we're inside of a zipimport situation until someone
|
24 |
-
# actually calls where(), but we don't want to re-extract the file
|
25 |
-
# on every call of where(), so we'll do it once then store it in a
|
26 |
-
# global variable.
|
27 |
-
global _CACERT_CTX
|
28 |
-
global _CACERT_PATH
|
29 |
-
if _CACERT_PATH is None:
|
30 |
-
# This is slightly janky, the importlib.resources API wants you to
|
31 |
-
# manage the cleanup of this file, so it doesn't actually return a
|
32 |
-
# path, it returns a context manager that will give you the path
|
33 |
-
# when you enter it and will do any cleanup when you leave it. In
|
34 |
-
# the common case of not needing a temporary file, it will just
|
35 |
-
# return the file system location and the __exit__() is a no-op.
|
36 |
-
#
|
37 |
-
# We also have to hold onto the actual context manager, because
|
38 |
-
# it will do the cleanup whenever it gets garbage collected, so
|
39 |
-
# we will also store that at the global level as well.
|
40 |
-
_CACERT_CTX = as_file(files("certifi").joinpath("cacert.pem"))
|
41 |
-
_CACERT_PATH = str(_CACERT_CTX.__enter__())
|
42 |
-
atexit.register(exit_cacert_ctx)
|
43 |
-
|
44 |
-
return _CACERT_PATH
|
45 |
-
|
46 |
-
def contents() -> str:
|
47 |
-
return files("certifi").joinpath("cacert.pem").read_text(encoding="ascii")
|
48 |
-
|
49 |
-
elif sys.version_info >= (3, 7):
|
50 |
-
|
51 |
-
from importlib.resources import path as get_path, read_text
|
52 |
-
|
53 |
-
_CACERT_CTX = None
|
54 |
-
_CACERT_PATH = None
|
55 |
-
|
56 |
-
def where() -> str:
|
57 |
-
# This is slightly terrible, but we want to delay extracting the
|
58 |
-
# file in cases where we're inside of a zipimport situation until
|
59 |
-
# someone actually calls where(), but we don't want to re-extract
|
60 |
-
# the file on every call of where(), so we'll do it once then store
|
61 |
-
# it in a global variable.
|
62 |
-
global _CACERT_CTX
|
63 |
-
global _CACERT_PATH
|
64 |
-
if _CACERT_PATH is None:
|
65 |
-
# This is slightly janky, the importlib.resources API wants you
|
66 |
-
# to manage the cleanup of this file, so it doesn't actually
|
67 |
-
# return a path, it returns a context manager that will give
|
68 |
-
# you the path when you enter it and will do any cleanup when
|
69 |
-
# you leave it. In the common case of not needing a temporary
|
70 |
-
# file, it will just return the file system location and the
|
71 |
-
# __exit__() is a no-op.
|
72 |
-
#
|
73 |
-
# We also have to hold onto the actual context manager, because
|
74 |
-
# it will do the cleanup whenever it gets garbage collected, so
|
75 |
-
# we will also store that at the global level as well.
|
76 |
-
_CACERT_CTX = get_path("certifi", "cacert.pem")
|
77 |
-
_CACERT_PATH = str(_CACERT_CTX.__enter__())
|
78 |
-
atexit.register(exit_cacert_ctx)
|
79 |
-
|
80 |
-
return _CACERT_PATH
|
81 |
-
|
82 |
-
def contents() -> str:
|
83 |
-
return read_text("certifi", "cacert.pem", encoding="ascii")
|
84 |
-
|
85 |
-
else:
|
86 |
-
import os
|
87 |
-
import types
|
88 |
-
from typing import Union
|
89 |
-
|
90 |
-
Package = Union[types.ModuleType, str]
|
91 |
-
Resource = Union[str, "os.PathLike"]
|
92 |
-
|
93 |
-
# This fallback will work for Python versions prior to 3.7 that lack the
|
94 |
-
# importlib.resources module but relies on the existing `where` function
|
95 |
-
# so won't address issues with environments like PyOxidizer that don't set
|
96 |
-
# __file__ on modules.
|
97 |
-
def read_text(
|
98 |
-
package: Package,
|
99 |
-
resource: Resource,
|
100 |
-
encoding: str = 'utf-8',
|
101 |
-
errors: str = 'strict'
|
102 |
-
) -> str:
|
103 |
-
with open(where(), encoding=encoding) as data:
|
104 |
-
return data.read()
|
105 |
-
|
106 |
-
# If we don't have importlib.resources, then we will just do the old logic
|
107 |
-
# of assuming we're on the filesystem and munge the path directly.
|
108 |
-
def where() -> str:
|
109 |
-
f = os.path.dirname(__file__)
|
110 |
-
|
111 |
-
return os.path.join(f, "cacert.pem")
|
112 |
-
|
113 |
-
def contents() -> str:
|
114 |
-
return read_text("certifi", "cacert.pem", encoding="ascii")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/certifi/py.typed
DELETED
File without changes
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/INSTALLER
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
pip
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/LICENSE
DELETED
@@ -1,21 +0,0 @@
|
|
1 |
-
MIT License
|
2 |
-
|
3 |
-
Copyright (c) 2025 TAHRI Ahmed R.
|
4 |
-
|
5 |
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
-
of this software and associated documentation files (the "Software"), to deal
|
7 |
-
in the Software without restriction, including without limitation the rights
|
8 |
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
-
copies of the Software, and to permit persons to whom the Software is
|
10 |
-
furnished to do so, subject to the following conditions:
|
11 |
-
|
12 |
-
The above copyright notice and this permission notice shall be included in all
|
13 |
-
copies or substantial portions of the Software.
|
14 |
-
|
15 |
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
-
SOFTWARE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/METADATA
DELETED
@@ -1,721 +0,0 @@
|
|
1 |
-
Metadata-Version: 2.1
|
2 |
-
Name: charset-normalizer
|
3 |
-
Version: 3.4.1
|
4 |
-
Summary: The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
|
5 |
-
Author-email: "Ahmed R. TAHRI" <[email protected]>
|
6 |
-
Maintainer-email: "Ahmed R. TAHRI" <[email protected]>
|
7 |
-
License: MIT
|
8 |
-
Project-URL: Changelog, https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md
|
9 |
-
Project-URL: Documentation, https://charset-normalizer.readthedocs.io/
|
10 |
-
Project-URL: Code, https://github.com/jawah/charset_normalizer
|
11 |
-
Project-URL: Issue tracker, https://github.com/jawah/charset_normalizer/issues
|
12 |
-
Keywords: encoding,charset,charset-detector,detector,normalization,unicode,chardet,detect
|
13 |
-
Classifier: Development Status :: 5 - Production/Stable
|
14 |
-
Classifier: Intended Audience :: Developers
|
15 |
-
Classifier: License :: OSI Approved :: MIT License
|
16 |
-
Classifier: Operating System :: OS Independent
|
17 |
-
Classifier: Programming Language :: Python
|
18 |
-
Classifier: Programming Language :: Python :: 3
|
19 |
-
Classifier: Programming Language :: Python :: 3.7
|
20 |
-
Classifier: Programming Language :: Python :: 3.8
|
21 |
-
Classifier: Programming Language :: Python :: 3.9
|
22 |
-
Classifier: Programming Language :: Python :: 3.10
|
23 |
-
Classifier: Programming Language :: Python :: 3.11
|
24 |
-
Classifier: Programming Language :: Python :: 3.12
|
25 |
-
Classifier: Programming Language :: Python :: 3.13
|
26 |
-
Classifier: Programming Language :: Python :: 3 :: Only
|
27 |
-
Classifier: Programming Language :: Python :: Implementation :: CPython
|
28 |
-
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
29 |
-
Classifier: Topic :: Text Processing :: Linguistic
|
30 |
-
Classifier: Topic :: Utilities
|
31 |
-
Classifier: Typing :: Typed
|
32 |
-
Requires-Python: >=3.7
|
33 |
-
Description-Content-Type: text/markdown
|
34 |
-
License-File: LICENSE
|
35 |
-
Provides-Extra: unicode-backport
|
36 |
-
|
37 |
-
<h1 align="center">Charset Detection, for Everyone 👋</h1>
|
38 |
-
|
39 |
-
<p align="center">
|
40 |
-
<sup>The Real First Universal Charset Detector</sup><br>
|
41 |
-
<a href="https://pypi.org/project/charset-normalizer">
|
42 |
-
<img src="https://img.shields.io/pypi/pyversions/charset_normalizer.svg?orange=blue" />
|
43 |
-
</a>
|
44 |
-
<a href="https://pepy.tech/project/charset-normalizer/">
|
45 |
-
<img alt="Download Count Total" src="https://static.pepy.tech/badge/charset-normalizer/month" />
|
46 |
-
</a>
|
47 |
-
<a href="https://bestpractices.coreinfrastructure.org/projects/7297">
|
48 |
-
<img src="https://bestpractices.coreinfrastructure.org/projects/7297/badge">
|
49 |
-
</a>
|
50 |
-
</p>
|
51 |
-
<p align="center">
|
52 |
-
<sup><i>Featured Packages</i></sup><br>
|
53 |
-
<a href="https://github.com/jawah/niquests">
|
54 |
-
<img alt="Static Badge" src="https://img.shields.io/badge/Niquests-Best_HTTP_Client-cyan">
|
55 |
-
</a>
|
56 |
-
<a href="https://github.com/jawah/wassima">
|
57 |
-
<img alt="Static Badge" src="https://img.shields.io/badge/Wassima-Certifi_Killer-cyan">
|
58 |
-
</a>
|
59 |
-
</p>
|
60 |
-
<p align="center">
|
61 |
-
<sup><i>In other language (unofficial port - by the community)</i></sup><br>
|
62 |
-
<a href="https://github.com/nickspring/charset-normalizer-rs">
|
63 |
-
<img alt="Static Badge" src="https://img.shields.io/badge/Rust-red">
|
64 |
-
</a>
|
65 |
-
</p>
|
66 |
-
|
67 |
-
> A library that helps you read text from an unknown charset encoding.<br /> Motivated by `chardet`,
|
68 |
-
> I'm trying to resolve the issue by taking a new approach.
|
69 |
-
> All IANA character set names for which the Python core library provides codecs are supported.
|
70 |
-
|
71 |
-
<p align="center">
|
72 |
-
>>>>> <a href="https://charsetnormalizerweb.ousret.now.sh" target="_blank">👉 Try Me Online Now, Then Adopt Me 👈 </a> <<<<<
|
73 |
-
</p>
|
74 |
-
|
75 |
-
This project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.
|
76 |
-
|
77 |
-
| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |
|
78 |
-
|--------------------------------------------------|:---------------------------------------------:|:--------------------------------------------------------------------------------------------------:|:-----------------------------------------------:|
|
79 |
-
| `Fast` | ❌ | ✅ | ✅ |
|
80 |
-
| `Universal**` | ❌ | ✅ | ❌ |
|
81 |
-
| `Reliable` **without** distinguishable standards | ❌ | ✅ | ✅ |
|
82 |
-
| `Reliable` **with** distinguishable standards | ✅ | ✅ | ✅ |
|
83 |
-
| `License` | LGPL-2.1<br>_restrictive_ | MIT | MPL-1.1<br>_restrictive_ |
|
84 |
-
| `Native Python` | ✅ | ✅ | ❌ |
|
85 |
-
| `Detect spoken language` | ❌ | ✅ | N/A |
|
86 |
-
| `UnicodeDecodeError Safety` | ❌ | ✅ | ❌ |
|
87 |
-
| `Whl Size (min)` | 193.6 kB | 42 kB | ~200 kB |
|
88 |
-
| `Supported Encoding` | 33 | 🎉 [99](https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings) | 40 |
|
89 |
-
|
90 |
-
<p align="center">
|
91 |
-
<img src="https://i.imgflip.com/373iay.gif" alt="Reading Normalized Text" width="226"/><img src="https://media.tenor.com/images/c0180f70732a18b4965448d33adba3d0/tenor.gif" alt="Cat Reading Text" width="200"/>
|
92 |
-
</p>
|
93 |
-
|
94 |
-
*\*\* : They are clearly using specific code for a specific encoding even if covering most of used one*<br>
|
95 |
-
|
96 |
-
## ⚡ Performance
|
97 |
-
|
98 |
-
This package offer better performance than its counterpart Chardet. Here are some numbers.
|
99 |
-
|
100 |
-
| Package | Accuracy | Mean per file (ms) | File per sec (est) |
|
101 |
-
|-----------------------------------------------|:--------:|:------------------:|:------------------:|
|
102 |
-
| [chardet](https://github.com/chardet/chardet) | 86 % | 63 ms | 16 file/sec |
|
103 |
-
| charset-normalizer | **98 %** | **10 ms** | 100 file/sec |
|
104 |
-
|
105 |
-
| Package | 99th percentile | 95th percentile | 50th percentile |
|
106 |
-
|-----------------------------------------------|:---------------:|:---------------:|:---------------:|
|
107 |
-
| [chardet](https://github.com/chardet/chardet) | 265 ms | 71 ms | 7 ms |
|
108 |
-
| charset-normalizer | 100 ms | 50 ms | 5 ms |
|
109 |
-
|
110 |
-
_updated as of december 2024 using CPython 3.12_
|
111 |
-
|
112 |
-
Chardet's performance on larger file (1MB+) are very poor. Expect huge difference on large payload.
|
113 |
-
|
114 |
-
> Stats are generated using 400+ files using default parameters. More details on used files, see GHA workflows.
|
115 |
-
> And yes, these results might change at any time. The dataset can be updated to include more files.
|
116 |
-
> The actual delays heavily depends on your CPU capabilities. The factors should remain the same.
|
117 |
-
> Keep in mind that the stats are generous and that Chardet accuracy vs our is measured using Chardet initial capability
|
118 |
-
> (e.g. Supported Encoding) Challenge-them if you want.
|
119 |
-
|
120 |
-
## ✨ Installation
|
121 |
-
|
122 |
-
Using pip:
|
123 |
-
|
124 |
-
```sh
|
125 |
-
pip install charset-normalizer -U
|
126 |
-
```
|
127 |
-
|
128 |
-
## 🚀 Basic Usage
|
129 |
-
|
130 |
-
### CLI
|
131 |
-
This package comes with a CLI.
|
132 |
-
|
133 |
-
```
|
134 |
-
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
|
135 |
-
file [file ...]
|
136 |
-
|
137 |
-
The Real First Universal Charset Detector. Discover originating encoding used
|
138 |
-
on text file. Normalize text to unicode.
|
139 |
-
|
140 |
-
positional arguments:
|
141 |
-
files File(s) to be analysed
|
142 |
-
|
143 |
-
optional arguments:
|
144 |
-
-h, --help show this help message and exit
|
145 |
-
-v, --verbose Display complementary information about file if any.
|
146 |
-
Stdout will contain logs about the detection process.
|
147 |
-
-a, --with-alternative
|
148 |
-
Output complementary possibilities if any. Top-level
|
149 |
-
JSON WILL be a list.
|
150 |
-
-n, --normalize Permit to normalize input file. If not set, program
|
151 |
-
does not write anything.
|
152 |
-
-m, --minimal Only output the charset detected to STDOUT. Disabling
|
153 |
-
JSON output.
|
154 |
-
-r, --replace Replace file when trying to normalize it instead of
|
155 |
-
creating a new one.
|
156 |
-
-f, --force Replace file without asking if you are sure, use this
|
157 |
-
flag with caution.
|
158 |
-
-t THRESHOLD, --threshold THRESHOLD
|
159 |
-
Define a custom maximum amount of chaos allowed in
|
160 |
-
decoded content. 0. <= chaos <= 1.
|
161 |
-
--version Show version information and exit.
|
162 |
-
```
|
163 |
-
|
164 |
-
```bash
|
165 |
-
normalizer ./data/sample.1.fr.srt
|
166 |
-
```
|
167 |
-
|
168 |
-
or
|
169 |
-
|
170 |
-
```bash
|
171 |
-
python -m charset_normalizer ./data/sample.1.fr.srt
|
172 |
-
```
|
173 |
-
|
174 |
-
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in JSON format.
|
175 |
-
|
176 |
-
```json
|
177 |
-
{
|
178 |
-
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
|
179 |
-
"encoding": "cp1252",
|
180 |
-
"encoding_aliases": [
|
181 |
-
"1252",
|
182 |
-
"windows_1252"
|
183 |
-
],
|
184 |
-
"alternative_encodings": [
|
185 |
-
"cp1254",
|
186 |
-
"cp1256",
|
187 |
-
"cp1258",
|
188 |
-
"iso8859_14",
|
189 |
-
"iso8859_15",
|
190 |
-
"iso8859_16",
|
191 |
-
"iso8859_3",
|
192 |
-
"iso8859_9",
|
193 |
-
"latin_1",
|
194 |
-
"mbcs"
|
195 |
-
],
|
196 |
-
"language": "French",
|
197 |
-
"alphabets": [
|
198 |
-
"Basic Latin",
|
199 |
-
"Latin-1 Supplement"
|
200 |
-
],
|
201 |
-
"has_sig_or_bom": false,
|
202 |
-
"chaos": 0.149,
|
203 |
-
"coherence": 97.152,
|
204 |
-
"unicode_path": null,
|
205 |
-
"is_preferred": true
|
206 |
-
}
|
207 |
-
```
|
208 |
-
|
209 |
-
### Python
|
210 |
-
*Just print out normalized text*
|
211 |
-
```python
|
212 |
-
from charset_normalizer import from_path
|
213 |
-
|
214 |
-
results = from_path('./my_subtitle.srt')
|
215 |
-
|
216 |
-
print(str(results.best()))
|
217 |
-
```
|
218 |
-
|
219 |
-
*Upgrade your code without effort*
|
220 |
-
```python
|
221 |
-
from charset_normalizer import detect
|
222 |
-
```
|
223 |
-
|
224 |
-
The above code will behave the same as **chardet**. We ensure that we offer the best (reasonable) BC result possible.
|
225 |
-
|
226 |
-
See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)
|
227 |
-
|
228 |
-
## 😇 Why
|
229 |
-
|
230 |
-
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
|
231 |
-
reliable alternative using a completely different method. Also! I never back down on a good challenge!
|
232 |
-
|
233 |
-
I **don't care** about the **originating charset** encoding, because **two different tables** can
|
234 |
-
produce **two identical rendered string.**
|
235 |
-
What I want is to get readable text, the best I can.
|
236 |
-
|
237 |
-
In a way, **I'm brute forcing text decoding.** How cool is that ? 😎
|
238 |
-
|
239 |
-
Don't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair Unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.
|
240 |
-
|
241 |
-
## 🍰 How
|
242 |
-
|
243 |
-
- Discard all charset encoding table that could not fit the binary content.
|
244 |
-
- Measure noise, or the mess once opened (by chunks) with a corresponding charset encoding.
|
245 |
-
- Extract matches with the lowest mess detected.
|
246 |
-
- Additionally, we measure coherence / probe for a language.
|
247 |
-
|
248 |
-
**Wait a minute**, what is noise/mess and coherence according to **YOU ?**
|
249 |
-
|
250 |
-
*Noise :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then
|
251 |
-
**I established** some ground rules about **what is obvious** when **it seems like** a mess (aka. defining noise in rendered text).
|
252 |
-
I know that my interpretation of what is noise is probably incomplete, feel free to contribute in order to
|
253 |
-
improve or rewrite it.
|
254 |
-
|
255 |
-
*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought
|
256 |
-
that intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.
|
257 |
-
|
258 |
-
## ⚡ Known limitations
|
259 |
-
|
260 |
-
- Language detection is unreliable when text contains two or more languages sharing identical letters. (eg. HTML (english tags) + Turkish content (Sharing Latin characters))
|
261 |
-
- Every charset detector heavily depends on sufficient content. In common cases, do not bother run detection on very tiny content.
|
262 |
-
|
263 |
-
## ⚠️ About Python EOLs
|
264 |
-
|
265 |
-
**If you are running:**
|
266 |
-
|
267 |
-
- Python >=2.7,<3.5: Unsupported
|
268 |
-
- Python 3.5: charset-normalizer < 2.1
|
269 |
-
- Python 3.6: charset-normalizer < 3.1
|
270 |
-
- Python 3.7: charset-normalizer < 4.0
|
271 |
-
|
272 |
-
Upgrade your Python interpreter as soon as possible.
|
273 |
-
|
274 |
-
## 👤 Contributing
|
275 |
-
|
276 |
-
Contributions, issues and feature requests are very much welcome.<br />
|
277 |
-
Feel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.
|
278 |
-
|
279 |
-
## 📝 License
|
280 |
-
|
281 |
-
Copyright © [Ahmed TAHRI @Ousret](https://github.com/Ousret).<br />
|
282 |
-
This project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.
|
283 |
-
|
284 |
-
Characters frequencies used in this project © 2012 [Denny Vrandečić](http://simia.net/letters/)
|
285 |
-
|
286 |
-
## 💼 For Enterprise
|
287 |
-
|
288 |
-
Professional support for charset-normalizer is available as part of the [Tidelift
|
289 |
-
Subscription][1]. Tidelift gives software development teams a single source for
|
290 |
-
purchasing and maintaining their software, with professional grade assurances
|
291 |
-
from the experts who know it best, while seamlessly integrating with existing
|
292 |
-
tools.
|
293 |
-
|
294 |
-
[1]: https://tidelift.com/subscription/pkg/pypi-charset-normalizer?utm_source=pypi-charset-normalizer&utm_medium=readme
|
295 |
-
|
296 |
-
[](https://www.bestpractices.dev/projects/7297)
|
297 |
-
|
298 |
-
# Changelog
|
299 |
-
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
300 |
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
301 |
-
|
302 |
-
## [3.4.1](https://github.com/Ousret/charset_normalizer/compare/3.4.0...3.4.1) (2024-12-24)
|
303 |
-
|
304 |
-
### Changed
|
305 |
-
- Project metadata are now stored using `pyproject.toml` instead of `setup.cfg` using setuptools as the build backend.
|
306 |
-
- Enforce annotation delayed loading for a simpler and consistent types in the project.
|
307 |
-
- Optional mypyc compilation upgraded to version 1.14 for Python >= 3.8
|
308 |
-
|
309 |
-
### Added
|
310 |
-
- pre-commit configuration.
|
311 |
-
- noxfile.
|
312 |
-
|
313 |
-
### Removed
|
314 |
-
- `build-requirements.txt` as per using `pyproject.toml` native build configuration.
|
315 |
-
- `bin/integration.py` and `bin/serve.py` in favor of downstream integration test (see noxfile).
|
316 |
-
- `setup.cfg` in favor of `pyproject.toml` metadata configuration.
|
317 |
-
- Unused `utils.range_scan` function.
|
318 |
-
|
319 |
-
### Fixed
|
320 |
-
- Converting content to Unicode bytes may insert `utf_8` instead of preferred `utf-8`. (#572)
|
321 |
-
- Deprecation warning "'count' is passed as positional argument" when converting to Unicode bytes on Python 3.13+
|
322 |
-
|
323 |
-
## [3.4.0](https://github.com/Ousret/charset_normalizer/compare/3.3.2...3.4.0) (2024-10-08)
|
324 |
-
|
325 |
-
### Added
|
326 |
-
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
|
327 |
-
- Support for Python 3.13 (#512)
|
328 |
-
|
329 |
-
### Fixed
|
330 |
-
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
|
331 |
-
- Improved the general reliability of the detector based on user feedbacks. (#520) (#509) (#498) (#407) (#537)
|
332 |
-
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes. (#381)
|
333 |
-
|
334 |
-
## [3.3.2](https://github.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) (2023-10-31)
|
335 |
-
|
336 |
-
### Fixed
|
337 |
-
- Unintentional memory usage regression when using large payload that match several encoding (#376)
|
338 |
-
- Regression on some detection case showcased in the documentation (#371)
|
339 |
-
|
340 |
-
### Added
|
341 |
-
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
|
342 |
-
|
343 |
-
## [3.3.1](https://github.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) (2023-10-22)
|
344 |
-
|
345 |
-
### Changed
|
346 |
-
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
|
347 |
-
- Improved the general detection reliability based on reports from the community
|
348 |
-
|
349 |
-
## [3.3.0](https://github.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) (2023-09-30)
|
350 |
-
|
351 |
-
### Added
|
352 |
-
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
|
353 |
-
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias (#323)
|
354 |
-
|
355 |
-
### Removed
|
356 |
-
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
|
357 |
-
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
|
358 |
-
|
359 |
-
### Changed
|
360 |
-
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
|
361 |
-
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.8
|
362 |
-
|
363 |
-
### Fixed
|
364 |
-
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_ (#350)
|
365 |
-
|
366 |
-
## [3.2.0](https://github.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) (2023-06-07)
|
367 |
-
|
368 |
-
### Changed
|
369 |
-
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
|
370 |
-
- Minor improvement over the global detection reliability
|
371 |
-
|
372 |
-
### Added
|
373 |
-
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
|
374 |
-
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
|
375 |
-
- Explicit support for Python 3.12
|
376 |
-
|
377 |
-
### Fixed
|
378 |
-
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue #289)
|
379 |
-
|
380 |
-
## [3.1.0](https://github.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) (2023-03-06)
|
381 |
-
|
382 |
-
### Added
|
383 |
-
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR #262)
|
384 |
-
|
385 |
-
### Removed
|
386 |
-
- Support for Python 3.6 (PR #260)
|
387 |
-
|
388 |
-
### Changed
|
389 |
-
- Optional speedup provided by mypy/c 1.0.1
|
390 |
-
|
391 |
-
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) (2022-11-18)
|
392 |
-
|
393 |
-
### Fixed
|
394 |
-
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
|
395 |
-
|
396 |
-
### Changed
|
397 |
-
- Speedup provided by mypy/c 0.990 on Python >= 3.7
|
398 |
-
|
399 |
-
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
|
400 |
-
|
401 |
-
### Added
|
402 |
-
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
403 |
-
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
404 |
-
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
405 |
-
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
406 |
-
|
407 |
-
### Changed
|
408 |
-
- Build with static metadata using 'build' frontend
|
409 |
-
- Make the language detection stricter
|
410 |
-
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
411 |
-
|
412 |
-
### Fixed
|
413 |
-
- CLI with opt --normalize fail when using full path for files
|
414 |
-
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
415 |
-
- Sphinx warnings when generating the documentation
|
416 |
-
|
417 |
-
### Removed
|
418 |
-
- Coherence detector no longer return 'Simple English' instead return 'English'
|
419 |
-
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
420 |
-
- Breaking: Method `first()` and `best()` from CharsetMatch
|
421 |
-
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
422 |
-
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
423 |
-
- Breaking: Top-level function `normalize`
|
424 |
-
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
425 |
-
- Support for the backport `unicodedata2`
|
426 |
-
|
427 |
-
## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)
|
428 |
-
|
429 |
-
### Added
|
430 |
-
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
|
431 |
-
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
|
432 |
-
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
|
433 |
-
|
434 |
-
### Changed
|
435 |
-
- Build with static metadata using 'build' frontend
|
436 |
-
- Make the language detection stricter
|
437 |
-
|
438 |
-
### Fixed
|
439 |
-
- CLI with opt --normalize fail when using full path for files
|
440 |
-
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
|
441 |
-
|
442 |
-
### Removed
|
443 |
-
- Coherence detector no longer return 'Simple English' instead return 'English'
|
444 |
-
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
|
445 |
-
|
446 |
-
## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)
|
447 |
-
|
448 |
-
### Added
|
449 |
-
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
|
450 |
-
|
451 |
-
### Removed
|
452 |
-
- Breaking: Method `first()` and `best()` from CharsetMatch
|
453 |
-
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
|
454 |
-
|
455 |
-
### Fixed
|
456 |
-
- Sphinx warnings when generating the documentation
|
457 |
-
|
458 |
-
## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)
|
459 |
-
|
460 |
-
### Changed
|
461 |
-
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
|
462 |
-
|
463 |
-
### Removed
|
464 |
-
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
|
465 |
-
- Breaking: Top-level function `normalize`
|
466 |
-
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
|
467 |
-
- Support for the backport `unicodedata2`
|
468 |
-
|
469 |
-
## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)
|
470 |
-
|
471 |
-
### Deprecated
|
472 |
-
- Function `normalize` scheduled for removal in 3.0
|
473 |
-
|
474 |
-
### Changed
|
475 |
-
- Removed useless call to decode in fn is_unprintable (#206)
|
476 |
-
|
477 |
-
### Fixed
|
478 |
-
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore from [@aleksandernovikov](https://github.com/aleksandernovikov) (#204)
|
479 |
-
|
480 |
-
## [2.1.0](https://github.com/Ousret/charset_normalizer/compare/2.0.12...2.1.0) (2022-06-19)
|
481 |
-
|
482 |
-
### Added
|
483 |
-
- Output the Unicode table version when running the CLI with `--version` (PR #194)
|
484 |
-
|
485 |
-
### Changed
|
486 |
-
- Re-use decoded buffer for single byte character sets from [@nijel](https://github.com/nijel) (PR #175)
|
487 |
-
- Fixing some performance bottlenecks from [@deedy5](https://github.com/deedy5) (PR #183)
|
488 |
-
|
489 |
-
### Fixed
|
490 |
-
- Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space (PR #175)
|
491 |
-
- CLI default threshold aligned with the API threshold from [@oleksandr-kuzmenko](https://github.com/oleksandr-kuzmenko) (PR #181)
|
492 |
-
|
493 |
-
### Removed
|
494 |
-
- Support for Python 3.5 (PR #192)
|
495 |
-
|
496 |
-
### Deprecated
|
497 |
-
- Use of backport unicodedata from `unicodedata2` as Python is quickly catching up, scheduled for removal in 3.0 (PR #194)
|
498 |
-
|
499 |
-
## [2.0.12](https://github.com/Ousret/charset_normalizer/compare/2.0.11...2.0.12) (2022-02-12)
|
500 |
-
|
501 |
-
### Fixed
|
502 |
-
- ASCII miss-detection on rare cases (PR #170)
|
503 |
-
|
504 |
-
## [2.0.11](https://github.com/Ousret/charset_normalizer/compare/2.0.10...2.0.11) (2022-01-30)
|
505 |
-
|
506 |
-
### Added
|
507 |
-
- Explicit support for Python 3.11 (PR #164)
|
508 |
-
|
509 |
-
### Changed
|
510 |
-
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels (PR #163 #165)
|
511 |
-
|
512 |
-
## [2.0.10](https://github.com/Ousret/charset_normalizer/compare/2.0.9...2.0.10) (2022-01-04)
|
513 |
-
|
514 |
-
### Fixed
|
515 |
-
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence (PR #154)
|
516 |
-
|
517 |
-
### Changed
|
518 |
-
- Skipping the language-detection (CD) on ASCII (PR #155)
|
519 |
-
|
520 |
-
## [2.0.9](https://github.com/Ousret/charset_normalizer/compare/2.0.8...2.0.9) (2021-12-03)
|
521 |
-
|
522 |
-
### Changed
|
523 |
-
- Moderating the logging impact (since 2.0.8) for specific environments (PR #147)
|
524 |
-
|
525 |
-
### Fixed
|
526 |
-
- Wrong logging level applied when setting kwarg `explain` to True (PR #146)
|
527 |
-
|
528 |
-
## [2.0.8](https://github.com/Ousret/charset_normalizer/compare/2.0.7...2.0.8) (2021-11-24)
|
529 |
-
### Changed
|
530 |
-
- Improvement over Vietnamese detection (PR #126)
|
531 |
-
- MD improvement on trailing data and long foreign (non-pure latin) data (PR #124)
|
532 |
-
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar) (PR #122)
|
533 |
-
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar) (PR #129)
|
534 |
-
- Code style as refactored by Sourcery-AI (PR #131)
|
535 |
-
- Minor adjustment on the MD around european words (PR #133)
|
536 |
-
- Remove and replace SRTs from assets / tests (PR #139)
|
537 |
-
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
538 |
-
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler (PR #135)
|
539 |
-
|
540 |
-
### Fixed
|
541 |
-
- Fix large (misleading) sequence giving UnicodeDecodeError (PR #137)
|
542 |
-
- Avoid using too insignificant chunk (PR #137)
|
543 |
-
|
544 |
-
### Added
|
545 |
-
- Add and expose function `set_logging_handler` to configure a specific StreamHandler from [@nmaynes](https://github.com/nmaynes) (PR #135)
|
546 |
-
- Add `CHANGELOG.md` entries, format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) (PR #141)
|
547 |
-
|
548 |
-
## [2.0.7](https://github.com/Ousret/charset_normalizer/compare/2.0.6...2.0.7) (2021-10-11)
|
549 |
-
### Added
|
550 |
-
- Add support for Kazakh (Cyrillic) language detection (PR #109)
|
551 |
-
|
552 |
-
### Changed
|
553 |
-
- Further, improve inferring the language from a given single-byte code page (PR #112)
|
554 |
-
- Vainly trying to leverage PEP263 when PEP3120 is not supported (PR #116)
|
555 |
-
- Refactoring for potential performance improvements in loops from [@adbar](https://github.com/adbar) (PR #113)
|
556 |
-
- Various detection improvement (MD+CD) (PR #117)
|
557 |
-
|
558 |
-
### Removed
|
559 |
-
- Remove redundant logging entry about detected language(s) (PR #115)
|
560 |
-
|
561 |
-
### Fixed
|
562 |
-
- Fix a minor inconsistency between Python 3.5 and other versions regarding language detection (PR #117 #102)
|
563 |
-
|
564 |
-
## [2.0.6](https://github.com/Ousret/charset_normalizer/compare/2.0.5...2.0.6) (2021-09-18)
|
565 |
-
### Fixed
|
566 |
-
- Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x (PR #100)
|
567 |
-
- Fix CLI crash when using --minimal output in certain cases (PR #103)
|
568 |
-
|
569 |
-
### Changed
|
570 |
-
- Minor improvement to the detection efficiency (less than 1%) (PR #106 #101)
|
571 |
-
|
572 |
-
## [2.0.5](https://github.com/Ousret/charset_normalizer/compare/2.0.4...2.0.5) (2021-09-14)
|
573 |
-
### Changed
|
574 |
-
- The project now comply with: flake8, mypy, isort and black to ensure a better overall quality (PR #81)
|
575 |
-
- The BC-support with v1.x was improved, the old staticmethods are restored (PR #82)
|
576 |
-
- The Unicode detection is slightly improved (PR #93)
|
577 |
-
- Add syntax sugar \_\_bool\_\_ for results CharsetMatches list-container (PR #91)
|
578 |
-
|
579 |
-
### Removed
|
580 |
-
- The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead (PR #92)
|
581 |
-
|
582 |
-
### Fixed
|
583 |
-
- In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection (PR #95)
|
584 |
-
- Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection (PR #96)
|
585 |
-
- The MANIFEST.in was not exhaustive (PR #78)
|
586 |
-
|
587 |
-
## [2.0.4](https://github.com/Ousret/charset_normalizer/compare/2.0.3...2.0.4) (2021-07-30)
|
588 |
-
### Fixed
|
589 |
-
- The CLI no longer raise an unexpected exception when no encoding has been found (PR #70)
|
590 |
-
- Fix accessing the 'alphabets' property when the payload contains surrogate characters (PR #68)
|
591 |
-
- The logger could mislead (explain=True) on detected languages and the impact of one MBCS match (PR #72)
|
592 |
-
- Submatch factoring could be wrong in rare edge cases (PR #72)
|
593 |
-
- Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path) (PR #72)
|
594 |
-
- Fix line endings from CRLF to LF for certain project files (PR #67)
|
595 |
-
|
596 |
-
### Changed
|
597 |
-
- Adjust the MD to lower the sensitivity, thus improving the global detection reliability (PR #69 #76)
|
598 |
-
- Allow fallback on specified encoding if any (PR #71)
|
599 |
-
|
600 |
-
## [2.0.3](https://github.com/Ousret/charset_normalizer/compare/2.0.2...2.0.3) (2021-07-16)
|
601 |
-
### Changed
|
602 |
-
- Part of the detection mechanism has been improved to be less sensitive, resulting in more accurate detection results. Especially ASCII. (PR #63)
|
603 |
-
- According to the community wishes, the detection will fall back on ASCII or UTF-8 in a last-resort case. (PR #64)
|
604 |
-
|
605 |
-
## [2.0.2](https://github.com/Ousret/charset_normalizer/compare/2.0.1...2.0.2) (2021-07-15)
|
606 |
-
### Fixed
|
607 |
-
- Empty/Too small JSON payload miss-detection fixed. Report from [@tseaver](https://github.com/tseaver) (PR #59)
|
608 |
-
|
609 |
-
### Changed
|
610 |
-
- Don't inject unicodedata2 into sys.modules from [@akx](https://github.com/akx) (PR #57)
|
611 |
-
|
612 |
-
## [2.0.1](https://github.com/Ousret/charset_normalizer/compare/2.0.0...2.0.1) (2021-07-13)
|
613 |
-
### Fixed
|
614 |
-
- Make it work where there isn't a filesystem available, dropping assets frequencies.json. Report from [@sethmlarson](https://github.com/sethmlarson). (PR #55)
|
615 |
-
- Using explain=False permanently disable the verbose output in the current runtime (PR #47)
|
616 |
-
- One log entry (language target preemptive) was not show in logs when using explain=True (PR #47)
|
617 |
-
- Fix undesired exception (ValueError) on getitem of instance CharsetMatches (PR #52)
|
618 |
-
|
619 |
-
### Changed
|
620 |
-
- Public function normalize default args values were not aligned with from_bytes (PR #53)
|
621 |
-
|
622 |
-
### Added
|
623 |
-
- You may now use charset aliases in cp_isolation and cp_exclusion arguments (PR #47)
|
624 |
-
|
625 |
-
## [2.0.0](https://github.com/Ousret/charset_normalizer/compare/1.4.1...2.0.0) (2021-07-02)
|
626 |
-
### Changed
|
627 |
-
- 4x to 5 times faster than the previous 1.4.0 release. At least 2x faster than Chardet.
|
628 |
-
- Accent has been made on UTF-8 detection, should perform rather instantaneous.
|
629 |
-
- The backward compatibility with Chardet has been greatly improved. The legacy detect function returns an identical charset name whenever possible.
|
630 |
-
- The detection mechanism has been slightly improved, now Turkish content is detected correctly (most of the time)
|
631 |
-
- The program has been rewritten to ease the readability and maintainability. (+Using static typing)+
|
632 |
-
- utf_7 detection has been reinstated.
|
633 |
-
|
634 |
-
### Removed
|
635 |
-
- This package no longer require anything when used with Python 3.5 (Dropped cached_property)
|
636 |
-
- Removed support for these languages: Catalan, Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk, Macedonian, and Serbocroatian.
|
637 |
-
- The exception hook on UnicodeDecodeError has been removed.
|
638 |
-
|
639 |
-
### Deprecated
|
640 |
-
- Methods coherence_non_latin, w_counter, chaos_secondary_pass of the class CharsetMatch are now deprecated and scheduled for removal in v3.0
|
641 |
-
|
642 |
-
### Fixed
|
643 |
-
- The CLI output used the relative path of the file(s). Should be absolute.
|
644 |
-
|
645 |
-
## [1.4.1](https://github.com/Ousret/charset_normalizer/compare/1.4.0...1.4.1) (2021-05-28)
|
646 |
-
### Fixed
|
647 |
-
- Logger configuration/usage no longer conflict with others (PR #44)
|
648 |
-
|
649 |
-
## [1.4.0](https://github.com/Ousret/charset_normalizer/compare/1.3.9...1.4.0) (2021-05-21)
|
650 |
-
### Removed
|
651 |
-
- Using standard logging instead of using the package loguru.
|
652 |
-
- Dropping nose test framework in favor of the maintained pytest.
|
653 |
-
- Choose to not use dragonmapper package to help with gibberish Chinese/CJK text.
|
654 |
-
- Require cached_property only for Python 3.5 due to constraint. Dropping for every other interpreter version.
|
655 |
-
- Stop support for UTF-7 that does not contain a SIG.
|
656 |
-
- Dropping PrettyTable, replaced with pure JSON output in CLI.
|
657 |
-
|
658 |
-
### Fixed
|
659 |
-
- BOM marker in a CharsetNormalizerMatch instance could be False in rare cases even if obviously present. Due to the sub-match factoring process.
|
660 |
-
- Not searching properly for the BOM when trying utf32/16 parent codec.
|
661 |
-
|
662 |
-
### Changed
|
663 |
-
- Improving the package final size by compressing frequencies.json.
|
664 |
-
- Huge improvement over the larges payload.
|
665 |
-
|
666 |
-
### Added
|
667 |
-
- CLI now produces JSON consumable output.
|
668 |
-
- Return ASCII if given sequences fit. Given reasonable confidence.
|
669 |
-
|
670 |
-
## [1.3.9](https://github.com/Ousret/charset_normalizer/compare/1.3.8...1.3.9) (2021-05-13)
|
671 |
-
|
672 |
-
### Fixed
|
673 |
-
- In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload (PR #40)
|
674 |
-
|
675 |
-
## [1.3.8](https://github.com/Ousret/charset_normalizer/compare/1.3.7...1.3.8) (2021-05-12)
|
676 |
-
|
677 |
-
### Fixed
|
678 |
-
- Empty given payload for detection may cause an exception if trying to access the `alphabets` property. (PR #39)
|
679 |
-
|
680 |
-
## [1.3.7](https://github.com/Ousret/charset_normalizer/compare/1.3.6...1.3.7) (2021-05-12)
|
681 |
-
|
682 |
-
### Fixed
|
683 |
-
- The legacy detect function should return UTF-8-SIG if sig is present in the payload. (PR #38)
|
684 |
-
|
685 |
-
## [1.3.6](https://github.com/Ousret/charset_normalizer/compare/1.3.5...1.3.6) (2021-02-09)
|
686 |
-
|
687 |
-
### Changed
|
688 |
-
- Amend the previous release to allow prettytable 2.0 (PR #35)
|
689 |
-
|
690 |
-
## [1.3.5](https://github.com/Ousret/charset_normalizer/compare/1.3.4...1.3.5) (2021-02-08)
|
691 |
-
|
692 |
-
### Fixed
|
693 |
-
- Fix error while using the package with a python pre-release interpreter (PR #33)
|
694 |
-
|
695 |
-
### Changed
|
696 |
-
- Dependencies refactoring, constraints revised.
|
697 |
-
|
698 |
-
### Added
|
699 |
-
- Add python 3.9 and 3.10 to the supported interpreters
|
700 |
-
|
701 |
-
MIT License
|
702 |
-
|
703 |
-
Copyright (c) 2025 TAHRI Ahmed R.
|
704 |
-
|
705 |
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
706 |
-
of this software and associated documentation files (the "Software"), to deal
|
707 |
-
in the Software without restriction, including without limitation the rights
|
708 |
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
709 |
-
copies of the Software, and to permit persons to whom the Software is
|
710 |
-
furnished to do so, subject to the following conditions:
|
711 |
-
|
712 |
-
The above copyright notice and this permission notice shall be included in all
|
713 |
-
copies or substantial portions of the Software.
|
714 |
-
|
715 |
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
716 |
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
717 |
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
718 |
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
719 |
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
720 |
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
721 |
-
SOFTWARE.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/RECORD
DELETED
@@ -1,35 +0,0 @@
|
|
1 |
-
../../../bin/normalizer,sha256=NDTiQG9iK2sidLPZlp_7rPS0-suXOcYpEDrNI0Ho18c,261
|
2 |
-
charset_normalizer-3.4.1.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
|
3 |
-
charset_normalizer-3.4.1.dist-info/LICENSE,sha256=bQ1Bv-FwrGx9wkjJpj4lTQ-0WmDVCoJX0K-SxuJJuIc,1071
|
4 |
-
charset_normalizer-3.4.1.dist-info/METADATA,sha256=JbyHzhmqZh_ugEn1Y7TY7CDYZA9FoU6BP25hrCNDf50,35313
|
5 |
-
charset_normalizer-3.4.1.dist-info/RECORD,,
|
6 |
-
charset_normalizer-3.4.1.dist-info/WHEEL,sha256=SAXwsvUnStmqZDZIFc0R93dpIgZzQCxgSCg6H6Io4Og,114
|
7 |
-
charset_normalizer-3.4.1.dist-info/entry_points.txt,sha256=8C-Y3iXIfyXQ83Tpir2B8t-XLJYpxF5xbb38d_js-h4,65
|
8 |
-
charset_normalizer-3.4.1.dist-info/top_level.txt,sha256=7ASyzePr8_xuZWJsnqJjIBtyV8vhEo0wBCv1MPRRi3Q,19
|
9 |
-
charset_normalizer/__init__.py,sha256=OKRxRv2Zhnqk00tqkN0c1BtJjm165fWXLydE52IKuHc,1590
|
10 |
-
charset_normalizer/__main__.py,sha256=yzYxMR-IhKRHYwcSlavEv8oGdwxsR89mr2X09qXGdps,109
|
11 |
-
charset_normalizer/__pycache__/__init__.cpython-311.pyc,,
|
12 |
-
charset_normalizer/__pycache__/__main__.cpython-311.pyc,,
|
13 |
-
charset_normalizer/__pycache__/api.cpython-311.pyc,,
|
14 |
-
charset_normalizer/__pycache__/cd.cpython-311.pyc,,
|
15 |
-
charset_normalizer/__pycache__/constant.cpython-311.pyc,,
|
16 |
-
charset_normalizer/__pycache__/legacy.cpython-311.pyc,,
|
17 |
-
charset_normalizer/__pycache__/md.cpython-311.pyc,,
|
18 |
-
charset_normalizer/__pycache__/models.cpython-311.pyc,,
|
19 |
-
charset_normalizer/__pycache__/utils.cpython-311.pyc,,
|
20 |
-
charset_normalizer/__pycache__/version.cpython-311.pyc,,
|
21 |
-
charset_normalizer/api.py,sha256=qBRz8mJ_R5E713R6TOyqHEdnmyxbEDnCSHvx32ubDGg,22617
|
22 |
-
charset_normalizer/cd.py,sha256=WKTo1HDb-H9HfCDc3Bfwq5jzS25Ziy9SE2a74SgTq88,12522
|
23 |
-
charset_normalizer/cli/__init__.py,sha256=D8I86lFk2-py45JvqxniTirSj_sFyE6sjaY_0-G1shc,136
|
24 |
-
charset_normalizer/cli/__main__.py,sha256=VGC9klOoi6_R2z8rmyrc936kv7u2A1udjjHtlmNPDTM,10410
|
25 |
-
charset_normalizer/cli/__pycache__/__init__.cpython-311.pyc,,
|
26 |
-
charset_normalizer/cli/__pycache__/__main__.cpython-311.pyc,,
|
27 |
-
charset_normalizer/constant.py,sha256=4VuTcZNLew1j_8ixA-Rt_VVqNWD4pwgHOHMCMlr0964,40477
|
28 |
-
charset_normalizer/legacy.py,sha256=yhNXsPHkBfqPXKRb-sPXNj3Bscp9-mFGcYOkJ62tg9c,2328
|
29 |
-
charset_normalizer/md.cpython-311-darwin.so,sha256=splQDu5cXvaQDimc0DHFha_4iQpKdJw_4lG0jbJ-0Gg,115664
|
30 |
-
charset_normalizer/md.py,sha256=iyXXQGWl54nnLQLueMWTmUtlivO0-rTBgVkmJxIIAGU,20036
|
31 |
-
charset_normalizer/md__mypyc.cpython-311-darwin.so,sha256=Ta4Btq1YDSz47DBkBFuqsX-X_QFwAoT197Kv1vuuwQ8,482024
|
32 |
-
charset_normalizer/models.py,sha256=lKXhOnIPtiakbK3i__J9wpOfzx3JDTKj7Dn3Rg0VaRI,12394
|
33 |
-
charset_normalizer/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
34 |
-
charset_normalizer/utils.py,sha256=T5UHo8AS7NVMmgruWoZyqEf0WrZVcQpgUNetRoborSk,12002
|
35 |
-
charset_normalizer/version.py,sha256=Ambcj3O8FfvdLfDLc8dkaxZx97O1IM_R4_aKGD_TDdE,115
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/WHEEL
DELETED
@@ -1,5 +0,0 @@
|
|
1 |
-
Wheel-Version: 1.0
|
2 |
-
Generator: setuptools (75.6.0)
|
3 |
-
Root-Is-Purelib: false
|
4 |
-
Tag: cp311-cp311-macosx_10_9_universal2
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/entry_points.txt
DELETED
@@ -1,2 +0,0 @@
|
|
1 |
-
[console_scripts]
|
2 |
-
normalizer = charset_normalizer:cli.cli_detect
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer-3.4.1.dist-info/top_level.txt
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
charset_normalizer
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__init__.py
DELETED
@@ -1,48 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
Charset-Normalizer
|
3 |
-
~~~~~~~~~~~~~~
|
4 |
-
The Real First Universal Charset Detector.
|
5 |
-
A library that helps you read text from an unknown charset encoding.
|
6 |
-
Motivated by chardet, This package is trying to resolve the issue by taking a new approach.
|
7 |
-
All IANA character set names for which the Python core library provides codecs are supported.
|
8 |
-
|
9 |
-
Basic usage:
|
10 |
-
>>> from charset_normalizer import from_bytes
|
11 |
-
>>> results = from_bytes('Bсеки човек има право на образование. Oбразованието!'.encode('utf_8'))
|
12 |
-
>>> best_guess = results.best()
|
13 |
-
>>> str(best_guess)
|
14 |
-
'Bсеки човек има право на образование. Oбразованието!'
|
15 |
-
|
16 |
-
Others methods and usages are available - see the full documentation
|
17 |
-
at <https://github.com/Ousret/charset_normalizer>.
|
18 |
-
:copyright: (c) 2021 by Ahmed TAHRI
|
19 |
-
:license: MIT, see LICENSE for more details.
|
20 |
-
"""
|
21 |
-
|
22 |
-
from __future__ import annotations
|
23 |
-
|
24 |
-
import logging
|
25 |
-
|
26 |
-
from .api import from_bytes, from_fp, from_path, is_binary
|
27 |
-
from .legacy import detect
|
28 |
-
from .models import CharsetMatch, CharsetMatches
|
29 |
-
from .utils import set_logging_handler
|
30 |
-
from .version import VERSION, __version__
|
31 |
-
|
32 |
-
__all__ = (
|
33 |
-
"from_fp",
|
34 |
-
"from_path",
|
35 |
-
"from_bytes",
|
36 |
-
"is_binary",
|
37 |
-
"detect",
|
38 |
-
"CharsetMatch",
|
39 |
-
"CharsetMatches",
|
40 |
-
"__version__",
|
41 |
-
"VERSION",
|
42 |
-
"set_logging_handler",
|
43 |
-
)
|
44 |
-
|
45 |
-
# Attach a NullHandler to the top level logger by default
|
46 |
-
# https://docs.python.org/3.3/howto/logging.html#configuring-logging-for-a-library
|
47 |
-
|
48 |
-
logging.getLogger("charset_normalizer").addHandler(logging.NullHandler())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__main__.py
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
from __future__ import annotations
|
2 |
-
|
3 |
-
from .cli import cli_detect
|
4 |
-
|
5 |
-
if __name__ == "__main__":
|
6 |
-
cli_detect()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/__init__.cpython-311.pyc
DELETED
Binary file (1.92 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/__main__.cpython-311.pyc
DELETED
Binary file (408 Bytes)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/api.cpython-311.pyc
DELETED
Binary file (20.8 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/cd.cpython-311.pyc
DELETED
Binary file (15.9 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/constant.cpython-311.pyc
DELETED
Binary file (43.6 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/legacy.cpython-311.pyc
DELETED
Binary file (3.16 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/md.cpython-311.pyc
DELETED
Binary file (27.6 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/models.cpython-311.pyc
DELETED
Binary file (18.6 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/utils.cpython-311.pyc
DELETED
Binary file (15.4 kB)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/__pycache__/version.cpython-311.pyc
DELETED
Binary file (414 Bytes)
|
|
gradio-env/lib/python3.11/site-packages/charset_normalizer/api.py
DELETED
@@ -1,668 +0,0 @@
|
|
1 |
-
from __future__ import annotations
|
2 |
-
|
3 |
-
import logging
|
4 |
-
from os import PathLike
|
5 |
-
from typing import BinaryIO
|
6 |
-
|
7 |
-
from .cd import (
|
8 |
-
coherence_ratio,
|
9 |
-
encoding_languages,
|
10 |
-
mb_encoding_languages,
|
11 |
-
merge_coherence_ratios,
|
12 |
-
)
|
13 |
-
from .constant import IANA_SUPPORTED, TOO_BIG_SEQUENCE, TOO_SMALL_SEQUENCE, TRACE
|
14 |
-
from .md import mess_ratio
|
15 |
-
from .models import CharsetMatch, CharsetMatches
|
16 |
-
from .utils import (
|
17 |
-
any_specified_encoding,
|
18 |
-
cut_sequence_chunks,
|
19 |
-
iana_name,
|
20 |
-
identify_sig_or_bom,
|
21 |
-
is_cp_similar,
|
22 |
-
is_multi_byte_encoding,
|
23 |
-
should_strip_sig_or_bom,
|
24 |
-
)
|
25 |
-
|
26 |
-
logger = logging.getLogger("charset_normalizer")
|
27 |
-
explain_handler = logging.StreamHandler()
|
28 |
-
explain_handler.setFormatter(
|
29 |
-
logging.Formatter("%(asctime)s | %(levelname)s | %(message)s")
|
30 |
-
)
|
31 |
-
|
32 |
-
|
33 |
-
def from_bytes(
|
34 |
-
sequences: bytes | bytearray,
|
35 |
-
steps: int = 5,
|
36 |
-
chunk_size: int = 512,
|
37 |
-
threshold: float = 0.2,
|
38 |
-
cp_isolation: list[str] | None = None,
|
39 |
-
cp_exclusion: list[str] | None = None,
|
40 |
-
preemptive_behaviour: bool = True,
|
41 |
-
explain: bool = False,
|
42 |
-
language_threshold: float = 0.1,
|
43 |
-
enable_fallback: bool = True,
|
44 |
-
) -> CharsetMatches:
|
45 |
-
"""
|
46 |
-
Given a raw bytes sequence, return the best possibles charset usable to render str objects.
|
47 |
-
If there is no results, it is a strong indicator that the source is binary/not text.
|
48 |
-
By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence.
|
49 |
-
And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will.
|
50 |
-
|
51 |
-
The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page
|
52 |
-
but never take it for granted. Can improve the performance.
|
53 |
-
|
54 |
-
You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that
|
55 |
-
purpose.
|
56 |
-
|
57 |
-
This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32.
|
58 |
-
By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain'
|
59 |
-
toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging.
|
60 |
-
Custom logging format and handler can be set manually.
|
61 |
-
"""
|
62 |
-
|
63 |
-
if not isinstance(sequences, (bytearray, bytes)):
|
64 |
-
raise TypeError(
|
65 |
-
"Expected object of type bytes or bytearray, got: {}".format(
|
66 |
-
type(sequences)
|
67 |
-
)
|
68 |
-
)
|
69 |
-
|
70 |
-
if explain:
|
71 |
-
previous_logger_level: int = logger.level
|
72 |
-
logger.addHandler(explain_handler)
|
73 |
-
logger.setLevel(TRACE)
|
74 |
-
|
75 |
-
length: int = len(sequences)
|
76 |
-
|
77 |
-
if length == 0:
|
78 |
-
logger.debug("Encoding detection on empty bytes, assuming utf_8 intention.")
|
79 |
-
if explain: # Defensive: ensure exit path clean handler
|
80 |
-
logger.removeHandler(explain_handler)
|
81 |
-
logger.setLevel(previous_logger_level or logging.WARNING)
|
82 |
-
return CharsetMatches([CharsetMatch(sequences, "utf_8", 0.0, False, [], "")])
|
83 |
-
|
84 |
-
if cp_isolation is not None:
|
85 |
-
logger.log(
|
86 |
-
TRACE,
|
87 |
-
"cp_isolation is set. use this flag for debugging purpose. "
|
88 |
-
"limited list of encoding allowed : %s.",
|
89 |
-
", ".join(cp_isolation),
|
90 |
-
)
|
91 |
-
cp_isolation = [iana_name(cp, False) for cp in cp_isolation]
|
92 |
-
else:
|
93 |
-
cp_isolation = []
|
94 |
-
|
95 |
-
if cp_exclusion is not None:
|
96 |
-
logger.log(
|
97 |
-
TRACE,
|
98 |
-
"cp_exclusion is set. use this flag for debugging purpose. "
|
99 |
-
"limited list of encoding excluded : %s.",
|
100 |
-
", ".join(cp_exclusion),
|
101 |
-
)
|
102 |
-
cp_exclusion = [iana_name(cp, False) for cp in cp_exclusion]
|
103 |
-
else:
|
104 |
-
cp_exclusion = []
|
105 |
-
|
106 |
-
if length <= (chunk_size * steps):
|
107 |
-
logger.log(
|
108 |
-
TRACE,
|
109 |
-
"override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.",
|
110 |
-
steps,
|
111 |
-
chunk_size,
|
112 |
-
length,
|
113 |
-
)
|
114 |
-
steps = 1
|
115 |
-
chunk_size = length
|
116 |
-
|
117 |
-
if steps > 1 and length / steps < chunk_size:
|
118 |
-
chunk_size = int(length / steps)
|
119 |
-
|
120 |
-
is_too_small_sequence: bool = len(sequences) < TOO_SMALL_SEQUENCE
|
121 |
-
is_too_large_sequence: bool = len(sequences) >= TOO_BIG_SEQUENCE
|
122 |
-
|
123 |
-
if is_too_small_sequence:
|
124 |
-
logger.log(
|
125 |
-
TRACE,
|
126 |
-
"Trying to detect encoding from a tiny portion of ({}) byte(s).".format(
|
127 |
-
length
|
128 |
-
),
|
129 |
-
)
|
130 |
-
elif is_too_large_sequence:
|
131 |
-
logger.log(
|
132 |
-
TRACE,
|
133 |
-
"Using lazy str decoding because the payload is quite large, ({}) byte(s).".format(
|
134 |
-
length
|
135 |
-
),
|
136 |
-
)
|
137 |
-
|
138 |
-
prioritized_encodings: list[str] = []
|
139 |
-
|
140 |
-
specified_encoding: str | None = (
|
141 |
-
any_specified_encoding(sequences) if preemptive_behaviour else None
|
142 |
-
)
|
143 |
-
|
144 |
-
if specified_encoding is not None:
|
145 |
-
prioritized_encodings.append(specified_encoding)
|
146 |
-
logger.log(
|
147 |
-
TRACE,
|
148 |
-
"Detected declarative mark in sequence. Priority +1 given for %s.",
|
149 |
-
specified_encoding,
|
150 |
-
)
|
151 |
-
|
152 |
-
tested: set[str] = set()
|
153 |
-
tested_but_hard_failure: list[str] = []
|
154 |
-
tested_but_soft_failure: list[str] = []
|
155 |
-
|
156 |
-
fallback_ascii: CharsetMatch | None = None
|
157 |
-
fallback_u8: CharsetMatch | None = None
|
158 |
-
fallback_specified: CharsetMatch | None = None
|
159 |
-
|
160 |
-
results: CharsetMatches = CharsetMatches()
|
161 |
-
|
162 |
-
early_stop_results: CharsetMatches = CharsetMatches()
|
163 |
-
|
164 |
-
sig_encoding, sig_payload = identify_sig_or_bom(sequences)
|
165 |
-
|
166 |
-
if sig_encoding is not None:
|
167 |
-
prioritized_encodings.append(sig_encoding)
|
168 |
-
logger.log(
|
169 |
-
TRACE,
|
170 |
-
"Detected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.",
|
171 |
-
len(sig_payload),
|
172 |
-
sig_encoding,
|
173 |
-
)
|
174 |
-
|
175 |
-
prioritized_encodings.append("ascii")
|
176 |
-
|
177 |
-
if "utf_8" not in prioritized_encodings:
|
178 |
-
prioritized_encodings.append("utf_8")
|
179 |
-
|
180 |
-
for encoding_iana in prioritized_encodings + IANA_SUPPORTED:
|
181 |
-
if cp_isolation and encoding_iana not in cp_isolation:
|
182 |
-
continue
|
183 |
-
|
184 |
-
if cp_exclusion and encoding_iana in cp_exclusion:
|
185 |
-
continue
|
186 |
-
|
187 |
-
if encoding_iana in tested:
|
188 |
-
continue
|
189 |
-
|
190 |
-
tested.add(encoding_iana)
|
191 |
-
|
192 |
-
decoded_payload: str | None = None
|
193 |
-
bom_or_sig_available: bool = sig_encoding == encoding_iana
|
194 |
-
strip_sig_or_bom: bool = bom_or_sig_available and should_strip_sig_or_bom(
|
195 |
-
encoding_iana
|
196 |
-
)
|
197 |
-
|
198 |
-
if encoding_iana in {"utf_16", "utf_32"} and not bom_or_sig_available:
|
199 |
-
logger.log(
|
200 |
-
TRACE,
|
201 |
-
"Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.",
|
202 |
-
encoding_iana,
|
203 |
-
)
|
204 |
-
continue
|
205 |
-
if encoding_iana in {"utf_7"} and not bom_or_sig_available:
|
206 |
-
logger.log(
|
207 |
-
TRACE,
|
208 |
-
"Encoding %s won't be tested as-is because detection is unreliable without BOM/SIG.",
|
209 |
-
encoding_iana,
|
210 |
-
)
|
211 |
-
continue
|
212 |
-
|
213 |
-
try:
|
214 |
-
is_multi_byte_decoder: bool = is_multi_byte_encoding(encoding_iana)
|
215 |
-
except (ModuleNotFoundError, ImportError):
|
216 |
-
logger.log(
|
217 |
-
TRACE,
|
218 |
-
"Encoding %s does not provide an IncrementalDecoder",
|
219 |
-
encoding_iana,
|
220 |
-
)
|
221 |
-
continue
|
222 |
-
|
223 |
-
try:
|
224 |
-
if is_too_large_sequence and is_multi_byte_decoder is False:
|
225 |
-
str(
|
226 |
-
(
|
227 |
-
sequences[: int(50e4)]
|
228 |
-
if strip_sig_or_bom is False
|
229 |
-
else sequences[len(sig_payload) : int(50e4)]
|
230 |
-
),
|
231 |
-
encoding=encoding_iana,
|
232 |
-
)
|
233 |
-
else:
|
234 |
-
decoded_payload = str(
|
235 |
-
(
|
236 |
-
sequences
|
237 |
-
if strip_sig_or_bom is False
|
238 |
-
else sequences[len(sig_payload) :]
|
239 |
-
),
|
240 |
-
encoding=encoding_iana,
|
241 |
-
)
|
242 |
-
except (UnicodeDecodeError, LookupError) as e:
|
243 |
-
if not isinstance(e, LookupError):
|
244 |
-
logger.log(
|
245 |
-
TRACE,
|
246 |
-
"Code page %s does not fit given bytes sequence at ALL. %s",
|
247 |
-
encoding_iana,
|
248 |
-
str(e),
|
249 |
-
)
|
250 |
-
tested_but_hard_failure.append(encoding_iana)
|
251 |
-
continue
|
252 |
-
|
253 |
-
similar_soft_failure_test: bool = False
|
254 |
-
|
255 |
-
for encoding_soft_failed in tested_but_soft_failure:
|
256 |
-
if is_cp_similar(encoding_iana, encoding_soft_failed):
|
257 |
-
similar_soft_failure_test = True
|
258 |
-
break
|
259 |
-
|
260 |
-
if similar_soft_failure_test:
|
261 |
-
logger.log(
|
262 |
-
TRACE,
|
263 |
-
"%s is deemed too similar to code page %s and was consider unsuited already. Continuing!",
|
264 |
-
encoding_iana,
|
265 |
-
encoding_soft_failed,
|
266 |
-
)
|
267 |
-
continue
|
268 |
-
|
269 |
-
r_ = range(
|
270 |
-
0 if not bom_or_sig_available else len(sig_payload),
|
271 |
-
length,
|
272 |
-
int(length / steps),
|
273 |
-
)
|
274 |
-
|
275 |
-
multi_byte_bonus: bool = (
|
276 |
-
is_multi_byte_decoder
|
277 |
-
and decoded_payload is not None
|
278 |
-
and len(decoded_payload) < length
|
279 |
-
)
|
280 |
-
|
281 |
-
if multi_byte_bonus:
|
282 |
-
logger.log(
|
283 |
-
TRACE,
|
284 |
-
"Code page %s is a multi byte encoding table and it appear that at least one character "
|
285 |
-
"was encoded using n-bytes.",
|
286 |
-
encoding_iana,
|
287 |
-
)
|
288 |
-
|
289 |
-
max_chunk_gave_up: int = int(len(r_) / 4)
|
290 |
-
|
291 |
-
max_chunk_gave_up = max(max_chunk_gave_up, 2)
|
292 |
-
early_stop_count: int = 0
|
293 |
-
lazy_str_hard_failure = False
|
294 |
-
|
295 |
-
md_chunks: list[str] = []
|
296 |
-
md_ratios = []
|
297 |
-
|
298 |
-
try:
|
299 |
-
for chunk in cut_sequence_chunks(
|
300 |
-
sequences,
|
301 |
-
encoding_iana,
|
302 |
-
r_,
|
303 |
-
chunk_size,
|
304 |
-
bom_or_sig_available,
|
305 |
-
strip_sig_or_bom,
|
306 |
-
sig_payload,
|
307 |
-
is_multi_byte_decoder,
|
308 |
-
decoded_payload,
|
309 |
-
):
|
310 |
-
md_chunks.append(chunk)
|
311 |
-
|
312 |
-
md_ratios.append(
|
313 |
-
mess_ratio(
|
314 |
-
chunk,
|
315 |
-
threshold,
|
316 |
-
explain is True and 1 <= len(cp_isolation) <= 2,
|
317 |
-
)
|
318 |
-
)
|
319 |
-
|
320 |
-
if md_ratios[-1] >= threshold:
|
321 |
-
early_stop_count += 1
|
322 |
-
|
323 |
-
if (early_stop_count >= max_chunk_gave_up) or (
|
324 |
-
bom_or_sig_available and strip_sig_or_bom is False
|
325 |
-
):
|
326 |
-
break
|
327 |
-
except (
|
328 |
-
UnicodeDecodeError
|
329 |
-
) as e: # Lazy str loading may have missed something there
|
330 |
-
logger.log(
|
331 |
-
TRACE,
|
332 |
-
"LazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %s",
|
333 |
-
encoding_iana,
|
334 |
-
str(e),
|
335 |
-
)
|
336 |
-
early_stop_count = max_chunk_gave_up
|
337 |
-
lazy_str_hard_failure = True
|
338 |
-
|
339 |
-
# We might want to check the sequence again with the whole content
|
340 |
-
# Only if initial MD tests passes
|
341 |
-
if (
|
342 |
-
not lazy_str_hard_failure
|
343 |
-
and is_too_large_sequence
|
344 |
-
and not is_multi_byte_decoder
|
345 |
-
):
|
346 |
-
try:
|
347 |
-
sequences[int(50e3) :].decode(encoding_iana, errors="strict")
|
348 |
-
except UnicodeDecodeError as e:
|
349 |
-
logger.log(
|
350 |
-
TRACE,
|
351 |
-
"LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %s",
|
352 |
-
encoding_iana,
|
353 |
-
str(e),
|
354 |
-
)
|
355 |
-
tested_but_hard_failure.append(encoding_iana)
|
356 |
-
continue
|
357 |
-
|
358 |
-
mean_mess_ratio: float = sum(md_ratios) / len(md_ratios) if md_ratios else 0.0
|
359 |
-
if mean_mess_ratio >= threshold or early_stop_count >= max_chunk_gave_up:
|
360 |
-
tested_but_soft_failure.append(encoding_iana)
|
361 |
-
logger.log(
|
362 |
-
TRACE,
|
363 |
-
"%s was excluded because of initial chaos probing. Gave up %i time(s). "
|
364 |
-
"Computed mean chaos is %f %%.",
|
365 |
-
encoding_iana,
|
366 |
-
early_stop_count,
|
367 |
-
round(mean_mess_ratio * 100, ndigits=3),
|
368 |
-
)
|
369 |
-
# Preparing those fallbacks in case we got nothing.
|
370 |
-
if (
|
371 |
-
enable_fallback
|
372 |
-
and encoding_iana in ["ascii", "utf_8", specified_encoding]
|
373 |
-
and not lazy_str_hard_failure
|
374 |
-
):
|
375 |
-
fallback_entry = CharsetMatch(
|
376 |
-
sequences,
|
377 |
-
encoding_iana,
|
378 |
-
threshold,
|
379 |
-
False,
|
380 |
-
[],
|
381 |
-
decoded_payload,
|
382 |
-
preemptive_declaration=specified_encoding,
|
383 |
-
)
|
384 |
-
if encoding_iana == specified_encoding:
|
385 |
-
fallback_specified = fallback_entry
|
386 |
-
elif encoding_iana == "ascii":
|
387 |
-
fallback_ascii = fallback_entry
|
388 |
-
else:
|
389 |
-
fallback_u8 = fallback_entry
|
390 |
-
continue
|
391 |
-
|
392 |
-
logger.log(
|
393 |
-
TRACE,
|
394 |
-
"%s passed initial chaos probing. Mean measured chaos is %f %%",
|
395 |
-
encoding_iana,
|
396 |
-
round(mean_mess_ratio * 100, ndigits=3),
|
397 |
-
)
|
398 |
-
|
399 |
-
if not is_multi_byte_decoder:
|
400 |
-
target_languages: list[str] = encoding_languages(encoding_iana)
|
401 |
-
else:
|
402 |
-
target_languages = mb_encoding_languages(encoding_iana)
|
403 |
-
|
404 |
-
if target_languages:
|
405 |
-
logger.log(
|
406 |
-
TRACE,
|
407 |
-
"{} should target any language(s) of {}".format(
|
408 |
-
encoding_iana, str(target_languages)
|
409 |
-
),
|
410 |
-
)
|
411 |
-
|
412 |
-
cd_ratios = []
|
413 |
-
|
414 |
-
# We shall skip the CD when its about ASCII
|
415 |
-
# Most of the time its not relevant to run "language-detection" on it.
|
416 |
-
if encoding_iana != "ascii":
|
417 |
-
for chunk in md_chunks:
|
418 |
-
chunk_languages = coherence_ratio(
|
419 |
-
chunk,
|
420 |
-
language_threshold,
|
421 |
-
",".join(target_languages) if target_languages else None,
|
422 |
-
)
|
423 |
-
|
424 |
-
cd_ratios.append(chunk_languages)
|
425 |
-
|
426 |
-
cd_ratios_merged = merge_coherence_ratios(cd_ratios)
|
427 |
-
|
428 |
-
if cd_ratios_merged:
|
429 |
-
logger.log(
|
430 |
-
TRACE,
|
431 |
-
"We detected language {} using {}".format(
|
432 |
-
cd_ratios_merged, encoding_iana
|
433 |
-
),
|
434 |
-
)
|
435 |
-
|
436 |
-
current_match = CharsetMatch(
|
437 |
-
sequences,
|
438 |
-
encoding_iana,
|
439 |
-
mean_mess_ratio,
|
440 |
-
bom_or_sig_available,
|
441 |
-
cd_ratios_merged,
|
442 |
-
(
|
443 |
-
decoded_payload
|
444 |
-
if (
|
445 |
-
is_too_large_sequence is False
|
446 |
-
or encoding_iana in [specified_encoding, "ascii", "utf_8"]
|
447 |
-
)
|
448 |
-
else None
|
449 |
-
),
|
450 |
-
preemptive_declaration=specified_encoding,
|
451 |
-
)
|
452 |
-
|
453 |
-
results.append(current_match)
|
454 |
-
|
455 |
-
if (
|
456 |
-
encoding_iana in [specified_encoding, "ascii", "utf_8"]
|
457 |
-
and mean_mess_ratio < 0.1
|
458 |
-
):
|
459 |
-
# If md says nothing to worry about, then... stop immediately!
|
460 |
-
if mean_mess_ratio == 0.0:
|
461 |
-
logger.debug(
|
462 |
-
"Encoding detection: %s is most likely the one.",
|
463 |
-
current_match.encoding,
|
464 |
-
)
|
465 |
-
if explain: # Defensive: ensure exit path clean handler
|
466 |
-
logger.removeHandler(explain_handler)
|
467 |
-
logger.setLevel(previous_logger_level)
|
468 |
-
return CharsetMatches([current_match])
|
469 |
-
|
470 |
-
early_stop_results.append(current_match)
|
471 |
-
|
472 |
-
if (
|
473 |
-
len(early_stop_results)
|
474 |
-
and (specified_encoding is None or specified_encoding in tested)
|
475 |
-
and "ascii" in tested
|
476 |
-
and "utf_8" in tested
|
477 |
-
):
|
478 |
-
probable_result: CharsetMatch = early_stop_results.best() # type: ignore[assignment]
|
479 |
-
logger.debug(
|
480 |
-
"Encoding detection: %s is most likely the one.",
|
481 |
-
probable_result.encoding,
|
482 |
-
)
|
483 |
-
if explain: # Defensive: ensure exit path clean handler
|
484 |
-
logger.removeHandler(explain_handler)
|
485 |
-
logger.setLevel(previous_logger_level)
|
486 |
-
|
487 |
-
return CharsetMatches([probable_result])
|
488 |
-
|
489 |
-
if encoding_iana == sig_encoding:
|
490 |
-
logger.debug(
|
491 |
-
"Encoding detection: %s is most likely the one as we detected a BOM or SIG within "
|
492 |
-
"the beginning of the sequence.",
|
493 |
-
encoding_iana,
|
494 |
-
)
|
495 |
-
if explain: # Defensive: ensure exit path clean handler
|
496 |
-
logger.removeHandler(explain_handler)
|
497 |
-
logger.setLevel(previous_logger_level)
|
498 |
-
return CharsetMatches([results[encoding_iana]])
|
499 |
-
|
500 |
-
if len(results) == 0:
|
501 |
-
if fallback_u8 or fallback_ascii or fallback_specified:
|
502 |
-
logger.log(
|
503 |
-
TRACE,
|
504 |
-
"Nothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.",
|
505 |
-
)
|
506 |
-
|
507 |
-
if fallback_specified:
|
508 |
-
logger.debug(
|
509 |
-
"Encoding detection: %s will be used as a fallback match",
|
510 |
-
fallback_specified.encoding,
|
511 |
-
)
|
512 |
-
results.append(fallback_specified)
|
513 |
-
elif (
|
514 |
-
(fallback_u8 and fallback_ascii is None)
|
515 |
-
or (
|
516 |
-
fallback_u8
|
517 |
-
and fallback_ascii
|
518 |
-
and fallback_u8.fingerprint != fallback_ascii.fingerprint
|
519 |
-
)
|
520 |
-
or (fallback_u8 is not None)
|
521 |
-
):
|
522 |
-
logger.debug("Encoding detection: utf_8 will be used as a fallback match")
|
523 |
-
results.append(fallback_u8)
|
524 |
-
elif fallback_ascii:
|
525 |
-
logger.debug("Encoding detection: ascii will be used as a fallback match")
|
526 |
-
results.append(fallback_ascii)
|
527 |
-
|
528 |
-
if results:
|
529 |
-
logger.debug(
|
530 |
-
"Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.",
|
531 |
-
results.best().encoding, # type: ignore
|
532 |
-
len(results) - 1,
|
533 |
-
)
|
534 |
-
else:
|
535 |
-
logger.debug("Encoding detection: Unable to determine any suitable charset.")
|
536 |
-
|
537 |
-
if explain:
|
538 |
-
logger.removeHandler(explain_handler)
|
539 |
-
logger.setLevel(previous_logger_level)
|
540 |
-
|
541 |
-
return results
|
542 |
-
|
543 |
-
|
544 |
-
def from_fp(
|
545 |
-
fp: BinaryIO,
|
546 |
-
steps: int = 5,
|
547 |
-
chunk_size: int = 512,
|
548 |
-
threshold: float = 0.20,
|
549 |
-
cp_isolation: list[str] | None = None,
|
550 |
-
cp_exclusion: list[str] | None = None,
|
551 |
-
preemptive_behaviour: bool = True,
|
552 |
-
explain: bool = False,
|
553 |
-
language_threshold: float = 0.1,
|
554 |
-
enable_fallback: bool = True,
|
555 |
-
) -> CharsetMatches:
|
556 |
-
"""
|
557 |
-
Same thing than the function from_bytes but using a file pointer that is already ready.
|
558 |
-
Will not close the file pointer.
|
559 |
-
"""
|
560 |
-
return from_bytes(
|
561 |
-
fp.read(),
|
562 |
-
steps,
|
563 |
-
chunk_size,
|
564 |
-
threshold,
|
565 |
-
cp_isolation,
|
566 |
-
cp_exclusion,
|
567 |
-
preemptive_behaviour,
|
568 |
-
explain,
|
569 |
-
language_threshold,
|
570 |
-
enable_fallback,
|
571 |
-
)
|
572 |
-
|
573 |
-
|
574 |
-
def from_path(
|
575 |
-
path: str | bytes | PathLike, # type: ignore[type-arg]
|
576 |
-
steps: int = 5,
|
577 |
-
chunk_size: int = 512,
|
578 |
-
threshold: float = 0.20,
|
579 |
-
cp_isolation: list[str] | None = None,
|
580 |
-
cp_exclusion: list[str] | None = None,
|
581 |
-
preemptive_behaviour: bool = True,
|
582 |
-
explain: bool = False,
|
583 |
-
language_threshold: float = 0.1,
|
584 |
-
enable_fallback: bool = True,
|
585 |
-
) -> CharsetMatches:
|
586 |
-
"""
|
587 |
-
Same thing than the function from_bytes but with one extra step. Opening and reading given file path in binary mode.
|
588 |
-
Can raise IOError.
|
589 |
-
"""
|
590 |
-
with open(path, "rb") as fp:
|
591 |
-
return from_fp(
|
592 |
-
fp,
|
593 |
-
steps,
|
594 |
-
chunk_size,
|
595 |
-
threshold,
|
596 |
-
cp_isolation,
|
597 |
-
cp_exclusion,
|
598 |
-
preemptive_behaviour,
|
599 |
-
explain,
|
600 |
-
language_threshold,
|
601 |
-
enable_fallback,
|
602 |
-
)
|
603 |
-
|
604 |
-
|
605 |
-
def is_binary(
|
606 |
-
fp_or_path_or_payload: PathLike | str | BinaryIO | bytes, # type: ignore[type-arg]
|
607 |
-
steps: int = 5,
|
608 |
-
chunk_size: int = 512,
|
609 |
-
threshold: float = 0.20,
|
610 |
-
cp_isolation: list[str] | None = None,
|
611 |
-
cp_exclusion: list[str] | None = None,
|
612 |
-
preemptive_behaviour: bool = True,
|
613 |
-
explain: bool = False,
|
614 |
-
language_threshold: float = 0.1,
|
615 |
-
enable_fallback: bool = False,
|
616 |
-
) -> bool:
|
617 |
-
"""
|
618 |
-
Detect if the given input (file, bytes, or path) points to a binary file. aka. not a string.
|
619 |
-
Based on the same main heuristic algorithms and default kwargs at the sole exception that fallbacks match
|
620 |
-
are disabled to be stricter around ASCII-compatible but unlikely to be a string.
|
621 |
-
"""
|
622 |
-
if isinstance(fp_or_path_or_payload, (str, PathLike)):
|
623 |
-
guesses = from_path(
|
624 |
-
fp_or_path_or_payload,
|
625 |
-
steps=steps,
|
626 |
-
chunk_size=chunk_size,
|
627 |
-
threshold=threshold,
|
628 |
-
cp_isolation=cp_isolation,
|
629 |
-
cp_exclusion=cp_exclusion,
|
630 |
-
preemptive_behaviour=preemptive_behaviour,
|
631 |
-
explain=explain,
|
632 |
-
language_threshold=language_threshold,
|
633 |
-
enable_fallback=enable_fallback,
|
634 |
-
)
|
635 |
-
elif isinstance(
|
636 |
-
fp_or_path_or_payload,
|
637 |
-
(
|
638 |
-
bytes,
|
639 |
-
bytearray,
|
640 |
-
),
|
641 |
-
):
|
642 |
-
guesses = from_bytes(
|
643 |
-
fp_or_path_or_payload,
|
644 |
-
steps=steps,
|
645 |
-
chunk_size=chunk_size,
|
646 |
-
threshold=threshold,
|
647 |
-
cp_isolation=cp_isolation,
|
648 |
-
cp_exclusion=cp_exclusion,
|
649 |
-
preemptive_behaviour=preemptive_behaviour,
|
650 |
-
explain=explain,
|
651 |
-
language_threshold=language_threshold,
|
652 |
-
enable_fallback=enable_fallback,
|
653 |
-
)
|
654 |
-
else:
|
655 |
-
guesses = from_fp(
|
656 |
-
fp_or_path_or_payload,
|
657 |
-
steps=steps,
|
658 |
-
chunk_size=chunk_size,
|
659 |
-
threshold=threshold,
|
660 |
-
cp_isolation=cp_isolation,
|
661 |
-
cp_exclusion=cp_exclusion,
|
662 |
-
preemptive_behaviour=preemptive_behaviour,
|
663 |
-
explain=explain,
|
664 |
-
language_threshold=language_threshold,
|
665 |
-
enable_fallback=enable_fallback,
|
666 |
-
)
|
667 |
-
|
668 |
-
return not guesses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|