Spaces:
Running
Running
File size: 14,210 Bytes
5c2ed06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
# he [](https://travis-ci.org/mathiasbynens/he) [](https://codecov.io/github/mathiasbynens/he?branch=master) [](https://gemnasium.com/mathiasbynens/he)
_he_ (for βHTML entitiesβ) is a robust HTML entity encoder/decoder written in JavaScript. It supports [all standardized named character references as per HTML](https://html.spec.whatwg.org/multipage/syntax.html#named-character-references), handles [ambiguous ampersands](https://mathiasbynens.be/notes/ambiguous-ampersands) and other edge cases [just like a browser would](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references), has an extensive test suite, and β contrary to many other JavaScript solutions β _he_ handles astral Unicode symbols just fine. [An online demo is available.](https://mothereff.in/html-entities)
## Installation
Via [npm](https://www.npmjs.com/):
```bash
npm install he
```
Via [Bower](http://bower.io/):
```bash
bower install he
```
Via [Component](https://github.com/component/component):
```bash
component install mathiasbynens/he
```
In a browser:
```html
<script src="he.js"></script>
```
In [Node.js](https://nodejs.org/), [io.js](https://iojs.org/), [Narwhal](http://narwhaljs.org/), and [RingoJS](http://ringojs.org/):
```js
var he = require('he');
```
In [Rhino](http://www.mozilla.org/rhino/):
```js
load('he.js');
```
Using an AMD loader like [RequireJS](http://requirejs.org/):
```js
require(
{
'paths': {
'he': 'path/to/he'
}
},
['he'],
function(he) {
console.log(he);
}
);
```
## API
### `he.version`
A string representing the semantic version number.
### `he.encode(text, options)`
This function takes a string of text and encodes (by default) any symbols that arenβt printable ASCII symbols and `&`, `<`, `>`, `"`, `'`, and `` ` ``, replacing them with character references.
```js
he.encode('foo Β© bar β baz π qux');
// β 'foo © bar ≠ baz 𝌆 qux'
```
As long as the input string contains [allowed code points](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream) only, the return value of this function is always valid HTML. Any [(invalid) code points that cannot be represented using a character reference](https://html.spec.whatwg.org/multipage/syntax.html#table-charref-overrides) in the input are not encoded:
```js
he.encode('foo \0 bar');
// β 'foo \0 bar'
```
However, enabling [the `strict` option](https://github.com/mathiasbynens/he#strict) causes invalid code points to throw an exception. With `strict` enabled, `he.encode` either throws (if the input contains invalid code points) or returns a string of valid HTML.
The `options` object is optional. It recognizes the following properties:
#### `useNamedReferences`
The default value for the `useNamedReferences` option is `false`. This means that `encode()` will not use any named character references (e.g. `©`) in the output β hexadecimal escapes (e.g. `©`) will be used instead. Set it to `true` to enable the use of named references.
**Note that if compatibility with older browsers is a concern, this option should remain disabled.**
```js
// Using the global default setting (defaults to `false`):
he.encode('foo Β© bar β baz π qux');
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly disallow named references:
he.encode('foo Β© bar β baz π qux', {
'useNamedReferences': false
});
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly allow named references:
he.encode('foo Β© bar β baz π qux', {
'useNamedReferences': true
});
// β 'foo © bar ≠ baz 𝌆 qux'
```
#### `decimal`
The default value for the `decimal` option is `false`. If the option is enabled, `encode` will generally use decimal escapes (e.g. `©`) rather than hexadecimal escapes (e.g. `©`). Beside of this replacement, the basic behavior remains the same when combined with other options. For example: if both options `useNamedReferences` and `decimal` are enabled, named references (e.g. `©`) are used over decimal escapes. HTML entities without a named reference are encoded using decimal escapes.
```js
// Using the global default setting (defaults to `false`):
he.encode('foo Β© bar β baz π qux');
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly disable decimal escapes:
he.encode('foo Β© bar β baz π qux', {
'decimal': false
});
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly enable decimal escapes:
he.encode('foo Β© bar β baz π qux', {
'decimal': true
});
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
he.encode('foo Β© bar β baz π qux', {
'useNamedReferences': true,
'decimal': true
});
// β 'foo © bar ≠ baz 𝌆 qux'
```
#### `encodeEverything`
The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that donβt need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option takes precedence over `allowUnsafeSymbols` (i.e. setting the latter to `true` in such a case has no effect).
```js
// Using the global default setting (defaults to `false`):
he.encode('foo Β© bar β baz π qux');
// β 'foo © bar ≠ baz 𝌆 qux'
// Passing an `options` object to `encode`, to explicitly encode all symbols:
he.encode('foo Β© bar β baz π qux', {
'encodeEverything': true
});
// β 'foo © bar ≠ baz 𝌆 qux'
// This setting can be combined with the `useNamedReferences` option:
he.encode('foo Β© bar β baz π qux', {
'encodeEverything': true,
'useNamedReferences': true
});
// β 'foo © bar ≠ baz 𝌆 qux'
```
#### `strict`
The default value for the `strict` option is `false`. This means that `encode()` will encode any HTML text content you feed it, even if it contains any symbols that cause [parse errors](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
```js
// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.encode('\x01');
// β ''
// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
he.encode('\x01', {
'strict': false
});
// β ''
// Passing an `options` object to `encode`, to explicitly enable strict mode:
he.encode('\x01', {
'strict': true
});
// β Parse error
```
#### `allowUnsafeSymbols`
The default value for the `allowUnsafeSymbols` option is `false`. This means that characters that are unsafe for use in HTML content (`&`, `<`, `>`, `"`, `'`, and `` ` ``) will be encoded. When set to `true`, only non-ASCII characters will be encoded. If the `encodeEverything` option is set to `true`, this option will be ignored.
```js
he.encode('foo Β© and & ampersand', {
'allowUnsafeSymbols': true
});
// β 'foo © and & ampersand'
```
#### Overriding default `encode` options globally
The global default setting can be overridden by modifying the `he.encode.options` object. This saves you from passing in an `options` object for every call to `encode` if you want to use the non-default setting.
```js
// Read the global default setting:
he.encode.options.useNamedReferences;
// β `false` by default
// Override the global default setting:
he.encode.options.useNamedReferences = true;
// Using the global default setting, which is now `true`:
he.encode('foo Β© bar β baz π qux');
// β 'foo © bar ≠ baz 𝌆 qux'
```
### `he.decode(html, options)`
This function takes a string of HTML and decodes any named and numerical character references in it using [the algorithm described in section 12.2.4.69 of the HTML spec](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references).
```js
he.decode('foo © bar ≠ baz 𝌆 qux');
// β 'foo Β© bar β baz π qux'
```
The `options` object is optional. It recognizes the following properties:
#### `isAttributeValue`
The default value for the `isAttributeValue` option is `false`. This means that `decode()` will decode the string as if it were used in [a text context in an HTML document](https://html.spec.whatwg.org/multipage/syntax.html#data-state). HTML has different rules for [parsing character references in attribute values](https://html.spec.whatwg.org/multipage/syntax.html#character-reference-in-attribute-value-state) β set this option to `true` to treat the input string as if it were used as an attribute value.
```js
// Using the global default setting (defaults to `false`, i.e. HTML text context):
he.decode('foo&bar');
// β 'foo&bar'
// Passing an `options` object to `decode`, to explicitly assume an HTML text context:
he.decode('foo&bar', {
'isAttributeValue': false
});
// β 'foo&bar'
// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
he.decode('foo&bar', {
'isAttributeValue': true
});
// β 'foo&bar'
```
#### `strict`
The default value for the `strict` option is `false`. This means that `decode()` will decode any HTML text content you feed it, even if it contains any entities that cause [parse errors](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
```js
// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.decode('foo&bar');
// β 'foo&bar'
// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
he.decode('foo&bar', {
'strict': false
});
// β 'foo&bar'
// Passing an `options` object to `decode`, to explicitly enable strict mode:
he.decode('foo&bar', {
'strict': true
});
// β Parse error
```
#### Overriding default `decode` options globally
The global default settings for the `decode` function can be overridden by modifying the `he.decode.options` object. This saves you from passing in an `options` object for every call to `decode` if you want to use a non-default setting.
```js
// Read the global default setting:
he.decode.options.isAttributeValue;
// β `false` by default
// Override the global default setting:
he.decode.options.isAttributeValue = true;
// Using the global default setting, which is now `true`:
he.decode('foo&bar');
// β 'foo&bar'
```
### `he.escape(text)`
This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: `&`, `<`, `>`, `"`, `'`, and `` ` ``.
```js
he.escape('<img src=\'x\' onerror="prompt(1)">');
// β '<img src='x' onerror="prompt(1)">'
```
### `he.unescape(html, options)`
`he.unescape` is an alias for `he.decode`. It takes a string of HTML and decodes any named and numerical character references in it.
### Using the `he` binary
To use the `he` binary in your shell, simply install _he_ globally using npm:
```bash
npm install -g he
```
After that you will be able to encode/decode HTML entities from the command line:
```bash
$ he --encode 'fΓΆo β₯ bΓ₯r π baz'
föo ♥ bår 𝌆 baz
$ he --encode --use-named-refs 'fΓΆo β₯ bΓ₯r π baz'
föo ♥ bår 𝌆 baz
$ he --decode 'föo ♥ bår 𝌆 baz'
fΓΆo β₯ bΓ₯r π baz
```
Read a local text file, encode it for use in an HTML text context, and save the result to a new file:
```bash
$ he --encode < foo.txt > foo-escaped.html
```
Or do the same with an online text file:
```bash
$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html
```
Or, the opposite β read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:
```bash
$ he --decode < foo-escaped.html > foo.txt
```
Or do the same with an online HTML snippet:
```bash
$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt
```
See `he --help` for the full list of options.
## Support
_he_ has been tested in at least:
* Chrome 27-50
* Firefox 3-45
* Safari 4-9
* Opera 10-12, 15β37
* IE 6β11
* Edge
* Narwhal 0.3.2
* Node.js v0.10, v0.12, v4, v5
* PhantomJS 1.9.0
* Rhino 1.7RC4
* RingoJS 0.8-0.11
## Unit tests & code coverage
After cloning this repository, run `npm install` to install the dependencies needed for he development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
Once thatβs done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
To generate the code coverage report, use `grunt cover`.
## Acknowledgements
Thanks to [Simon Pieters](https://simon.html5.org/) ([@zcorpan](https://twitter.com/zcorpan)) for the many suggestions.
## Author
| [](https://twitter.com/mathias "Follow @mathias on Twitter") |
|---|
| [Mathias Bynens](https://mathiasbynens.be/) |
## License
_he_ is available under the [MIT](https://mths.be/mit) license.
|