For quite some time now I’ve been working on tldts (npm), an extemely fast, feature-full, battle-tested JavaScript library for domain parsing. It allows to answer questions such has:
- What’s the hostname of an URL?
- What’s the registrable domain of an URL, a hostname or an email address?
- What’s the sub-domain of a domain?
- etc.
What puts tldts appart from other libraries is:
- It is much faster than alternatives, allowing to parse between 1 and 2 million domains per second (that’s up to 1000 times faster than other popular libraries).
- It is more feature-full, supporting IPs detection, domain validation and complex URLs parsing.
- It offers the smallest bundles, in both
cjs
,esm
andumd
formats; it runs anywhere. - It is written in TypeScript and benefits from 100% test coverage.
Most of the features are made possible by the public suffix list project. But tldts offers some bells and whistles on top. One of the goals is to be conveniant to use and as fast as it gets. Some libraries require you to provide already-valid hostnames, but tldts has no such constraints and will happily parse complex URLs, as well as already-extracted hostnames. The best part is that this does not come with any over-head!
const tldts = require('tldts');
// Retrieving hostname related informations of a given URL
tldts.parse('https://remusao.github.io/posts/tldts-benchmarks.html');
// { domain: 'github.io',
// hostname: 'remusao.github.io',
// isIcann: true,
// isIp: false,
// isPrivate: false,
// publicSuffix: 'io',
// subdomain: 'remusao' }
In this post I’d like to share some results of a performance comparison against other available JavaScript libraries offering the same kind of features. We will see that they vary greatly in terms of performance, ease of use or features. In fact, we observed that tldts
is up to 1000 times faster than some of the other libraries!
Before presenting the results, a few words about the instrumentation and what was measured. We aim at comparing the libraries in the following aspects:
- Features
- Performance (in terms of operations per second)
- Loading time (time it took V8 to load the bundle)
- Memory used
All the measurements were performed in the following environment:
Node.js
version 11.6.0- Hardware: X1 Carbon 4th with i7-6600U CPU and 16GB of RAM
And now the list of the contenders:
In the results you will also see tldts-experimental
mentioned. It is a probabilistic data-structure implementing the exact same features as tldts
but using much less memory, loading instantly and offering even higher performances. It can be used in contexts with very constrained hardware capabilities such as mobiles.
Now let’s now dig into the results!
Feature Matrix
The following features are considered:
- Is there
IDNA
support? Does the library support inputs with unicode such as 中国? - Does the library accept complex
URLs
as input, or is it necessary to extract the hostname before-hand? - Is the library able to detect if the given input is an
IP
address? This is important, otherwisegetPublicSuffix('192.168.0.1')
would return1
! - Does the library allow you to extract
domain
andpublic suffix
? - Does the library support ICANN/Private sections of the public suffix list? Can they be disabled individually?
- Will the public suffix rules be
shipped
with the library, or do they need to be fetched separately?
Library | IDNA | URLs | IPs | getDomain | getPublicSuffix | ICANN/Private | Ships lists |
---|---|---|---|---|---|---|---|
tldts | X | X | X | X | X | X | X |
tld.js | X | X | X | X | X | X | X |
psl | X | X | X | X | |||
parse-domain | X | X | X | X | X | X | |
haraka-tld | X | X | X | X | X | ||
uBlock publicsuffixlist | ? | X | X |
Performance
Here we measure the performance of three common operations offered by domain parsing library: getting the public suffix of a hostname, getting the domain (tld + sld) and getting the subdomain.
A few notes about this benchmark:
- The inputs used are always already valid hostnames (no URLs, although some libraries like tldts support it). You can find the list of inputs there:
- The selection of hostnames can be seen in bench_performance.js and was selected to contain a mix of non-existing suffixes, ICANN rules, private rules as well as wildcards and exceptions.
- All hostnames were ASCII (puny-encoded if needed before-hand)
- All libraries were used in their default setup (no option given, with the exception of
tldts-no-parse
which runstldts
disabling the parsing phase and assuming that the input is already a valid hostname, to match the behavior of other libraries).
The results are expressed in terms of operations per second (where each operation is calling the function once on a hostname).
Library | getPublicSuffix | getDomain | getSubdomain |
---|---|---|---|
tldts-experimental | 1 898 446 | 1 690 572 | 1 615 166 |
tldts no parsing | 1 780 469 | 1 515 703 | 1 502 692 |
tldts | 1 280 063 | 1 134 956 | 1 125 362 |
tld.js | 1 141 414 | 1 049 180 | 1 125 362 |
ublock publicsuffix | 620 816 | 567 664 | ? |
parse-domain | 554 355 | 528 217 | 551 008 |
haraka-tld | ? | 105 321 | ? |
psl | 1 654 | 1 693 | 1 673 |
Here we see that the performance varies a lot between libraries, for the same operations. tldts
is 1000 faster than psl
, which is the most popular library.
Memory Usage
Here we estimate the memory used by each library. The measurements are done using the bench_memory.js script, which loads each file ten times and measure the average memory usage before and after GC using process.memoryUsage()
. The result are then compared to a reference memory usage computed in the same way using noop_test.js
which does not import anything.
Library | Before GC | After GC |
---|---|---|
tldts-experimental | 461KB | 229 KB |
parse-domain | 2.579 MB | 1.310MB |
psl | 2.199 MB | 1.537MB |
tldjs | 2.621 MB | 1.714MB |
tldts | 3.094 MB | 1.792MB |
UBlock publicsuffix | 4.529 MB* | 2.399MB |
haraka-tld | 4.405 MB | 2.595MB |
(*) The memory of uBlock cannot be estimated correctly as for this benchmarks the lists were inlined in the source code, which is not how it’s used in production.
Loading Time
One point of comparison which can make a difference in some contexts (e.g.: mobile or if the library is embedded in a website) is the loading time of the bundle itself (or time it takes to parse the code and initialize it). It can have a big impact if you use the library on very slow devices (like mobiles) and here again, not all the libraries are equal.
The benchmark code can be found in bench_startup.sh. It measures the time it takes to import each of the libraries. The measurements are performed using the bench CLI, looking at the mean
time returned for each.
Note that this benchmark was performed using the cjs bundle. The performance might be different in another environment or different bundle (e.g.: UMD in a browser).
Library | Mean (ms) |
---|---|
Ref (no require ) | 48.21 |
tldts-experimental | 47.93 |
psl | 53.77 |
tld.js | 58.74 |
parse-domain | 61.96 |
tldts | 64.48 |
ublock | 78.05 |
haraka-tld | 84.93 |
Note that some libraries like ublock
or haraka-tld
perform some form of parsing of the rules at loading-time, which incurs an initial cost when importing the library.
Bundles
Comparison of bundle sizes, when applicable (not all libraries provide bundles):
Library | Normal | Minified | Gzipped |
---|---|---|---|
tldts-experimental | 100KB | 94KB | 38KB |
tldts | 140KB | 95KB | 37KB |
psl | 138KB | 122KB | 39KB |
tld.js | 209KB | 141KB | 40KB |
parse-domain | ? | ? | ? |
ublock | ? | ? | ? |
haraka-tld | ? | ? | ? |
Dependencies
Here is a comparison of dependencies for each library:
Library | Dependencies |
---|---|
tldts | (none) |
psl | punycode |
tld.js | punycode |
ublock | punycode |
haraka-tld | punycode |
parse-domain | ? |
Conclusion
=> Tldts on NPM
npm install tldts
If that sounds appealing to you, give it a shot and do not hesite to open issues for any feedback you might have!