Very fast link-checking.
A good utility is custom-made for a job. There are many link checkers out there, but none of them seems to be striving for the following set of goals.
You want to run the link-checker at least before every deploy (on CI or manually). When it takes ages, you're less likely to do so.
linkcheckis currently several times faster than blc and all other link checkers that go to at least comparable depth. It is 40 times faster than the only tool that goes to the same depth (linkchecker).
Finds all relevant problems
No link-checker can guarantee correct results: the web is too flaky for that. But at least the tool should correctly parse the HTML (not just try to guess what's a URL and what isn't) and the CSS (for
- PENDING: srcset support
Leaves out irrelevant problems
https:. It won't try to check FTP or telnet or nntp links.
linkcheckwill currently completely ignore unsupported schemes like
data:. This may change in the future to at least show info-level warning.
linkcheckdoesn't validate file system directories. Servers often behave very differently than file systems, so validating links on the file system often leads to both false positives and false negatives. Links should be checked in their natural habitat, and as close to the production environment as possible. You can (and should) run
linkcheckon your localhost server, of course.
Yes, a command line utility can have good or bad UX. It has mostly to do with giving sane defaults, not forcing users to learn new constructs, not making them type more than needed, and showing concise output.
The most frequent use cases should be only a few arguments.
- For example, unleashing
linkcheckon http://localhost:4001/ can be done via
- For example, unleashing
linkcheckdoesn't throttle itself on localhost.
linkcheckfollows POSIX CLI standards (no
@inputand similar constructs like in linklint).
Brief and meaningful output
When everything works, you don't want to see a huge list of links.
- In this scenario,
linkcheckjust outputs 'Perfect' and some stats on a single line.
- In this scenario,
When things are broken, you want to see where exactly is the problem and you want to have it sorted in a sane way.
linkchecklists broken links by their source URL first so that you can fix many links at once. It also sorts the URLs alphabetically, and shows both the exact location of the link (line:column) and the anchor text (or the tag if it wasn't an anchor).
For CI builds, you want non-zero exit code whenever there is a problem.
linkcheckreturns status code
1if there are warnings, and status code
2if there are errors.
It goes without saying that
linkcheck honors robots.txt and throttles itself
when accessing websites.
Step 1. Install Dart
Full installation guides per platform:
For example, on a Mac, assuming you have homebrew, you just run:
$ brew tap dart-lang/dart $ brew install dart
Step 2. Install
Once Dart is installed, run:
$ pub global activate linkcheck
Pub installs executables into
~/.pub-cache/bin, which may not be on your path.
You can fix that by adding the following to your shell's config file (.bashrc,
Then either restart the terminal or run
source ~/.bash_profile (assuming
~/.bash_profile is where you put the PATH export above).
If in doubt, run
linkcheck -h. Here are some examples to get you started.
linkcheck without arguments will try to crawl
http://localhost:8080/ (which is the most common local server URL).
linkcheckto crawl the site and ignore external links
linkcheck -eto try external links
If you run your local server on http://localhost:4000/, for example, you can do:
linkcheck :4000to crawl the site and ignore external links
linkcheck :4000 -eto try external links
linkcheck will not throttle itself when accessing localhost. It will go as
fast as possible.
linkcheck www.example.comto crawl www.example.com and ignore external links
linkcheck https://www.example.comto start directly on https
linkcheck www.example.com www.other.comto crawl both sites and check links between the two (but ignore external links outside those two sites)
Many entry points
Assuming you have a text file
mysites.txt like this:
http://egamebook.com/ http://filiph.net/ https://alojz.cz/
You can run
linkcheck -i mysites.txt and it will crawl all of them and also
check links between them. This is useful for:
- Link-checking projects spanning many domains (or subdomains).
- Checking all your public websites / blogs / etc.
There's another use for this, and that is when you have a list of inbound links, like this:
http://www.dartlang.org/ http://www.dartlang.org/tools/ http://www.dartlang.org/downloads/
You probably want to make sure you never break your inbound links. For example, if a page changes URL, the previous URL should still work (redirecting to the new page when appropriate).
Where do you get a list of inbound links? Try your site's sitemap.xml as a starting point, and — additionally — try something like the Google Webmaster Tools’ crawl error page.
Sometimes, it is legitimate to ignore some failing URLs. This is done via
Let's say you're working on a site and a significant portion of it is currently
under construction. You can create a file called
example, and fill it with regular expressions like so:
# Lines starting with a hash are comments. admin/ \.s?css$ \#info
The file above includes a comment on line 1 which will be ignored. Line 2 is
blank and will be ignored as well. Line 3 contains a broad regular expression
that will make linkcheck ignore any link to a URL containing
anywhere in it. Line 4 shows that there is full support for
regular expressions – it will ignore URLs ending with
.scss. Line 5 shows the only special escape sequence.
If you need to start your regular expression with a
(which linkcheck would normally parse as a comment) you can
# with a backslash (
\). This will force linkcheck
not to ignore the line. In this case, the regular expression on line 4
#info anywhere in the URL.
To use this file, you run linkcheck like this:
linkcheck example.com --skip-file my_skip_file.txt
Regular expressions are hard. If unsure, use the
-d option to see what URLs
your skip file is ignoring, exactly.