This package has been renamed to 'html'.

Future releases of this package will happen in the html package.

To continue using html5lib without deprecation warnings, change your pubspec to depend on html5lib: '<=0.12.0'.

See the html package for details.

html5lib in Pure Dart

This is a pure Dart html5 parser. It's a port of html5lib from Python. Since it's 100% Dart you can use it safely from a script or server side app.

Eventually the parse tree API will be compatible with dart:html, so the same code will work on the client and the server.

Installation

Add this to your pubspec.yaml (or create it):

dependencies:
  html5lib: any

Then run the Pub Package Manager (comes with the Dart SDK):

pub install

Usage

Parsing HTML is easy!

import 'package:html5lib/parser.dart' show parse;
import 'package:html5lib/dom.dart';

main() {
  var document = parse(
      '<body>Hello world! <a href="www.html5rocks.com">HTML5 rocks!');
  print(document.outerHtml);
}

You can pass a String or list of bytes to parse. There's also parseFragment for parsing a document fragment, and HtmlParser if you want more low level control.

Running Tests

# From Dart SVN checkout
./tools/build.py -m release
./tools/test.py -m release html5lib
./tools/test.py -m release -r drt html5lib

Libraries

dom

A simple tree API that results from parsing html. Intended to be compatible with dart:html, but it is missing many types and APIs.

dom_parsing

This library contains extra APIs that aren't in the DOM, but are useful when interacting with the parse tree.

parser

This library has a parser for HTML5 documents, that lets you parse HTML easily from a script or server side application:

parser_console

This library adds dart:io support to the HTML5 parser. Call initDartIOSupport before calling the parse methods and they will accept a RandomAccessFile as input, in addition to the other input types.