Skip to content

An Objective-C framework for your everyday HTML needs.

License

Notifications You must be signed in to change notification settings

iabudiab/HTMLKit

Repository files navigation

HTMLKit

HTMLKit Logo

An Objective-C framework for your everyday HTML needs.

HTMLKit CI codecov Carthage Compatible CocoaPods Compatible Platform License MIT

Quick Overview

HTMLKit is a WHATWG specification-compliant framework for parsing and serializing HTML documents and document fragments for iOS and OSX. HTMLKit parses real-world HTML the same way modern web browsers would.

HTMLKit provides a rich DOM implementation for manipulating and navigating the document tree. It also understands CSS3 selectors making node-selection and querying the DOM a piece of cake.

DOM Validation

DOM mutations are validated as described in the WHATWG DOM Standard. Invalid DOM manipulations throw hierarchy-related exceptions. You can disable these validations, which will also increase the performance by about 20-30%, by defining the HTMLKIT_NO_DOM_CHECKS compiler constant.

Tests

HTMLKit passes all of the HTML5Lib Tokenizer and Tree Construction tests. The html5lib-tests is configured as a git-submodule. If you plan to run the tests, do not forget to pull it too.

The CSS3 Selector implementation is tested with an adapted version of the CSS3 Selectors Test Suite, ignoring the tests that require user interaction, session history, and scripting.

Does it Swift?

Check out the playground!

Installation

Carthage

Carthage is a decentralized dependency manager that builds your dependencies and provides you with binary frameworks.

If you don't have Carthage yet, you can install it with Homebrew using the following command:

$ brew update
$ brew install carthage

To add HTMLKit as a dependency into your project using Carthage just add the following line in your Cartfile:

github "iabudiab/HTMLKit"

Then run the following command to build the framework and drag the built HTMLKit.framework into your Xcode project.

$ carthage update

CocoaPods

CocoaPods is a dependency manager for Cocoa projects.

If you don't have CocoaPods yet, you can install it with the following command:

$ gem install cocoapods

To add HTMLKit as a dependency into your project using CocoaPods just add the following in your Podfile:

target 'MyTarget' do
  pod 'HTMLKit', '~> 4.2'
end

Then, run the following command:

$ pod install

Swift Package Manager

Swift Package Manager is the package manager for the Swift programming language.

Add HTMLKit to your Package.swift dependecies:

.package(url: "https://github.com/iabudiab/HTMLKit", .upToNextMajor(from: "4.0.0")),

Then run:

$ swift build

Manually

1- Add HTMLKit as git submodule

$ git submodule add https://github.com/iabudiab/HTMLKit.git

2- Open the HTMLKit folder and drag'n'drop the HTMLKit.xcodeproj into the Project Navigator in Xcode to add it as a sub-project.

3- In the General panel of your target add HTMLKit.framework under the Embedded Binaries

Parsing

Parsing Documents

Given some HTML content, you can parse it either via the HTMLParser or instatiate a HTMLDocument directly:

NSString *htmlString = @"<div><h1>HTMLKit</h1><p>Hello there!</p></div>";

// Via parser
HTMLParser *parser = [[HTMLParser alloc] initWithString:htmlString];
HTMLDocument *document = [parser parseDocument];

// Via static initializer
HTMLDocument *document = [HTMLDocument documentWithString:htmlString];

Parsing Fragments

You can also prase HTML content as a document fragment with a specified context element:

NSString *htmlString = @"<div><h1>HTMLKit</h1><p>Hello there!</p></div>";

HTMLParser *parser = [[HTMLParser alloc] initWithString: htmlString];

HTMLElement *tableContext = [[HTMLElement alloc] initWithTagName:@"table"];
NSArray *nodes = [parser parseFragmentWithContextElement:tableContext];

for (HTMLNode *node in nodes) {
	NSLog(@"%@", node.outerHTML);
}

// The same parser instance can be reusued:
HTMLElement *bodyContext = [[HTMLElement alloc] initWithTagName:@"body"];
nodes = [parser parseFragmentWithContextElement:bodyContext];

The DOM

The DOM tree can be manipulated in several ways, here are just a few:

  • Create new elements and assign attributes
HTMLElement *description = [[HTMLElement alloc] initWithTagName:@"meta"  attributes: @{@"name": @"description"}];
description[@"content"] = @"HTMLKit for iOS & OSX";
  • Append nodes to the document
HTMLElement *head = document.head;
[head appendNode:description];

HTMLElement *body = document.body;
NSArray *nodes = @[
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"red"}],
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"green"}],
	[[HTMLElement alloc] initWithTagName:@"div" attributes: @{@"class": @"blue"}]
];
[body appendNodes:nodes];
  • Enumerate child elements and perform DOM editing
[body enumerateChildElementsUsingBlock:^(HTMLElement *element, NSUInteger idx, BOOL *stop) {
	if ([element.tagName isEqualToString:@"div"]) {
		HTMLElement *lorem = [[HTMLElement alloc] initWithTagName:@"p"];
		lorem.textContent = [NSString stringWithFormat:@"Lorem ipsum: %lu", (unsigned long)idx];
		[element appendNode:lorem];
	}
}];
  • Remove nodes from the document
[body removeChildNodeAtIndex:1];
[head removeAllChildNodes];
[body.lastChild removeFromParentNode];
  • Manipulate the HTML directly
greenDiv.innerHTML = @"<ul><li>item 1<li>item 2";
  • Navigate to child and sibling nodes
HTMLNode *firstChild = body.firstChild;
HTMLNode *greenDiv = firstChild.nextSibling;
  • Iterate the DOM tree with custom filters
HTMLNodeFilterBlock *filter =[HTMLNodeFilterBlock filterWithBlock:^ HTMLNodeFilterValue (HTMLNode *node) {
	if (node.childNodesCount != 1) {
		return HTMLNodeFilterReject;
	}
	return HTMLNodeFilterAccept;
}];

for (HTMLElement *element in [body nodeIteratorWithShowOptions:HTMLNodeFilterShowElement filter:filter]) {
	NSLog(@"%@", element.outerHTML);
}
  • Create and manipulate DOM Ranges
HTMLDocument *document = [HTMLDocument documentWithString:@"<div><h1>HTMLKit</h1><p id='foo'>Hello there!</p></div>"];
HTMLRange *range = [[HTMLRange alloc] initWithDocument:document];

HTMLNode *paragraph = [document querySelector:@"#foo"];
[range selectNode:paragraph];
[range extractContents];

CSS3 Selectors

All CSS3 Selectors are supported except for the pseudo-elements (::first-line, ::first-letter, ...etc.). You can use them the way you always have:

// Given the document:
NSString *htmlString = @"<div><h1>HTMLKit</h1><p class='greeting'>Hello there!</p><p class='description'>This is a demo of HTMLKit</p></div>";
HTMLDocument *document = [HTMLDocument documentWithString: htmlString];

// Here are some of the supported selectors
NSArray *paragraphs = [document querySelectorAll:@"p"];
NSArray *paragraphsOrHeaders = [document querySelectorAll:@"p, h1"];
NSArray *hasClassAttribute = [document querySelectorAll:@"[class]"];
NSArray *greetings = [document querySelectorAll:@".greeting"];
NSArray *classNameStartsWith_de = [document querySelectorAll:@"[class^='de']"];

NSArray *hasAdjacentHeader = [document querySelectorAll:@"h1 + *"];
NSArray *hasSiblingHeader = [document querySelectorAll:@"h1 ~ *"];
NSArray *hasSiblingParagraph = [document querySelectorAll:@"p ~ *"];

NSArray *nonParagraphChildOfDiv = [document querySelectorAll:@"div :not(p)"];

HTMLKit also provides API to create selector instances in a type-safe manner without the need to parse them first. The previous examples would like this:

NSArray *paragraphs = [document elementsMatchingSelector:typeSelector(@"p")];
NSArray *paragraphsOrHeaders = [document elementsMatchingSelector:
	anyOf(@[
		typeSelector(@"p"), typeSelector(@"h1")
	])
];

NSArray *hasClassAttribute = [document elementsMatchingSelector:hasAttributeSelector(@"class")];
NSArray *greetings = [document elementsMatchingSelector:classSelector(@"greeting")];
NSArray *classNameStartsWith_de = [document elementsMatchingSelector:attributeSelector(CSSAttributeSelectorBegins, @"class", @"de")];

NSArray *hasAdjacentHeader = [document elementsMatchingSelector:adjacentSiblingSelector(typeSelector(@"h1"))];
NSArray *hasSiblingHeader = [document elementsMatchingSelector:generalSiblingSelector(typeSelector(@"h1"))];
NSArray *hasSiblingParagraph = [document elementsMatchingSelector:generalSiblingSelector(typeSelector(@"p"))];

NSArray *nonParagraphChildOfDiv = [document elementsMatchingSelector:
	allOf(@[
		childOfElementSelector(typeSelector(@"div")),
		not(typeSelector(@"p"))
	])
];

Here are more examples:

HTMLNode *firstDivElement = [document firstElementMatchingSelector:typeSelector(@"div")];

NSArray *secondChildOfDiv = [firstDivElement querySelectorAll:@":nth-child(2)"];
NSArray *secondOfType = [firstDivElement querySelectorAll:@":nth-of-type(2n)"];

secondChildOfDiv = [firstDivElement elementsMatchingSelector:nthChildSelector(CSSNthExpressionMake(0, 2))];
secondOfType = [firstDivElement elementsMatchingSelector:nthOfTypeSelector(CSSNthExpressionMake(2, 0))];

NSArray *notParagraphAndNotDiv = [firstDivElement querySelectorAll:@":not(p):not(div)"];
notParagraphAndNotDiv = [firstDivElement elementsMatchingSelector:
	allOf([
		not(typeSelector(@"p")),
		not(typeSelector(@"div"))
	])
];

One more thing! You can also create your own selectors. You either subclass the CSSSelector or just use the block-based wrapper. For example the previous selector can be implemented like this:

CSSSelector *myAwesomeSelector = namedBlockSelector(@"myAwesomeSelector", ^BOOL (HTMLElement *element) {
	return ![element.tagName isEqualToString:@"p"] && ![element.tagName isEqualToString:@"div"];
});
notParagraphAndNotDiv = [firstDivElement elementsMatchingSelector:myAwesomeSelector];

Change Log

See the CHANGELOG.md for more info.

License

HTMLKit is available under the MIT license. See the LICENSE file for more info.