I have all the HTML code for a page of the Vermont Lottery's website (this page). In that page, there's this giant table:


lottery prize table

I am fetching the HTML of that page successfully with this code (in a Playground):


import UIKit
import Foundation
import PlaygroundSupport

let url:URL = URL(string: "https://vtlottery.com/games/instant-tickets/outstanding-prizes")!
let session = URLSession.shared

let request = NSMutableURLRequest(url: url)
request.httpMethod = "POST"
request.cachePolicy = NSURLRequest.CachePolicy.reloadIgnoringCacheData

let paramString = "data=Hello"
request.httpBody = paramString.data(using: String.Encoding.utf8)

let task = session().dataTask(with: request as URLRequest) {
    data, response, error) in

    guard let data = data, let _:URLRespOnse= response  where error == nil else {

    let dataString = NSString(data: data, encoding: String.Encoding.utf8.rawValue)



PlaygroundPage.current.needsIndefiniteExecution = true

How can I go about fetching the data from the table in such a way that I can turn it into a giant JSON hash table (or similar)?


Basically, I want that the HTML code of that chart to turn into something like:


        "row 1": {
        "Game #":"1333",
        "Game Name":"Money"
        "Top Prizes":["$150000", "$5000", "$500"],
        "Unclaimed Top Prizes":["2", "16", "246"],
        "Total Unclaimed":"$2479510",
        "% Sold": "1",
        "# Of Tickets":"183600"
        "row 2": {
        "Game #":"1339",
        "Game Name":"Diamonds & Pearls"
        "Top Prizes":["$50000", "$5000", "$500"],
        "Unclaimed Top Prizes":["3", "5", "174"],
        "Total Unclaimed":"$1264925",
        "% Sold": "4",
        "# Of Tickets":"201650"
        }, ... (etc.)

How can I begin to go about parsing that data?


1 个解决方案



You need an HTML parser to go through the response HTML. Do not even think about using regex. The answer below uses HTMLReader, which you can add to your project via CocoaPods. Playground kept crashing on me so I converted your code an IBAction instead:


import UIKit
import MapKit
import HTMLReader

class ViewController: UIViewController {
    var result = [String: [String: AnyObject]]()

    override func viewDidLoad() {

    override func didReceiveMemoryWarning() {
        // Dispose of any resources that can be recreated.

    @IBAction func loadHTML(_ sender: AnyObject) {
        let url = URL(string: "https://vtlottery.com/games/instant-tickets/outstanding-prizes")!
        let session = URLSession.shared

        let request = NSMutableURLRequest(url: url)
        request.httpMethod = "POST"
        request.cachePolicy = NSURLRequest.CachePolicy.reloadIgnoringCacheData

        let paramString = "data=Hello"
        request.httpBody = paramString.data(using: String.Encoding.utf8)

        let task = session().dataTask(with: request as URLRequest) { data, response, error in
            guard let data = data, let _ = response where error == nil else {

            var index = 0

            // Many columns has newlines, tabs or spaces for their
            // textual content. Here, define a character set to trim
            // them off
            let spaceCharacterSet = CharacterSet(charactersIn: "\n\t ")

            let html = HTMLDocument(data: data, contentTypeHeader: "text/html; charset=utf-8")
            for node in html.nodes(matchingSelector: "#tblData tbody tr") {
                let columns = node.nodes(matchingSelector: "td")

                let topPrices = columns[3].nodes(matchingSelector: "p").map { $0.textContent }
                let unclaimedTopPrices = columns[4].nodes(matchingSelector: "p").map { $0.textContent }

                // You have to open the Web Inspector in Safari to grab
                // the table's structure
                let rowData: [String: AnyObject] = [
                    "Price"                : columns[0].textContent.trimmingCharacters(in : spaceCharacterSet),
                    "Game #"               : columns[1].textContent.trimmingCharacters(in : spaceCharacterSet),
                    "Game Name"            : columns[2].textContent.trimmingCharacters(in : spaceCharacterSet),
                    "Top Prices"           : topPrices,
                    "Unclaimed Top Prizes" : unclaimedTopPrices,
                    "Total Unclaimed"      : columns[5].textContent.trimmingCharacters(in : spaceCharacterSet),
                    "% Sold"               : columns[6].textContent.trimmingCharacters(in : spaceCharacterSet),
                    "# of tickets"         : columns[7].textContent.trimmingCharacters(in : spaceCharacterSet)

                index += 1
                self.result["row \(index)"] = rowData

            // It's better if you put a breakpoint on the line below.
            // Swift 3's logging is too verbose at the moment.
            print(self.result["row 1"]!["Unclaimed Top Prizes"])


