{"id":20425722,"url":"https://github.com/mapmeld/scraping-by-in-nodejs","last_synced_at":"2025-03-05T05:16:23.510Z","repository":{"id":57140616,"uuid":"44181872","full_name":"mapmeld/scraping-by-in-nodejs","owner":"mapmeld","description":"Tutorial for jQuery users to learn NodeJS by writing a scraper module","archived":false,"fork":false,"pushed_at":"2017-09-12T22:01:25.000Z","size":30,"stargazers_count":2,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-06-11T18:07:54.869Z","etag":null,"topics":["nodejs","scraping","tutorial"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mapmeld.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-13T14:27:05.000Z","updated_at":"2017-05-18T17:47:57.000Z","dependencies_parsed_at":"2022-09-05T01:31:10.135Z","dependency_job_id":null,"html_url":"https://github.com/mapmeld/scraping-by-in-nodejs","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapmeld%2Fscraping-by-in-nodejs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapmeld%2Fscraping-by-in-nodejs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapmeld%2Fscraping-by-in-nodejs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mapmeld%2Fscraping-by-in-nodejs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mapmeld","download_url":"https://codeload.github.com/mapmeld/scraping-by-in-nodejs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241967055,"owners_count":20050331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nodejs","scraping","tutorial"],"created_at":"2024-11-15T07:14:06.708Z","updated_at":"2025-03-05T05:16:23.491Z","avatar_url":"https://github.com/mapmeld.png","language":"JavaScript","readme":"# How To:\n## Scraping By in NodeJS\n\nThis tutorial is for newbies or near-newbies to NodeJS, who have written some jQuery before.\n\nBy the end of the tutorial, you will publish a site-scraping module on NPM - then you and other developers can get data out of that site by installing your module.\n\nThis repo contains the code for my own scraper module, which returns a list of world leaders from Wikipedia. You're welcome to look at the code or use it in a real application.\n\n### Thinking about what web servers are\n\nIf you're coming into this with 100% client-side experience, you might have questions about \"server-side JavaScript\". Most of my client-side\nwork was creating visualizations and maps on web pages, so at first I wondered, how would I load the Google Maps API into Node? How would I respond to events like click and drag?\nHow would people see my pages?\n\nA web server program is a hub designed for three things: figuring out what the user wants, finding that information, and responding to the user. It can be written in many languages, and JavaScript is just\nbeing repurposed to write this kind of program.\n\nLet's look at a pseudocode example:\n\n```javascript\n// the server waits until an event, like someone requesting a page on the website\n// this happens anytime someone follows a link or types in a URL\n// if two users arrive it will call this function twice, and handle them separately\n\n// pseudocode\nserver.onRequest = function (url) {\n  if (url == \"/\") {\n    // homepage - respond with static HTML page\n    send(homepage.html);\n  } else if (url == \"/profile/:username\") {\n    // look up this user in the database\n    database.findUser(username, function (userData) {\n      // after the database finds this user and reports back, we can continue responding\n      // the browser will see just this HTML or other text\n      send('hello ' + userData.name);\n    });\n    // don't return anything from this function - wait for the callback function to be called with data\n  }\n};\n```\n\nIn this code, we don't know what the website looks like, and we didn't write any code for the browser. The browser just receives HTML without knowing what happened inside the server.\n\nTo repeat from before: the server was designed around three things: figuring out what the user wants, finding that information, and responding to the user.\n\n### Thinking about NodeJS, servers, and modules\n\nNodeJS servers, especially frameworks like ExpressJS, are similar to that pseudocode. But before you write a server, it's easier to write a module. A module is a set of data and functions which you can import into other NodeJS programs. Using tools such as Browserify or WebPack, you could also use modules as client-side / browser libraries. There are also NodeJS command line tools and small hardware devices.\n\nIn this example, scraping a \u003ca href='https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government'\u003elist of world leaders from Wikipedia\u003c/a\u003e, there is a good use case to separate your app and the leader module. You could probably turn it into an interesting API, where people can request the full table, an individual country, or historical data.\n\nYour first guess might be to make a step-by-step program like this:\n\n```javascript\n// pseudocode!\nserver.onRequest = function(url) {\n  leaders = getLeaders();\n  send(leaders);\n};\n```\n\nThis won't work because code is fast, and connecting to Wikipedia takes a lot of time by comparison. Your program will:\n\n* ask Wikipedia for a list of world leaders\n* wait for Wikipedia to respond\n* wait to finish downloading the response\n* run some code to add each leader to a list\n\nMaking the server wait here in NodeJS is blocking new requests from coming in to your server.\n\nSo you want to write an asynchronous program. This will make a request, and use a new anonymous callback function to process the data when it's done:\n\n```javascript\n// pseudocode!\nserver.onRequest = function(url) {\n  getLeaders(function (leaders) {\n    send(leaders);\n  });\n};\n```\n\nBefore we wrote ```leaders = getLeaders()``` because the function returned data immediately and would store data in the ```leaders``` variable.  In the async version, nothing is returned and instead data is returned inside the callback function, after all of the internal work is completed.\n\n### Create your project with git init and npm init\n\nFirst, install NodeJS and git on your computer.\n\nRun this code in the command prompt:\n\n```bash\nmkdir world-leaders\ncd world-leaders\ngit init\nnpm init\n```\n\nnpm init will ask you some questions. You can type answers or press Enter to accept a suggestion / leave it blank.\n\nFor \"git repository\" you can leave it blank, or paste a URL for your GitHub repo.\n\nFor \"license\" you can review [several options](http://choosealicense.com/) for open-sourcing your code, but I typically use MIT.\n\nThese settings are stored in package.json, the main source for NPM's information about your module, its use, and the other libraries that it depends on to work. You can modify this file later directly, or re-run npm init.\n\n### Install Node modules as dependencies\n\nYou don't need to re-invent the wheel to download HTML from the web. In the command prompt, install the \"request\" module:\n\n```bash\nnpm install request --save\n```\n\nThis installs the latest version on npmjs.com for this module. Adding --save puts the module and its version into your package.json file, under \"dependencies\". Make sure to list any and all dependencies for your module in the package.json, so other developers can get them all at the same time.\n\nThis tutorial also uses Cheerio, which lets you use jQuery-like features. Let's install that one, too:\n\n```bash\nnpm install cheerio --save\n```\n\n### GET-ing a page\n\nI created a file named index.js to be my main script. The first thing it needs to do to scrape a webpage is to load the HTML source of a webpage as a string. Let's try that, and then use\nconsole.log to print it out to the command line and check if it worked.\n\n```javascript\n// real JavaScript, not pseudocode anymore\n// versions/1.js\n\n// load the module that I installed\nvar request = require('request');\n\n// request is a function that I can use like this:\n\nrequest(\"https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government\",\n  function (anyError, serverResponse, body) {\n  // when you're done getting the page, this gets called\n\n  if (anyError) {\n    // crash the script if there is an error\n    throw anyError;\n  } else {\n    // console.log will output to the command line\n    console.log(body);\n  }\n});\n```\n\nRun ```node index.js``` and see what happens.\n\nIf everything works, you should see a lot of HTML output to the command line. If you are offline and unable to connect to Wikipedia, you might see an error like this:\n\n```bash\nscraping-by-in-node-js/versions/1.js:14\n    throw anyError;\n    ^\n\nError: getaddrinfo ENOTFOUND en.wikipedia.org en.wikipedia.org:443\n    at errnoException (dns.js:26:10)\n    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:77:26)\n```\n\nThis is a good time to **check that the data that you want to scrape is in the HTML source**. If data is loaded later by JavaScript, as it is on Airbnb and other house-listing sites, then you should look up the right URL to request and scrape.\n\nHere's part of the HTML source for the head of state and government:\n\n```html\n\u003ctable class=\"wikitable\"\u003e\n...\n\u003ctr\u003e\n  \u003cth style=\"font-weight:normal; text-align:left;\"\u003e\n    \u003ca href=\"/wiki/United_Kingdom\" title=\"United Kingdom\"\u003eUnited Kingdom\u003c/a\u003e\n  \u003c/th\u003e\n  \u003ctd\u003e\n    \u003ca href=\"/wiki/Monarchy_of_the_United_Kingdom\" title=\"Monarchy of the United Kingdom\"\u003eQueen\u003c/a\u003e\u0026#160;–\n    \u003ca href=\"/wiki/Elizabeth_II\" title=\"Elizabeth II\"\u003eElizabeth II\u003c/a\u003e\n    \u003csup id=\"cite_ref-ERII_3-15\" class=\"reference\"\u003e\n      \u003ca href=\"#cite_note-ERII-3\"\u003e\n        \u003cspan\u003e[\u003c/span\u003en 3\u003cspan\u003e]\u003c/span\u003e\n      \u003c/a\u003e\n    \u003c/sup\u003e\n  \u003c/td\u003e\n  \u003ctd style=\"background-color:LightYellow;\"\u003e\n    \u003ca href=\"/wiki/Prime_Minister_of_the_United_Kingdom\" title=\"Prime Minister of the United Kingdom\"\u003ePrime Minister\u003c/a\u003e\u0026#160;–\n    \u003ca href=\"/wiki/David_Cameron\" title=\"David Cameron\"\u003eDavid Cameron\u003c/a\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n  \u003cth style=\"font-weight:normal; text-align:left;\"\u003e\n    \u003ca href=\"/wiki/United_States\" title=\"United States\"\u003eUnited States\u003c/a\u003e\n  \u003c/th\u003e\n  \u003ctd colspan=\"2\" style=\"background-color:LightYellow;\"\u003e\n    \u003cdiv align=\"center\"\u003e\n      \u003ca href=\"/wiki/President_of_the_United_States\" title=\"President of the United States\"\u003ePresident\u003c/a\u003e\u0026#160;–\n      \u003ca href=\"/wiki/Barack_Obama\" title=\"Barack Obama\"\u003eBarack Obama\u003c/a\u003e\n    \u003c/div\u003e\n  \u003c/td\u003e\n\u003c/tr\u003e\n...\n\u003c/table\u003e\n```\n\nThe HTML has some interesting data: the country's name and article, the position's name and Wikipedia article, and the current name and Wikipedia article for that leader. We can see that each country gets a ```tr``` element, and the leader gets a ```td``` element, which can be two columns wide if -like in the US- the leader is head of state and head of government.\n\n### Using jQuery to get leader names\n\nIn jQuery, if you wanted to get a list of leader ```td``` elements from that HTML, you would write a selector such as this:\n\n```javascript\n$(\"table.wikitable td\")\n```\n\nOn Wikipedia, you can open the developer tools and test this yourself.\n\nThere are several jQuery and modern-JavaScript functions you could use to iterate, but to avoid overcomplicating things, let's write a for loop:\n\n```javascript\nvar leaders = $(\"table.wikitable td\");\nfor (var i = 0; i \u003c leaders.length; i++) {\n  console.log( $(leaders[i]).text() );\n}\n```\n\nAlthough some of the leaders' names have additional text, it's mostly good:\n\n```\nPresident – Park Geun-hye[n 1]\nPrime Minister – Hwang Kyo-ahn\n```\n\n### Translating jQuery to Cheerio to get leader names\n\nLet's go back to index.js and start using the cheerio module that we installed. It's a good idea to open up \u003ca href=\"https://www.npmjs.com/package/cheerio\"\u003ethe official documentation\u003c/a\u003e for this module as a reference.\n\n```javascript\n// versions/2.js\n\n// load both modules now\nvar request = require('request');\nvar cheerio = require('cheerio');\n\nrequest(\"https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government\",\n  function (anyError, serverResponse, body) {\n  // when you're done getting the page, this gets called\n\n  if (anyError) {\n    throw anyError;\n  } else {\n    // body is just a string - let's turn it into a jQuery-like object\n    $ = cheerio.load(body);\n\n    // now let's run that same for loop that we used in the browser\n    var leaders = $(\"table.wikitable td\");\n    for (var i = 0; i \u003c leaders.length; i++) {\n      console.log( $(leaders[i]).text() );\n    }\n  }\n});\n```\n\nCool! It should work the same in NodeJS as in the browser!\n\n### Returning data instead of logging\n\nIf I want this scraper to become a re-usable module, I need to hide these implementation details someplace and create a single function that other users can call.\n\nI'm going to name this function scrapeData. Because it has asynchronous code requesting a page inside of it, scrapeData also needs to be asynchronous. I will pass the data back through a callback function instead of trying to use \"return\". This callback function will be the new way to handle errors and world leader data.\n\n```javascript\n// versions/3.js\n\nvar request = require('request');\nvar cheerio = require('cheerio');\n\nfunction scrapeData (callback) {\n  request(\"https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government\",\n    function (anyError, server_response, body) {\n    if (anyError) {\n      callback(anyError, null);\n    } else {\n      $ = cheerio.load(body);\n\n      var leaders = $(\"table.wikitable td\");\n      var leaderData = [];\n      for (var i = 0; i \u003c leaders.length; i++) {\n        leaderData.push( $(leaders[i]).text() );\n      }\n      callback(anyError, leaderData);\n    }\n  });\n}\n```\n\nCode which calls scrapeData and handles its responses would look like this:\n\n```javascript\n// we've separated the reusable scraping and DOM manipulation from the application and output\nscrapeData(function (anyError, leaders) {\n  if (anyError) {\n    throw anyError;\n  } else {\n    console.log(leaders);\n  }\n});\n```\n\nWhen I run ```node index.js```, I now get a JSON array of leader names:\n\n```\n...\n'Prime Minister – Ralph Gonsalves',\n'O le Ao o le Malo – Tufuga Efi',\n'Prime Minister – Tuilaepa Aiono Sailele Malielegaoi',\n'\\nCaptain Regent – Lorella Stefanelli[n 8]\\n',\n...\n```\n\n### Make your data awesome and structured\n\nIf you want this module to be useful to you and others, keep pushing to organize and improve the data.\n\nA list of leader names is interesting, but we can aim for cleaner, more structured JSON data to be returned. I'd like it to organize leaders by country, and handle unusual cases such as countries with multiple heads of state.\n\nSomething like this:\n\n```javascript\n[  // array of countries\n  {\n    \"country\": \"United States\",\n    \"wiki\": \"https://en.wikipedia.org/wiki/United_States\",\n    \"heads_of_state\": [ // array of heads of state\n      {\n        \"title\": {\n          \"name\": \"President\",\n          \"wiki\": \"https://en.wikipedia.org/wiki/President_of_the_United_States\"\n        },\n        \"person\": {\n          \"name\": \"Barack Obama\",\n          \"wiki\": \"https://en.wikipedia.org/wiki/Barack_Obama\"\n        }\n      }\n    ],\n    \"heads_of_government\": [ // array of heads of government (can be same as head of state)\n      {\n        \"title\": {\n          \"name\": \"President\",\n          \"wiki\": \"https://en.wikipedia.org/wiki/President_of_the_United_States\"\n        },\n        \"person\": {\n          \"name\": \"Barack Obama\",\n          \"wiki\": \"https://en.wikipedia.org/wiki/Barack_Obama\"\n        }\n      }\n    ]\n  }\n]\n```\n\nOnce you know what you want the response to look like, you can use Cheerio to start pulling more data from the HTML and inserting it into this structure. My final code is about 100 lines, most of it is specific to this Wikipedia page and my scraper. Here are a few snippets which show you how much Cheerio looks like jQuery:\n\n```javascript\nif (country.find(\"td\").length \u003e 1) {\n  // head of government and heads of state \u003ctd\u003es exist\n  heads_of_government = $(country.find(\"td\")[1]);\n} else {\n  // head of government is the same as heads of state, copy object\n  heads_of_government = heads_of_state;\n}\n...\n// get the URL of a link\nwikiLink = title.find(\"a\").attr(\"href\");\n...\n// remove an element\nperson.find(\"sup\").remove();\n```\n\nYou can view the full code here: \u003ca href=\"https://github.com/mapmeld/scraping-by-in-nodejs/blob/master/versions/4.js\"\u003ehttps://github.com/mapmeld/scraping-by-in-nodejs/blob/master/versions/4.js\u003c/a\u003e\n\n### Turning your script into a module\n\nOn the client-side, JavaScript programs are a collection of libraries and scripts. In NodeJS, you want to publish\nall of your code as a reusable module. For example, your code uses the request and cheerio modules. As you've seen with request and cheerio, once you've installed a module you can use it like this:\n\n```javascript\nvar request = require(\"request\");\nrequest(\"http://example.com\", function(anyError, serverResponse, body) { ... });\n```\n\nLet's share your scrapeData function so that people can ```npm install``` your module someday.\nIn the script, add this line to make the module's main export your scrapeData function.\n\n```javascript\nmodule.exports = scrapeData;\n```\n\nYour full file is now looking like this:\n\n```javascript\nvar request = require('request');\nvar cheerio = require('cheerio');\n\nfunction scrapeData (callback) { ... }\n\n// remove the part after the function where you called scrapeData...\n// now scrapeData is called by people who use this module\n\nmodule.exports = scrapeData;\n```\n\nYou can test your module in the Node REPL:\n\nOn your command line, type ```node```, enter, and then enter these lines:\n\n```javascript\ngetLeaders = require('./index.js');\n\u003e [Function: scrapeData]\ngetLeaders(function(err, data) { console.log(JSON.stringify(data)) })\n```\n\nIf everything went OK, you should first get an \"undefined\" response, from your getLeaders call not returning anything synchronously, then when the request finishes your data will come out.\n\n### Publishing your node module\n\nGo to npmjs.com and create an account. Confirm your e-mail.\n\nOpen package.json again to make sure you have a good name and version number for your module.\n\nThen, on the command line, run ```npm publish```. You will be asked to log in.\n\nIf everything goes well, you should have a module listed at npmjs.com/package/MODULENAME and it should be possible\nfor others to download it by running ```npm install MODULENAME``` or including it as a dependency in\n*their* package.json file.\n\nWhen you want to update the module, re-open package.json, increase your version number, and re-run ```npm publish```. You cannot re-publish a module without changing the version number, because that would be confusing.\n\n### Modules with multiple functions\n\nIn that example, we have one ```scrapeData``` function and, as with ```request```, we make the module\ncontain just one function.\n\nWhat if you want your one module to be a little smarter, and have multiple functions and options?\nSuppose I want to return leaders for a specific country, too, or look up the country where a specific\nleader is from.\n\nYou can add a new function and reorganize module.exports so you share both functions:\n\n```javascript\nfunction scrapeData() { ... }\n\nfunction fromCountry() { ... }\n\nmodule.exports = {\n  all: scrapeData,\n  fromCountry: fromCountry\n};\n```\n\nYou can also include JSON data in your exports. This isn't done so often, but it's helpful if your module comes with an interesting dataset.\n\n```javascript\nmodule.exports = {\n  all: scrapeData,\n  fromCountry: fromCountry,\n  credit: \"CC-BY-SA Wikipedia.org\"\n};\n```\n\nIf someone is writing a script, and they have your module installed, they can still use ```require(\"MODULENAME\")```,\nbut they need to call the specific function.\n\n```javascript\nvar leaders = require('world-leaders');\n\n// change from leaders() to leaders.all()\nleaders.all(function (err, allLeaders) {\n  console.log(allLeaders);\n  console.log(\"from \" + leaders.credit);\n});\n```\n\nTo avoid repeating your scraper code, you can have fromCountry use the same scrapeData function.\n**Don't overload Wikipedia with requests** - save your scraped data somewhere.\n\n```javascript\nvar savedData = null;\n\nfunction scrapeData(callback) {\n  if (savedData) {\n    // already scraped! so fast now\n    return callback(null, savedData);\n  }\n  ...\n    // whenever you finish scraping, save the response\n    savedData = countryData;\n    callback(null, countryData);\n  ...\n}\n\nfunction fromCountry(countryName, callback) {\n  // fromCountry doesn't need to know if it is the scraping or viewing a cached copy\n  // just reuse your existing code\n  scrapeData(function (err, countries) {\n    if (err) {\n      return callback(err, null);\n    }\n    for (var c = 0; c \u003c countries.length; c++) {\n      if (countries[c].country == countryName) {\n        return callback(null, countries[c]);\n      }\n    }\n    callback(\"country not found\", null);\n  });\n}\n```\n\n### Testing your node module\n\nYou should write tests to make sure that your module returns consistent responses, even as you continue changing the code. In fact, many people believe you should\nwrite tests first (test-driven development!). But this is a tutorial so you'll learn the testing part now.\n\nInstall the mocha test module and command line tools:\n\n```bash\nnpm install mocha -g\nnpm install mocha --save-dev\n```\n\nIn your package.json file, look for a \"scripts\" property and add a test script:\n\n```javascript\n{\n  \"name\": \"world-leaders\",\n   ...\n   ...\n  \"scripts\": {\n    \"test\": \"mocha\"\n  },\n  ...\n}\n```\n\nNow when you run ```npm test``` on the command line, you should get the error\n\n```bash\nError: cannot resolve path (or pattern) 'test'\n    at Object.lookupFiles (/usr/local/lib/node_modules/mocha/lib/utils.js:591:32)\n```\n\nRun ```mkdir test``` and create test.js inside of it. Here's what a simple test might look like:\n\n```javascript\n// use Node's built-in assert library\nvar assert = require('assert');\n\n// import your own module from the parent directory\nvar worldLeaders = require('../index.js');\n\n// use describe(function() { .. }) blocks to organize your tests\n// on the top level, we describe a feature (\"list all world leaders\")\n// on the next level, we describe an expectation of how it'll work (\"separate head of state and head of government\")\ndescribe(\"list all world leaders:\", function() {\n  it(\"has leaders for 203 countries\", function (done) {\n    // run your module's code\n    worldLeaders.all(function(anyErrors, leaders) {\n      // use assert.equal and other assert functions to check that responses match expectations\n      assert.equal(anyErrors, null);\n      assert.equal(leaders.length, 203);\n      \n      // when you test async, functions, call the done() function afterward\n      done();\n    });\n    \n    // usually this test fails if it takes 2 seconds or longer to call done()\n    // scraping takes some time, so let's give 4 seconds (4000 milliseconds)\n    this.timeout(4000);\n  });\n});\n```\n\nThis test isn't testing your code so much, because if it fails it is likely Wiki editors who added or removed\na country. The same thing could happen if you test that a leader's name is in the response.\nHere are some better tests which look at the structure of your data and behavior of your code:\n\n* each country has a head of state and a head of government\n* the list includes the United States\n* the United States has a President as both head of state and head of government, and they are equivalent\n* the United Kingdom's head of state is different from the head of government, who is the Prime Minister\n* there are between 190 and 210 countries\n\nHere's how I would test the first three:\n\n```javascript\nvar assert = require('assert');\nvar worldLeaders = require('../index.js');\n\ndescribe(\"calling worldLeaders.all() \", function() {\n  it(\"has one head of state and head of government for each country\", function (done) {\n    worldLeaders.all(function(anyErrors, leaders) {\n      assert.equal(anyErrors, null);\n      // go through the list of countries and find any obvious missing people\n      for (var i = 0; i \u003c leaders.length; i++) {\n        // requirements for each country\n        assert.notEqual(leaders[i].heads_of_state, null);\n        assert.notEqual(leaders[i].heads_of_state.length, 0);\n        assert.notEqual(leaders[i].heads_of_government, null);\n        assert.notEqual(leaders[i].heads_of_government.length, 0);\n      }\n      done();\n    });\n    // keep the timeout\n    this.timeout(4000);\n  });\n});\n\ndescribe(\"calling worldLeaders.for country() \", function() {\n  it(\"returns the US president as head of government and head of state\", function (done) {\n    worldLeaders.fromCountry('United States', function(anyErrors, usLeaders) {\n      assert.equal(anyErrors, null);\n      assert.equal(usLeaders.heads_of_state.length, 1);\n      assert.equal(usLeaders.heads_of_state[0].title.name, \"President\");\n      asset.deepEqual(usLeaders.heads_of_state, usLeaders.heads_of_government);\n      done();\n    });\n    // change the timeout time here, too\n    this.timeout(4000);\n  });\n});\n```\n\nWhen I ran ```npm test```, on the thing that I discovered was that I was getting \"President President\" as the title instead of\njust \"President\". Not good!\n\n```bash\n1) calling worldLeaders.for country()  returns the US president as head of government and head of state:\n\n      Uncaught AssertionError: 'President President' == 'President'\n      + expected - actual\n\n      -President President\n      +President\n\n      at test/test.js:30:14\n```\n\nThis message tells me the test that failed, including what came out of the program, and what I expected.\n\nAfter I wrote some code to fix this particular error, I can re-run ```npm test```\n\n```bash\n  calling worldLeaders.all()\n    ✓ has one head of state and head of government for each country (2864ms)\n\n  calling worldLeaders.for country()\n    ✓ returns the US president as head of government and head of state (2024ms)\n\n  2 passing (5s)\n  ```\n  \nYou should test your errors, too. My code sends back a \"country not found\" error if someone asks for a non-existent country. Here's\nhow you can test one:\n\n```javascript\nit(\"returns an error when requesting a fake country\", function (done) {\n  worldLeaders.fromCountry('Narnia', function(anyErrors, narniaLeaders) {\n    assert.equal(anyErrors, \"country not found\");\n    assert.equal(narniaLeaders, null);\n    done();\n  });\n  this.timeout(4000);\n});\n```\n\n### Including your package in a server\n\nThere are several different web frameworks and servers in the NodeJS world, and they can all use your module. I'm going\nto create a simple example using the Express framework.\t\n\nCreate another project folder where you are creating the server. Install only Express and your own package for now:\n\n```bash\ncd ..\nmkdir my-first-server\ncd my-first-server\nnpm install express MYPACKAGE\n```\n\nYour Express server will set up a website on http://localhost:8080/ When someone goes to that page, we should show them a JSON list of all world leaders,\nand if someone goes to http://localhoat:8080/country/Albania they should see a list of Albanian leaders. Let's write app.js for that purpose:\n\n```javascript\n// load express and your own module\nvar express = require(\"express\");\nvar worldLeaders = require(\"world-leaders\");\n\n// this is how we initialize an Express app\nvar app = express();\n\napp.get(\"/\", function (req, res) {\n  // this is the homepage, where we return all world leaders\n  // req is the initial request from the browser. We don't call it \"request\" because it'd be confused with the \"request\" module\n  // res is the response. When you're finished getting callbacks and other data, use a response method to talk back to the browser\n  \n  worldLeaders.all(function (anyError, leaders) {\n    if (anyError) {\n      // don't throw errors anymore - you could crash the server! instead let the user know that they got an error\n      res.json({ error: anyError });\n    } else {\n      // here we are sending back JSON data, so it's best to use res.json method\n      res.json(leaders);\n    }\n  });\n});\n\napp.get(\"/country/:requestedCountry\", function (req, res) {\n  // this is the API page for any country\n  // requestedCountry is available on req.params\n  // if it were a URL like ?requestedState=NY , you would check req.query.requestedState\n  // if it were a POST request, you'd need the body-parser module and req.body\n  \n  worldLeaders.fromCountry(req.params.requestedCountry, function (anyError, leaders) {\n    if (anyError) {\n      res.json({ error: anyError });\n    } else {\n      res.json(leaders);\n    }\n  });\n});\n\n// OK now the server knows what to do. Let's launch it:\napp.listen(\"8080\", function() {\n  console.log(\"Server ready on http://localhost:8080\");\n});\n```\n\nNow run your server with ```node app.js``` and wait for confirmation that it's running.\n\nThat's all it takes. If you request http://localhost:8080 and http://localhost:8080/country/Albania they should work. If you try http://localhost:8080/country/Narnia\nor another incorrect country name, look what comes back:\n\n```javascript\n{\"error\": \"country not found\"}\n```\n\nThat's an error thrown by the world-leaders module! By writing a good module, you've made it really easy to deliver data to this server, without having to create new\nerrors or responses.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmapmeld%2Fscraping-by-in-nodejs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmapmeld%2Fscraping-by-in-nodejs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmapmeld%2Fscraping-by-in-nodejs/lists"}