Data Journalism Lab 4

Lab 4 instructions

In Lab #4 you will use the scraping techniques outlined in class to identify and pull data from a site that has content of interest to you.

You need to use Node on repl.it in order to grab the data and generatr a CSV. That can be done by:

  • Extracting tabular data from a single or multiple pages
  • Extracting non-tabular data from a single or multiple pages

Each of these can be emulated emulating one of the following repl.it examples:

Single page table scraper

Multi-page table scraper

Note that this outputs two files to copy and paste.

Single page non-table scraper

Paged results non-table scraper

Steps

  1. Create a new repl.it Node.js project using this link
  2. Go to the page you’re using and “view source” in your browser.
  3. Choose the approprate type of script and customize it to fit your situation
  4. Once it’s working, export to github as a gist using the “share” button repl.it share button
  5. Submit the gist url to ICON

Grading

  • Runs and generates csv output: 3 points
  • Code is correct w/ good variable names: 5 points
  • At least 10 meaningful comments: 2 point