Simple scraper.
Go to file
Simon Vieille 229242ed11
packaging
2020-04-14 18:18:04 +02:00
src refactoring: index.js is now a module, add cli.js 2020-04-14 18:16:19 +02:00
.gitignore add scraper 2020-04-11 22:18:33 +02:00
README.md refactoring: index.js is now a module, add cli.js 2020-04-14 18:16:19 +02:00
package.json packaging 2020-04-14 18:18:04 +02:00
yarn.lock add dependences 2020-04-11 22:18:09 +02:00

README.md

Scraper

This project is a basic tool to scrap a data from a website using a CSS selector.

For example, if you want to retrieve the number of a project's releases hosted on github:

With CLI

node src/cli.js \
  --url https://github.com/foo/bar \
  --selector '.repository-content .numbers-summary li:nth-child(4) a' \
  --tags \
  --breaks \
  --spaces \
  --breaks \
  --trim

...will show XXX releases.

More help with node src/cli.js --help.

With code

const scraper = require('deblan-scraper')

const options = {
	url: 'https://github.com/foo/bar',
	acceptAllStatus: false, // Optional
	method: 'GET', // Optional
}

const selector = '.repository-content .numbers-summary li:nth-child(4) a'

const filters = {
    tags: null,
    breaks: null,
    spaces: null,
    trim: null,
}

scraper(
	options,
	selector,
	filters,
	function(value) {
		console.log(value)
	},
	function(error) {
		console.log(error)
	}
)

Installation

Requirements:

  • node >= 10
  • yarn
$ git clone https://gitnet.fr/deblan/scraper.git
$ cd scraper
$ yarn