Simple scraper.
Go to file
2020-04-11 22:35:30 +02:00
src add scraper 2020-04-11 22:18:33 +02:00
.gitignore add scraper 2020-04-11 22:18:33 +02:00
package.json add dependences 2020-04-11 22:18:09 +02:00
README.md add documentation 2020-04-11 22:35:30 +02:00
yarn.lock add dependences 2020-04-11 22:18:09 +02:00

Scraper

This project is a basic tool to scrap a data from a website using a CSS selector.

For example, you are able to retrieve the number of a project's releases hosted on github:

node src/index.js \
	--url https://github.com/foo/bar \
	--selector '.repository-content .numbers-summary li:nth-child(4) a' \
	--tags \
	--breaks \
	--spaces \
	--breaks \
	--trim

...will show XXX releases.

More help with node src/index.js --help.

Installation

Requirements:

  • node >= 10
  • yarn
$ git clone https://gitnet.fr/deblan/scraper.git
$ cd scraper
$ yarn