Scraper ======= [![Build Status](https://ci.gitnet.fr/buildStatus/icon?job=Gitnet%2Fscraper%2Fmaster)](https://ci.gitnet.fr/job/Gitnet/job/scraper/job/master/) This project is a basic tool to scrap a data from a website using a CSS selector. For example, if you want to retrieve the number of a project's releases hosted on github: With CLI --- ``` node src/cli.js \ --url https://github.com/foo/bar \ --selector '.repository-content .numbers-summary li:nth-child(4) a' \ --tags \ --breaks \ --spaces \ --breaks \ --trim ``` ...will show `XXX releases`. More help with `node src/cli.js --help`. With code --------- ``` const scraper = require('deblan-scraper') const options = { url: 'https://github.com/foo/bar', acceptAllStatus: false, // Optional, default is `false` method: 'GET', // Optional, default is `GET` } const isMultiple = false // get the first result, `true` to get an array of results const selector = '.repository-content .numbers-summary li:nth-child(4) a' const filters = { tags: null, // Removes tags. You can specify the tags to remove (separated by comma) breaks: null, // Removes breaks (\n, \r) spaces: null, // Replaces 2 successive spaces by 1, except breaks trim: null, // Strips whitespaces from the beginning and end of the value } scraper( options, selector, filters, function(value) { console.log(value) }, function(error) { console.log(error) }, isMultiple ) ``` Installation ------------ Requirements: * node >= 10 * yarn ``` $ git clone https://gitnet.fr/deblan/scraper.git $ cd scraper $ yarn ```