Simple scraper.
Go to file
Simon Vieille a7d23b9027
ci/woodpecker/push/woodpecker Pipeline was successful Подробиці
fix ci syntax
2023-09-29 16:37:56 +02:00
src apply linter 2023-03-31 17:53:24 +02:00
test apply linter 2023-03-31 17:53:50 +02:00
.gitignore add scraper 2020-04-11 22:18:33 +02:00
.woodpecker.yml fix ci syntax 2023-09-29 16:37:56 +02:00
README.md remove jenkins stuff 2023-03-31 21:34:27 +02:00
package-lock.json update dependencies 2023-03-31 17:53:40 +02:00
package.json update dependencies 2023-03-31 17:53:40 +02:00
yarn.lock update dependencies 2023-03-31 17:53:40 +02:00

README.md

Scraper

Build Status

This project is a basic tool to scrap a data from a website using a CSS selector.

For example, if you want to retrieve the number of a project's releases hosted on github:

With CLI

node src/cli.js \
  --url https://github.com/foo/bar \
  --selector '.repository-content .numbers-summary li:nth-child(4) a' \
  --tags \
  --breaks \
  --spaces \
  --breaks \
  --trim

...will show XXX releases.

More help with node src/cli.js --help.

With code

const scraper = require('deblan-scraper')

const options = {
  url: 'https://github.com/foo/bar',
  acceptAllStatus: false, // Optional, default is `false`
  method: 'GET', // Optional, default is `GET`
}

const isMultiple = false // get the first result, `true` to get an array of results

const selector = '.repository-content .numbers-summary li:nth-child(4) a'

const filters = {
  tags: null, // Removes tags. You can specify the tags to remove (separated by comma)
  breaks: null, // Removes breaks (\n, \r)
  spaces: null, // Replaces 2 successive spaces by 1, except breaks
  trim: null, // Strips whitespaces from the beginning and end of the value
}

scraper(
  options,
  selector,
  filters,
  function(value) {
    console.log(value)
  },
  function(error) {
    console.log(error)
  },
  isMultiple
)

Installation

Requirements:

  • node >= 10
  • yarn
$ git clone https://gitnet.fr/deblan/scraper.git
$ cd scraper
$ yarn