Skip to main content

Dynamic

Template

Unlike the other parse this parser uses javascript files, so the options described here must be included inside the javascript file. Also, the dynamic parser uses a scrape function instead of scrape options.

Below is a template for the dynamic parser source file:

module.exports = {
name: "csd.uoc.gr",
type: "dynamic",
// ...
scrape: async function (utils, Article) {
// Do your thing...

const article = new Article();
article.title = '';
// ...

if(error)
throw new Error('Failed for csd.uoc.gr!');

return articles;
}
}

Starting

Create the scrape asynchronous function to write down code needed for scraping. Saffron will ignore the rest of the file when it comes to scrapping so all code must be included inside the scrape function, such as imports user-defined functions etc.

info

Any extra libraries used here must be added to your package.json file.

Utils

Utils provide a set of necessary functions and fields that are used by all scrappers and will be needed for you to.

isScrapeAfterError

A field that will return true if the previous job was a failure.

url

A field that contains the url that must be scrapped.

aliases

A field that contains the categories passed alongside the url. You do not have to insert them to each article, saffron is doing this for you.

instructions

A set of instructions. It is mainly used from other parsers and also contains the current scrape function at the scrapeFunction field as a string.

amount

The maximum amount of articles that has to be returned.

get

A middleware function for axios to do a GET request.

post

A middleware function for axios to do a POST request.

request

A middleware function for axios to do any kind of request.

parse

A function that will get as its first parameter the contents of another source file and return the articles that was scrapped.

In case of incorrect source file or failed parsing an Exception will be thrown.

This function will return an object array that contains the url, aliases and articles for each url specified at the nested source file.

htmlStrip

A function that will strip a string of all html tags

A function that gets as its first parameter valid HTML code and returns an attachment array of the urls of the following tags: a, img, link.

Writing code

Nested functions & Imports

Saffron will ignore the rest of the file when it comes to scrapping so all code must be included inside the scrape function, such as imports user-defined functions etc.

scrape: async (utils, Article) => {
const log = (message) => {
console.log(message);
}

const other = require('other');
log('Using a nested function.');
// ...
}

Callbacks

Dynamic parser does not utilize callbacks. If you want to use callbacks in your code you have to return a promise:

scrape: (utils, Article) => {
return new Promise((resolve, reject) => {
utils.get(utils.url).then(response => {
// ...

resolve(articles);
});
});
}

Fail job

In case you want to mark the scrapping as failed and return no articles then you have to throw an Error:

scrape: async (utils, Article) => {
// ...
throw new Error("Parsing failed.");
}

or reject the promise:

scrape: async (utils, Article) => {
return new Promise((resolve, reject) => {
// ...
reject(new Error("Parsing failed."));
});
}