@hackage / scrapbook

collect posts of site that is wrote in config yaml using feed or scraping

Latest0.5.0

About

Metadata

  • Last updated , by matsubara0507
  • License MIT
  • Categories Web Development
  • Maintained by: MATSUBARA Nobutada

  • Lottery factor: 0

Links

Installation

Readme

scrapbook

Hackage Build Status

This is cli tool that collect posts of site that is wrote in config yaml using feed or scraping.

Usage

  1. clone this repository or add scrapbook package to extra-deps in stack.yaml
  2. run stack install

e.g.

$ stack exec -- scrapbook -o "example" example/sites.yaml
Docker
$ docker run --rm -v `pwd`/example:/work matsubara0507/scrapbook scrapbook sites.yaml

build docker image:

$ stack --docker build -j 1 Cabal # if out of memory in docker
$ stack --docker --local-bin-path=./bin install
$ docker build -t matsubara0507/scrapbook . --build-arg local_bin_path=./bin
Command
scrapbook [options] [input-file]
  -o DIR                --output=DIR                 Write output to DIR instead of stdout.
  -t FORMAT, -w FORMAT  --to=FORMAT, --write=FORMAT  Specify output format. default is `feed`.
                        --version                    Show version
GHCi
>> import Control.Lens ((^.))
>> import Data.Maybe
>> conf <- fromJust <$> readConfig "example/sites.yaml"
>> (Right posts) <- collect . fmap concat $ mapM (fetch . toSite) (conf ^. #sites)
>> collect $ writeFeed "example" (fromJust $ conf ^. #feed) posts
Right ()

Example

see matsuara0507/scrapbook-example

Documentation

How to write config yaml file.

# configuration for generating Atom feed (Optional)
feed:
  ## write as site title to Atom feed
  title: "Sample Site Posts"
  ## write as site url to Atom feed
  baseUrl: "https://example.com"
  ## file name (Optional)
  ### if nothing, use same name from input file
  name: atom.xml

# Haskeller's site configuration
sites:
    ## Title of site
  - title: "ひげメモ"
    ## Author of site
    author: matsubara0507
    ## URL of site
    url: https://matsubara0507.github.io
    ## Feed url of site
    ### there are several field to set feed url
    ### `feed` is basic field. This field auto branch to Atom or RSS 2.0.
    feed: https://matsubara0507.github.io/feed
  - title: "Kuro's Blog"
    author: "Hiroyuki Kurokawa"
    url: http://kurokawh.blogspot.com/
    ### `atom` is for Atom feed.  
    atom:
      ### feed url of Atom
      url: http://kurokawh.blogspot.com/feeds/posts/default
      ### set attr as constraint for link on each entry of Atom feed (Optional)
      ### if nothing, choice head. if set multiple attr, conjunction.
      linkAttrs:
        rel: alternate
  - title: "あどけない話"
    author: "kazu-yamamoto"
    url: http://d.hatena.ne.jp/kazu-yamamoto
    ### `rss` is for RSS 2.0 feed.
    ### set feed url.
    rss: http://d.hatena.ne.jp/kazu-yamamoto/rss2