Skip to content

GitHub scraper is broken #2507

Closed
Closed
@spamguy

Description

@spamguy
Contributor

Bug report

Scrapers subclassed from Github do not seem to work anymore. JSON is no longer returned from GitHub URLs, which Github scrapers need to process correctly.

OS information

Using OSX 15.5, running devdocs locally.

Steps to reproduce

Attempt to generate docs for any of the Github-based projects.

Examples: thor docs:generate koa and thor docs:generate fluture

More resources

Output from the above:

% thor docs:generate fluture --debug
/!\ WARNING /!\

Some scrapers send thousands of HTTP requests in a short period of time,
which can slow down the source site and trouble its maintainers.

Please scrape responsibly. Don't do it unless you're modifying the code.

To download the latest tested version of this documentation, run:
  thor docs:download fluture

Proceed? (y/n) y
Queue:   github.com/fluture-js/Fluture/blob/14.0.0/README.md
Process: github.com/fluture-js/Fluture/blob/14.0.0/README.md                                                           [0ms]
ERROR:
  https://github.com/fluture-js/Fluture/blob/14.0.0/README.md
  JSON::ParserError: lexical error: invalid char in json text.
                                           <!DOCTYPE html> <html   lang="e
                         (right here) ------^


  /Users/X/src/devdocs/lib/docs/scrapers/github.rb:19:in 'Docs::Github#parse'
  /Users/X/src/devdocs/lib/docs/core/scraper.rb:176:in 'Docs::Scraper#process_response'
  /Users/X/src/devdocs/lib/docs/core/scraper.rb:160:in 'block in Docs::Scraper#handle_response'
  /Users/X/src/devdocs/lib/docs/core/instrumentable.rb:15:in 'Docs::Instrumentable::Methods#instrument'
  /Users/X/src/devdocs/lib/docs/core/scraper.rb:159:in 'Docs::Scraper#handle_response'
  /Users/X/src/devdocs/lib/docs/core/scraper.rb:77:in 'block in Docs::Scraper#build_pages'
  /Users/X/src/devdocs/lib/docs/core/requester.rb:59:in 'block (2 levels) in Docs::Requester#handle_response'
  /Users/X/src/devdocs/lib/docs/core/requester.rb:58:in 'Array#each'
  /Users/X/src/devdocs/lib/docs/core/requester.rb:58:in 'block in Docs::Requester#handle_response'
  /Users/X/src/devdocs/lib/docs/core/instrumentable.rb:15:in 'Docs::Instrumentable::Methods#instrument'
  /Users/X/src/devdocs/lib/docs/core/requester.rb:57:in 'Docs::Requester#handle_response'
  /Users/X/src/devdocs/lib/docs/core/requester.rb:18:in 'Docs::Requester.run'
  /Users/X/src/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in 'Docs::UrlScraper#request_all'
  /Users/X/src/devdocs/lib/docs/core/scraper.rb:76:in 'Docs::Scraper#build_pages'
  /Users/X/src/devdocs/lib/docs/core/doc.rb:115:in 'block in Docs::Doc.store_pages'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:87:in 'block (2 levels) in Docs::AbstractStore#replace'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:182:in 'Docs::AbstractStore#track_touched'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:87:in 'block in Docs::AbstractStore#replace'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:170:in 'Docs::AbstractStore#lock'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:87:in 'Docs::AbstractStore#replace'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:85:in 'block in Docs::AbstractStore#replace'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:144:in 'Docs::AbstractStore#open_yield_close'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:30:in 'Docs::AbstractStore#open'
  /Users/X/src/devdocs/lib/docs/storage/abstract_store.rb:85:in 'Docs::AbstractStore#replace'
  /Users/X/src/devdocs/lib/docs/core/doc.rb:114:in 'Docs::Doc.store_pages'
  /Users/X/src/devdocs/lib/docs.rb:100:in 'Docs.generate'
  /Users/X/src/devdocs/lib/tasks/docs.thor:301:in 'Thor::Sandbox::DocsCLI#generate_doc'
  /Users/X/src/devdocs/lib/tasks/docs.thor:105:in 'Thor::Sandbox::DocsCLI#generate'

Possible fix

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @spamguy

      Issue actions

        GitHub scraper is broken · Issue #2507 · freeCodeCamp/devdocs