Scrapy

Scrapy
Developer(s)
Initial release	26 June 2008
Stable release	2.6.1 / 1 March 2022; 10 days ago
Repository	github.com/scrapy/scrapy ;
Written in	Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	BSD License
Website	scrapy.org

Scrapy (/ˈskreɪpaɪ/^[2] SKRAY-peye) is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler.^[3] It is currently maintained by , a web-scraping development and services company.

Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other don't repeat yourself frameworks, such as Django,^[4] it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.

The Scrapy framework provides you which powerful features such as auto-throttle, rotating proxies and user-agents, allowing you scrape virtually undetected across the net. Scrapy also provides a web-crawling shell, which can be used by developers to test their assumptions on a site’s behavior.^[5]

Some well-known companies and products using Scrapy are: Lyst,^[6]^[7] Parse.ly,^[8] ,^[9] Sciences Po Medialab,^[10] Data.gov.uk’s World Government Data site.^[11][1]

History[]

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in Montevideo, Uruguay). The first public release was in August 2008 under the BSD license, with a milestone 1.0 release happening in June 2015.^[12] In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.^[13]^[14]

References[]

^ https://github.com/scrapy/scrapy/releases/tag/2.6.1; publication date: 1 March 2022; retrieved: 11 March 2022.
^ Commit 975f150
^ Scrapy at a glance.
^ "Frequently Asked Questions". Retrieved 28 July 2015.
^ "Scrapy shell". Retrieved 28 July 2015.
^ Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Retrieved 28 July 2015.
^ Scrapy | Companies using Scrapy
^ Montalenti, Andrew. "Web Crawling & Metadata Extraction in Python".
^ "Scrapy Companies". Scrapy website.
^ Hyphe v0.0.0: the first release of our new webcrawler is out!
^ Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.
^ Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).
^ Pablo Hoffman (2013). List of the primary authors & contributors. Retrieved 18 November 2013.
^ Interview Scraping Hub.

External links[]

[wikidata-7bc54687afe7024f1d9c6e63d3d080c6eee097a3-v3-1] ttps://github.com/scrapy/scrapy/releases/tag/2.6.1; publication date: 1 March 2022; retrieved: 11 March 2022.

[2] Commit 975f150

[3] Scrapy at a glance.

[4] "Frequently Asked Questions". Retrieved 28 July 2015.

[5] "Scrapy shell". Retrieved 28 July 2015.

[6] Bell, Eddie; Heusser, Jonathan. "Scalable Scraping Using Machine Learning". Retrieved 28 July 2015.

[7] Scrapy | Companies using Scrapy

[8] Montalenti, Andrew. "Web Crawling & Metadata Extraction in Python".

[9] "Scrapy Companies". Scrapy website.

[10] Hyphe v0.0.0: the first release of our new webcrawler is out!

[11] Ben Firshman [@bfirsh] (21 January 2010). "World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore" (Tweet) – via Twitter.

[12] Medina, Julia (19 June 2015). "Scrapy 1.0 official release out!". scrapy-users (Mailing list).

[list-13] Pablo Hoffman (2013). List of the primary authors & contributors. Retrieved 18 November 2013.

[14] Interview Scraping Hub.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]


Developer(s)
Initial release	26 June 2008 (2008-06-26)

Stable release	2.6.1^[1] / 1 March 2022; 10 days ago (1 March 2022)

Repository	github.com/scrapy/scrapy
Written in	Python
Operating system	Windows, macOS, Linux
Type	Web crawler
License	BSD License
Website	scrapy.org