Dask (software)

From Wikipedia, the free encyclopedia
Dask
Dask logo.svg
Original author(s)Matthew Rocklin
Developer(s)Dask
Initial releaseJanuary 8, 2015; 7 years ago (2015-01-08)
Stable release
2021.09.01 / September 1, 2021; 6 months ago (2021-09-01)
RepositoryDask Repository
Written inPython[1]
Operating systemLinux, Microsoft Windows, macOS
Available inPython
TypeData analytics
LicenseNew BSD
Websitedask.org

Dask is an open source library for parallel computing written in Python.[2][3] Originally developed by Matthew Rocklin, Dask is a community project maintained and sponsored by developers and organizations.

Overview[]

Dask is a library composed of two parts. It includes a task scheduling component for building dependency graphs and scheduling tasks. Second, it includes the distributed data structures with APIs similar to Pandas Dataframes or NumPy arrays. Dask has a variety of use cases and can be run with a single node and scale to thousand node clusters.[4]

References[]

  1. ^ "Dask: Parallel Computation with Blocked algorithms and Task Scheduling" (PDF). This paper introduces dask, a specification to encode parallel algorithms, using primitive Python dictionaries, tuples, and callables.
  2. ^ Daniel, Jesse C. (2019). Data Science at Scale with Python and Dask. Manning Publications. ISBN 9781617295607.
  3. ^ Rocklin, Matthew (2015). "Dask: Parallel Computation with Blocked algorithms and Task Scheduling". Proceedings of the 14th Python in Science Conference: 126–132. doi:10.25080/Majora-7b98e3ed-013.
  4. ^ "Dask — Dask documentation".

External links[]


Retrieved from ""