What is the Paradigm Data Portal?
A collection of open source cryptocurrency datasets aimed at researchers and tool builders.
How much do these datasets cost?
All of these datasets are free.
Are there any restrictions on how these datasets can be used?
All of these datasets can be used without restriction. They are released under a CC0 license into the public domain.
How can I acquire these datasets?
The files can be downloaded using either 1) the links on individual dataset pages or 2) the downloader tool in the project repo.
What is the format of these datasets?
Each dataset is distributed as a set of parquet files. See individual dataset pages to view the schema of each dataset.
These files are too big. How am I supposed to use them?
There are many options for processing large amounts of parquet data. In many cases, you don’t even need a database or a large amount of memory. See below for an overview of parquet, or see example dataset usage in this notebook.
Datasets are provided in parquet format.
Parquet is a tabular data format that has many advantages over older formats such as csv.
You can run efficient queries against parquet files without any database and without needing to fit the files in memory. See this notebook for example usage.
Parquet files can be read directly by many programming languages:
- Python: polars, pandas
- Rust: rust-polars, parquet, parquet2
- R: arrow
- Typescript: nodejs-polars, parquets
- SQL: duckdb
Alternatively, you can import parquet files to data platforms such as