Concepts¶
Root directories¶
The most important concept for scilo is "root directories".
These are directories that should be present at the root of your project that organize your files into different functions.
Code¶
This folder contains all the code used to process files from data, perform statistical analyses, create visualizations, and save the outputs in results. Generally, this directory contains a series of subdirectories, each of which contain a set of related scripts for a single research question. For example, each subdirectory may contain:
- a workflow file (e.g.
SnakefileorNextflow), - a data pre-processing script,
- a statistical analysis and calculation script, or
- a plotting script
An example code directory is:
code/
├─ YYYY-mm-dd_first-analysis/
│ ├─ plot.r
│ ├─ preprocess.py
│ ├─ README.md
│ └─ Snakefile
│
├─ YYYY-mm-dd_second-analysis/
│ ├─ analysis.java
│ ├─ main.nf
│ ├─ plot.r
│ └─ README.org
└─ README.md
Data¶
All raw and processed data should be found within this directory. The datasets located here will then be sourced by code within the code directory and used to generate outputs in the results directory.
An example data directory is:
data/
├─ dataset1/
│ ├─ data.tsv
│ ├─ metadata.tsv
│ └─ README.md
├─ dataset2_soft-link
├─ dataset3/
│ ├─ raw_data.parquet
│ ├─ README.md
│ ├─ processed_data.csv
│ └─ processing_script.py
└─ README.md
Results¶
Code and scripts in the code directory should create outputs in this directory to ensure a separation of inputs, code, and outputs. Generally, this directory contains a series of subdirectories, each of which will contain all the outputs originating from the code in the corresponding code subdirectory.
An example code and results directory is:
project_root/
├─ code/
│ ├─ YYYY-mm-dd_first-analysis/
│ │ ├─ plot.r
│ │ ├─ preprocess.py
│ │ ├─ README.md
│ │ └─ Snakefile
│ │
│ └─ YYYY-mm-dd_second-analysis/
│ ├─ analysis.py
│ ├─ plot.r
│ ├─ README.org
│ └─ main.nf
│
└─ results/
├─ YYYY-mm-dd_first-analysis/
│ ├─ figure.png
│ └─ preprocessed_data.tsv
│
└─ YYYY-mm-dd_second-analysis/
├─ data1.tsv
├─ data2.tsv
├─ data3.tsv
├─ figure1.png
├─ figure2.png
└─ figure3.png
External¶
In many research projects you will need a package manifest, like a Nix expression or an Anaconda recipe. Alternatively, you may need to vendor a copy of some external code using a git submodule. This folder can be used to house all of the relevant code so as to not populate the custom code present in code.
Documentation¶
Documentation such as a manuscript, experimental descriptions from contract research organizations, or software user guides can go in this directory. Interactive HTML notebooks like Jupyter, R Markdown, or Quarto that render or explain the data, but don't create artefacts that belong in results, could also go here. Data that belongs in data or notebooks which process data that belong in code should not be placed in here.
Required files¶
You may want certain directories to contain specific files.
An example would be a README.md file located in every data subdirectory, or a Snakefile in every code subdirectory.
You can specify what files are required in different parts of the project depending on your workflow and needs by setting the configuration file.