Skip to content

rashitig/ethz_webarchive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ethz_webarchive

Some code to help with processing the webarchive files of ETH Zürich

Clone the repository

ssh [email protected]:rashitig/ethz_webarchive.git

Set up the environment

conda create -n "env_warc" python=3.10 ipython

or

python3.10 -m venv env_warc

then activate the environment

conda activate env_warc

or

source env_warc/bin/activate

then install the requirements

pip install -r requirements.txt

Run python prep_warc_files.py after setting the filepaths.

About

Some code to help with processing the webarchive files of ETH Zürich

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages