The following is the stepwise procedure that to be used in working on the programming exercise shared.
- Create your project directory and name it
Project_denguemaking sure it has the necessary subdirectories needed for any bioinformatics project
mkdir -p Project_dengue/{raw_data/{compressed},processed_data,scripts}- To put the downloaded zip file into the compressed subdirectory inthe raw data and unzipping.
- First move into
compresseddirectory by
cd Project_dengue/raw_data/compressed/- Then copy the file using
cpcommand as bellow
cp /mnt/c/Users/micro/Desktop/Msc/lisso/dengue.zip .- Exctract it by the
unzipcommand which can be installed bysudo apt install unzipwhen not installed. - Use
unzipwith-doption when specificing the output directory.
unzip dengue.zip -d /home/genomics/Project_dengue/raw_data/Use unzip command with -l option to get a summary of the zipped file without extracting it.
unzip -l dengue.zipOutput
Archive: ./dengue.zip
Length Date Time Name
--------- ---------- ----- ----
2795 2024-11-19 09:23 dengueseq1.fasta
8006 2024-11-19 09:23 dengueseq2.fasta
6476 2024-11-19 09:24 dengueseq3.fasta
7844 2024-11-19 09:24 dengueseq4.fasta
10936 2024-11-19 09:20 dengueseq5.fasta
--------- -------
36057 5 files
Use wc with -l to count lines only in each file with .fasta extension in the raw_data directory where extracted files were placed.
wc -l *.fastaOutput
42 dengueseq1.fasta
115 dengueseq2.fasta
94 dengueseq3.fasta
113 dengueseq4.fasta
157 dengueseq5.fasta
521 total
Use the cat to combined all files and pipe | the output to count all lines combined by wc -l
cat *.fasta | wc -lOutput
521
Use cat command and redirect output to new the name.
cat *.fasta > dengue_merged.fastaUse grep command with flag -c
grep -c ">" dengue_merged.fastaOutput
5
Use grep command with flag -c
grep -c ">" dengue_merged.fastaOutput
5
Use grep command and redirect the of output to a new folder. You can use cat to view contents of the new file dengue_headers.txt
grep ">" dengue_merged.fasta > dengue_headers.txtUse both awk to search for columns and pipe output to sed to remove identfires.
awk -F '[>,]' '{print $2}' dengue_headers.txt | sed 's/^[^ ]* //' > viruses.txtUse the awk command to sort columns
awk -F '>' '{print $2}' dengue_headers.txt | awk '{print $1}' > identifiers.txtUse grep -v to invert the output or sed with d option to to delete all headers.
grep -v '^>' dengue_merged.fasta > dengue_seq.txtor
sed '/^>/d' <dengue_merged.fasta > dengue_seq.txtUse the tr command to translate from upper to lower case.
tr '[:upper:]' '[:lower:]' < dengue_seq.txt > dengue_seq_lowercase.txt