diff --git a/Ishika/assignment_1/FinalBoss/README.md b/Ishika/assignment_1/FinalBoss/README.md new file mode 100644 index 0000000..1e64597 --- /dev/null +++ b/Ishika/assignment_1/FinalBoss/README.md @@ -0,0 +1,29 @@ +# File Format Converter + +This is a simple file format converter which allows you to convert your files between some common document formats such as: + +- .pdf +- .epub +- .md +- .html +- .txt +- .docx + +### Why use this? + +Looking up sites and uploading files for such basic file conversions can be very time consuming, especially when one frequently needs to make these conversions. This script automates the whole process and allows us to convert a file into another format by simply putting in the file name and the format we want it in. + +### How to use? + +```bash +./file_format_converter.sh file.extension required_format +``` + +### What to make sure of? + +- The command-line tool used for most of the conversions - pandoc, requires LaTeX(.tex) whenever it needs to convert to pdf. Make sure to have it installed beforehand. +_(Source: https://pandoc.org/MANUAL.html#options)_ + +- The file must mention its original format with the file name, in the format file_name.extension. + + diff --git a/Ishika/assignment_1/FinalBoss/converted_file.png b/Ishika/assignment_1/FinalBoss/converted_file.png new file mode 100644 index 0000000..1aaa7cd Binary files /dev/null and b/Ishika/assignment_1/FinalBoss/converted_file.png differ diff --git a/Ishika/assignment_1/FinalBoss/file_format_converter.sh b/Ishika/assignment_1/FinalBoss/file_format_converter.sh new file mode 100644 index 0000000..f29031f --- /dev/null +++ b/Ishika/assignment_1/FinalBoss/file_format_converter.sh @@ -0,0 +1,45 @@ +#!/bin/sh + +input_file="$1" +input_format="${1##*.}" +output_format="$2" + + +output_file="${input_file%.*}.$output_format" +text_file="/tmp/tmp_conversion.txt" + + +case "${input_format}-${output_format}" in +txt-md|epub-md|docx-md|html-md|\ + md-txt|html-txt|epub-txt|docx-txt|\ + md-html|txt-html|epub-html|docx-html|\ + md-pdf|html-pdf|txt-pdf|epub-pdf|\ + md-docx|html-docx|docx-pdf|txt-docx|\ + md-epub|txt-epub|html-epub) + pandoc "$input_file" -o "$output_file" + echo "Converted $input_file to $output_format" + ;; + +pdf-txt|pdf-html|pdf-md|pdf-docx|pdf-epub) + + pdftotext "$input_file" "$text_file" + pandoc "$text_file" -o "$output_file" + echo "Converted $input_file to $output_format" + ;; + +*) + + echo "Conversion from $input_format to $output_format not supported yet." + ;; + +esac + + + + + + + + + + diff --git a/Ishika/assignment_1/FinalBoss/sample.png b/Ishika/assignment_1/FinalBoss/sample.png new file mode 100644 index 0000000..50655f2 Binary files /dev/null and b/Ishika/assignment_1/FinalBoss/sample.png differ diff --git a/Ishika/assignment_1/Git/missing-semester-lec-3/README.md b/Ishika/assignment_1/Git/missing-semester-lec-3/README.md new file mode 100644 index 0000000..e5119d8 --- /dev/null +++ b/Ishika/assignment_1/Git/missing-semester-lec-3/README.md @@ -0,0 +1,64 @@ +# Lec 6 : Version Control Systems(git) + +## Solutions + + +2) The repo can be cloned as: +```bash +$ git clone https://github.com/missing-semester/missing-semester.git +``` + +a) Gives the version history of the given repo in graphical form. +```bash +$ git clone https://github.com/missing-semester/missing-semester.git +$ cd missing-semester +/missing-semester$ git log --all --graph --decorate +``` + +b) -1 only shows the last commit and -- README.md limits the log to only commits that modified README.md +```bash +~/missing-semester$ git log -1 -- README.md +``` + +c) blame looks for the modifier of only that line which contains the string "collections:". This gives us with the hash of that particular commit which we can use as an argument to the show command. It gives info about that commit, inculding the message it was commited with. +```bash +~/missing-semester$ git blame _config.yml | grep "collections:" +~/missing-semester$ git show a88b4eac +``` + +3) +```bash +ishika@LAPTOP-0EUE93V4:~$ git init demo +ishika@LAPTOP-0EUE93V4:~$ cd demo +ishika@LAPTOP-0EUE93V4:~/demo$ touch secret_file +ishika@LAPTOP-0EUE93V4:~/demo$ git add secret_file +ishika@LAPTOP-0EUE93V4:~/demo$ git commit -m "added the secret file, uh oh!" +[master (root-commit) ee60ad4] added the secret file, uh oh + 1 file changed, 0 insertions(+), 0 deletions(-) + create mode 100644 secret +_file +ishika@LAPTOP-0EUE93V4:~/demo$ git rm secret_file +rm 'secret_file' +ishika@LAPTOP-0EUE93V4:~/demo$ git commit -m "deleted the secret file, sigh..." +[master e18c37f] deleted the secret file, sigh + 1 file changed, 0 insertions(+), 0 deletions(-) + delete mode 100644 secret +_file +ishika@LAPTOP-0EUE93V4:~/demo$ git log +commit e18c37f3222d12c92b78438f755137625b9b7123 (HEAD -> master) +Author: ishika +Date: Tue Jun 3 17:15:01 2025 +0000 + + deleted the secret file, sigh + +commit ee60ad485f2e65227e1eb3e4d39ea545b617f582 +Author: ishika +Date: Tue Jun 3 17:14:06 2025 +0000 + + added the secret file, uh oh +``` +As we can see, the commit still shows up when we uploaded that file. That means anyone with the git repo can access the contents of those files by reverting back to the state when the commit was made using checkout. + +To resolve this and remove our sensitive files, we use git-filter-repo. + + diff --git a/Ishika/assignment_1/Shell/bandit-wargame/README.md b/Ishika/assignment_1/Shell/bandit-wargame/README.md new file mode 100644 index 0000000..2a22494 --- /dev/null +++ b/Ishika/assignment_1/Shell/bandit-wargame/README.md @@ -0,0 +1,182 @@ +# Bandit : OverTheWire + +## Solutions Lvl 0 - 15 + +Note: I had already done all these levels earlier so this time around, I tried to implement some of the commands and tricks which were taught in the missing semester lecture and assignments. + +0) We are using ssh command which allows us to securely access remote computers over a network. Here we are using it to log into another computer using the info provided. +```bash +$ ssh -p 2220 bandit0@bandit.labs.overthewire.org +``` + +1) cat reads the contents of a file. ~ denotes home directory. +```bash +$ cat ~/readme +``` + +2) +```bash +$ cat ~/- +``` + +3) For filenames with spaces, we can simply put them under quotation marks, so the shell does not consider each word from the file name as a separate argument. +```bash +$ cat "spaces in this filename" +``` + +4) For hidden files, -l flag can be used along with the ls command to list them along with the other files. +```bash +$ ls +inhere +$ ls -a inhere +. .. ...Hiding-From-You +$ cat inhere/...Hiding-From-You +``` + +5) The file command with ./* gives the type for all the files stored in the current directory. +-file07 is the only one in human-readable format. +```bash +bandit4@bandit:~$ ls +inhere +bandit4@bandit:~$ cd inhere +bandit4@bandit:~/inhere$ ls +-file00 -file02 -file04 -file06 -file08 +-file01 -file03 -file05 -file07 -file09 +bandit4@bandit:~/inhere$ file ./* +./-file00: PGP Secret Sub-key - +./-file01: data +./-file02: data +./-file03: data +./-file04: data +./-file05: data +./-file06: data +./-file07: ASCII text +./-file08: data +./-file09: data +bandit4@bandit:~/inhere$ cat ./-file07 +``` +6) Here we are using find to get all the files satiifying the file type and size conditions and redirecting this into xargs using piping, which will use all the files we find and use the ls command in the long list format to check if the file is executable or not. +```bash +bandit5@bandit:~$ find . -type f -size 1033c | xargs ls -l +-rw-r----- 1 root bandit5 1033 Apr 10 14:23 ./inhere/maybehere07/.file2 +bandit5@bandit:~$ cat ./inhere/maybehere07/.file2 +``` + +7) Since the file is somewhere on the server, we have to search the entire filesystem. In this process, we will find lots of files we are not permitted to open but will unnecessarily flood our terminal. To prevent this we redirected the STDERR stream to null. +```bash +bandit6@bandit:~$ find / -user bandit7 -group bandit6 -size 33c 2>/dev/null +/var/lib/dpkg/info/bandit7.password +bandit6@bandit:~$ cat /var/lib/dpkg/info/bandit7.password +``` + +8) grep looks of all the lines containing the word "millionth" in the file data.txt and shows them on our terminal. +```bash +bandit7@bandit:~$ ls +data.txt +bandit7@bandit:~$ grep "millionth" data.txt +``` + +9) 'sort' command as the name suggests sorts the lines of a text either alphabetically or numerically. 'uniq' removes duplicate adjacent lines and is generally used along with 'sort'. The -c flag counts the number of times a line occurs. +```bash +bandit8@bandit:~$ sort data.txt | uniq -c +``` + +10) 'strings' command only shows human-readable lines from a text file. +```bash +bandit9@bandit:~$ strings data.txt ; grep "=" +``` + +11) -d flag decodes the data in base64 present in the file. +```bash +bandit10@bandit:~$ base64 -d data.txt +``` + +12) Here, we are taking the text from file data.txt and piping it into the 'tr' command, which translates the characters from a rot13 script to a normal script. +```bash +bandit11@bandit:~$ cat data.txt | tr '[a-zA-Z]' '[n-za-mN-ZA-M]' +``` + +13) I gave the temporary directory a hard to guess name by using base64 encoding. Then I moved into that directory and copied the data.txt file there. This helped in keeping the home directory clean as there was a lot of decompressing and we could experiment here however we wanted.I also had to rename the files everytime I wanted to decompress to show the extension of the type of compression because the commands were not seeing them as valid files without proper extensions. +```bash +bandit12@bandit:~$ mktemp -d "/tmp/$(echo -n 'hard_to_guess' | base64).XXXXXX" +/tmp/aGFyZF90b19ndWVzcw==.LUb4ip +bandit12@bandit:/$ tmpdir="/tmp/aGFyZF90b19ndWVzcw==.LUb4ip" +bandit12@bandit:~$ cp data.txt $tmpdir +bandit12@bandit:/$ cd $tmpdir +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv data.txt hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ xxd -r hexdump.txt > decoded +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file decoded +decoded: gzip compressed data, was "data2.bin", last modified: Thu Apr 10 14:22:57 2025, max compression, from Unix, original size modulo 2^32 585 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv decoded decoded.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ gzip -d decoded.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls -la +total 9988 +drwx------ 2 bandit12 bandit12 4096 Jun 3 07:26 . +drwxrwx-wt 2118 root root 10211328 Jun 3 07:26 .. +-rw-rw-r-- 1 bandit12 bandit12 585 Jun 3 07:24 decoded +-rw-r----- 1 bandit12 bandit12 2646 Jun 3 06:59 hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file decoded +decoded: bzip2 compressed data, block size = 900k +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv decoded decoded.bz2 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ bzip2 -d decoded.bz +bzip2: Can't open input file decoded.bz: No such file or directory. +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ bzip2 -d decoded.bz2 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls -l +total 8 +-rw-rw-r-- 1 bandit12 bandit12 443 Jun 3 07:24 decoded +-rw-r----- 1 bandit12 bandit12 2646 Jun 3 06:59 hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file decoded +decoded: gzip compressed data, was "data4.bin", last modified: Thu Apr 10 14:22:57 2025, max compression, from Unix, original size modulo 2^32 20480 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv decoded decoded.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ gzip -d decoded.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file decoded +decoded: POSIX tar archive (GNU) +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv decoded decoded.tar +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ tar -xf decoded.tar +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file decoded +decoded: cannot open `decoded' (No such file or directory) +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls -l +total 36 +-rw-r--r-- 1 bandit12 bandit12 10240 Apr 10 14:22 data5.bin +-rw-rw-r-- 1 bandit12 bandit12 20480 Jun 3 07:24 decoded.tar +-rw-r----- 1 bandit12 bandit12 2646 Jun 3 06:59 hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file data5.bin +data5.bin: POSIX tar archive (GNU) +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ tar -xf data5.bin +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls -l +total 40 +-rw-r--r-- 1 bandit12 bandit12 10240 Apr 10 14:22 data5.bin +-rw-r--r-- 1 bandit12 bandit12 222 Apr 10 14:22 data6.bin +-rw-rw-r-- 1 bandit12 bandit12 20480 Jun 3 07:24 decoded.tar +-rw-r----- 1 bandit12 bandit12 2646 Jun 3 06:59 hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file data6.bin +data6.bin: bzip2 compressed data, block size = 900k +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv data6.bin data6.bz2 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ bzip2 -d data6.bz2 +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls +data5.bin data6 data8.bin decoded.tar hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file data6 +data6: POSIX tar archive (GNU) +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ mv data8.bin data8.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ gzip -d data8.gz +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ ls +data5.bin data6 data8 decoded.tar hexdump.txt +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ file data8 +data8: ASCII text +bandit12@bandit:/tmp/aGFyZF90b19ndWVzcw==.LUb4ip$ cat data8 +``` + +14) -i flag stands for identity file which allows us to tell ssh which key to use for authentication. +```bash +bandit13@bandit:~$ ls +sshkey.private +bandit13@bandit:~$ ssh -i sshkey.private -p 2220 bandit14@localhost +bandit14@bandit:~$ cat /etc/bandit_pass/bandit14 +``` + +15) nc connects us to a service on port 30000, allowing us to send and recieve passwords. +```bash +bandit14@bandit:~$ nc localhost 30000 +``` + + diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-1/README.md b/Ishika/assignment_1/Shell/missing-semester-lec-1/README.md new file mode 100644 index 0000000..cd541d1 --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-1/README.md @@ -0,0 +1,171 @@ +# Missing Semester Lecture 1: The Shell + +## Solutions + +1) I already have Ubuntu installed on my system via dual boot. I can use the Ubuntu terminal on my Windows terminal which runs through WSL as well. I verified this by running the echo $SHELL command, which gave /bin/bash as output, confirming that I am working in the required shell. +```bash + ishika@LAPTOP-0EUE93V4:~$ echo $SHELL + /bin/bash +``` + +2) I used mkdir command to create a directory under /tmp named missing. + +```bash +ishika@LAPTOP-0EUE93V4:~$ mkdir /tmp/missing +``` +I confirmed it using ls command which gives the list of all the files/sub-directories present in a directory. +```bash +ishika@LAPTOP-0EUE93V4:~$ ls /tmp +missing +snap-private-tmp +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-apt-news.service-y2QpVy +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-esm-cache.service-wEACev +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-systemd-logind.service-XjpEJ3 +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-systemd-resolved.service-lykdnH +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-systemd-timesyncd.service-ykYq35 +systemd-private-e23e66b1b8c54d69a0ab9b9f7a0ad759-wsl-pro.service-MBlJDc +``` + +3) Touch command allows us to create empty files or update the relevant timestamps of a file. Using touch, we can update the access time(i.e. last time the file was read) and moodification time(i.e. when the contents of the file were last changed) to the present time. + + +4) +```bash +ishika@LAPTOP-0EUE93V4:~$ touch /tmp/missing/semester +``` +I confirmed it later using ls command. +```bash +ishika@LAPTOP-0EUE93V4:~$ ls /tmp/missing +semester +``` + +5) The latest version of Ubuntu corresponds the shell /bin/sh to the shell program dash, referred to as symbolic link between the two. The dash subshell does not allow History Expansion, which changes the behavior of '!', and is used as a shortcut to run previous commands. Since this feature does not work in dash subshell, using either single or double quotes does not cause any issues. In case of bash, we can simply use single quotes in place of double quotes. +```bash +ishika@LAPTOP-0EUE93V4:~$ echo '#!/bin/sh' > /tmp/missing/semester +ishika@LAPTOP-0EUE93V4:~$ echo "curl --head --silent https://missing.csail.mit.edu" >> /tmp/missing/semester +ishika@LAPTOP-0EUE93V4:~$ cat /tmp/missing/semester +#!/bin/sh +curl --head --silent https://missing.csail.mit.edu +``` + +I confirmed that I am on dash subshell using ls command. +```bash +ishika@LAPTOP-0EUE93V4:~$ ls -l /bin/sh +lrwxrwxrwx 1 root root 4 Mar 31 2024 /bin/sh -> dash +``` + +6) Executing the script under missing file shows: +```bash +ishika@LAPTOP-0EUE93V4:~$ /tmp/missing/semester +-bash: /tmp/missing/semester: Permission denied +ishika@LAPTOP-0EUE93V4:~$ ls -l /tmp/missing +total 4 +-rw-r--r-- 1 ishika ishika 61 Jun 1 08:26 semester +``` +As we can see, there is a - in place of x in the first column which gives us information about the permissions placed on a file or directory. The - in place of x indicates that we are not allowed to execute the script stored in semester. + + +7) Forcefully running sh interpreter to execute the script stored in semester gives: +```bash +ishika@LAPTOP-0EUE93V4:~$ sh /tmp/missing/semester +HTTP/2 200 +server: GitHub.com +content-type: text/html; charset=utf-8 +last-modified: Sat, 19 Apr 2025 16:35:21 GMT +access-control-allow-origin: * +etag: "6803d0c9-205d" +expires: Sun, 01 Jun 2025 12:06:12 GMT +cache-control: max-age=600 +x-proxy-cache: MISS +x-github-request-id: 527D:17B599:49773E:4D4B7C:683C3FDC +accept-ranges: bytes +age: 0 +date: Sun, 01 Jun 2025 12:28:43 GMT +via: 1.1 varnish +x-served-by: cache-bom4739-BOM +x-cache: HIT +x-cache-hits: 0 +x-timer: S1748780923.727788,VS0,VE299 +vary: Accept-Encoding +x-fastly-request-id: 447ce580bec995a3cfeb640c8fe48afc22ac15a0 +content-length: 8285 +``` + +Running the file semester semester under sh worked but simply executing it by writing the path didn't because a newly created file is non-executable by default, which we checked earlier by taking a look at the permissions using ls command. sh on the other hand is a command interpreter. It directly executes whatever argument is passed into it. Here is an excerpt from the manual of sh command: + +>It incorporates many features to aid interactive use and has the +>advantage that the interpretative language is common to both interactive and non-interactive use (shell +>scripts). That is, commands can be typed directly to the running shell or can be put into a file and the file +>can be executed directly by the shell. + +So, it does not matter if a file is executable or not. The sh command will execute it because that is what it is supposed to do. + + +8) chmod is used to make changes to the permissions of a file or directory. + + +9) We can use chmod to make a file executable. +```bash +ishika@LAPTOP-0EUE93V4:~$ chmod +x /tmp/missing/semester +ishika@LAPTOP-0EUE93V4:~$ /tmp/missing/semester +HTTP/2 200 +server: GitHub.com +content-type: text/html; charset=utf-8 +last-modified: Sat, 19 Apr 2025 16:35:21 GMT +access-control-allow-origin: * +etag: "6803d0c9-205d" +expires: Sun, 01 Jun 2025 12:06:12 GMT +cache-control: max-age=600 +x-proxy-cache: MISS +x-github-request-id: 527D:17B599:49773E:4D4B7C:683C3FDC +accept-ranges: bytes +age: 0 +date: Sun, 01 Jun 2025 13:07:50 GMT +via: 1.1 varnish +x-served-by: cache-bom4738-BOM +x-cache: HIT +x-cache-hits: 0 +x-timer: S1748783270.092229,VS0,VE196 +vary: Accept-Encoding +x-fastly-request-id: da5e66bf18c4cc88696139218aaa00c92fd799ac +content-length: 8285 +``` + +The change is permissions can be verified. +```bash +ishika@LAPTOP-0EUE93V4:~$ ls -l /tmp/missing/semester +-rwxr-xr-x 1 ishika ishika 61 Jun 1 08:26 /tmp/missing/semester +``` +The x's in the first columns shows that the file is now executable for the owner, groups and any user as well. + +To make sure our script is executed under sh interpreter, we wrote the shebang line into the file(#!bin/sh). The shebang is used to define the interpreter the script is supposed to run into. Whenever we are supposed to execute a file using shebang, we are telling the system to execute the script under the interpreter defined in the shebang only. + + +10) +```bash +ishika@LAPTOP-0EUE93V4:~$ curl --head --silent https://missing.csail.mit.edu | grep -i last-modified > last-modified.txt +ishika@LAPTOP-0EUE93V4:~$ cat last-modified.txt +last-modified: Sat, 19 Apr 2025 16:35:21 GMT +``` +This text file is by default created in the current directory, which happens to be my home directory and can be verified using pwd command. +```bash +ishika@LAPTOP-0EUE93V4:~$ pwd +/home/ishika +``` + + +11) The battery parameter is stored in the power_supply directory under the parent directory class in sys. +```bash +ishika@LAPTOP-0EUE93V4:~$ cat /sys/class/power_supply/BAT1/capacity +56 +``` + + + + + + + + + + diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/README.md b/Ishika/assignment_1/Shell/missing-semester-lec-2/README.md new file mode 100644 index 0000000..cd9a6bf --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/README.md @@ -0,0 +1,113 @@ +# Lecture -2 : Shell Tools and Scripting + +## Solutions + +1) To fulfill the given requirements, I used the following flags and options: + +- -l : to list the files and directories with the fields gievn in the sample output- permissions, link count, owner, group, size, timestamp, name from left to right respectively. +- -a : to show hidden files in the list as well +- -h : shows the sizes in human-readable formats(such as M, K etc.) +- -t : sorts the files on the basis of when they were created, beginning from the recent ones +- --color=auto : shows the output in colorized format, but only when the output stream, i.e. STOUT is directed towards the terminal. + +```bash +ishika@LAPTOP-0EUE93V4:~$ ls -laht --color=auto +total 56K +drwxr-x--- 8 ishika ishika 4.0K Jun 1 19:36 . +drwxr-xr-x 2 ishika ishika 4.0K Jun 1 19:36 three +drwxr-xr-x 2 ishika ishika 4.0K Jun 1 19:36 two +drwxr-xr-x 2 ishika ishika 4.0K Jun 1 19:36 one +-rw------- 1 ishika ishika 1.8K Jun 1 14:25 .bash_history +drwx------ 2 ishika ishika 4.0K Jun 1 14:12 .ssh +-rw-r--r-- 1 ishika ishika 46 Jun 1 13:26 last-modified.txt +-rw------- 1 ishika ishika 20 Jun 1 13:06 .lesshst +-rw-r--r-- 1 ishika ishika 0 May 31 14:50 .hushlogin +-rw-r--r-- 1 ishika ishika 0 May 31 14:49 .motd_shown +drwx------ 2 ishika ishika 4.0K May 31 14:49 .cache +drwxr-xr-x 2 ishika ishika 4.0K May 31 14:49 .landscape +-rw-r--r-- 1 ishika ishika 220 May 31 14:48 .bash_logout +-rw-r--r-- 1 ishika ishika 3.7K May 31 14:48 .bashrc +-rw-r--r-- 1 ishika ishika 807 May 31 14:48 .profile +drwxr-xr-x 3 root root 4.0K May 31 14:48 .. +``` + +2) Bash script written in marco.sh: + ```bash + #!/bin/sh + + marco(){ + touch /tmp/save_directory + pwd > /tmp/save_directory + } + + polo(){ + cd "$(cat /tmp/save_directory)" + } + ``` + +Example of execution: +```bash +ishika@LAPTOP-0EUE93V4:~$ mkdir one two three +ishika@LAPTOP-0EUE93V4:~$ mkdir one/here +ishika@LAPTOP-0EUE93V4:~$ source '/mnt/c/Users/Lenovo/OneDrive/Desktop/dsg assignment 1/marco.sh' +ishika@LAPTOP-0EUE93V4:~$ chmod +x '/mnt/c/Users/Lenovo/OneDrive/Desktop/dsg assignment 1/marco.sh' +ishika@LAPTOP-0EUE93V4:~$ cd one/here +ishika@LAPTOP-0EUE93V4:~/one/here$ marco +ishika@LAPTOP-0EUE93V4:~/one/here$ cd ~ +ishika@LAPTOP-0EUE93V4:~$ cd three +ishika@LAPTOP-0EUE93V4:~/three$ polo +ishika@LAPTOP-0EUE93V4:~/one/here$ +``` + +Here I created a file save_directory in the /tmp directory. I gave it an absolute path so the file can be accessed outside the function definition as well. This file saves the path of the directory we are currently in which is later used to cd back into this directory upon calling polo. + +3) The script to check the given script can be written as: + ```bash + #!/usr/bin/env bash + + count=0 + + while true; do + ./script_q3.sh >> stdout_file 2>> stderr_file + if [[ $? -ne 0 ]]; then + break + fi + ((count++)) + done + + echo "The standard output is:" + cat stdout_file + + echo "The standard error is:" + cat stderr_file + + echo "The error count is $count" + ``` + +4) +```bash +$find . -type f -name "*.html" | xargs -d '\n' zip file.zip +``` + +This command looks for all the files with html extension, pipes it into xargs command which zips all these files into file.zip. The -d flag with xargs is used to handle filenames with specifications, in this case spaces. + +5) To look for the most recent file in a directory, this can be done simply as: +```bash +$ ls -lt | head -n 2 | tail -n 1 +``` +As for looking up the entire filesystem to find the most recent modified file, I tried something along the logic of: +```bash +$find / -type f 2>/dev/null | xargs stat -c '%Y %n' | sort -n | tail -n 1 +``` +which seems logically intact but does not work as we want it to. I could not figure it out before the deadline. + + + + + + + + + + + diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/marco.sh b/Ishika/assignment_1/Shell/missing-semester-lec-2/marco.sh new file mode 100644 index 0000000..a89a544 --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/marco.sh @@ -0,0 +1,10 @@ +#!/bin/sh + +marco(){ + touch /tmp/save_directory + pwd > /tmp/save_directory +} + +polo(){ + cd "$(cat /tmp/save_directory)" +} \ No newline at end of file diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/script_checker_q3.sh b/Ishika/assignment_1/Shell/missing-semester-lec-2/script_checker_q3.sh new file mode 100644 index 0000000..2bf6653 --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/script_checker_q3.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash + +count=0 + +while true; do + ./script_q3.sh >> stdout_file 2> stderr_file + if [[ $? -ne 0 ]]; then + break + fi + ((count++)) +done + +echo "The standard output is:" +cat stdout_file + +echo "The standard error is:" +cat stderr_file + +echo "The error count is $count" diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/script_q3.sh b/Ishika/assignment_1/Shell/missing-semester-lec-2/script_q3.sh new file mode 100644 index 0000000..c8335cb --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/script_q3.sh @@ -0,0 +1,11 @@ +#!/usr/bin/env bash + +n=$(( RANDOM % 100 )) + +if [[ n -eq 42 ]]; then + echo "Something went wrong" + >&2 echo "The error was using magic numbers" + exit 1 +fi + +echo "Everything went according to plan" \ No newline at end of file diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/stderr_file b/Ishika/assignment_1/Shell/missing-semester-lec-2/stderr_file new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/stderr_file @@ -0,0 +1 @@ + diff --git a/Ishika/assignment_1/Shell/missing-semester-lec-2/stdout_file b/Ishika/assignment_1/Shell/missing-semester-lec-2/stdout_file new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/Ishika/assignment_1/Shell/missing-semester-lec-2/stdout_file @@ -0,0 +1 @@ + diff --git a/Ishika/probabilty-and-statistics/assignment_1.ipynb b/Ishika/probabilty-and-statistics/assignment_1.ipynb new file mode 100644 index 0000000..cb5c8d0 --- /dev/null +++ b/Ishika/probabilty-and-statistics/assignment_1.ipynb @@ -0,0 +1,248 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b2680569", + "metadata": {}, + "source": [ + "## Probability and Statistics : Assignment - 1\n" + ] + }, + { + "cell_type": "markdown", + "id": "e47aaad1", + "metadata": {}, + "source": [ + "### Problem 1\n", + "It is given that $Y = |Z|$, where $Z \\sim \\mathcal{N}(0, 1)$, i.e. $\\mu = 0 ~\\text{and}~ \\sigma^2 = 1$. \n", + "\n", + "$$f_z(z)=\\frac{1}{\\sqrt{2\\pi}}e^{-\\frac{z^2}{2}}~ \\text{for}~ z \\in (-\\infty, \\infty) $$\n", + "\n", + "\\begin{align*}\n", + "F_Y(y) &= P(Y \\le y) \\\\\n", + " &= P(|Z|\\le y)\\\\\n", + " &= P(-y \\le |Z| \\le y)\\\\\n", + " &= F_Z(y)-F_Z(-y)\n", + "\\end{align*}\n", + "\n", + "Hence, \n", + "\\begin{align*}\n", + "\\frac {dF_Y(y)}{dy} &= \\frac {d(F_Z(y)- F_Z(-y))}{dy} \\\\\n", + " &= f_z(y) - f_z(-y)\\\\\n", + " &= 2*\\frac{1}{\\sqrt{2\\pi}}e^{-\\frac{y^2}{2}}\\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}}e^{-\\frac{y^2}{2}}, \\text{for}~ y \\ge 0; ~ 0 ~ \\text{otherwise} \n", + "\\end{align*} \n", + "\n", + "Now, \n", + "$$\n", + "\\begin{align*}\n", + "\\mathbb{E}[Y]&= \\int_{-\\infty}^{+\\infty}yf_Y(y)dy\\\\\n", + " &= \\int_{0}^{+\\infty}y\\sqrt{\\frac{2}{\\pi}}e^{-\\frac{y^2}{2}}dy\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "Let $\\frac{y^2}{2}=t \\Rightarrow ydy = dt$\n", + "\n", + "\\begin{align*}\n", + " \\mathbb{E}[Y] &=~ \\sqrt{\\frac{2}{\\pi}} \\int_{0}^{+\\infty}e^{-t}dt\\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}}(-e^{-t})|_0^\\infty\\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}}\n", + "\\end{align*}\n", + "\n", + "We know, $\\mathrm{Var}(Y) =\\mathbb{E}[Y^2]-(\\mathbb{E}[Y])^2 $\n", + "\n", + "MGF of Y is given as:\n", + "\\begin{align*}\n", + " \\mathbb{E}(e^{ty}) &= \\int_0^\\infty e^{ty}f_Y(y)dy\\\\\n", + " &= \\sqrt \\frac{2}{\\pi}\\int_0^\\infty e^{ty}e^{\\frac{-y^2}{2}}dy\\\\\n", + " &= \\sqrt \\frac{2}{\\pi}\\int_0^\\infty e^{-\\frac{1}{2}(y^2 -2ty)} dy\\\\\n", + " &= \\sqrt \\frac{2}{\\pi} e^\\frac{t^2}{2} \\int_0^\\infty e^{\\frac{-1}{2}(y-t)^2}dy\\\\\n", + " &= \\sqrt \\frac{2}{\\pi} e^\\frac{t^2}{2} \\int_{-t}^\\infty e^{\\frac{-1}{2}k^2}dk ~ \\text{, taking $y -t=k \\Rightarrow dy=dk$}\n", + "\\end{align*}\n", + "\n", + "MGF is directly related to different moments of Y as:\n", + "$$\n", + "\\begin{align*}\n", + "\\left. \\frac{d^k M_Y(t)}{dt^k} \\right|_{t=0} &= \\mathbb{E}[Y^k] \\\\\n", + "\\mathbb{E}[Y^2] &= \\left. \\frac{d^2 M_Y(t)}{dt^2} \\right|_{t=0} \\\\\n", + " &= \\left. \\frac{d}{dt} \\left( \\frac{dM_Y(t)}{dt} \\right) \\right|_{t=0} \\\\\n", + "\\frac{dM_Y(t)}{dt} &= \\sqrt{\\frac{2}{\\pi}} \\left( 1 + t \\cdot \\int_{-t}^\\infty e^{-\\frac{1}{2}k^2} \\, dk \\right) \\\\\n", + "\\mathbb{E}[Y^2] &= \\left. \\frac{d}{dt} \\left( \\frac{dM_Y(t)}{dt} \\right) \\right|_{t=0} \\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}} \\int_{0}^\\infty e^{-\\frac{1}{2}k^2} \\, dk \\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}} \\int_{-\\infty}^\\infty \\frac{1}{2} e^{-\\frac{1}{2}k^2} \\, dk \\\\\n", + " &= \\sqrt{\\frac{2}{\\pi}} \\cdot \\frac{1}{2} \\cdot \\sqrt{2\\pi} \\\\\n", + " &= 1\n", + "\\end{align*}\n", + "\n", + "$$\n", + "\n", + "Similarly, \n", + "$$\n", + "\\begin{align*}\n", + "\\mathbb{E}[Y^3] &= \\left. \\frac{d^3 M_Y(t)}{dt^3} \\right|_{t=0} \\\\\n", + "\\mathbb{E}[Y^3] &= \\left. \\frac{d^2}{dt^2} \\left( \\frac{dM_Y(t)}{dt} \\right) \\right|_{t=0} \\\\\n", + "\\frac{d^2 M_Y(t)}{dt^2} &= \\sqrt{\\frac{2}{\\pi}} \\left( 1 + t^2 u + u \\right), \\quad \\text{where } u = e^{\\frac{t^2}{2}} \\int_{-t}^\\infty e^{-\\frac{1}{2}k^2} \\, dk \\\\\n", + "\n", + "\\text{Hence,} \\\\\n", + "\\mathbb{E}[Y^3] &= \\left. \\frac{d^3 M_Y(t)}{dt^3} \\right|_{t=0} = \\sqrt{\\frac{2}{\\pi}}(1 + u') \\\\\n", + " &= 2 \\sqrt{\\frac{2}{\\pi}}\n", + "\n", + "\\end{align*}\n", + "\\\\\n", + "$$\n", + "$Var(Y) = \\mathbb{E}[Y^2] - (\\mathbb{E}[Y])^2 = 1 - \\frac{2}{\\pi}$\n", + "\n", + "\n", + "### Problem 2\n", + "\n", + "$X \\sim \\text{Poisson}(\\lambda)$ and g is any functiom for which expectation exists.\n", + "As per LOTUS,\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "\\mathbb{E}(Xg(X)) &= \\sum_{x=0}^{\\infty} xg(x)p_X(x) \\\\\n", + "\\text{where } p_X(x) &= \\frac{e^{-\\lambda} \\lambda^x}{x!}, \\quad x \\in \\{0,1,\\dots\\} \\\\\n", + "&= \\sum_{x=1}^{\\infty} g(x) \\frac{e^{-\\lambda} \\lambda^x x}{x!} \\\\\n", + "&= \\sum_{x=1}^{\\infty} g(x) \\frac{e^{-\\lambda} \\lambda^x}{(x-1)!} \\qquad \\left( \\because \\text{at } x=0,~\\mathbb{E}(Xg(X)) = 0 \\right) \\\\\n", + "\\text{Let } x - 1 &= t \\\\\n", + "\\Rightarrow \\mathbb{E}(Xg(X)) &= \\sum_{t=0}^{\\infty} g(t+1) \\frac{e^{-\\lambda} \\lambda^{t+1}}{t!} \\\\\n", + "&= \\lambda \\sum_{t=0}^{\\infty} g(t+1) \\frac{e^{-\\lambda} \\lambda^t}{t!} \\\\\n", + "&= \\lambda \\mathbb{E}(g(t+1)) \\\\\n", + "\\\\\n", + "\\mathbb{E}(X^3) &= \\lambda \\mathbb{E}[(X+1)^2] \\\\\n", + "&= \\lambda \\mathbb{E}[X^2 + 2X + 1] \\\\\n", + "&= \\lambda \\left( \\mathbb{E}(X^2) + 2 \\mathbb{E}(X) + 1 \\right) \\\\\n", + "\\text{We know:} \\quad \\text{Var}(X) &= \\mathbb{E}(X^2) - [\\mathbb{E}(X)]^2 \\\\\n", + "\\Rightarrow \\mathbb{E}(X^2) &= \\lambda + \\lambda^2 \\\\\n", + "\\text{Taking } g(X) = X^2 &\\text{ is valid because for Poisson, all moments exist} \\\\\n", + "\\mathbb{E}(X^3) &= \\lambda \\left[ \\lambda^2 + 2\\lambda + 1 \\right] \\\\\n", + "&= \\lambda^3 + 2\\lambda^2 + \\lambda\n", + "\\end{align*}\n", + "$$\n", + "\n", + "### Problem 3\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "T_1 &\\sim \\text{Expo}(\\lambda_1) \\quad \\text{for student 1} \\\\\n", + "T_2 &\\sim \\text{Expo}(\\lambda_2) \\quad \\text{for student 2} \\\\\n", + "\\end{align*}\n", + "$$\n", + "\n", + "where $T_1$ and $T_2$ are random variables for the time when the students start losing focus, respectively.\n", + "\n", + "For exponential distribution, \n", + "\n", + "$$\n", + "f_X(x)=\\lambda e^{-\\lambda x} ~\\text{for} ~x > 0, ~\\lambda > 0~;~ 0~\\text{otherwise}\n", + "$$\n", + "Now, \n", + "$$\n", + "\n", + "\\begin{align*}\n", + "P(T_1 < T_2) &= \\int_0^\\infty \\int_{t_1}^\\infty f_{T_1}(t_1) f_{T_2}(t_2) \\, dt_2 \\, dt_1 ~(\\because T_1~, T_2 ~\\text{are independent})\\\\\n", + "&= \\int_0^\\infty \\int_{t_1=0}^{t_2} \\lambda_1 e^{-\\lambda_1 t_1} \\cdot \\lambda_2 e^{-\\lambda_2 t_2} \\, dt_2 \\, dt_1 \\\\\n", + "&= \\lambda_1 \\lambda_2 \\int_0^\\infty e^{-\\lambda_2 t_2} \\int_{0}^{t_2} e^{-\\lambda_1 t_1} dt_1 dt_2 \\\\\n", + "&= \\lambda_2 \\int_0^\\infty e^{-\\lambda_2t_2} [1-e^{-\\lambda_1t_2}] dt_2 \\\\\n", + "&= -\\lambda_2[\\frac{1}{\\lambda_1+\\lambda_2}- \\frac{1}{\\lambda_2}] \\\\\n", + "&= \\frac{\\lambda_1}{\\lambda_1 + \\lambda_2}\n", + "\\end{align*}\n", + "\n", + "$$\n", + "I was trying to do this via the hint provided but got stuck in between. I first defined another random variable $Y=\\text{min}(T_1,T_2)$\n", + "and equated it to $T_1$ to find the probability for $T_1 < T_2$.\n", + "\n", + "\n", + "### Problem 4\n", + "\n", + "Let X be the message sent by Devansh. Z is the actual message recieved by Shree and Y is the noise that got added during transmission.\n", + "\n", + "$$\n", + "\\therefore X \\sim \\text{Bernoulli}(p=1/2)(\\text{assumed}) and \\quad Y \\sim \\mathcal{N}(0, \\sigma^2). Z = X + Y\n", + "$$\n", + "\n", + "Shree reads the message as **yes** if the value recieved is greater than 1/2, and as **no** if it is less than 1/2.\n", + "\n", + "(a) The probability that Shree reads the message correctly is given by,\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "P(Z > \\tfrac{1}{2}, X = 1) + P(Z < \\tfrac{1}{2}, X = 0) \n", + "&= P(Z > \\tfrac{1}{2} \\mid X = 1) P(X = 1) + P(Z < \\tfrac{1}{2} \\mid X = 0) P(X = 0) \\\\\n", + "&= P(X + Y > \\tfrac{1}{2}, X = 1) P(X = 1) + P(X + Y < \\tfrac{1}{2}, X = 0)P(X = 0) \\\\\n", + "&= P(Y > -\\tfrac{1}{2})P(X = 1) + P(Y < \\tfrac{1}{2})P(X = 0)\n", + "\\end{align*}\n", + "$$\n", + "\n", + "Y follows Normal distribution with $\\mu=0$ and $\\sigma^2$.\n", + "Since Normal distributions are symmetric about the mean $\\mu$ which is 0 in this case, hence:\n", + "$$\n", + "P(Y < 1/2)=P(Y> -1/2)\n", + "$$\n", + "\n", + "Now, \n", + "$$\n", + "\\begin{align*}\n", + "f_Y(y) &= \\frac{1}{\\sqrt{2\\pi}\\sigma} e^{-\\frac{1}{2} \\frac{y^2}{\\sigma^2}} \\\\\n", + "\\therefore \\quad P(Y < 1/2) &= \\frac{1}{\\sqrt{2\\pi} \\sigma} \\int_{-\\infty}^{1/2} e^{-\\frac{1}{2} \\frac{y^2}{\\sigma^2}} \\, dy \\\\\n", + " &= \\frac{1}{\\sqrt{2\\pi}} \\int_{-\\infty}^{1/(2\\sigma)} e^{-\\frac{1}{2} t^2} \\, dt \\\\\n", + " &= \\Phi \\left( \\frac{1}{2\\sigma} \\right)\n", + "\n", + "\\end{align*}\n", + "$$\n", + "Hence P(correct interpretation)= 0.5*P(Y < 1/2) + 0.5*P(Y> -1/2) = $\\Phi \\left( \\frac{1}{2\\sigma} \\right)$\n", + "\n", + "(b) When $\\sigma$ is very small, \n", + "\n", + "$$\n", + "\\lim_{\\sigma \\to 0} \\Phi\\left( \\frac{1}{2\\sigma} \\right) = \\Phi(\\infty) \\approx 1\n", + "$$\n", + "\n", + "which is expected as a very small $\\sigma$ implies almost negligible noise as it is mostly clustered around the mean 0. Thus their is a very high probability the message is correctly relied.\n", + "\n", + "When $\\sigma$ is very large,\n", + "$$\n", + "\\lim_{\\sigma \\to \\infty} \\Phi\\left( \\frac{1}{2\\sigma} \\right) = \\Phi(0) \\approx 1/2\n", + "$$\n", + "\n", + "which makes sense as large variance means that the noise is spread all over, thus affecting the message. As a result, the message is correctly recieved only half of the time.\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "ef07819c", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/Ishika/probabilty-and-statistics/assignment_2.ipynb b/Ishika/probabilty-and-statistics/assignment_2.ipynb new file mode 100644 index 0000000..c373fc1 --- /dev/null +++ b/Ishika/probabilty-and-statistics/assignment_2.ipynb @@ -0,0 +1,382 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e6aca925", + "metadata": {}, + "source": [ + "## Probability and Statistics: Assignment 2\n", + "\n", + "\n", + "### Problem 1 - Bayesian Interpretation and Probability\n", + "\n", + "Let the events be defined as follows:\n", + "\n", + "H: *Event that Alex is expert, i.e. the hypothesis*\n", + "\n", + "E: *Event of hitting 3 bull's eye out of 5, i.e. the evidence*\n", + "\n", + "**Prior Belief** : It is the probability of an event before we get any new evidence. It is largely based on general data, existing knowledge, or just our intuition. It is used as a starting point to factor-in new evidence and update our belief.\n", + "\n", + "In our case, we are assuming that Alex is only 1% as good as he claims. \n", + "Hence,\n", + "$$\n", + "P(Expert)=P(H)=0.01 ~\\text{or} ~\\frac{1}{100}\n", + "$$\n", + "\n", + "**Likelihood** : The event of him hitting bull's eyes in a given number of trials follows simple Binomial distributiion $X\\sim \\mathcal{Binom(5, 0.7)}$ if he is an expert and is as good as he claims. Otherwise, the event is given by $X\\sim \\mathcal{Binom(5, 0.1)}$ when he is not an expert and his accuracy matches the general population. If Alex is an expert, the likelihood of him hitting 3 bull's eye out of 5 is,\n", + "$$\n", + "P(\\text{3 bull's eye out of 5 | Expert})=P(E | H)=\\binom{5}{3}*(0.7)^3*(0.3)^2 \\approx 0.3087 \\text{or} 30.87%\n", + "$$\n", + "\n", + "If he is not an expert, the same likelihood is given as,\n", + "$$\n", + "P(\\text{3 bull's eye out of 5 | Not an Expert})=P(E | H')=\\binom{5}{3}*(0.1)^3*(0.9)^2 \\approx 0.008 \\text{or} 0.8%\n", + "$$\n", + "\n", + "**Bayesian Update** : Bayes' theorem is given by\n", + "$$\n", + "P(H | E) = \\frac {P(E | H)P(H)}{P(E | H)P(H) + P(E | H')P(H')} \n", + "$$ \n", + "In our case,\n", + "\n", + "$$\n", + "P(H | E)= \\frac {\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{100}}{\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{100}+\\binom{5}{3}*(0.1)^3*(0.9)^2\\frac{99}{100}} \\approx 0.278\n", + "$$\n", + "\n", + "**Interpretation**\n", + "\n", + "- (a) The posterior, that is our updated belief given the new evidence is approximately **0.278**. \n", + "\n", + "- (b) Initially, our belief was that Alex is not as good at he claims and based on our intuition, we decided that the probability of him being an expert is only 1%. Based on the evidence presented to us, that Alex was able to hit 3 bull's eye out of 5, increased the odds of him being an expert upto 27.8%.\n", + "\n", + "If we look at this intuitively, we can say that getting 3 hits out of 5 while being an expert makes more sense than being able to do it otherwise. We can even see this numerically as well as the probability of him doing this as an expert is 30.87% compared to the very less 0.8% when we consider he is not.\n", + "\n", + "- (c) If we change our prior belief to 20%, then $ P(H) = 0.2$\n", + "$$\n", + "\\therefore P(H | E) = \\frac {\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{5}}{\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{5}+\\binom{5}{3}*(0.1)^3*(0.9)^2*\\frac{4}{5}} \\approx 0.905 ~\\text{or}~ 90.5%\n", + "$$\n", + "\n", + "This drastically increased the probability that Alex is an expert given the evidence. Although our assumption was only 1% initially, it was sufficient to increase the belief upto 27.8% as our evidence was strong enough to favour the condition that Alex is an expert. Now, when we started off with a considerable assumption already, that increased the belief upto 90.5%. In this case, the prior strongly influences the result and the posteriori is highly sensitive to any changes made in it. \n", + "\n", + "### Problem 2: Estimating an Exponential Rate from Truncated data\n", + "\n", + "**Part A**\n", + "\n", + "Since $T_i$ follows exponential distribution with rate $\\lambda$, its pdf is given as:\n", + "$$\n", + "f_X(x) = \n", + "\\begin{cases}\n", + "\\lambda e^{-\\lambda x}, & x \\geq 0 \\\\\n", + "0, & x < 0\n", + "\\end{cases}\n", + "$$\n", + "Since we are only considering $T_j$ for $T \\geq 10$, the pdf is given as:\n", + "$$\n", + "\\begin{align*}\n", + "f_{T | T \\geq 10}(t) &= \\frac{f_{T}(t)}{P(T \\geq 10)}~(\\text{upon normalizing pdf over the new domain, so that integration of the pdf over the domain remains 1}) \\\\\n", + " &= \\frac{\\lambda e^{-\\lambda t}}{\\int_{10}^\\infty \\lambda e^{-\\lambda t } dt} = \\lambda e^{-\\lambda (t-10)}, ~t \\geq 10\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "From this, we get the likelihood function as\n", + "$$\n", + "L(\\lambda)= \\prod_{i=1}^n \\lambda e^{-\\lambda (t_i-10)} \n", + "$$\n", + "\n", + "The log-likelihood is given as:\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "l(\\lambda) &= \\log(L(\\lambda)) \\\\\n", + " &= \\log(\\lambda^n \\prod_{i=1}^n e^{-\\lambda (t_i-10)}) \\\\\n", + " &= n\\log(\\lambda) -\\lambda \\sum_{i=1}^n (t_i-10)\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "For MLE,\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "\\frac{dl(\\lambda)}{d\\lambda} &= 0 \\Rightarrow~ \\frac{n}{\\lambda}-\\sum_{i=1}^n (t_i-10)=0 \\Rightarrow~ \\hat{\\lambda}=\\frac{n}{\\sum_{i=1}^n (t_i-10)}\n", + "\\end{align*}\n", + "$$\n", + "\n", + "- Not using truncation while finding estimate means we are taking into account the part as well which has no data points. As a result, the estimate for $\\lambda$ would have been lower than expected, thus giving a wrong account of the data we have.\n", + "\n", + "- For our MLE, we have taken the assumption that the waiting times which are greater than 10 minutes are always taken into consideration. In this case, missing data points for t>10 will lead to wrong results. This could probably be sorted by conditioning our pdf or rather modifying it to take into account the randomness of the device as well. We could do the same for the likelihood function too maybe, whichever makes the process smoother and efficient.\n", + "\n", + "**Part B**\n", + "\n", + "Since we have assumed Gamma prior on $\\lambda$, then $\\lambda \\sim \\text{Gamma}(\\alpha, \\beta)$.\n", + "\n", + "$$\n", + "\\therefore p(\\lambda) = \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} x^{\\alpha - 1} e^{-\\beta x}, ~ x > 0\n", + "$$\n", + "\n", + "Thus the posterior (unnormalized) is given as:\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "p(\\lambda | \\text{data}) &= p(\\text{data} | \\lambda).p(\\lambda) \\\\\n", + " &= \\lambda^n \\prod_{i=1}^n e^{-\\lambda (t_i-10)}.\\lambda^{\\alpha - 1} e^{-\\beta \\lambda} \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)}\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "Log-posterior is given as:\n", + "\n", + "$$\n", + "\\log(p(\\lambda | \\text{data})) = n\\log(\\lambda) -\\lambda \\sum_{i=1}^n (t_i-10) + (\\alpha -1)\\log(\\lambda) -\\beta\\lambda + k\n", + "$$\n", + "\n", + "To maximise this, we need to equate its derivative to 0.\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "\\frac{d\\log(p(\\lambda | \\text{data}))}{d\\lambda} &= 0 \\Rightarrow \\frac{n}{\\lambda}-\\sum_{i=1}^n (t_i-10) + \\frac{(\\alpha -1)}{\\lambda} - \\beta = 0 \\Rightarrow \\hat{\\lambda}=\\frac{n+\\alpha-1}{\\sum_{i=1}^n (t_i-10) + \\beta}\n", + "\\end{align*}\n", + "$$\n", + "\n", + "- When we got the result using MAP, factors such as $\\alpha$ and $\\beta$ are part of the result too. These are influenced by the data we already have prior to when we started observing whereas for MLE, the result was purely based on the observations we made. \n", + " \n", + "MLE and MAP can significantly differ in cases when the sample size is small. We saw this in the earlier question as well. Due to small sample size, the result is strongly prior driven and consequently, highly sensitive to it as well. \n", + "\n", + "**Interpretation**\n", + "- $\\lambda$ calculated using pure MLE is based only on the observed data. Thus a single outlier can give drastically different result, which is often not an accurate representation of how things actually are. This can be understood through the difference between climate and weather.Suppose you visit a new city planning to buy a property there, but it is raining on that day so you decide against it, assuming its always rainy. This is similar to MLE where the decision is made upon immediate observations.\n", + "On the other hand, climate gives the trend of the weather in that place over the years, thus helping in your decision-making by giving a more reliable data. This is similar to MAP, which takes prior data into account as well which might lead to more holistic results. \n", + "Whether to use MAP and MLE is mostly case-dependent and both have their uses. \n", + "In our case, adding the prior gives a more holistic view of the bus arrivals, thus the $\\lambda$ obtained this way can be applied to the overall trends as well.\n", + "\n", + "- Say that on a particular day, a huge rally is passing through the main roads. This can suddenly increase the bus arrival times for that day due to traffic and delays, leading to skewed estimate found using MLE. In this case, MAP gives a better estimate of $\\lambda$ over MLE as we are taking into account what we already know about the usual arrival times and a single outlier does not affect the estimate drastically\n", + "\n", + "\n", + "\n", + "### Problem 4: Hypothesis Testing with Known Variance\n", + "\n", + "The filling amount is defined by a Normal distribution with $\\mu = 500$ grams and $\\sigma = 10$ grams. To check if our machine is correctly calibrated, we formulated our hypotheses.\n", + "\n", + "**Null Hypothesis, $H_0$** : $\\mu = 500$, i.e. the machine is correctly calibrated and $\\mu$ matches the initial value\n", + "**Alternative Hypothesis, $H_1$** : $\\mu \\neq 500$\n", + "\n", + "Upon sampling on n=16 bags, we get $\\bar{X} = 504$ grams.\n", + "\n", + "For known variance, the test statistic $Z$ for $\\mu$ is given as:\n", + "\n", + "$$\n", + "Z = \\frac{\\bar{X}-\\mu_0}{\\sigma / \\sqrt{n}} = \\frac{504-500}{10 / 4}= 1.6\n", + "$$\n", + "\n", + "For significance $\\alpha = 0.05$, \n", + "\n", + "$$\n", + "z_{\\frac{\\alpha}{2}}=z_{0.025}=1.96\n", + "$$\n", + "\n", + "The acceptance region for such cases with two-sided testing and known variance is given by $|Z| \\leq z_{\\frac{\\alpha}{2}} $\n", + "\n", + "Since our test statistic safely fits this condition, we can say that we failed to reject the Null Hypothesis, as the data given to us is not strong enough to reject it and prove that the machine is miscalibrated. \n", + "\n", + "**Interpretation**\n", + "- Rejecting $H_0$ implies that the data we obtained from sampling was sufficient to conclude the same. In hypothesis testing, this means that our Test Statistic lies in the rejection region of the distribution defined by $\\alpha$, i.e. the level of significance. $\\alpha$ represents the probability of rejecting the Null Hypothesis when it is actually true. If our test statistic lies beyond the acceptable region defined by this $\\alpha$, it means we have enough evidence to reject the Null Hypothesis, and the result supports the alternative. \n", + "\n", + "\n", + "​In case we fail to reject $H_0$, it means the data we collected from samples was not enough to reject it. The point of Hypothesis Testing is to see if the evidence we have is sufficient to doubt the Null Hypothesis. In the first case, we said that we can reject $H_0$ given the evidence since it strongly goes against it, but in this case our conclusion is that the Null Hypothesis cannot be rejected as the evidence is not strong enough to outright deny it.\n", + "\n", + "- In a smaller sample, some random factors get added into our data which influence the result, leading to errors. Larger number of samples help reducing the effect of these factors and provide a better basis to reject $H_0$ when it is actually false, thus improving the power of the test.\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "d51c70ab", + "metadata": {}, + "source": [ + "## Probability and Statistics: Assignment 2\n", + "\n", + "\n", + "### Problem 1 - Bayesian Interpretation and Probability\n", + "\n", + "Let the events be defined as follows:\n", + "\n", + "H: *Event that Alex is expert, i.e. the hypothesis*\n", + "\n", + "E: *Event of hitting 3 bull's eye out of 5, i.e. the evidence*\n", + "\n", + "**Prior Belief** : It is the probability of an event before we get any new evidence. It is largely based on general data, existing knowledge, or just our intuition. It is used as a starting point to factor-in new evidence and update our belief.\n", + "\n", + "In our case, we are assuming that Alex is only 1% as good as he claims. \n", + "Hence,\n", + "$$\n", + "P(Expert)=P(H)=0.01 ~\\text{or} ~\\frac{1}{100}\n", + "$$\n", + "\n", + "**Likelihood** : The event of him hitting bull's eyes in a given number of trials follows simple Binomial distributiion $X\\sim \\mathcal{Binom(5, 0.7)}$ if he is an expert and is as good as he claims. Otherwise, the event is given by $X\\sim \\mathcal{Binom(5, 0.1)}$ when he is not an expert and his accuracy matches the general population. If Alex is an expert, the likelihood of him hitting 3 bull's eye out of 5 is,\n", + "$$\n", + "P(\\text{3 bull's eye out of 5 | Expert})=P(E | H)=\\binom{5}{3}*(0.7)^3*(0.3)^2 \\approx 0.3087 \\text{or} 30.87%\n", + "$$\n", + "\n", + "If he is not an expert, the same likelihood is given as,\n", + "$$\n", + "P(\\text{3 bull's eye out of 5 | Not an Expert})=P(E | H')=\\binom{5}{3}*(0.1)^3*(0.9)^2 \\approx 0.008 \\text{or} 0.8%\n", + "$$\n", + "\n", + "**Bayesian Update** : Bayes' theorem is given by\n", + "$$\n", + "P(H | E) = \\frac {P(E | H)P(H)}{P(E | H)P(H) + P(E | H')P(H')} \n", + "$$ \n", + "In our case,\n", + "\n", + "$$\n", + "P(H | E)= \\frac {\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{100}}{\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{100}+\\binom{5}{3}*(0.1)^3*(0.9)^2\\frac{99}{100}} \\approx 0.278\n", + "$$\n", + "\n", + "**Interpretation**\n", + "\n", + "- (a) The posterior, that is our updated belief given the new evidence is approximately **0.278**. \n", + "\n", + "- (b) Initially, our belief was that Alex is not as good at he claims and based on our intuition, we decided that the probability of him being an expert is only 1%. Based on the evidence presented to us, that Alex was able to hit 3 bull's eye out of 5, increased the odds of him being an expert upto 27.8%.\n", + "\n", + "If we look at this intuitively, we can say that getting 3 hits out of 5 while being an expert makes more sense than being able to do it otherwise. We can even see this numerically as well as the probability of him doing this as an expert is 30.87% compared to the very less 0.8% when we consider he is not.\n", + "\n", + "- (c) If we change our prior belief to 20%, then $ P(H) = 0.2$\n", + "$$\n", + "\\therefore P(H | E) = \\frac {\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{5}}{\\binom{5}{3}*(0.7)^3*(0.3)^2*\\frac{1}{5}+\\binom{5}{3}*(0.1)^3*(0.9)^2*\\frac{4}{5}} \\approx 0.905 ~\\text{or}~ 90.5%\n", + "$$\n", + "\n", + "This drastically increased the probability that Alex is an expert given the evidence. Although our assumption was only 1% initially, it was sufficient to increase the belief upto 27.8% as our evidence was strong enough to favour the condition that Alex is an expert. Now, when we started off with a considerable assumption already, that increased the belief upto 90.5%. In this case, the prior strongly influences the result and the posteriori is highly sensitive to any changes made in it. \n", + "\n", + "### Problem 2: Estimating an Exponential Rate from Truncated data\n", + "\n", + "**Part A**\n", + "\n", + "Since $T_i$ follows exponential distribution with rate $\\lambda$, its pdf is given as:\n", + "$$\n", + "f_X(x) = \n", + "\\begin{cases}\n", + "\\lambda e^{-\\lambda x}, & x \\geq 0 \\\\\n", + "0, & x < 0\n", + "\\end{cases}\n", + "$$\n", + "Since we are only considering $T_j$ for $T \\geq 10$, the pdf is given as:\n", + "$$\n", + "\\begin{align*}\n", + "f_{T | T \\geq 10}(t) &= \\frac{f_{T}(t)}{P(T \\geq 10)}~(\\text{upon normalizing pdf over the new domain, so that integration of the pdf over the domain remains 1}) \\\\\n", + " &= \\frac{\\lambda e^{-\\lambda t}}{\\int_{10}^\\infty \\lambda e^{-\\lambda t } dt} = \\lambda e^{-\\lambda (t-10)}, ~t \\geq 10\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "From this, we get the likelihood function as\n", + "$$\n", + "L(\\lambda)= \\prod_{i=1}^n \\lambda e^{-\\lambda (t_i-10)} \n", + "$$\n", + "\n", + "The log-likelihood is given as:\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "l(\\lambda) &= \\log(L(\\lambda)) \\\\\n", + " &= \\log(\\lambda^n \\prod_{i=1}^n e^{-\\lambda (t_i-10)}) \\\\\n", + " &= n\\log(\\lambda) -\\lambda \\sum_{i=1}^n (t_i-10)\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "For MLE,\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "\\frac{dl(\\lambda)}{d\\lambda} &= 0 \\Rightarrow~ \\frac{n}{\\lambda}-\\sum_{i=1}^n (t_i-10)=0 \\Rightarrow~ \\hat{\\lambda}=\\frac{n}{\\sum_{i=1}^n (t_i-10)}\n", + "\\end{align*}\n", + "$$\n", + "\n", + "- Not using truncation while finding estimate means we are taking into account the part as well which has no data points. As a result, the estimate for $\\lambda$ would have been lower than expected, thus giving a wrong account of the data we have.\n", + "\n", + "- For our MLE, we have taken the assumption that the waiting times which are greater than 10 minutes are always taken into consideration. In this case, missing data points for t>10 will lead to wrong results. This could probably be sorted by conditioning our pdf or rather modifying it to take into account the randomness of the device as well. We could do the same for the likelihood function too maybe, whichever makes the process smoother and efficient.\n", + "\n", + "**Part B**\n", + "\n", + "Since we have assumed Gamma prior on $\\lambda$, then $\\lambda \\sim \\text{Gamma}(\\alpha, \\beta)$.\n", + "\n", + "$$\n", + "\\therefore p(\\lambda) = \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} x^{\\alpha - 1} e^{-\\beta x}, ~ x > 0\n", + "$$\n", + "\n", + "Thus the posterior (unnormalized) is given as:\n", + "\n", + "$$\n", + "\\begin{align*}\n", + "p(\\lambda | \\text{data}) &= p(\\text{data} | \\lambda).p(\\lambda) \\\\\n", + " &= \\lambda^n \\prod_{i=1}^n e^{-\\lambda (t_i-10)}.\\lambda^{\\alpha - 1} e^{-\\beta \\lambda} \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)}\n", + " \n", + "\\end{align*}\n", + "$$\n", + "\n", + "Log-posterior is given as:\n", + "\n", + "$$\n", + "\n", + "$$\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "### Problem 4: Hypothesis Testing with Known Variance\n", + "\n", + "The filling amount is defined by a Normal distribution with $\\mu = 500$ grams and $\\sigma = 10$ grams. To check if our machine is correctly calibrated, we formulated our hypotheses.\n", + "\n", + "**Null Hypothesis, $H_0$** : $\\mu = 500$, i.e. the machine is correctly calibrated and $\\mu$ matches the initial value\n", + "**Alternative Hypothesis, $H_1$** : $\\mu \\neq 500$\n", + "\n", + "Upon sampling on n=16 bags, we get $\\bar{X} = 504$ grams.\n", + "\n", + "For known variance, the test statistic $Z$ for $\\mu$ is given as:\n", + "\n", + "$$\n", + "Z = \\frac{\\bar{X}-\\mu_0}{\\sigma / \\sqrt{n}} = \\frac{504-500}{10 / 4}= 1.6\n", + "$$\n", + "\n", + "For significance $\\alpha = 0.05$, \n", + "\n", + "$$\n", + "z_{\\frac{\\alpha}{2}}=z_{0.025}=1.96\n", + "$$\n", + "\n", + "The acceptance region for such cases with two-sided testing and known variance is given by $|Z| \\leq z_{\\frac{\\alpha}{2}} $\n", + "\n", + "Since our test statistic safely fits this condition, we can say that we failed to reject the Null Hypothesis, as the data given to us is not strong enough to reject it and prove that the machine is miscalibrated. \n", + "\n", + "**Interpretation**\n", + "- Rejecting $H_0$ implies that the data we obtained from sampling was sufficient to conclude the same. In hypothesis testing, this means that our Test Statistic lies in the rejection region of the distribution defined by $\\alpha$, i.e. the level of significance. $\\alpha$ represents the probability of rejecting the Null Hypothesis when it is actually true. If our test statistic lies beyond the acceptable region defined by this $\\alpha$, it means we have enough evidence to reject the Null Hypothesis, and the result supports the alternative. \n", + "\n", + "\n", + "​In case we fail to reject $H_0$, it means the data we collected from samples was not enough to reject it. The point of Hypothesis Testing is to see if the evidence we have is sufficient to doubt the Null Hypothesis. In the first case, we said that we can reject $H_0$ given the evidence since it strongly goes against it, but in this case our conclusion is that the Null Hypothesis cannot be rejected as the evidence is not strong enough to outright deny it.\n", + "\n", + "- In a smaller sample, some random factors get added into our data which influence the result, leading to errors. Larger number of samples help reducing the effect of these factors and provide a better basis to reject $H_0$ when it is actually false, thus improving the power of the test.\n", + "\n", + "\n", + "\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}