- Pre LLM time period: December 1st 2018 - September 30th 2020
- Post LLM time period: December 1st 2022 - September 30th 2024
- SQL queries
- Average number of posts
- Average number of posts per day
- Average length of body of post
- Average number of posts per hour of the day
- Number of questions posted
- Average score per question
- Average view count per question
- Average number of reply's per post
- Number of post reply's
- The average of the number of reply's per day
- Average length of answer
- Average number of votes per top answer
- Average number of accounts created
- Number of active users
- Average number of dailey users
- Record the preferred response
- Record the number of verbs vs adjective used per response.
- Determine which topics/coding languages are most searched/tagged in posts.
- Count the number of tags accosiated to each topic.
- A Post is consider to be anything "posted" to Stack Overflow. This includes questions, answers, comments, and forum discussions.
- A Question is consider to be a Post that has no parentId. Therefor is it a question being asked which marks the beginning of a dicussion. It is not in reply to another Post
- An Answer is consider to be a Post that has a valid parentId. Therefor when a user engages in a discussion they are considered to be replying to a Post of type Question.
- Install
python 3.9 - Create virtual environment:
python3 -m venv venv
- Install requirements
pip install -r requirements.txt
- Run the Pre vs Post LLM graph generation seperately
- Pre LLM:
python3 -m src.pre_LLM
- Post LLM:
python3 -m src.post_LLM
- Comparison Graphs:
python3 -m src.comparison_LLM
- Pre LLM:
- (optional) Run all graph generation simultaniously.
python3 -m src.run_all
- (optional) Run pfd generation.
- Pre-LLM:
python3 -m src.pdf_generation.pre_llm_pdf_generation
- Post-LLM:
python3 -m src.pdf_generation.post_llm_pdf_generation
- To run project lint:
python3 -m black .