Skip to content

This project aims to analyze Stack Overflow usage before and after the rise of Large Language Models (LLMs) to assess whether the increasing popularity of LLMs has impacted the usage of Stack Overflow.

Notifications You must be signed in to change notification settings

colton456p/Stack-Overflow-Data-Analysis

Repository files navigation

Research Study: Stack Overflow Usage Pre vs Post LLM

  • Pre LLM time period: December 1st 2018 - September 30th 2020
  • Post LLM time period: December 1st 2022 - September 30th 2024
  • SQL queries

Data research plan:

Posts:

  • Average number of posts
  • Average number of posts per day
  • Average length of body of post
  • Average number of posts per hour of the day

Questions:

  • Number of questions posted
  • Average score per question
  • Average view count per question

Answers:

  • Average number of reply's per post
  • Number of post reply's
  • The average of the number of reply's per day
  • Average length of answer
  • Average number of votes per top answer

Users:

  • Average number of accounts created
  • Number of active users
  • Average number of dailey users

Quality of responses:

  • Record the preferred response
  • Record the number of verbs vs adjective used per response.

Popularity of topics discussed

  • Determine which topics/coding languages are most searched/tagged in posts.
  • Count the number of tags accosiated to each topic.

Important study information to consider:

  • A Post is consider to be anything "posted" to Stack Overflow. This includes questions, answers, comments, and forum discussions.
  • A Question is consider to be a Post that has no parentId. Therefor is it a question being asked which marks the beginning of a dicussion. It is not in reply to another Post
  • An Answer is consider to be a Post that has a valid parentId. Therefor when a user engages in a discussion they are considered to be replying to a Post of type Question.

Installation guide:

  1. Install python 3.9
  2. Create virtual environment:
    python3 -m venv venv
  3. Install requirements
    pip install -r requirements.txt
  4. Run the Pre vs Post LLM graph generation seperately
    • Pre LLM:
      python3 -m src.pre_LLM
    • Post LLM:
      python3 -m src.post_LLM
    • Comparison Graphs:
      python3 -m src.comparison_LLM
  5. (optional) Run all graph generation simultaniously.
    python3 -m src.run_all
  6. (optional) Run pfd generation.
  • Pre-LLM:
    python3 -m src.pdf_generation.pre_llm_pdf_generation
  • Post-LLM:
    python3 -m src.pdf_generation.post_llm_pdf_generation

Linter

  • To run project lint:
    python3 -m black .

Code style: black

About

This project aims to analyze Stack Overflow usage before and after the rise of Large Language Models (LLMs) to assess whether the increasing popularity of LLMs has impacted the usage of Stack Overflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages