Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 33 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
&ensp;[<kbd> <br> Overview <br> </kbd>](#overview)&ensp;
&ensp;[<kbd> <br> Technologies <br> </kbd>](#technologies)&ensp;
&ensp;[<kbd> <br> Deploy <br> </kbd>](#deploy)&ensp;
&ensp;[<kbd> <br> EDA <br> </kbd>](#eda)&ensp;
&ensp;[<kbd> <br> Current state <br> </kbd>](#current-state)&ensp;
<br><br><br><br></div>

Expand All @@ -30,11 +31,12 @@
- [Development](#development)
- [Structure](#structure)
- [Adding new service](#adding-new-service)
- [Current state](#current-state)
- [EDA](#eda)
- [Distribution of political positions overall](#distribution-of-political-positions-overall)
- [Distribution of political positions in the USA](#distribution-of-political-positions-in-the-usa)
- [Distribution of political positions in the UK](#distribution-of-political-positions-in-the-uk)
- [Bonus: Distribution of political positions of sources which require VPN](#bonus-distribution-of-political-positions-of-sources-which-require-vpn)
- [Current state](#current-state)


## Overview
Expand Down Expand Up @@ -90,6 +92,8 @@ A NATS message queue which is used for S2S communication.
## Deploy

The initial version is available at https://data-wrangling-and-visualisation.github.io/DeBias/
The EDA is available at https://data-wrangling-and-visualisation.github.io/DeBias/
The draft Javascript visualization is available at https://debias.inno.dartt0n.ru/

### Using external S3 provider

Expand Down Expand Up @@ -181,7 +185,7 @@ To add new service:



## Current state
## EDA

We have collected 38 sources of news from USA and UK and found out their political positions.

Expand All @@ -201,4 +205,30 @@ It seems left parties are indeed more liberal.

We have parsed several news articles using python and prepared a deployment describing general trends in these articles.

The deployment can be found on [Github Pages](https://data-wrangling-and-visualisation.github.io/DeBias/)
The deployment can be found on [Github Pages](https://data-wrangling-and-visualisation.github.io/DeBias/)


## Current state

We have added a draft of our frontend visualization. It can be viewed in the **frontend** directory, in the **index.html** file.

For now we have not created connection with the backend, however the file respresent our vision of the final visualization: graph of connections between keywords, their corresponding themes and number of occurence.

The file can be opened as an html file, or py running the following script from the **frontend** directory:

```python
python3 -m http.server
```

We are also incorporating NLP into data analysis. We perform the following operations on the extracted websites data:

- Extract names entities from text: helps identify the most important keywords (people's names, countries, organizations). Performed with spacy.
- Find themes in the data: identify general theme of the text (politics, economics, etc.). Performed with transformers.

The collected keywords are then combined for future analysis.

Example of NLP preprocessing can be found in **debias/processor directory**, in **processor.py** file.

Deploy can be found at: https://debias.inno.dartt0n.ru/

We have added the functionality to filter by date, category, number of keyword occurences. The number of shown nodes can also be limited.
52 changes: 52 additions & 0 deletions experiments/frontend/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>News Article Keyword Network by Date</title>
<!-- Link to the external CSS file -->
<link rel="stylesheet" href="style.css">
<!-- Include D3.js library -->
<script src="https://d3js.org/d3.v7.min.js"></script>
</head>
<body>
<!-- Control Panel -->
<div id="controls">
<h3>Date Selection</h3>
<label for="date-select">Select Date:</label>
<select id="date-select"></select>

<h3>Filter by Category</h3>
<div id="category-filters">
<!-- Category checkboxes will be added by JS -->
<button id="select-all-cats">Select All</button>
<button id="deselect-all-cats">Deselect All</button>
</div>

<h3>Filter by Popularity</h3>
<label for="popularity-slider">Min Keyword Occurrences:</label>
<input type="range" id="popularity-slider" min="1" max="10" value="1" step="1">
<span id="popularity-value">1</span>
<label for="popularity-max-nodes">Max Nodes to Display:</label>
<input type="number" id="popularity-max-nodes" min="10" value="150" step="10">

<button id="reset-filters" style="margin-top: 15px;">Reset Filters</button>
</div>

<!-- Graph Container -->
<div id="graph-container">
<div id="graph"></div> <!-- SVG will be appended here by JS -->
<div class="tooltip"></div> <!-- Tooltip div -->
<div class="edge-info"></div> <!-- Edge info div -->
<div class="legend"></div> <!-- Legend div -->
<div id="no-data-message">No data available for the selected date and filters.</div> <!-- Message div -->
</div>

<!-- Loading Indicator -->
<div class="loading">Loading data...</div>

<!-- Link to the external JavaScript file -->
<!-- Place it at the end of body so DOM is loaded before script runs -->
<script src="script.js"></script>
</body>
</html>
Loading