Skip to content

Commit

Permalink
Rework project structure, update arguments, update math.
Browse files Browse the repository at this point in the history
  • Loading branch information
Dmitry committed Nov 17, 2022
1 parent a60e775 commit ca478cd
Show file tree
Hide file tree
Showing 15 changed files with 864 additions and 82 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/tmp/
/test/.pytest_cache/
*.json
8 changes: 5 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
FROM python:3.8.6
FROM python:3.9.15

ADD src /src
ADD test /test
ADD calculate_ephemerality.py /
ADD ephemerality.py /
ADD requirements.txt /
ADD _version.py /
ADD setup.py /

RUN pip install -r requirements.txt

ENTRYPOINT ["python", "calculate_ephemerality.py"]
ENTRYPOINT ["python", "ephemerality.py"]
105 changes: 92 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,50 @@
# Ephemerality metric
In [[1]](#1) we formalized the ephemerality metrics used to estimate the healthiness of online discussions. It shows how 'ephemeral' topics are, that is whether the discussions are more or less uniformly active or only revolve around one or several peaks of activity.
In [[1]](#1) we formalized the ephemerality metrics used to estimate the healthiness of online discussions. It shows how
'ephemeral' topics are, that is whether the discussions are more or less uniformly active or only revolve around one or
several peaks of activity.

### Requirements
The code was tested to work with Python 3.8.6 and Numpy 1.20.3, but is expected to also run on their previous versions.
The code was tested to work with Python 3.8.6 and Numpy 1.21.5, but is expected to also run on their older versions.

## How to run the experiments
The code can be run directly via the calculate_ephemerality.py script or via a Docker container built with the provided Dockerfile.
The code can be run directly via the calculate_ephemerality.py script or via a Docker container built with the provided
Dockerfile.

### Input
The script/container expect the following input arguments:

* **Frequency vector file**. The file should contain a vector of numbers in csv format. It does not need to be normalized, if it is not --- it will be done automatically.
* **Output file**. Optional. If it is provided, the results will be written into this file in JSON format.
* **Frequency vector file**. `[-i PATH, --input PATH]` _Optional_. Path to a file containing one or several arrays of
numbers in csv format (one array per line), representing temporal frequency vectors. They do not need to be normalized:
if they are not --- they will be normalized automatically.
* **Frequency vector**. _Optional_. If input file is not provided, a frequency vector is expected as a positional
argument (either comma- or space-separated).
* **Output file**. `[-o PATH, --output PATH]` _Optional_. If it is provided, the results will be written into this file
in JSON format.
* **Threshold**. `[-t FLOAT, -threshold FLOAT]` _Optional_. Threshold value for ephemerality computations. Defaults
to 0.8.
* **Print**. `[-p, --print]`. _Optional_. If output file is provided, forces the results to still be printed to stdout.

### Output
The results are printed to STDOUT in **(ε<sub>2</sub>, ε<sub>4</sub>)** format. Additionally, if the output file was specified among the input arguments, the results will also be written into this file in JSON format.
If no output file specified or `-p` option is used, results are printed to STDOUT in **[ε<sub>orig</sub>
ε<sub>orig_span</sub> ε<sub>filtered</sub> ε<sub>filtered_span</sub> ε<sub>sorted</sub> ε<sub>sorted_span</sub>]**
format, one line per each line of input file (or a single line for command line input).

If the output file was specified among the input arguments, the results will be written into that file in JSON format as
a list of dictionaries, one per input line:

```
[
{
"ephemerality_original": FLOAT,
"ephemerality_original_span": INT,
"ephemerality_filtered": FLOAT,
"ephemerality_filtered_span": INT,
"ephemerality_sorted": FLOAT,
"ephemerality_sorted_span": INT
},
...
]
```

### Example

Expand All @@ -25,33 +55,82 @@ Input file `test_input.csv`:

#### Python execution:

Input 1:

```
python calculate_ephemerality.py ./test_input.csv ./test_output.json
python ephemerality.py -i tmp/test_input.csv -t 0.8 --output tmp/test_output.json -P
```

Output:
Output 1:
```
0.2, 0.25
0.1250000000000001 7 0.5 4 0.625 3
0.2500000000000001 3 0.5 2 0.5 2
```

`test_output.json` content:
```
{"ephemerality2": 0.2, "ephemerality4": 0.25}
[
{
"ephemerality_original": 0.1250000000000001,
"ephemerality_original_span": 7,
"ephemerality_filtered": 0.5,
"ephemerality_filtered_span": 4,
"ephemerality_sorted": 0.625,
"ephemerality_sorted_span": 3
},
{
"ephemerality_original": 0.2500000000000001,
"ephemerality_original_span": 3,
"ephemerality_filtered": 0.5,
"ephemerality_filtered_span": 2,
"ephemerality_sorted": 0.5,
"ephemerality_sorted_span": 2
}
]
```

Input 2:

```
python ephemerality.py 0.0 0.0 0.0 0.2 0.55 0.0 0.15 0.1 0.0 0.0 -t 0.5
```

Output 2:
```
0.0 5 0.8 1 0.8 1
```

#### Docker execution
```
docker run -a STDOUT -v [PATH_TO_FOLDER]/tmp/:/tmp/ ephemerality:0.1 /tmp/test_input.csv /tmp/test_output.json
docker run -a STDOUT -v [PATH_TO_FOLDER]/tmp/:/tmp/ ephemerality:1.0.0 -i /tmp/test_input.csv -o /tmp/test_output.json -t 0.5 -p
```

Output:
```
0.2, 0.25
0.0 5 0.8 1 0.8 1
0.19999999999999996 2 0.6 1 0.6 1
```

`test_output.json` content:
```
{"ephemerality2": 0.2, "ephemerality4": 0.25}
[
{
"ephemerality_original": 0.0,
"ephemerality_original_span": 5,
"ephemerality_filtered": 0.8,
"ephemerality_filtered_span": 1,
"ephemerality_sorted": 0.8,
"ephemerality_sorted_span": 1
},
{
"ephemerality_original": 0.19999999999999996,
"ephemerality_original_span": 2,
"ephemerality_filtered": 0.6,
"ephemerality_filtered_span": 1,
"ephemerality_sorted": 0.6,
"ephemerality_sorted_span": 1
}
]
```


Expand Down
1 change: 1 addition & 0 deletions _version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "1.0.0"
21 changes: 0 additions & 21 deletions calculate_ephemerality.py

This file was deleted.

89 changes: 89 additions & 0 deletions ephemerality.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
from _version import __version__
import sys
import json
import argparse
import numpy as np
from src import compute_ephemeralities


HELP_INFO = ""


def init_argparse() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
usage="%(prog)s [FREQUENCY_VECTOR] [-h] [-v] [-i INPUT_FILE] [-o OUTPUT_FILE.json] [-t THRESHOLD]...",
description="Calculate ephemerality for a given vector of frequencies."
)
parser.add_argument(
"-v", "--version", action="version",
version=f"{parser.prog} version {__version__}"
)
parser.add_argument(
"-p", "--print", action="store_true",
help="If output file is provided, forces the results to still be printed to stdout."
)
parser.add_argument(
"-i", "--input", action="store",
help="Path to the input csv file. If not specified, will use the command line arguments "
"(delimited either by commas or spaces)."
)
parser.add_argument(
"-o", "--output", action="store",
help="Path to the output json file. If not specified, will output ephemerality values to stdout in the"
" following format separated by a space: \"EPH_ORIG EPH_ORIG_SPAN EPH_FILT EPH_FILT_SPAN EPH_SORT "
"EPH_SORT_SPAN\""
)
parser.add_argument(
"-t", "--threshold", action="store", default=0.8,
help="Threshold value for ephemerality computations. Defaults to 0.8."
)
parser.add_argument(
'frequencies',
help='frequency vector (if the input file is not specified)',
nargs='*'
)
return parser


def print_ephemeralities(ephemerality_list: list[dict]):
for ephemeralities in ephemerality_list:
print(f"{ephemeralities['ephemerality_original']} {ephemeralities['ephemerality_original_span']} "
f"{ephemeralities['ephemerality_filtered']} {ephemeralities['ephemerality_filtered_span']} "
f"{ephemeralities['ephemerality_sorted']} {ephemeralities['ephemerality_sorted_span']}")


if __name__ == '__main__':
parser = init_argparse()
args = parser.parse_args()

frequency_vectors = list()

if args.input:
with open(args.input, 'r') as f:
for line in f.readlines():
if line.strip():
frequency_vectors.append(np.array(line.split(','), dtype=float))
else:
if len(args.frequencies) > 1:
frequency_vectors.append(np.array(args.frequencies, dtype=float))
elif len(args.frequencies) == 1:
if ' ' in args.frequencies[0]:
frequency_vectors.append(np.array(args.frequencies[0].split(' '), dtype=float))
else:
frequency_vectors.append(np.array(args.frequencies[0].split(','), dtype=float))
else:
sys.exit('No input provided!')

threshold = float(args.threshold)

ephemerality_list = list()
for frequency_vector in frequency_vectors:
ephemerality_list.append(compute_ephemeralities(frequency_vector=frequency_vector, threshold=threshold))

if args.output:
with open(args.output, 'w+') as f:
json.dump(ephemerality_list, f, indent=2)
if args.print:
print_ephemeralities(ephemerality_list)
else:
print_ephemeralities(ephemerality_list)
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
numpy==1.20.3
numpy==1.21.5
30 changes: 30 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import os
from setuptools import setup
import re

VERSION_FILE = "_version.py"
VERSION_REGEX = r"^__version__ = ['\"]([^'\"]*)['\"]"


def read(file_name):
return open(os.path.join(os.path.dirname(__file__), file_name)).read()


version_lines = open(VERSION_FILE, 'r').read()
match = re.search(VERSION_REGEX, version_lines, re.M)
if match:
version = match.group(1)
else:
raise RuntimeError("Unable to find version string in %s." % (VERSION_FILE,))

setup(
name='ephemerality',
version=version,
packages=['src', 'test'],
url='https://github.com/HPAI-BSC/ephemerality',
license='MIT',
author='HPAI BSC',
author_email='[email protected]',
description='Module for computing ephemerality metrics of temporal arrays.',
long_description=read('README.md')
)
4 changes: 2 additions & 2 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from src.ephemerality import compute_ephemerality_measures
from src.ephemerality_computation import compute_ephemeralities

__all__ = ['compute_ephemerality_measures']
__all__ = ['compute_ephemeralities']
Binary file added src/__pycache__/__init__.cpython-310.pyc
Binary file not shown.
Binary file added src/__pycache__/__init__.cpython-38.pyc
Binary file not shown.
Binary file not shown.
36 changes: 0 additions & 36 deletions src/ephemerality.py

This file was deleted.

Loading

0 comments on commit ca478cd

Please sign in to comment.