Skip to content

Commit

Permalink
Sunset pandas in favor of polars (#128)
Browse files Browse the repository at this point in the history
* Strip aggregations from dataframe creation

* Migrate visualizations

* Update dependencies

* Toggle between pandas/polars return type

* Auto convert pandas to polars if pandas provided
  • Loading branch information
joweich authored Dec 23, 2023
1 parent 0d5379f commit a559502
Show file tree
Hide file tree
Showing 9 changed files with 113 additions and 258 deletions.
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,9 @@
[![codecov](https://codecov.io/gh/joweich/chat-miner/branch/main/graph/badge.svg?token=6EQF0YNGLK)](https://codecov.io/gh/joweich/chat-miner)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

🌐
**English**
[Русский][RU]

[EN]:README.md
[RU]:README.ru.md

-----------------

**chat-miner** provides lean parsers for every major platform transforming chats into pandas dataframes. Artistic visualizations allow you to explore your data and create artwork from your chats.
**chat-miner** provides lean parsers for every major platform transforming chats into dataframes. Artistic visualizations allow you to explore your data and create artwork from your chats.


## 1. Installation
Expand All @@ -50,7 +43,7 @@ from chatminer.chatparsers import WhatsAppParser

parser = WhatsAppParser(FILEPATH)
parser.parse_file()
df = parser.parsed_messages.get_df()
df = parser.parsed_messages.get_df(as_pandas=True) # as_pandas=False returns polars dataframe
```
**Note:**
Depending on your source system, Python requires to convert the filepath to a raw string.
Expand Down
157 changes: 0 additions & 157 deletions README.ru.md

This file was deleted.

12 changes: 4 additions & 8 deletions chatminer/chatparsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from pathlib import Path
from typing import Any, Dict, List, Optional

import pandas as pd
import polars as pl
from dateutil import parser as datetimeparser
from tqdm import tqdm
from tqdm.contrib.logging import logging_redirect_tqdm
Expand All @@ -34,14 +34,10 @@ def __init__(self):
def append(self, mess: ParsedMessage):
self._parsed_messages.append(mess)

def get_df(self):
def get_df(self, as_pandas=False):
messages_as_dict = [asdict(mess) for mess in self._parsed_messages]
df = pd.DataFrame(messages_as_dict)
df["weekday"] = df["timestamp"].dt.day_name()
df["hour"] = df["timestamp"].dt.hour
df["words"] = df["message"].apply(lambda s: len(s.split(" ")))
df["letters"] = df["message"].apply(len)
return df
df = pl.DataFrame(messages_as_dict)
return df.to_pandas() if as_pandas else df

def write_to_json(self, file: str):
def serialize_message(mess: ParsedMessage):
Expand Down
2 changes: 1 addition & 1 deletion chatminer/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
def get_args():
try:
cliparser = argparse.ArgumentParser(
description="chat-miner provides lean parsers for every major platform transforming chats into pandas dataframes.\
description="chat-miner provides lean parsers for every major platform transforming chats into dataframes.\
Artistic visualizations allow you to explore your data and create artwork from your chats."
)

Expand Down
15 changes: 8 additions & 7 deletions chatminer/nlp.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
from typing import Optional

import pandas as pd
import polars as pl
from transformers import pipeline


def add_sentiment(df: pd.DataFrame, lang: str = "en") -> pd.DataFrame:
def add_sentiment(df: pl.DataFrame, lang: str = "en") -> pl.DataFrame:
"""
Add sentiment column to the input dataframe
Parameters:
df (pd.DataFrame): The input dataframe
df (pl.DataFrame): The input dataframe
lang (str): Language of the messages, defaults to "en"
Returns:
pd.DataFrame: The input dataframe with an additional column "sentiment"
pl.DataFrame: The input dataframe with an additional column "sentiment"
"""
if "message" not in df.columns:
Expand All @@ -38,11 +38,12 @@ def extract_sentiment(message: str) -> Optional[str]:
"""
try:
return str(sentiment_pipeline(message)[0]["label"])
return sentiment_pipeline(message)[0]["label"]
except Exception as e:
print(f"Error processing message: {message}: {e}")
return None

df["sentiment"] = df["message"].apply(extract_sentiment)

df = df.with_columns(
(pl.col("message")).map_elements(extract_sentiment).alias("sentiment"),
)
return df
Loading

0 comments on commit a559502

Please sign in to comment.