Fix dividends databank XBRL parsing and deduplication#148
Conversation
- Added exact XBRL XML tag regex parsing in `nse_lib.py` to correctly extract `RateOfFinalDividendRecommendedPerEquityShare` directly from NSE Corporate Announcements, resolving the missing BHEL dividend issue without triggering N+1 requests. - Extracted and added the Annual General Meeting (AGM) intimation date to the board meeting purpose string in `nse_lib.py`. - Updated `special_sit_routes.py` and `workbench.html` to tighten chronological deduplication logic by checking if the expected dividend amounts match before arbitrarily merging synthetic intimations into corporate actions, resolving the HINDZINC/POWERGRID duplicates and BPCL missing intimation bugs. - Fixed the UI to format and display the exact HH:MM:SS broadcast time from the backend rather than discarding it. Co-authored-by: letssayx <[email protected]>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
- Update regex in `nse_lib.py` and `field_mapper.py` to correctly parse edge-case fractions (e.g. `Re0.25`, `\u20b9`) and XBRL tags (BHEL amounts). - Add specific checks for 'Annual General Meeting' to properly capture AGM intimation dates. - Relax strict purpose checks to accurately catch board meetings labeled vaguely as 'Financial Results' (e.g. BPCL) by cross-referencing globally pre-fetched corporate announcements while rigorously checking dates. - Ensure the broadcast time component is parsed and displayed alongside the date in the Dividends Databank UI. - Apply exact deduplication rules based on chronologically matching the absolute minimum time difference to preserve back-to-back 15-day interim edge cases (HINDZINC, POWERGRID) in both the frontend and backend. Co-authored-by: letssayx <[email protected]>
- Update regex in `nse_lib.py` to extract XBRL dividend amounts from specific XML tags. - Cross-reference global corporate announcements in `nse_lib.py` to capture missing board meeting intimations (e.g., BPCL) that use generic purposes like "Financial Results". - Safely extract AGM dates from board meeting intimations for UI display. - Overhaul timeline deduplication logic in `special_sit_routes.py` and `workbench.html` to link intimations to final corporate actions strictly by chronological distance (0-90 days), fixing duplication issues while preserving identical 15-day interim dividends. - Remove completely destructive synthetic data generation from `nse_importer.py` that was triggering mass `DELETE` queries and destroying historical corporate actions (2019-2026). Switched pipelines to safe `upsert_batch`. - Improve time string parsing in frontend UI rendering to accurately display native Board Meeting exact times without UTC shift bleeds. Co-authored-by: letssayx <[email protected]>
- Captures synthetic events for Board Meetings where dividend amounts are directly extracted or AGMs are discussed. - Deduplicates upcoming synthetic meetings within a 5-day window to prevent triplicate rows (e.g., POWERGRID). - Formats dates and explicitly extracts and renders board meeting and broadcast times instead of overwriting official corporate action times. - Added strict null safety checks when reading `_matchedMeeting` properties to prevent `TypeError` during rendering. Co-authored-by: letssayx <[email protected]>
Fixes 6 functional UI and ingestion bugs in the Dividends Databank, including missing XBRL data, aggressive deduplication dropping board intimations, and missing broadcast times.
PR created automatically by Jules for task 7184079409821878691 started by @letssayx