Skip to content

Components: Combine summaries and descriptions #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: live
Choose a base branch
from

Conversation

duncandewhurst
Copy link
Contributor

@duncandewhurst duncandewhurst commented May 8, 2025

@kathryn-ods here's Gemini's attempt at combining the summaries and descriptions. I'm not terribly happy with it, but I think that's more a symptom of the quality of the existing content, than of what Gemini is doing with it.

I tried rewriting a couple myself, but it's quite time-consuming:

API specification

An application programming interface (API) specification standardises how data publishers should provide interactive access to data. It defines things like endpoints, data formats, and authentication methods.

APIs enable developers to use data without having to download and process entire datasets. Standardised APIs reduce the obstacles to building applications that use data, thereby encouraging adoption.

An API specification also reduces design costs for anyone wanting to publish data via an API.

Advocacy plan

An advocacy plan outlines the specific actions and resources needed to promote the adoption of a data standard among potential adopters, such as government agencies, civil society organisations and multi-lateral institutions.

It incorporates compelling arguments in favour of adoption, tailored to address the specific needs and benefits of different adopters. The plan establishes a structured approach to encourage widespread uptake, which may include communication campaigns, workshops, development of tools, and endorsement by international organisations. The plan also identifies key metrics for success, such as public commitments, published datasets and documented uses. The plan should be regularly updated to reflect the standard's evolving maturity, its impact, and potential challenges to adoption, such as resistance to change or perceived complexity.

Essentially, I think we need to decide if the Gemini produced content is good enough for now (with a quick review to ensure accuracy) or whether to spend time on fully rewriting everything ourselves.

What do you think?


Here's the very rough code I used, including the prompt to the Gemini API:

Python code

import os

from google import genai

client = genai.Client(api_key="[API KEY]")

model = "gemini-2.0-flash"

prompt = "Context: On reviewing a range of policy-related data standards, including those that we maintain and those maintained by others, we identified approximately 60 components that at least two standards have chosen to create or adopt. Combine the following markdown-formatted summary and description of one of the components. Provide your response as raw markdown text in british english. Do not add a title. By combine, I mean rewrite and copy-edit. Not simply append. Begin with a one sentence definiton/description of the component. Subsequent sentences can expand on the definition and provide additional information, based on what is in the source text."

directory_path = "docs/components"

for filename in os.listdir(directory_path):
    if filename != "index.md":
        filepath = os.path.join(directory_path, filename)
        with open(filepath, "r") as f:
            lines = f.readlines()

        content = {"summary": [], "description": []}
        start = None
        end = None
        section = None

        for index, line in enumerate(lines):
            if line.startswith("## Summary"):
                start = index
                section = "summary"
            elif line.startswith("## Description"):
                section = "description"
            elif line.startswith("## "):
                section = None
            elif section:
                end = index
                content[section].append(line)

        if len(content["summary"]) > 0 and len(content["description"]) > 0:
            
            contents = f"{'\n\n'.join([prompt, 'summary:', ''.join(content['summary']), 'description:', ''.join(content['description'])])}"

            response = client.models.generate_content(
                model=model,
                contents=contents
            )

            newlines = lines[0:start] + [response.text] + ["\n\n"] + lines[end + 1 :]

            with open(filepath, "w") as f:
                f.writelines(newlines)

@kathryn-ods
Copy link
Contributor

Combining the two sections is probably only worthwhile if we're happy with the resulting content. I don't see loads of value in an interim rewrite to improve formatting if the content itself isn't good enough.

Fwiw I have a good amount of capacity right now so could spend some time either manually rewriting or reviewing gemini rewrties. Maybe we should have a call and think about where we want to get with os4d as a whole in the next few months?

@duncandewhurst
Copy link
Contributor Author

Combining the two sections is probably only worthwhile if we're happy with the resulting content. I don't see loads of value in an interim rewrite to improve formatting if the content itself isn't good enough.

Agreed!

Fwiw I have a good amount of capacity right now so could spend some time either manually rewriting or reviewing gemini rewrties. Maybe we should have a call and think about where we want to get with os4d as a whole in the next few months?

Great. Let's do that. I'll send an invitation.

Before starting on rewriting, I think it'll be good to agree on some guidelines. I've made a start on that in a Google doc and have tried redrafting three of the component descriptions according to the guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Components: Combine summaries and descriptions
2 participants