Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shall we rename the Indexer component ? #21

Closed
agoncal opened this issue Feb 26, 2024 · 8 comments
Closed

Shall we rename the Indexer component ? #21

agoncal opened this issue Feb 26, 2024 · 8 comments
Labels
question Further information is requested

Comments

@agoncal
Copy link
Collaborator

agoncal commented Feb 26, 2024

At first I didn't understand what "Indexer" meant in the document processor. And the documentation mentions "Data ingestion". So what about renaming the component "Indexer" to something else:

  • data-ingestion
  • data-ingestor
  • ingestor (does it exist in Engligh or is it ingester)
@agoncal agoncal added the question Further information is requested label Feb 26, 2024
@sinedied
Copy link
Collaborator

It's been in the todo to rename it to ingestion, but I left it as it is last time as I did not get the chance to refactor everything.

I would prefer to keep it 1-word for simplicity, what about ingestion then?

@SandraAhlgrimm
Copy link
Contributor

Well, I believe that ingestion itself is not very self explaining. But I also can't come up with a 1-word solution. Why does it have to be just one?

@sinedied
Copy link
Collaborator

It will be used to name the service in bicep and yaml too, and I think using more than 1 word might be confusing, ie data-ingestion-java-quarkus.

What bother me is that we called the process "data ingestion" here, but what we're doing technically is indexing data into the DB (that's how it's called too in Azure docs). Maybe the good solution is to keep the indexer name and change how we named the process in the docs?

@agoncal
Copy link
Collaborator Author

agoncal commented Feb 26, 2024

Cultural difference between JS and Java developers. We love our long class names ;o) ObjectFactoryCreatingFactoryBean

@agoncal
Copy link
Collaborator Author

agoncal commented Feb 26, 2024

I quite like the 1-word version: ingestion is ok for me

@SandraAhlgrimm
Copy link
Contributor

It will be used to name the service in bicep and yaml too, and I think using more than 1 word might be confusing, ie data-ingestion-java-quarkus.

What bother me is that we called the process "data ingestion" here, but what we're doing technically is indexing data into the DB (that's how it's called too in Azure docs). Maybe the good solution is to keep the indexer name and change how we named the process in the docs?

I agree here, the way I understood the typescript code is that you're doing indexing here.

Copilot answer: Data ingestion and indexing are both crucial steps in managing and utilizing data, but they serve different purposes:

  1. Data Ingestion:

    • Definition: Data ingestion is the process of moving and replicating data from various sources to a target landing or raw zone. This destination could be a cloud data lake, a cloud data warehouse, or another storage medium where the data can be accessed, used, and analyzed by an organization.
    • Purpose: Data ingestion ensures that data from diverse sources (such as databases, APIs, logs, files, sensors, etc.) is collected and made available for further processing. It's the first step in the data pipeline.
    • Key Activities:
      • Extraction: Retrieving data from source systems.
      • Transformation: Converting data into a suitable format.
      • Loading: Storing data in the target location.
    • Example: Collecting customer orders from an e-commerce website and storing them in a data lake for analysis.
  2. Indexing:

    • Definition: Indexing is the process of creating a searchable structure (an index) that allows efficient retrieval of data from a large dataset.
    • Purpose: Indexing enhances data query performance by organizing data in a way that accelerates search operations.
    • Key Activities:
      • Creating Indexes: Identifying relevant fields and creating index structures.
      • Updating Indexes: Keeping indexes up-to-date as data changes.
      • Query Optimization: Utilizing indexes to speed up search queries.
    • Example: Creating an index on a database table's primary key column to quickly locate specific records.

In summary, data ingestion focuses on getting data into the system, while indexing optimizes data access and retrieval. Both processes are essential for effective data management and analysis.

@agoncal
Copy link
Collaborator Author

agoncal commented Feb 27, 2024

@sinedied @SandraAhlgrimm so, shall we rename indexer with ingestion then ?

+1 for me

@agoncal
Copy link
Collaborator Author

agoncal commented Feb 27, 2024

I've renamed indexer with ingestion in the java branch only

9b1bef9

@agoncal agoncal closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants