Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC Idea: Extend data model and user interface to capture better information about contributors #415

Closed
vchrombie opened this issue Mar 11, 2021 · 16 comments

Comments

@vchrombie
Copy link
Member

The information stored about Organizations is very basic. For each organization, only its name and domains (e.g example.com) are stored. Organizations might have hierarchical structures composed of several groups, areas, and departments, where employees work for. We would like to be able to track all this information.

SortingHat is the tool that we use to manage identities data in GrimoireLab. As individuals in a project can have different identities - several usernames or email addresses - this tool allows creating unified profiles of them. Then, the platform will use this information to generate accurate results of the activity of these participants.

SortingHat started as a command-line tool but after some years, we saw its potential and we decided to create a new version, this time as a service. This new version provides a new GraphQL API to operate with the server and a UI web-based app, that replaces Hatstall, the old UI for SortingHat.

Although the development of it is in its later stage and it will be ready soon for the stable version of the platform, there are many good ideas that we will like to incorporate. Some of them were selected for GSoC 2021.

The aims of the project are as follows:

  • Define a data model to store an organization's internal structure.
  • Implement methods to manage this information.
  • Integrate this information on the UI
  • (Extra stretch goal) Store organization aliases (e.g Google and Google, LLC).
  • Difficulty: Hard
  • Requirements: Interest in software analytics. Python programming. JavaScript programming. SQL knowledge. Willingness to understand GrimoireLab internals.
  • Recommended: Experience with Python, JavaScript, UI development, GraphQL, Django, and Vue.js would be convenient but can be learned during the project.
  • Mentors: @sduenas @evamillan @mafesan

Microtasks

For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:

Once you're familiar with Grimoirelab, you can have a look at the following microtasks.

  • Microtask 0:
    Download PyCharm and get familiar with it (for instance, you can follow this tutorial).

  • Microtask 1:
    Set up a dev environment to work on GrimoireLab. Have a look at chaoss/grimoirelab-sirmordred - Getting-Started.md.

  • Microtask 2:
    Execute micro-mordred to collect, enrich and visualize data from Git repositories.

  • Microtask 3:
    Based on the elasticsearch documents produced by micro-mordred and source code of chaoss/grimoirelab-elk, try to answer the following questions:

    • What is the meaning of the JSON attribute author_id?
    • What is the meaning of the JSON attribute author_org_name?
    • What is the meaning of the JSON attribute author_uuid?
    • What is the meaning of the JSON attribute author_domain?
    • What is the meaning of the JSON attribute uuid?
    • What is the meaning of the JSON attribute utc_commit?
    • What is the meaning of the JSON attribute origin?
  • Microtask 4:
    Set up the developer environment of SortingHat (muggle branch).

    NOTE: The sortinghat muggle branch is a WIP branch. As of now, it doesn't work with the core of the GrimoireLab platform but we hope to have it ready soon.

  • Microtask 5:
    Create a sample profile with different identities and enrollments using the SortingHat UI.

  • Microtask 6:
    Using the SortingHat GraphQL Console, create a query that fetches the data (identities, enrollments) of an individual profile.

  • Microtask 7:
    Create a script that can parse the gitdm developer affiliation files and load the data in a SortingHat database using GraphQL.

  • Microtask 8:
    Improve the visualization of the individualCards component. You need not send a PR, please update the work in your personal fork.

  • Microtask 9:
    Submit a PR to any of the GrimoireLab components to increase the test coverage of one or more files of the source code.

  • Microtask 10:
    Submit at least a PR to one of the GrimoireLab repositories to fix an issue, improve the documentation, etc. Some good-first-issues are:

@AllMight2099
Copy link

Hey there!
My name is Nishanth and I'm a sophomore from IIT Roorkee. I interned as a Django backend developer for a couple of months and I do believe that I have a strong understanding of not only python but Django as well. I'm fairly I have also worked on multiple Javascript projects, one of them being the front-end portion of our campus group's website that hosts machine learning competitions (using React), so I do have a strong understanding of Javascript too!
I've also set up GrimoreLabs and I'm fairly familiar with its workings, but I would love to get a little bit of guidance not this issue. Looking forward to working with you guys!

@vchrombie
Copy link
Member Author

Hi everyone, thanks for your interest in applying for this idea. You can start working on the microtasks to get a better idea of the project. Let us know if you have any doubts. 🙂

For all students interested in this idea, please comment on this issue to get in touch with the mentors. This is the main communication channel.

@Rashmi-K-A
Copy link

Hey:wave:! I am Rashmi. I am an incoming grad student at Carnegie Mellon University. I have completed most of the microtasks related to this project. You can find them here in this repo: https://github.com/Rashmi-K-A/chaoss-sortinghat. I am very interested in contributing to this project and would love to learn more! It would be great if I can get in touch with the mentors so I can understand more about this project.

@rinamanta
Copy link

Hello, my name is Rina and I'm in my first year of a master's program in data science. Your project sounds very interesting. I am not sure I have time to complete the microtasks by March 29th, but I will give it a go!

@vchrombie
Copy link
Member Author

Hi @rinamanta, thanks for your interest in applying for this idea.

I am not sure I have time to complete the microtasks by March 29th, but I will give it a go!

The student application period ends by April 13, 2021. You can work on the microtasks and proposal till the deadline.
https://summerofcode.withgoogle.com/how-it-works/

@vchrombie
Copy link
Member Author

Hi, @AllMight2099 @Rashmi-K-A @rinamanta

I hope you started working on the microtasks. As you might know, you have to submit a proposal before the GSoC deadline. You are also expected to attempt at least one microtask for considering your application.

The main reason behind the microtasks is, these tasks will give a good minimum understanding of the Sorting Hat tool as well as the GrimoireLab platform as a whole. It will be really helpful for writing your proposal.

If you haven't started working on the microtasks yet, I would suggest you start asap. You can create a github repository for storing the microtasks and you can open issues in that repo for asking doubts or reviewing the tasks.

Thanks.

@rinamanta
Copy link

rinamanta commented Mar 22, 2021 via email

@rinamanta
Copy link

rinamanta commented Mar 23, 2021 via email

@vchrombie
Copy link
Member Author

Hi @rinamanta

I have already installed Anaconda for my Python class in college. I see it has a version of Pycharm. Is it okay to use this version of Pycharm for the microtasks?

Sure. You can use it.

I'm sorry. I started to follow the instructions to download the various software, and I don't think my old laptop can handle it all. I'm a middle-aged career switcher so the tech world is all very new to me.
I think applying for GSoC right now is just too big a leap for me. But thank you again for your clear instructions. Your project does sound very interesting.

Thanks for your interest in applying for this idea. You can still continue to contribute to the project. Please let us know if you need any help.

@Rashmi-K-A
Copy link

@vchrombie It would be great if you perhaps elaborate more on what kind of information we are looking to capture regarding the internal structure of the organization. I understand we want to store information about teams or units within the organization but would this be just one level down or are we looking to have the ability to break down into further subunits?

@sduenas
Copy link
Member

sduenas commented Mar 24, 2021

@Rashmi-K-A we're looking for a hyerarchical structure of many levels. Think on a tree (graph) structure.

@Rashmi-K-A
Copy link

@sduenas @vchrombie Am I correct in thinking that there would be more reads than writes for organizational data?
Since we need to be holding hierarchical data inside a relational DB, I want to make sure I am thinking of the right model to store the data in terms of efficiency.

@sduenas
Copy link
Member

sduenas commented Mar 26, 2021

@sduenas @vchrombie Am I correct in thinking that there would be more reads than writes for organizational data?

Yes, that's right.

Since we need to be holding hierarchical data inside a relational DB, I want to make sure I am thinking of the right model to store the data in terms of efficiency.

In any case, think of we have implemented everything with Django so far. That means you'll need to use the framework and the ORM it uses to map data and tables. Queries can be performed directly with SQL but it'll be better if we use Django API for doing that.

@vchrombie
Copy link
Member Author

Hi everyone, the student application period has started and the deadline is 13 April 2021, 18:00 UTC. GSoC Timeline

Please continue working on the proposal and complete as many microtasks as possible. Please let us know if you need any help with doubts or reviewing the microtasks.
Thanks!

@vchrombie
Copy link
Member Author

Thanks to everyone who showed interest in applying for this idea and worked on making a proposal and the microtasks. It was great working with you.

As the final steps, please make sure you submit your proposal on the Google Summer of Code website and you also need to open a PR adding your name and details to the GSoC-interest.md file in order to qualify as an interested candidate. Both have to be completed before the deadline mentioned on the GSoC website.

Thanks once again! All the best.

@vchrombie
Copy link
Member Author

Thank you for participating in this idea, and congratulations to @Rashmi-K-A for being selected for the project idea.

If you have any questions, comments, or concerns about the selection process, feel free to send an email to [email protected]. Thanks!

This issue is going to be closed on Friday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants