Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 76 additions & 37 deletions content/role/data-ml-engineer/home.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,9 @@ tags:
}
</style>

It’s your job to ensure the right data powers the right applications at the right time and in the right place.
You’ve got your hands full addressing all aspects of data logistics and processing required to ensure the success of your data-intensive projects, including those involving analytics and AI/machine learning. On this page, we provide content to help you meet these challenges, as well as practical tips on key issues to improve efficiency and performance.

With an increased number and variety of workloads, how can you address all aspects of data logistics and processing that can make or break the success of any data-intensive project, including analytics and AI/machine learning? And do it easily and reliably?

On this page, we provide content to help you meet these challenges. You will find a rotating selection of foundational material, ideas to help you get inspired, as well as practical tips on key issues to improve efficiency and performance. You’ll also learn what Hewlett Packard Enterprise (HPE) offers.

The roles of the Data/ML Engineer and Data Scientist can overlap. You may also find content of interest to you on the [Data Scientist](/role/data-scientist/home/) page. Content on this page changes as new material becomes available or new topics arise, so check back regularly.
You may also find content of interest to you on the [Data Scientist](/role/data-scientist/home/) page, so be sure to check it out.

<br>
<hr style="background: #7630EA; height: 5px; border: none">
Expand All @@ -32,38 +28,36 @@ The roles of the Data/ML Engineer and Data Scientist can overlap. You may also f
<div class="row">
<div class="column">
### Get Inspired
**A sampler of new ideas related to data/ML engineering:**
**New ideas related to data/ML engineering**

*Learn how industry innovation may affect your job.*

</div>
<div class="column">
### Building a Foundation
**Key to data science projects is a unifying data infrastructure to handle logistics and the containerization of applications**
**Setting up the pipeline**

*Simplify operations and workflows with the right data fabric and orchestrate containerized applications with open source Kubernetes.*
*Simplify operations and workflows*

</div>
</div>

<div class="row">
<div class="column">
- Unit testing isn’t just for code: you need to unit test your data. [Watch Deequ: Unit Tests for Data](https://www.youtube.com/watch?v=2f_JewK79GI)
- Check out [How fine-grained data placement helps optimize application performance](/blog/how-fine-grained-data-placement-helps-optimize-application-performance/)

- Data locality helps support GPUs and other accelerators from a data point of view. Read [How fine-grained data placement helps optimize application performance](https://developer.hpe.com/blog/how-fine-grained-data-placement-helps-optimize-application-performance/)
- See [Swarm Learning: Turn your distributed data into a competitive edge](https://www.hpe.com/us/en/collaterals/collateral.a50000344.Swarm-Learning-Turn-Your-Distributed-Data-into-Competitive-Edge-technical-white-paper.html?rpv=cpf&parentPage=/us/en/products/compute/hpc/deep-learning)

- Better connections between data producers and data consumers make data science more successful. Read [Getting value from your data shouldn’t be this hard](https://www.hpe.com/us/en/insights/articles/getting-value-from-your-data-shouldn-t-be-this-hard-2106.html)
- Read [Writing deep learning tools for all data scientists, not just unicorns](/blog/writing-deep-learning-tools-for-all-data-scientists-not-just-unicorns/)
</div>
<div class="column">
- Study the technical paper [HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management](https://www.hpe.com/psnow/doc/a00110846enw)
- Read [Deploying an ML model in HPE GreenLake for ML Ops](/blog/mlops-–-deploying-an-ml-model-in-greenlake-platform-mlops-service/)

- Read [What’s your superpower for data management?](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/What-s-your-superpower-for-data-management/ba-p/7100920#.Ya5RTb3ML0p)
- Learn more about [HPE GreenLake for AI/ML/Data Analytics](https://www.hpe.com/us/en/greenlake/ai-ml-analytics.html)

- View the [HPE Ezmeral Data Fabric platform page](https://developer.hpe.com/platform/hpe-ezmeral-data-fabric/home/)
- Check out how to [build transformative AI applications at scale with HPE Machine Learning Development Environment](/blog/build-transformative-ai-applications-at-scale-with-hpe-cray-ai-development-environment/)

- Read [Kuberneticized machine learning and AI using Kubeflow](https://developer.hpe.com/blog/kubernetized-machine-learning-and-ai-using-kubeflow/)

- Learn how management of large scale Kubernetes clusters is made easier with [HPE Ezmeral Runtime Enterprise](https://developer.hpe.com/platform/hpe-ezmeral/home/)
- Study the technical paper [HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management](https://www.hpe.com/psnow/doc/a00110846enw)

</div>
</div>
Expand All @@ -74,43 +68,44 @@ The roles of the Data/ML Engineer and Data Scientist can overlap. You may also f

### Addressing Key Concerns

**What can I do to lower the entry barriers to developing new AI/ML/data science projects?**
**Lowering the entry barriers to developing new AI/ML/data science projects**

- AI/ML projects can and should be run on the same system as analytics projects: Read “Chapter 3: AI and Analytics Together” in the free eBook [AI and Analytics at Scale: Lessons from Real-World Production Systems](https://www.hpe.com/us/en/resources/software/ai-and-analytics-systems.html)
- AI/ML projects can and should be run on the same system as analytics projects: Read “Chapter 3: AI and Analytics Together” in the free eBook [AI and Analytics at Scale: Lessons from Real-World Production Systems](https://www.hpe.com/us/en/resources/software/ai-and-analytics-systems.html)



**Who should be included on the team to ensure the success of the project?**
**Getting started**

- Consider [The New Data Science Team: Who’s on First?](https://community.hpe.com/t5/hpe-ezmeral-uncut/the-new-data-science-team-who-s-on-first/ba-p/7154783#.Y4Zsq-zMLlw)

- Explore Deep learning model training – a first-time user’s experience with Determined [Part 1](/blog/deep-learning-model-training-–-a-first-time-user’s-experience-with-determined-part-1/) and [Part 2](/blog/deep-learning-model-training-–-a-first-time-user’s-experience-with-determined-–-part-2/)

- Read [The New Data Science Team: Who’s on First?](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/The-New-Data-Science-Team-Who-s-on-First/ba-p/7154783#.Ybi1pb3MI2y)



**How do I handle data movement?**
**Handling data movement**

- Read [A better approach to major data motion: built-in data mirroring](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/A-better-approach-to-major-data-motion-Efficient-built-in/ba-p/7135056#.Ya5Xqb3ML0p)
- Read [A better approach to major data motion: built-in data mirroring](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/A-better-approach-to-major-data-motion-Efficient-built-in/ba-p/7135056#.Ya5Xqb3ML0p)

- Watch the webinar [Data Motion at Scale: the Untold Story](https://www.hpe.com/h22228/video-gallery/us/en/5a1ff1b7-faf8-43f2-98a3-d5b7331616b6/video?jumpid=em_4pbhacrk27_aid-520049397&utm_source=RE)
- Watch the webinar [Data Motion at Scale: the Untold Story](https://www.hpe.com/h22228/video-gallery/us/en/5a1ff1b7-faf8-43f2-98a3-d5b7331616b6/video?jumpid=em_4pbhacrk27_aid-520049397&utm_source=RE)

**What makes it easier to deal with edge computing in large-scale systems?**
**Dealing with edge computing in large-scale systems**

- Read [To the edge and back again: Meeting the challenges of edge computing](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/To-the-edge-and-back-again-Meeting-the-challenges-of-edge/ba-p/7132609#.Ya5X3r3ML0o)
- Read [To the edge and back again: Meeting the challenges of edge computing](https://community.hpe.com/t5/HPE-Ezmeral-Uncut/To-the-edge-and-back-again-Meeting-the-challenges-of-edge/ba-p/7132609#.Ya5X3r3ML0o)



**How do I ensure data trust and security?**
**Ensuring data trust and security**

- New approaches are improving the connection between data producers and data consumers. See how in the video [Dataspaces: connecting to data you can trust](https://www.youtube.com/watch?v=9VTLA1nxpoo)
- View [Dataspaces: connecting to data you can trust](https://www.youtube.com/watch?v=9VTLA1nxpoo)

- Learn about the [SPIFFE and SPIRE projects](https://developer.hpe.com/platform/spiffe-and-spire-projects/home/) that are hosted by the CNCF Foundation
- Explore the CNCF [SPIFFE and SPIRE projects](https://developer.hpe.com/platform/spiffe-and-spire-projects/home/)

**How are others doing this?**
**Case studies**

Check out these real-world case studies
- [Accelerating Autonomous Car Development with Ready Access to Global Data Fabric](https://www.hpe.com/psnow/doc/a50003176enw?jumpid=in_lit-psnow-red)

- [Accelerating Autonomous Car Development with Ready Access to Global Data Fabric](https://www.hpe.com/psnow/doc/a50003176enw?jumpid=in_lit-psnow-red)

- [Accelerating Data Insight for a Better Work Life](https://www.hpe.com/psnow/doc/a50003827enw)
- [Accelerating Data Insight for a Better Work Life](https://www.hpe.com/psnow/doc/a50003827enw)

<br>
<hr style="background: #FF8300; height: 5px; border: none">
Expand All @@ -133,6 +128,41 @@ Check out these real-world case studies
- [Data Science Unplugged: Part 2](https://www.youtube.com/watch?v=Va4tSr__Yok)

- [How to make data consumable for real-world data science](https://www.youtube.com/watch?v=4WKjRqflF7M)

- [The Great Unification: Building analytic pipelines with Apache Spark workloads](https://www.youtube.com/watch?v=TxZP_T9CC5Y&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF)

- [Location, location, location! Succeed at the edge with HPE Ezmeral and NVIDIA](https://www.youtube.com/watch?v=C5HfiLatauQ&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF)

- [Golden age of AI, dark ages of AI infrastructure](https://www.youtube.com/watch?v=ktZFLD-9qgw&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF)

- [Accelerate public sector AI use cases using a powerful ML Ops platform](https://www.youtube.com/watch?v=5pejLKu32Js&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF&index=1)

</div>
</div>

---

<br><br>
<a href="/campaign/meetups/" style="font-weight: 700; font-size: 27px">Meetups</a>

<div class="row">
<div class="column">
A series of in-depth talks on open source developer technologies.

</div>
<div class="column">

- [Streamlit - The fastest way to build and share data science apps](https://www.youtube.com/watch?v=sdgTYy3BJiM&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF)

- [Scaling language training to trillion-parameter models on a GPU cluster](https://www.youtube.com/watch?v=rIPqCvvMmms&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF&index=2)

- [OpenSearch – The open-source search and analytics suite you can run yourself](https://www.youtube.com/watch?v=KdssEOIdO_0&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF&index=2)

- [Machine Learning Data Version Control (DVC): Reproducibility and collaboration in your ML projects](https://www.youtube.com/watch?v=sgkN09LkCP4&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF&index=2)

- [Boost Spark AI workloads with Pepperdata](https://www.youtube.com/watch?v=N36DTliNmck&list=PLtS6YX0YOX4f5TyRI7jUdjm7D9H4laNlF&index=2)


</div>
</div>

Expand All @@ -147,8 +177,17 @@ Check out these real-world case studies

</div>
<div class="column">
- [HPE Ezmeral Data Fabric 101 – Get to know the basics around the data fabric](https://hackshack.hpedev.io/workshop/26)
- [HPE Ezmeral Data Fabric 101 – Get to know the basics around the data fabric](/hackshack/workshop/26)

- [Spark 101 – Introduction to Apache Spark concepts](/hackshack/workshop/34)

- [Deep Learning model training at scale with Determined](/hackshack/replays/38)

- [Machine Learning 101 – Introduction to ML concepts](/hackshack/workshop/35)

- [Building a dynamic Machine Learning pipeline with KubeDirector](/hackshack/workshop/18)

- [Deploying end-to-end machine learning workflows with PE Ezmeral ML Ops](/hackshack/workshop/28)

</div>
</div>
Expand Down Expand Up @@ -185,7 +224,7 @@ Check out these real-world case studies

</div>
<div class="column">
- [HPE Dev Slack]( https://slack.hpedev.io/)
- [HPE Developer Community Slack]( https://slack.hpedev.io/)


</div>
Expand Down
Loading