Hello, I’m Salma.
AI Scientist

Data Science lead at the foundation model research team at SAP, building and fine-tuning large language models for enterprise applications.

Contact me!
Profile Picture

Who am I?

With an academic background in Machine Learning and Computational Biology, I currently work as a Senior AI scientist in the Foundation Model Research team at SAP. My work focuses on fine-tuning large language models and building Generative AI Applications. I currently lead the Data Science team at SAP, where we build AI assistants for ABAP developers.

My academic research focused on developing statistical and machine learning models (NLP) based on high-throughput sequencing datasets to better understand biological systems.

I like spending my free time hiking, reading popular science books, and playing board games. The pandemic encouraged me to rediscover my painting passion.

About Me

Projects

AI Assistants for ABAP Developers

Leading a data science team at SAP focusing on ABAP developer efficiency features for Joule (SAP's AI Copilot), including code completion, code generation, and code explanation.

  • developed ABAP foundation models,
  • supervised fine-tuning for code completion and explanation,
  • released as part of ABAP development tools.
TechEd Session
AI Assistants for ABAP Developers
Automated Document Processing

Automated Document Processing

SAP's premium Document Information Extraction service automates the processing of business documents like invoices and delivery notes with generative AI. This enables customizable extraction schemas and adds support for over 40 languages.

  • developed custom multi-modal models,
  • integrated LLMs for document processing,
  • fine-tuned large language models for business document processing.

Documentation

Bipartite Motif Finder (BMF)

Open-source tool and web server for predicting spaced RNA sub-sequences attracting RNA-binding proteins.

  • optimized code for speed (AVX-optimization/C++/Cython),
  • handled several terabytes of interaction data, explored several datasets to uncover interesting patterns,
  • created a user-friendly interface and documentation,
  • programmed a webserver for optimal accessibility.
Manuscript
Bipartite Motif Finder (BMF)
CNNs to Predict Transcription Activation

CNNs to Predict Transcription Activation

Leveraged CNNs to find protein sequences that can activate gene transcription.

  • developed methods to interpret CNN predictions,
  • compared the performance of the trained CNN with simpler ML classifiers.

Manuscript

Recent Outreach

Dec 2024
Building Specialized LLM-based Enterprise Applications
Berlin AI Meetup |
Nov 2024
Agentic AI in Joule
SAP Select keynote demo
Oct 2024
Revolutionize ABAP development with Joule, our generative AI copilot
TechEd lecture |
Feb 2024
Beyond Chatbots: Building Enterprise LLM-based Applications
360DevConnect Berlin Meetup |
Outreach Image

Education

2017-2021
Max Planck Institute for Biophysical Chemistry
PhD, Quantitative and Computational Biology
2015-2017
Georg-August University
MSc, International Max Planck Research School for Molecular Biology
2014-2015
University of Tehran
MSc, Direct PhD program in Biotechnology
2010-2014
University of Tehran
BSc, Direct PhD program in Biotechnology
2006-2010
Farzanegan HS
High School Diploma, Math & Physics

Certifications

2020
Deep Learning Specialization
DeepLearning.AI | Coursera |
2021
Natural Language Processing Specialization
DeepLearning.AI | Coursera |
2021
Practical Data Science Specialization
AWS & DeepLearning.AI | Coursera |
2022
TensorFlow: Advanced Techniques Specialization
AWS & DeepLearning.AI | Coursera |
2021
Deep Neural Networks with PyTorch
IBM | Coursera |
2021
SQL for Data Science
University of California, Davis | Coursera |
2021
AWS Cloud Technical Essentials
AWS | Coursera |
2021
Agile Project Management
Google | Coursera |

Selected Publications


2021
Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins
Salma Sohrabi-Jahromi, Johannes Söding (Bioinformatics)
2020
A High-Throughput screen for transcription activation domains reveals their sequence features and permits prediction by deep learning
Ariel Erijman, Lukasz Kozlowski, Salma Sohrabi-Jahromi,..., Johannes Söding, Steven Hahn (Molecular Cell)
2019
Transcriptome maps of general eukaryotic RNA degradation factors
Salma Sohrabi-Jahromi, Katharina B Hofmann,..., Johannes Soeding, Patrick Cramer (eLife)
2016
A kidney-specific genome-scale metabolic network model for analyzing focal segmental glomerulosclerosis
Salma Sohrabi-Jahromi, Sayed-Amir Marashi, Shiva Kalantari (Mammalian Genome)


Google Scholar

Contact