Assist in cleaning, transforming, and preparing structured and unstructured datasets for analysis using Python and SQL.
Perform exploratory data analysis (EDA) to uncover trends, patterns, and anomalies in datasets, contributing to actionable insights.

Data Feed Support:

Collaborate with senior analysts to design and implement data pipelines for external data sources.
Develop basic automation scripts for data collection and ingestion, leveraging tools like Pandas and Python libraries.

Leveraging LLMs and Building RAG Systems:

Support the development of Retrieval-Augmented Generation (RAG) systems by integrating large language models (LLMs) with knowledge bases.
Assist in curating and pre-processing data to feed into vector databases, ensuring efficient retrieval for LLM queries.
Work with senior team members to fine-tune LLMs on domain-specific datasets for enhanced performance.
Contribute to evaluating and optimizing RAG workflows to ensure accurate and relevant output from LLM systems.

Data Science Projects:

Participate in projects that leverage LLMs or machine learning models to accomplish specific tasks, such as classification, regression, or NLP-based solutions.
Assist in designing workflows to preprocess and structure input data for model training and evaluation.
Collaborate with team members to iterate on model improvements, including hyperparameter tuning and feature engineering.
Support the integration of machine learning or LLM outputs into real-world applications, ensuring scalability and usability.

Visualization and Reporting:

Create clear, concise visualizations to communicate data-driven insights using tools like Matplotlib, Seaborn, or Tableau.
Prepare summaries and presentations to share findings with the team and stakeholders.

Collaboration and Communication:

Work in a diverse, multicultural environment, learning to adapt to various workflows and perspectives.
Maintain open communication with mentors and peers to ensure alignment on project objectives and deliverables.

Requirements

Technical Skills:

Proficiency in Python: Experience with data manipulation libraries such as Pandas, NumPy, and basic familiarity with Scikit-learn.
Strong foundational knowledge of SQL: Ability to perform data extraction, manipulation, and simple database management.
Familiarity with LLMs and their applications, such as OpenAI GPT or similar models.
Basic understanding of vector databases (e.g., Pinecone, Weaviate) and their integration with LLMs.
Experience with EDA techniques and statistical analysis.
Familiarity with data visualization tools and techniques.

Soft Skills:

Willingness to Learn: Open to exploring new tools, techniques, and methodologies in the field of data science and AI.
Problem-Solving Mindset: Ability to approach challenges methodically and think critically about data-related problems.
Effective Communicator: Comfortable explaining technical findings to a non-technical audience.

Advantages:

Familiarity with cloud platforms like AWS or Google Cloud is a plus.
Experience with data engineering platforms such as Databricks will be advantageous.
Prior academic or personal projects involving LLMs, RAG systems, or advanced data analytics are highly valued.

What You’ll Gain:

This internship provides a unique opportunity to work on cutting-edge AI technologies, including LLMs and RAG systems. You will gain hands-on experience in data science workflows, learn to build advanced AI-enabled applications, and enhance your skills in data analytics, setting a strong foundation for your future career in AI and data science.

About ActiveFence

ActiveFence is the leading tool stack for Trust & Safety teams, worldwide. By relying on ActiveFence’s end-to-end solution, Trust & Safety teams – of all sizes – can keep users safe from the widest spectrum of online harms, unwanted content, and malicious behavior, including child safety, disinformation, fraud, hate speech, terror, nudity, and more.

Using cutting-edge AI and a team of world-class subject-matter experts to continuously collect, analyze, and contextualize data, ActiveFence ensures that in an ever-changing world, customers are always two steps ahead of bad actors. As a result, Trust & Safety teams can be proactive and provide maximum protection to users across a multitude of abuse areas in 70+ languages.

Backed by leading Silicon Valley investors such as CRV and Norwest, ActiveFence has raised $100M to date; employs 300 people worldwide, and has contributed to the online safety of billions of users across the globe.

Apply now

See more open positions at ActiveFence

Privacy policy Cookie policy