Data Analyst Intern

ActiveFence

ActiveFence

IT, Data Science
Vietnam
Posted on Mar 4, 2025

Data Analyst Intern

  • Intelligence
  • Vietnam
  • Intern
  • Part-time

Description

Data Preparation and Exploration:

  • Assist in cleaning, transforming, and preparing structured and unstructured datasets for analysis using Python and SQL.
  • Perform exploratory data analysis (EDA) to uncover trends, patterns, and anomalies in datasets, contributing to actionable insights.

Data Feed Support:

  • Collaborate with senior analysts to design and implement data pipelines for external data sources.
  • Develop basic automation scripts for data collection and ingestion, leveraging tools like Pandas and Python libraries.

Leveraging LLMs and Building RAG Systems:

  • Support the development of Retrieval-Augmented Generation (RAG) systems by integrating large language models (LLMs) with knowledge bases.
  • Assist in curating and pre-processing data to feed into vector databases, ensuring efficient retrieval for LLM queries.
  • Work with senior team members to fine-tune LLMs on domain-specific datasets for enhanced performance.
  • Contribute to evaluating and optimizing RAG workflows to ensure accurate and relevant output from LLM systems.

Data Science Projects:

  • Participate in projects that leverage LLMs or machine learning models to accomplish specific tasks, such as classification, regression, or NLP-based solutions.
  • Assist in designing workflows to preprocess and structure input data for model training and evaluation.
  • Collaborate with team members to iterate on model improvements, including hyperparameter tuning and feature engineering.
  • Support the integration of machine learning or LLM outputs into real-world applications, ensuring scalability and usability.

Visualization and Reporting:

  • Create clear, concise visualizations to communicate data-driven insights using tools like Matplotlib, Seaborn, or Tableau.
  • Prepare summaries and presentations to share findings with the team and stakeholders.

Collaboration and Communication:

  • Work in a diverse, multicultural environment, learning to adapt to various workflows and perspectives.
  • Maintain open communication with mentors and peers to ensure alignment on project objectives and deliverables.

Requirements

Technical Skills:

  • Proficiency in Python: Experience with data manipulation libraries such as Pandas, NumPy, and basic familiarity with Scikit-learn.
  • Strong foundational knowledge of SQL: Ability to perform data extraction, manipulation, and simple database management.
  • Familiarity with LLMs and their applications, such as OpenAI GPT or similar models.
  • Basic understanding of vector databases (e.g., Pinecone, Weaviate) and their integration with LLMs.
  • Experience with EDA techniques and statistical analysis.
  • Familiarity with data visualization tools and techniques.

Soft Skills:

  • Willingness to Learn: Open to exploring new tools, techniques, and methodologies in the field of data science and AI.
  • Problem-Solving Mindset: Ability to approach challenges methodically and think critically about data-related problems.
  • Effective Communicator: Comfortable explaining technical findings to a non-technical audience.

Advantages:

  • Familiarity with cloud platforms like AWS or Google Cloud is a plus.
  • Experience with data engineering platforms such as Databricks will be advantageous.
  • Prior academic or personal projects involving LLMs, RAG systems, or advanced data analytics are highly valued.

What You’ll Gain:

This internship provides a unique opportunity to work on cutting-edge AI technologies, including LLMs and RAG systems. You will gain hands-on experience in data science workflows, learn to build advanced AI-enabled applications, and enhance your skills in data analytics, setting a strong foundation for your future career in AI and data science.

About ActiveFence

ActiveFence is the leading tool stack for Trust & Safety teams, worldwide. By relying on ActiveFence’s end-to-end solution, Trust & Safety teams – of all sizes – can keep users safe from the widest spectrum of online harms, unwanted content, and malicious behavior, including child safety, disinformation, fraud, hate speech, terror, nudity, and more.

Using cutting-edge AI and a team of world-class subject-matter experts to continuously collect, analyze, and contextualize data, ActiveFence ensures that in an ever-changing world, customers are always two steps ahead of bad actors. As a result, Trust & Safety teams can be proactive and provide maximum protection to users across a multitude of abuse areas in 70+ languages.

Backed by leading Silicon Valley investors such as CRV and Norwest, ActiveFence has raised $100M to date; employs 300 people worldwide, and has contributed to the online safety of billions of users across the globe.