Data Analyst Intern
ActiveFence
Data Analyst Intern
- Intelligence
- Vietnam
- Intern
- Part-time
Description
Data Preparation and Exploration:
- Assist in cleaning, transforming, and preparing structured and unstructured datasets for analysis using Python and SQL.
- Perform exploratory data analysis (EDA) to uncover trends, patterns, and anomalies in datasets, contributing to actionable insights.
Data Feed Support:
- Collaborate with senior analysts to design and implement data pipelines for external data sources.
- Develop basic automation scripts for data collection and ingestion, leveraging tools like Pandas and Python libraries.
Leveraging LLMs and Building RAG Systems:
- Support the development of Retrieval-Augmented Generation (RAG) systems by integrating large language models (LLMs) with knowledge bases.
- Assist in curating and pre-processing data to feed into vector databases, ensuring efficient retrieval for LLM queries.
- Work with senior team members to fine-tune LLMs on domain-specific datasets for enhanced performance.
- Contribute to evaluating and optimizing RAG workflows to ensure accurate and relevant output from LLM systems.
Data Science Projects:
- Participate in projects that leverage LLMs or machine learning models to accomplish specific tasks, such as classification, regression, or NLP-based solutions.
- Assist in designing workflows to preprocess and structure input data for model training and evaluation.
- Collaborate with team members to iterate on model improvements, including hyperparameter tuning and feature engineering.
- Support the integration of machine learning or LLM outputs into real-world applications, ensuring scalability and usability.
Visualization and Reporting:
- Create clear, concise visualizations to communicate data-driven insights using tools like Matplotlib, Seaborn, or Tableau.
- Prepare summaries and presentations to share findings with the team and stakeholders.
Collaboration and Communication:
- Work in a diverse, multicultural environment, learning to adapt to various workflows and perspectives.
- Maintain open communication with mentors and peers to ensure alignment on project objectives and deliverables.
Requirements
Technical Skills:
- Proficiency in Python: Experience with data manipulation libraries such as Pandas, NumPy, and basic familiarity with Scikit-learn.
- Strong foundational knowledge of SQL: Ability to perform data extraction, manipulation, and simple database management.
- Familiarity with LLMs and their applications, such as OpenAI GPT or similar models.
- Basic understanding of vector databases (e.g., Pinecone, Weaviate) and their integration with LLMs.
- Experience with EDA techniques and statistical analysis.
- Familiarity with data visualization tools and techniques.
Soft Skills:
- Willingness to Learn: Open to exploring new tools, techniques, and methodologies in the field of data science and AI.
- Problem-Solving Mindset: Ability to approach challenges methodically and think critically about data-related problems.
- Effective Communicator: Comfortable explaining technical findings to a non-technical audience.
Advantages:
- Familiarity with cloud platforms like AWS or Google Cloud is a plus.
- Experience with data engineering platforms such as Databricks will be advantageous.
- Prior academic or personal projects involving LLMs, RAG systems, or advanced data analytics are highly valued.
What You’ll Gain:
This internship provides a unique opportunity to work on cutting-edge AI technologies, including LLMs and RAG systems. You will gain hands-on experience in data science workflows, learn to build advanced AI-enabled applications, and enhance your skills in data analytics, setting a strong foundation for your future career in AI and data science.
About ActiveFence
ActiveFence is the leading tool stack for Trust & Safety teams, worldwide. By relying on ActiveFence’s end-to-end solution, Trust & Safety teams – of all sizes – can keep users safe from the widest spectrum of online harms, unwanted content, and malicious behavior, including child safety, disinformation, fraud, hate speech, terror, nudity, and more.
Using cutting-edge AI and a team of world-class subject-matter experts to continuously collect, analyze, and contextualize data, ActiveFence ensures that in an ever-changing world, customers are always two steps ahead of bad actors. As a result, Trust & Safety teams can be proactive and provide maximum protection to users across a multitude of abuse areas in 70+ languages.
Backed by leading Silicon Valley investors such as CRV and Norwest, ActiveFence has raised $100M to date; employs 300 people worldwide, and has contributed to the online safety of billions of users across the globe.