What You'll Do :
- Work closely with the product team to fetch real-time data and design complex scraping flows to extract information from multiple sources.
- Research new data sources independently to document scraping methods, infra requirements along with scaling and monitoring strategies.
- Acquire, clean, standardize, transform, structure, and store data.
- Develop modules to extract data from documents and identify entities and relationships.
- Perform exploratory analysis on datasets to identify potential insights.
- Help other team members with optimizing data models and analytics.
- Maintain data integrity and consistency across multiple databases and applications.
What Makes You A Great Fit :
- Expertise in Web crawling and scraping (i.e. Scrapy, Selenium, BS4, etc).
- Knowledge of working with Page Models, JS Rendering, Pop-Ups, Tabs, IP Proxies, and Captchas.
- Knowledge of SQL and NoSQL databases (i.e PostgreSQL, MongoDB/DynamoDB, Neo4J).
- Knowledge of API Development using frameworks like flask.
- Knowledge of machine learning libraries/frameworks is essential.
- Extracting, cleaning, and structuring data from unstructured or semi-structured sources like PDF, Text Files, Log files, etc
- Proficiency in Python is a must.