The Synthesio R&D department is focused on data provisioning, enrichment and exploitation.
Synthesio crawls numerous public consumer data footprints including social, reviews & ratings, survey, search and press data, representing an average of 70M new documents per day. Each document is then analysed and enriched using custom NLP and image recognition models developed by our data scientists. All this data is stored in databases and can be accessed by our customers via our dashboarding solution or via our APIs.
Our stack: Go, Python, JS, MySQL, ElasticSearch, Cassandra (ScyllaDB), Kafka, Gitlab, Docker, Ansible.
Our platform is a microservices architecture platform backed by consequent databases and powerful GPU servers for AI calculations (training + inference).
Our infrastructure is mainly composed of rented bare metal servers running Debian. Big part of it is dedicated to storage, including:
1 PB+ ElasticSearch, 750TB+ MySQL clusters, 150 TB+ ScyllaDB, and also 250TB+ of kafka.
Everything is automated with Ansible, from OS installation to application and monitoring deployment.
Observability is done with Prometheus for metrics and alerting —more than 8M active time series—, syslog + Elasticsearch/Kibana for centralized logs, Jaeger for tracing, Grafana for data visualization.
- Join a strong SRE team of 5 engineers (and more to come).
- Ensure efficiency and reliability of the processing, enrichments and storage of 70M new docs per day, with the great help and collaboration from all the R&D team (Data Engineers, Data Scientists, Frontend Engineers …).
- Collaborate closely with R&D Engineers and features teams on their projects, from software architecture design to CI/CD processes and performance measurements, always in balance with production availability.
- Take part in a weekly on-call rotation, handled by the whole SRE team (< 1 week/month/person) to ensure SLAs for our clients. This is fairly compensated, and a special attention is paid to the impacts on the health of the engineers.
- Ensure scalability, high availability and backups of the platform and its data storages with massive data querying problematics (1 PB+ ElasticSearch, 750TB+ MySQL clusters, 150 TB+ ScyllaDB)
- Challenge existing practices to improve our day-to-day team experience,
- Take part in the definition of the SRE roadmap and architectures.
- Be an advocate of SRE values toward all the company.