Carlos Pérez Miguel

PhD and Data Engineer

About Me

Hi, I’m Carlos Pérez-Miguel. I received the PhD in Computer Science from the University of the Basque Contry, Spain in 2015. I have spent the past 16 years developing software in different sectors, both in Industry and Academy. In these years, I have been a Backender, a PhD student, a CTO and a Data Engineer. I am currently working as an HPC applications specialist at the Donostia International Physics Center.

Experience

Donostia International Physics Center

HPC Support and Application Specialist

June 2024

https://dipc.ehu.eus/

Cutting-edge research facility with one of the largest HPC clusters in Spain.

HPC application specialist. Technologies: Slurm, Graphana, OpenMP, MPI

Sparta Commodities

Senior Data Engineer

November 2021 - January 2024

https://www.spartacommodities.com/

Real time data ingestion and processing for a Series-A scale-up, international Oil and Gas SaaS company.

Senior data engineer working at an international, remote-first, Oil and Gas Data company. Main responsibilities:

  • Full software life cycle from conception to application and validation of tests.
  • Build and maintain real time data pipelines.
  • Administration of infrastructure.
  • Mentoring of new team members.
  • Tech lead of several projects:
    • New product to improve visualization of historical data.
    • Migration of large amount of real time data and related services from PostgreSQL to Redshift.
    • Design and develop MVP for real time price data ingestion and anomaly detection.

Technologies: Apache Kafka, Python, Docker, PostgreSQL, Redshift, AWS, Redis, Apache Flink, Vector.dev, Datadog, Apache Iceberg.

Safecont

Cofounder - CTO - Data Engineer

September 2015 - February 2021

https://twitter.com/safecontes

Improving the experience of SEO specialists and Online Marketeers by using machine learning and Big Data techniques.

Design tools using Data Mining and Big Data technologies to optimise the on-line visibility of our clients. Main responsibilities:

  • Full software life cycle from conception to application and validation of tests.
  • Management of research projects:
    • Duplicate content detection
    • Document classification and clustering
    • PageRank optimization
    • Time-series forecasting
    • Temporal Travel Salesman Problem
  • Administration of several servers.
  • Trainer of new application users and to the new team members.
  • Customer support.

Technologies: Python, Spark, Scikit-learn, Numpy, Pandas, Dask, Numba, Mongodb, Mesos, Hadoop, Solr, Nutch, Google App Studio, MySQL, REST APIs, Wordpress, CodeIgniter, D3js, Highcharts, Nginx, Apache, Docker, Google Docs API, AWS, Streamlit, SQLite.

Orange-France Telecom

System Administrator

October 2007 - August 2008

Sysadmin for the bistest telecom company on France.

Administration and integration of Orange’s Address Book web applications. Main responsibilities: - Administration of several web servers and load balancers. - Validation of integration tests. - Partnership with several developer teams at India and France. - Partnership with in-house production teams.

Technologies: Apache, Tomcat, Perl, Bash, VMWare, Java, Redhat Linux.

Crédit Agricole CIB

Back Office Application Developer

May 2006 - September 2007

Backoffice developer

Main developer of the application Suprema, Calyon’s Back Office application for titles borrowing and lending. Main responsibilities:

  • Full software life cycle from conception to application and validation of tests.
  • In-house financial libraries C++ Development, evolutions and bug fixing.
  • Trainer of new application users and to the new team members.
  • Partnership with several teams including Tokyo and New York.
  • Customer support.

Technologies: C, C++, Java, Shell, SQL, Unix.

KUTXABANK

Embedded Systems Developer

October 2005 - May 2006

Main responsibilities:

  • Analysis and design of an embedded application for processing banking operations.
  • Inhouse financial libraries C Development.
  • Full software life cycle from conception to application and validation of tests.
  • Trainer of new application users.

Technologies: C, Java, Debian Linux, XML, Shell

Education

University of Basque Country

PhD in Computer Science

2010 - 2015

PhD thesis: "High Throughput Computing over Peer-to-peer Networks"

Advisors: J. Miguel-Alonso and A. Mendiburu

Viva: 18/06/2015

Cum Laude, International mention with a visit to the NICAL group, University of Science and Technology of China, Hefei, China (January-April 2013).

Partially funded by an Doctoral Grant from the Basque Government.

University of Basque Country

MSc in Distributed Systems

2008 - 2010

MSc thesis: "Parallelization of Estimation of Distribution Algorithms over the Cell Broadband Engine"

Parallelization and vectorization of scientific codes over several parallel architectures.

University of Basque Country

MSc in Computer Science

1999 - 2005

MSc thesis: "Peer-to-peer file sharing system"

Design, analysis and assessment of IT systems in general, considering the different aspects of management, orgranisation and direction of IT projects, maintenance of equipment and infrastructure, artificial intelligence, parallelism and mass management of information.

9 months in Université de Versailles-Saint Quentin en Yvelines, Versailles, France (Erasmus Grant).

Publications

Journals

  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Competition-based failure-aware scheduling for High-Throughput Computing systems on peer-to-peer networks”. Cluster Computing, 18(3), 1229-1249, 2015.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Modeling the availability of Cassandra”. Journal of Parallel and Distributed Computing, 86, 29-44, 2015.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “High throughput computing over peer-to-peer networks”. Future Generation Computer Systems, 29(1), 352-360, 2013.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Porting Estimation of Distribution Algorithms to the Cell Broadband Engine”. Parallel Computing, 36(10-11), 618-634, 2010.

Conferences

  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Sistemas HTC sobre redes P2P”. XXI Jornadas de Paralelismo, CEDI, Valencia, Spain, 2010.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Porting Estimation of Distribution Algorithms to the Cell Broadband Engine”. Best Paper Award. Second International Workshop on Parallel Architectures and Bioinspired Algorithms (WPABA), Raleigh, NC, USA, 2009.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms”. GECCO, Montreal, Canada, 2009.
  • Pérez-Miguel, C., Miguel-Alonso, J. and Mendiburu, A. “Evaluation of the Cell Broadband Engine running Continuous Estimation of Distribution Algorithms”. XX Jornadas de Paralelismo. A Coruña. Spain. 2009.

A Little More About Me

Alongside my interests in parallel and distributed systems, some of my other interests and hobbies are:

  • Traveling
  • Reading
  • Bikes
  • Bread bakery