Skip to main navigation Skip to search Skip to main content

Cell Lines CoCoPUTs: A Database of Codon and Codon-pair Usage Frequencies in Cell Lines

  • Nigam Padhiar
  • , Nathan Clement
  • , Upendra Katneni
  • , Patrick Dopler
  • , Tigran Ghazanchyan
  • , Luis Santana-Quintero
  • , Anton Golikov
  • , Michael DiCuccio
  • , Haim Bar
  • , Anton A Komar
  • , Chava Kimchi-Sarfaty
  • Division of Plasma Protein Therapeutics Food and Drug Administration
  • Independent Researcher
  • University of Connecticut

Research output: Contribution to journalArticlepeer-review

Abstract

Cell lines are essential tools for studying biological mechanisms, advancing pre-clinical drug discovery and supporting biologics production. To further research in these fields, we introduce the Cell Lines CoCoPUTs (Codon and Codon Pair Usage Tables,https://dnahive.fda.gov/hivecuts/cell-lines/), a comprehensive resource of transcriptomic-weighted codon and codon-pair usages for 1866 unique cell lines derived from two cancer databases, Catalogue of Somatic Mutations in Cancer (COSMIC) and Cancer Cell Line Encyclopedia (CCLE), and the Human Protein Atlas (HPA) database. Despite differences in the number of cell lines in each database and platforms used for the analysis (microarray vs RNA-Seq), codon usage distributions were broadly similar for all overlapping cell lines across three databases. Application of unsupervised machine learning approaches, including hierarchical and spectral clustering, for the analysis of 1355 cell lines of non-metastatic origin yielded more distinct clusters based on codon-pair usage over codon usage. However, distance-based comparisons indicated that codon usage often yields equal or smaller within-group distances than codon-pair usage and that cell lines are, on average, closer to their site of origin than to their disease phenotype.
Original languageEnglish
Article number169718
JournalJournal of Molecular Biology
DOIs
StateAccepted/In press - Jan 1 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Cell Lines CoCoPUTs
  • codon usage
  • codon-pair usage
  • dinucleotide usage
  • transcriptomic weighted cell line data

Cite this