Archives
Using characteristic features of ECM proteins and a computat
Using characteristic features of ECM proteins and a computational pipeline combining interrogation of protein and gene databases, we previously defined the matrisome as the ensemble of ECM and ECM-associated proteins [[16], [17], [18]]. In mammals, the matrisome represents 4% of the genome, or approximately 1000 genes. We further classified these Norfloxacin hydrochloride australia into core matrisome components, consisting of collagens, proteoglycans, and glycoproteins (including laminins, fibronectins, etc.), and matrisome-associated components, including proteins that could incorporate into ECMs or are co-purified with ECM proteins. These components are further subdivided into ECM-affiliated proteins (e.g., C-type lectins, galectins, annexins, semaphorins, syndecans, and glypicans), ECM regulators (e.g., MMPs, ADAMs, and crosslinking enzymes), and secreted factors (e.g., TGF-β, BMPs, FGFs, Wnt proteins, and chemokines) [[16], [17], [18]]. More recently, we employed a computational approach to predict the in-silico matrisome of the zebrafish [19]. Defining the matrisome of organisms has been instrumental to annotate transcriptomic and proteomic data and has permitted the identification of ECM signatures of biological processes [20] and of human diseases including cancers and fibrosis [[21], [22], [23], [24], [25]].
Here, we devised a novel bioinformatic pipeline combining gene orthology and de-novo identification to define the C. elegans matrisome. We report the identification of 719 genes potentially encoding ECM and ECM-associated proteins, including 181 collagens of which 173 are predicted to be components of the cuticle. Based on their collagen-domain organization, we propose to group these cuticular collagens into five novel clusters and further divide them in sub-clusters. In addition, we demonstrate that the newly defined C. elegans matrisome can be used to annotate data from high throughput RNAi screens, transcriptomic, and proteomic data, and can assist with the identification of ECM genes or signatures relevant in the context of various physiological and pathological processes.
The workflow and steps for defining the C. elegans matrisome are outlined in Fig. 1.
In order to better classify and study the 185 collagen-domain-containing proteins in C. elegans, we propose to define a novel nomenclature based on their collagen-domain organization and the addition of other characteristic protein domains (e.g. C-type lectin; C4, the collagen IV NC1 domain; TSP; FNIII), similar to the mammalian collagen classification [46]. To do so, we clustered the 181 collagens and the 4 collagen-domain containing proteins into four major groups: (1) the vertebrate-like collagens (similar to mammalian type IV, XVIII, XXV), (2) the collagen-domain-containing proteins with mammalian orthologues (collectins and gliomedin), (3) the non-cuticular collagens with no clear orthology to mammalian collagens, and (4) the cuticular collagens. This last group contains the largest number of 173 collagens and which we further propose to subdivide into five main clusters (A to E). For detailed comparison and to facilitate the diffusion of this proposed classification, we constructed the C. elegans collagen database, CeColDB, available at: http://CeColDB.permalink.cc/.