logo
Tags down

shadow

List full lines based on single field much faster than grep


By : Tahir.T
Date : July 29 2020, 08:00 PM
Hope this helps The solution I proposed above with fgrep will not make a big difference. It is better to use the join tool, if it is OK to sort the files position.txt and build37.txt.
code :
join -1 1 -2 2 <(sort -k 1 position.txt) <(sort -k 2 build37.txt) | awk '{print $2, $1, $3, $4}'


Share : facebook icon twitter icon

How to grep a variable which stores a full text file and print matching lines


By : Alexis
Date : March 29 2020, 07:55 AM
help you fix your problem Hi I have been trying to execute a code where i used a variable $logs to save all my linux logs. Now i want to grep the variable for a pattern and print the whole line for the lines that have the pattern in them. I want to print whole line where i do grep /pattern/ and the lines that have pattern in them have to be printed. Anyways here is my code. , Try this:
code :
foreach (grep(/$search/, split(/\n/, $logs))) {
    print $_."\n";
}

how to grep lines according to 6th field contents


By : user3704119
Date : March 29 2020, 07:55 AM
I hope this helps . I would like to know how to filter lines according to the mem usage: I would like to be able to find only the PIDs that (RES mem usage) exceed 5GB
code :
$ awk '$6~/g/ && ($6+0)>5' file
12654 user    20   0 11.2g 6.1g  10m S 104.8 13.0 784:41.07 MATLAB
3088 user    20   0 17.4g  12g  11m S 101.0 26.3   2043:48 MATLAB
12909 user    20   0 12.1g 5.5g  28m S 99.1 11.8 796:01.45 MATLAB
13154 user    20   0 10.1g 5.6g  10m S 99.1 11.8 797:07.14 MATLAB

How to run grep in parallel on single lines from a list


By : Zhuo Chen
Date : March 29 2020, 07:55 AM
this will help A little trickier than I first thought since matched lines need to go to separate files - please post performance if you get the chance - this solution can be used in parallel too - the species list file can be chunked and/or the fasta file can be chunked and fed to parallel runs of the script
This takes about 1 minute on an Intel Xeon E5 with a 6GB fake data file checked for 10,000 species - but increasing the species list to 100,0000 even in chunks of 10,000 was problematic as I ran into disk issues with that many files being created and appended to in one directory - the problems began when the species list crossed 50,000 - this number will be different on other systems - I modified the script to create 100 subdirectories and limited each directory to 1000 files - this worked well and all 100,000 files were generated without having to chunk the species list or the 6GB datafile
code :
specieslist=$1
nspecies=$(wc -l $specieslist|cut -f1 -d' ')
echo -e "grep $nspecies species from $specieslist\n"
grep -A1 -F -f $specieslist|
awk '
# skip context marker
/^--$/{next}
# process pair of lines
# first line is matching species header line
# species is semicolon-delimited field 7 of first line
# second line is sequence - both lines are written to a file with sanitized species name
{
  split($0, flds, ";")
  species=flds[7]
  filekey=gensub(/\W/,".","g",species)
  file="fastaout." filekey
  if(!(filekey in outfiles))  {
    outfiles[filekey]=file
    printf("species \"%s\" outfile \"%s\" first match line %d: \"%s\"\n", species, file, NR, $0)
    print >file
  }
  getline; print >>file
# close may be needed on systems where awk cannot juggle too many open files
close(outfile)
}
'
outfiles=(fastaout.*)
noutfiles=${#outfiles[*]}
echo -e "\ncreated $noutfiles fastaout.* files"
head -5 fastaout*
$ head out.*
==> out.Brachyspira_innocens <==
brachyspira_innocens.1:-UC-U-G--G-U---------------------------
brachyspira_innocens.2:-UC-U-G--G-U---------------------------

==> out.Methanohalophilus_portucalensis_FDF-1 <==
methanohalophilus_portucalensis_fdf-1:-UC-U-G--G-U---------------------------

==> out.Pucciniomycotina <==
pucciniomycotina:-UC-U-G--G-U---------------------------
Allobacillus_halotolerans
Allochromatium_truperi
Allochromatium_vinosum
Methanohalophilus_portucalensis_FDF-1
Brachyspira_innocens
Pucciniomycotina
HP451749.6.1794_Eukaryota;Opisthokonta;Nucletmycea;Fungi;Dikarya;Basidiomycota;Pucciniomycotina;Pucciniomycetes;Pucciniales;Pucciniaceae;Puccinia;Puccinia_triticina;.............................................................................
pucciniomycotina:-UC-U-G--G-U---------------------------
>EF164983.1.1433_Bacteria;Spirochaetae;Spirochaetes;Spirochaetales;Brachyspiraceae;Brachyspira;Brachyspira_innocens
brachyspira_innocens.1:-UC-U-G--G-U---------------------------
>X96499.1.1810_Eukaryota;Archaeplastida;Chloroplastida;Charophyta;Phragmoplastophyta;Streptophyta;Embryophyta;Marchantiophyta;Jungermanniales;Calypogeia;Plagiochila_adiantoides
plagiochila_adiantoides:-UC-U-G--G-U---------------------------
>AB034906.1.1763_Eukaryota;Opisthokonta;Nucletmycea;Fungi;Dikarya;Ascomycota;Saccharomycotina;Saccharomycetes;Saccharomycetales;Saccharomycetaceae;Citeromyces;Citeromyces_siamensis
citeromyces_siamensis:-UC-U-G--G-U---------------------------
>AY290717.1.1208_Archaea;Euryarchaeota;Methanomicrobia;Methanosarcinales;Methanosarcinaceae;Methanohalophilus;Methanohalophilus_portucalensis_FDF-1
methanohalophilus_portucalensis_fdf-1:-UC-U-G--G-U---------------------------
>EF164984.1.1433_Bacteria;Spirochaetae;Spirochaetes;Spirochaetales;Brachyspiraceae;Brachyspira;Brachyspira_pulli
brachyspira_pulli:-UC-U-G--G-U---------------------------
>AY291120.1.1477_Bacteria;Proteobacteria;Betaproteobacteria;Burkholderiales;Comamonadaceae;Lampropedia;Lampropedia_hyalina
lampropedia_hyalina:-UC-U-G--G-U---------------------------
>EF164987.1.1433_Bacteria;Spirochaetae;Spirochaetes;Spirochaetales;Brachyspiraceae;Brachyspira;Brachyspira_alvinipulli
brachyspira_alvinipulli:-UC-U-G--G-U---------------------------
>JQ838073.1.1461_Bacteria;Actinobacteria;Actinobacteria;Streptomycetales;Streptomycetaceae;Streptomyces;Streptomyces_sp._QLS01
streptomyces_sp._qls01:-UC-U-G--G-U---------------------------
>EF164989.1.1433_Bacteria;Spirochaetae;Spirochaetes;Spirochaetales;Brachyspiraceae;Brachyspira;Brachyspira_alvinipulli
brachyspira_alvinipulli:-UC-U-G--G-U---------------------------
>JQ838076.1.1460_Bacteria;Actinobacteria;Actinobacteria;Streptomycetales;Streptomycetaceae;Streptomyces;Streptomyces_sp._QLS04
streptomyces_sp._qls04:-UC-U-G--G-U---------------------------
>AB035584.1.1789_Eukaryota;Opisthokonta;Nucletmycea;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Tremellomycetes;Tremellales;Trichosporonaceae;Trichosporon;Trichosporon_debeurmannianum
trichosporon_debeurmannianum:-UC-U-G--G-U---------------------------
>JQ838080.1.1457_Bacteria;Actinobacteria;Actinobacteria;Streptomycetales;Streptomycetaceae;Streptomyces;Streptomyces_sp._QLS11
streptomyces_sp._qls11:-UC-U-G--G-U---------------------------
>EF165015.1.1527_Bacteria;Firmicutes;Clostridia;Clostridiales;Family_XI;Tepidimicrobium;Clostridium_sp._PML3-1
clostridium_sp._pml3-1:-UC-U-G--G-U---------------------------
>U85867.1.1424_Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae;Marinobacter;Marinobacter_sp.
Marinobacter_sp.:-UC-U-G--G-U---------------------------
>EF165044.1.1398_Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Methylobacteriaceae;Methylobacterium;Methylobacterium_sp._CBMB38
methylobacterium_sp._cbmb38:-UC-U-G--G-U---------------------------
>U85870.1.1458_Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;Pseudomonas_sp.
pseudomonas_sp.:-UC-U-G--G-U---------------------------
>EF165046.1.1380_Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacteriales;Enterobacteriaceae;Pantoea;Pantoea_sp._CBMB55
pantoea_sp._cbmb55:-UC-U-G--G-U---------------------------
>EF164983.1.1433_Bacteria;Spirochaetae;Spirochaetes;Spirochaetales;Brachyspiraceae;Brachyspira;Brachyspira_innocens
brachyspira_innocens.2:-UC-U-G--G-U---------------------------

grep those matching lines which contain single quotes


By : Antony
Date : March 29 2020, 07:55 AM
will be helpful for those in need You may either use the ERE POSIX pattern by adding the -E option or change the + quantified subpatterns to their equivalents with * (e.g. [0-9]+ => [0-9][0-9]*) (or escape + in GNU grep).
code :
s="{'ID': 'X.id: 4243.4, Y.id 534534', 'text': 'text - text'}"
echo "$s" | grep -E "{'ID': 'X\.id: [0-9]+\.[0-9], Y\.id [0-9]+', 'text': 'text - text'}"
echo "$s" | grep "{'ID': 'X\.id: [0-9][0-9]*\.[0-9], Y\.id [0-9][0-9]*', 'text': 'text - text'}"

Grouping a List with XSLT 1.0 on each single Day based on a Date-Field within the List


By : Manas Mandlekar
Date : March 29 2020, 07:55 AM
like below fixes the issue Since you are using XSLT 1.0, you will have to use a technique called as Muenchian Grouping.
For this requirement, a composite key will be defined comprising of the date part of string of elements and .
code :
<xsl:key name="keyDate" match="SRO_IF_POLYTERMINE_PRO_PATIENT" use="concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10))" />
<xsl:for-each select="SRO_IF_POLYTERMINE_PRO_PATIENT[generate-id() = generate-id(key('keyDate',concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))[1])]">
    <xsl:sort select="substring(key('keyDate', concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))/TRMBEGIN_Q, 1, 10)" />
    ....
</xsl:for-each>
<xsl:for-each select="key('keyDate', concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))">
    <xsl:sort select="substring(TRMBEGIN_Q, 12, 5)" />
    ....
</xsl:for-each>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" />
    <xsl:strip-space elements="*" />

    <xsl:key name="keyDate" match="SRO_IF_POLYTERMINE_PRO_PATIENT" use="concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10))" />

    <xsl:template match="XML_DATA">
        <table border="0" cellspacing="0" cellpadding="5">
            <xsl:for-each select="SRO_IF_POLYTERMINE_PRO_PATIENT[generate-id() = generate-id(key('keyDate',concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))[1])]">
                <xsl:sort select="substring(key('keyDate', concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))/TRMBEGIN_Q, 1, 10)" />
                <tr>
                    <td colspan="2">
                        <b><xsl:value-of select="substring(key('keyDate', concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))/TRMBEGIN_Q, 1, 10)" /></b>
                    </td>
                </tr>
                <xsl:for-each select="key('keyDate', concat(substring(TRMBEGIN_Q, 1, 10), '|', substring(TRMENDE_Q, 1, 10)))">
                    <xsl:sort select="substring(TRMBEGIN_Q, 12, 5)" />
                    <tr>
                        <td><xsl:value-of select="concat(substring(TRMBEGIN_Q, 12, 5), ' - ', substring(TRMENDE_Q, 12, 5))" /></td>
                        <td><xsl:value-of select="XRO_PATNAME" /></td>
                    </tr>
                </xsl:for-each>
            </xsl:for-each>
        </table>
    </xsl:template>
</xsl:stylesheet>
shadow
Privacy Policy - Terms - Contact Us © voile276.org