In: Computer Science
(BASH) Say I have an array of strings with about 100 numbers ranging in length from 3-6 that represent some sort of ID. Within the array of strings, there are some duplicates numbers. What I am trying to do is represent only the top 12 numbers in the form (ID, count occurence) where its ranked by the count occurrences. So the first number would not be the ID with the most digits but it would be the one with the most occurrences. Is there a way to do this with AWK , begin? or how could i approach this problem?
Solution:
For example, for array
declare -a arr=("ABS" "ABC" "SAF" "ABS" "ABC" "SAF" "ABS" "ABC" "SAF" "SAF")
string is str = "ABS ABC SAF ABS ABC SAF ABS ABC SAF SAF"
Command: echo $str | tr '[:space:]' '[\n*]' | sort | uniq -c | sort -bnr | head -12
where,
tr:
just replaces spaces in
string with newlinesComplete bash code:
str=""
## declare an array variable
declare -a arr=("ABS" "ABC" "SAF" "ABS" "ABC" "SAD" "ABS" "ABC" "SAF" "SAF" "ABS" "ABC" "SAG" "ABD" "DBC" "CAF" "CAF" "ABD" "SDF" "SSF" "ABS" "ABC" "SAF" "ABS" "ABC" "SRF" "ACS" "ABD" "SRF" "SEF")
## loop through the above array and convert to string str
for i in "${arr[@]}"
do
str="${str} $i"
done
## Find top 12 id with highest count occurence
echo $str | tr '[:space:]' '[\n*]' | sort | uniq -c | sort -bnr | head -12 | awk '{print $2","$1}'
Output:
ABS,6
ABC,6
SAF,4
ABD,3
SRF,2
CAF,2
SSF,1
SEF,1
SDF,1
SAG,1
SAD,1
DBC,1
Attached screenshot of code:
PS: Let me know if you have any doubt.