Forum: >>> Magnum BBS <<<

Unique In Column

From Mike Sanders@21:1/5 to All on Mon Oct 2 07:10:04 2023

# verifies an item is unique to the 2nd column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+" }

{ Field2Values[tolower($2)] = 1 }

END { if (uniqueItem("apple", FILENAME) != 0) exit 1 }

function uniqueItem(field2, file) {

lowerField2 = tolower(field2)

if(lowerField2 in Field2Values) {
print "Error: '" field2 "' was found in 2nd column of " file
return 1
} else print "Item: '" field2 "' is unique to 2nd column of " file

return 0
}

# eof

--
:wq
Mike Sanders

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mike Sanders@21:1/5 to Mike Sanders on Mon Oct 2 07:37:03 2023

Mike Sanders <porkchop@invalid.foo> wrote:

# verifies an item is unique to the 2nd column

quick update, why hard-code a field number anyhow?

# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

# eof

--
:wq
Mike Sanders

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ed Morton@21:1/5 to Mike Sanders on Sun Nov 5 09:26:30 2023

On 10/2/2023 2:37 AM, Mike Sanders wrote:

Mike Sanders <porkchop@invalid.foo> wrote:

# verifies an item is unique to the 2nd column

quick update, why hard-code a field number anyhow?

# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

# eof

That's checking whether or not a value exists, not whether or not it's
unique, and producing the wrong output. If we modify it to take a
variable fruit:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

and add a second "apple" in column 2 of your CSV:

$ cat file.csv
john, kiwi
suzi, apple
suzi, orange
gwen, apple

then we can run it as:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Error: 'kiwi' was found in column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Item: 'grape' is unique to column 2 of file.csv

and you can see it's reporting that "grape" is a unique value when it's
not actually present at all.

If we change the script to:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)]++ }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
if (FieldValues[lowerField] == 1) {
print "Item: '" field "' is unique to column " col " of " file
}
else {
print "Error: '" field "' was found in column " col " of " file
return 1
}
}
else {
print "Error: '" field "' was not found in column " col " of " file
}

return 0
}

THEN it'll report unique "fruit" values correctly as well as reporting
which are present/absent:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Item: 'kiwi' is unique to column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Error: 'grape' was not found in column 2 of file.csv

Regards,

Ed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mike Sanders@21:1/5 to Ed Morton on Mon Nov 6 03:14:17 2023

Ed Morton <mortonspam@gmail.com> wrote:

[...]

Thanks Ed, must study your example & mull it over =)

--
:wq
Mike Sanders

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	13:11:54
Calls:	6,706
Files:	12,237
Messages:	5,351,033

Unique In Column

Who's Online

System Info