This SO question was interesting and had various approaches to solve it. Here's a sample example to explain the problem to be solved:

``````\$ cat ip.txt
caller_number=034082394234324, clear_number=33335345435,  direction=1,
caller_number=83479234234,     clear_number=34836424733, direction=2,
caller_number=83479234234,     clear_number=64237384533, direction=2,

\$ cat list.txt
642
3333
534234235

\$ cat op.txt
caller_number=83479234234,     clear_number=64237384533, direction=2,
``````

Any data present in `list.txt` has to be matched immediately after `clear_number=` and the input line should also have `direction=2,`. In the sample above, first line matches `3333` but not the second criteria. The second line fails even though it has `642` since it is not immediately after `clear_number=`. The `list.txt` file can have 10K-50K lines and `ip.txt` is around 10GB.

Here's a slightly modified answer based on existing solutions on that thread. Since the data present in `list.txt` has to be partially matched after `clear_number=`, a single direct comparison with the keys saved in `arr` is not possible. This solution loops over all the keys for every input line that matches the `direction=2,` criteria (breaks the loop if a match is found early).

``````FNR==NR{ arr["=" \$0]; next }

\$3=="direction=2,"{
for(i in arr)
if(index(\$2,i)){
print
next
}
}
``````

To run the solutions, use `mawk -f script.awk list.txt ip.txt`

In my dreams that night, I realized that the solution can be improved drastically by looping over the digits after `clear_number=` instead of looping over keys saved in `arr`. Matching a key is `O(1)`, so the time saving is huge since the inner loop is now a maximum of 12 (length of digits after `clear_number=`) instead of looping a maximum of 10K-50K times! With a 35M sample input file and 12K keys that I created for testing, I found this solution to be about 200 times faster.

``````FNR==NR{ arr[\$0]; next }

\$3=="direction=2,"{
val=substr(\$2,14)
for(i=1; i<length(val); i++)
if(substr(val,1,i) in arr){
print
next
}
}
``````